Re: Misaligned BufferDescriptors causing major performance problems on AMD

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Misaligned BufferDescriptors causing major performance problems on AMD
Date: 2015-01-01 18:58:02
Message-ID: 20150101185802.GA13930@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 1, 2015 at 05:59:25PM +0100, Andres Freund wrote:
> > That seems like a strange approach. I think it's pretty sensible to
> > try to ensure that allocated blocks of shared memory have decent
> > alignment, and we don't have enough of them for aligning on 64-byte
> > boundaries (or even 128-byte boundaries, perhaps) to eat up any
> > meaningful amount of memory. The BUFFERALIGN() stuff, like much else
> > about the way we manage shared memory, has also made its way into the
> > dynamic-shared-memory code. So if we do adjust the alignment that we
> > guarantee for the main shared memory segment, we should perhaps adjust
> > DSM to match. But I guess I don't understand why you'd want to do it
> > that way.
>
> The problem is that just aligning the main allocation to some boundary
> doesn't mean the hot part of the allocation is properly aligned. shmem.c
> in fact can't really do much about that - so fully moving the
> responsibility seems more likely to ensure that future code thinks about
> alignment.

Yes, there is shared memory allocation alignment and object alignment.
Since there are only about 50 cases of these, a worst-case change to
force 64-byte alignment would only cost 3.2k of shared memory.

It might make sense to make them all 64-byte aligned to reduce CPU cache
contention, but we have to have actual performance numbers to prove
that. My two patches allow individual object alignment to be tested. I
have not been able to see any performance difference (<1%) with:

$ pgbench --initialize --scale 100 pgbench
$ pgbench --protocol prepared --client 32 --jobs 16 --time=100 --select-only pgbench

on my dual-socket 16 vcore server.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-01 19:49:06 Re: Misaligned BufferDescriptors causing major performance problems on AMD
Previous Message Robert Haas 2015-01-01 17:59:57 Re: Parallel Seq Scan