Re: Postgres vs. intel ccNUMA on Linux

Lists: pgsql-hackers
From: James Robinson <jlrobins(at)socialserve(dot)com>
To: Hackers Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Postgres vs. intel ccNUMA on Linux
Date: 2010-09-29 18:45:20
Message-ID: BF587806-E961-4AB8-9ED9-164F3BA55E75@socialserve.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

Any tips / conventional wisdom regarding running postgres on large-
ish memory ccNUMA intel machines, such as a 32G dual-quad-core,
showing two NUMA nodes of 16G each? I expect each postgres backend's
non-shared memory usage to remain nice and reasonably sized, hopefully
staying within the confines of its processor's local memory region,
but how will accesses to shared memory and / or buffer cache play out?
Do people tune their backends via 'numactl' ?

Furthermore, if one had more than one database being served by the
machine, would it be advisable to do this via multiple clusters
instead of a single cluster, tweaking the processor affinity of each
postmaster accordingly, trying to ensure each cluster's shared memory
segments and buffer cache pools remain local for the resulting backends?

Thanks!
----
James Robinson
Socialserve.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: James Robinson <jlrobins(at)socialserve(dot)com>
Cc: Hackers Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres vs. intel ccNUMA on Linux
Date: 2010-09-30 20:49:47
Message-ID: AANLkTinrD+LcK1VZ7w2vO4wFEsYuwt0HT_NvNHoOm00+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 2:45 PM, James Robinson
<jlrobins(at)socialserve(dot)com> wrote:
> Hackers,
>
>        Any tips / conventional wisdom regarding running postgres on
> large-ish memory ccNUMA intel machines, such as a 32G dual-quad-core,
> showing two NUMA nodes of 16G each? I expect each postgres backend's
> non-shared memory usage to remain nice and reasonably sized, hopefully
> staying within the confines of its processor's local memory region, but how
> will accesses to shared memory and / or buffer cache play out? Do people
> tune their backends via 'numactl' ?
>
>        Furthermore, if one had more than one database being served by the
> machine, would it be advisable to do this via multiple clusters instead of a
> single cluster, tweaking the processor affinity of each postmaster
> accordingly, trying to ensure each cluster's shared memory segments and
> buffer cache pools remain local for the resulting backends?

I was hoping someone more knowledgeable about this topic would reply, but...

Generally, I don't recommend running more than one postmaster on one
machine, because one big pool for shared_buffers is generally going to
be more efficient than two smaller pools. However, as you say, the
shared memory region might be a problem, particularly for things like
the ProcArray, which are pretty "hot" and are accessed by every
backend during every transaction. But I'm not sure whether the
additional overhead is going to be more or less than the overhead of
splitting the shared_buffers arena in half, so I suspect you're going
to have to benchmark it to find out.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: James Robinson <jlrobins(at)socialserve(dot)com>
Cc: Hackers Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres vs. intel ccNUMA on Linux
Date: 2010-10-01 01:32:38
Message-ID: 4CA53A36.2060305@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

James Robinson wrote:
> Any tips / conventional wisdom regarding running postgres on
> large-ish memory ccNUMA intel machines, such as a 32G dual-quad-core,
> showing two NUMA nodes of 16G each? I expect each postgres backend's
> non-shared memory usage to remain nice and reasonably sized, hopefully
> staying within the confines of its processor's local memory region,
> but how will accesses to shared memory and / or buffer cache play out?
> Do people tune their backends via 'numactl' ?

My gut feel here is that the odds this particular area will turn into
your bottleneck is so slim that worrying about it in advance is
premature optimization. If you somehow end up in the unexpected
situation where processor time that might be altered via such fine
control is your bottleneck, as opposed to disks, buffer cache
contention, the ProcArray contention Robert mentioned, WAL contention,
or something else like that--all things you can't segment usefully
here--well maybe at that point I'd start chasing after numactl. As for
how likely that is, all I can say is I've never gotten there before
finding a much more obvious bottleneck first.

However, I recently wrote a little utility to test memory speeds as
increasing numbers of clients do things on a system, and it may provide
you some insight into how your system responds as different numbers of
them do things: http://github.com/gregs1104/stream-scaling

I've gotten results submitted to me where you can see memory speeds
fluctuate on servers where threads bounce between processors and their
associated memory, stuff that goes away if you then lock the test
program to specific cores. If you want to discuss results from trying
that on your system and how that might impact real-world server
behavior, I'd recommend posting about that to the pgsql-performance list
rather than this one. pgsql-hackers is more focused on code-level
issues with PostgreSQL. There really aren't any of those in the area
you're asking about, as the database is blind to what the OS is doing
underneath of it here.

> Furthermore, if one had more than one database being served by the
> machine, would it be advisable to do this via multiple clusters
> instead of a single cluster, tweaking the processor affinity of each
> postmaster accordingly, trying to ensure each cluster's shared memory
> segments and buffer cache pools remain local for the resulting backends?

If you have a database that basically fits in memory, that might
actually work. Note however that the typical useful tuning for
PostgreSQL puts more cache into the operating system side of things than
what's dedicated to the database, and that may end up mixed across
"banks" as it were. I'd still place my money on running into another
limitation first, but the idea is much more sound . What I would try
doing here is running the SELECT-only version of pgbench against both
clusters at once, and see if you really can get more total oomph out of
the system than a single cluster of twice the size. The minute disks
start entering the picture though, you're likely to end up back to where
processor/memory affinity is the least of your concerns.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book