Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine

Lists: pgsql-performance
From: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 12:35:29
Message-ID: 20080215123528.GA16532@hyperion.scode.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello,

my impression has been that in the past, there has been a general
semi-consensus that upping shared_buffers to use the majority of RAM
has not generally been recommended, with reliance on the buffer cache
instead being the recommendation.

Given the changes that have gone into 8.3, in particular with regards
to minimizing the impact of large sequential scans, would it be
correct to say that given that

- enough memory is left for other PG bits (sort mems and whatnot else)
- only PG is running on the machine
- you're on 64 bit so do not run into address space issues
- the database working set is larger than RAM

it would be generally advisable to pump up shared_buffers pretty much
as far as possible instead of relying on the buffer cache?

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org


From: Kenneth Marshall <ktm(at)rice(dot)edu>
To: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 13:37:34
Message-ID: 20080215133734.GU12156@it.is.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, Feb 15, 2008 at 01:35:29PM +0100, Peter Schuller wrote:
> Hello,
>
> my impression has been that in the past, there has been a general
> semi-consensus that upping shared_buffers to use the majority of RAM
> has not generally been recommended, with reliance on the buffer cache
> instead being the recommendation.
>
> Given the changes that have gone into 8.3, in particular with regards
> to minimizing the impact of large sequential scans, would it be
> correct to say that given that
>
> - enough memory is left for other PG bits (sort mems and whatnot else)
> - only PG is running on the machine
> - you're on 64 bit so do not run into address space issues
> - the database working set is larger than RAM
>
> it would be generally advisable to pump up shared_buffers pretty much
> as far as possible instead of relying on the buffer cache?
>
> --
> / Peter Schuller
>
> PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
> Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
> E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org
>
Peter,

PostgreSQL still depends on the OS for file access and caching. I
think that the current recommendation is to have up to 25% of your
RAM in the shared buffer cache.

Ken


From: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
To: Kenneth Marshall <ktm(at)rice(dot)edu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 13:58:46
Message-ID: 20080215135846.GA17327@hyperion.scode.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> PostgreSQL still depends on the OS for file access and caching. I
> think that the current recommendation is to have up to 25% of your
> RAM in the shared buffer cache.

This feels strange. Given a reasonable amount of RAM (let's say 8 GB
in this case), I cannot imagine why 75% of that would be efficiently
used for anything but the buffer cache (ignoring work_mem, stacks,
etc). Obviously the OS will need memory to do it's usual stuff
(buffering to do efficient I/O, and so on). But the need for that
should not increase with the amount of RAM in the machine, all else
being equal.

What type of file I/O, other than reading pages of PostgreSQL data
which are eligable for the PostgreSQL buffer cache, does PostgreSQL do
that would take advantage of the operating system caching so much
data?

(Assuming the database is not extreme to the point of file system meta
data being huge.)

If the 25% rule still holds true, even under circumstances where the
assumption is that the PostgreSQL buffer cache is more efficient (in
terms of hit ratio) at caching PostgreSQL database data pages, it
would be useful to understand why in order to understand the
trade-offs involved and make appropriate decisions.

Or is it a matter of PostgreSQL doing non-direct I/O, such that
anything cached in shared_buffers will also be cached by the OS?

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 14:29:05
Message-ID: Pine.GSO.4.64.0802150916380.28161@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Fri, 15 Feb 2008, Peter Schuller wrote:

> Or is it a matter of PostgreSQL doing non-direct I/O, such that
> anything cached in shared_buffers will also be cached by the OS?

PostgreSQL only uses direct I/O for writing to the WAL; everything else
goes through the regular OS buffer cache unless you force it to do
otherwise at the OS level (like some Solaris setups do with
forcedirectio). This is one reason it still make not make sense to give
an extremely high percentage of RAM to PostgreSQL even with improvements
in managing it. Another is that shared_buffers memory has to be
reconciled with disk at every checkpoint, where OS buffers do not. A
third is that your OS may just be more efficient at buffering--it knows
more about the underlying hardware, and the clock-sweep method used
internally by PostgreSQL to simulate a LRU cache is not extremely
sophisticated.

However, don't feel limited by the general 25% rule; it's certainly worth
exploring whether 50% or more works better for your workload. You'll have
to benchmark that yourself though, and I'd suggest using pg_buffercache:
http://www.postgresql.org/docs/8.3/static/pgbuffercache.html to get an
idea just what the pages are being used for.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 18:06:11
Message-ID: 200802151006.11858.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Friday 15 February 2008 06:29, Greg Smith wrote:
> PostgreSQL only uses direct I/O for writing to the WAL; everything else
> goes through the regular OS buffer cache unless you force it to do
> otherwise at the OS level (like some Solaris setups do with
> forcedirectio).

Also, note that even when direct I/O is available, most users and benchmark
tests have reported that having PostgreSQL "take over" the entire cache is
not a net performance gain. I believe this is mostly because our I/O and
caching code aren't designed for this kind of operation.

I believe that MyEmma had a different experience on their workload, though.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Erik Jones <erik(at)myemma(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org, Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 18:37:10
Message-ID: 72BE31DF-ABAF-459B-99A3-4E575EC3E08A@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Feb 15, 2008, at 12:06 PM, Josh Berkus wrote:

> On Friday 15 February 2008 06:29, Greg Smith wrote:
>> PostgreSQL only uses direct I/O for writing to the WAL; everything
>> else
>> goes through the regular OS buffer cache unless you force it to do
>> otherwise at the OS level (like some Solaris setups do with
>> forcedirectio).
>
> Also, note that even when direct I/O is available, most users and
> benchmark
> tests have reported that having PostgreSQL "take over" the entire
> cache is
> not a net performance gain. I believe this is mostly because our I/
> O and
> caching code aren't designed for this kind of operation.
>
> I believe that MyEmma had a different experience on their workload,
> though.

Actually, while we did have shared_buffers set to 4G on an 8G system
when we were running with forcedirectio, the decision to even run
with forcedirectio was a temporary until we were able (welll, forced
to) migrate to a new system with a sane drive configuration. The old
set up was done horribly by a sysadmin who's no longer with us who
set us up with a RAID5 array with both the data and xlogs both
mirrored across all of the disks with no spares. So, I wouldn't
consider the numbers I was seeing then a reliable expectation as that
system was nowhere close to ideal. We've seen much more sane and
consistent numbers on a more normal setup, i.e. without forcedirectio
and with <= 25% system memory.

Erik Jones

DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Erik Jones <erik(at)myemma(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 18:42:32
Message-ID: 20080215104232.54c4a397@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Fri, 15 Feb 2008 12:37:10 -0600
Erik Jones <erik(at)myemma(dot)com> wrote:
>(welll, forced
> to) migrate to a new system with a sane drive configuration. The
> old set up was done horribly by a sysadmin who's no longer with us
> who set us up with a RAID5 array with both the data and xlogs both
> mirrored across all of the disks with no spares.

Is the admin still with us? Or is he fertilizer? I have some know some
great gardeners from Jersey...

Sincerely,

Joshua D. Drake

>
> Erik Jones
>
> DBA | Emma®
> erik(at)myemma(dot)com
> 800.595.4401 or 615.292.5888
> 615.292.0777 (fax)
>
> Emma helps organizations everywhere communicate & market in style.
> Visit us online at http://www.myemma.com
>
>
>
>
> ---------------------------(end of
> broadcast)--------------------------- TIP 3: Have you checked our
> extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

- --
The PostgreSQL Company since 1997: http://www.commandprompt.com/
PostgreSQL Community Conference: http://www.postgresqlconference.org/
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL SPI Liaison | SPI Director | PostgreSQL political pundit

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFHtd0YATb/zqfZUUQRAuwPAJ0Y2VjYMkHhCsQ07Sadj/kT0Yz3wQCgmuCP
eOmndoyvYe+DhH+AOwcyms4=
=qGZE
-----END PGP SIGNATURE-----


From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 19:34:33
Message-ID: 47B5E949.7070103@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg Smith wrote:
> On Fri, 15 Feb 2008, Peter Schuller wrote:
>
>> Or is it a matter of PostgreSQL doing non-direct I/O, such that
>> anything cached in shared_buffers will also be cached by the OS?
>
> PostgreSQL only uses direct I/O for writing to the WAL; everything
> else goes through the regular OS buffer cache unless you force it to
> do otherwise at the OS level (like some Solaris setups do with
> forcedirectio). This is one reason it still make not make sense to
> give an extremely high percentage of RAM to PostgreSQL even with
> improvements in managing it. Another is that shared_buffers memory
> has to be reconciled with disk at every checkpoint, where OS buffers
> do not. A third is that your OS may just be more efficient at
> buffering--it knows more about the underlying hardware, and the
> clock-sweep method used internally by PostgreSQL to simulate a LRU
> cache is not extremely sophisticated.
>
> However, don't feel limited by the general 25% rule; it's certainly
> worth exploring whether 50% or more works better for your workload.
> You'll have to benchmark that yourself though, and I'd suggest using
> pg_buffercache:
> http://www.postgresql.org/docs/8.3/static/pgbuffercache.html to get an
> idea just what the pages are being used for.
>

As per the test that I have done mostly with forcedirectio on Solaris, I
have seen gains with increasing the buffercache to about somewhere
between 10GB and thats when thing seem to take a turn...

So in my case I am generally comfortable for Postgres to use about
8-10GB beyond which I am cautious.

Also with tests with UFS buffered for table/index and forcedirectio it
seems to perform better with forcedirectio .. However if you do want to
exploit the extra RAM with UFS then you have to do some tunings for UFS
in Solaris.. Now with machines with 32GB becoming common this is
something worth pursuing depending on the storage if it can handle the
directio load or not.

Regards,
Jignesh


From: Erik Jones <erik(at)myemma(dot)com>
To: Joshua D(dot) Drake <jd(at)commandprompt(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Greg Smith <gsmith(at)gregsmith(dot)com>, Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-15 19:46:48
Message-ID: 5801B3E8-BE50-4DD9-A8CA-AEEDA3FA4B72@myemma.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Feb 15, 2008, at 12:42 PM, Joshua D. Drake wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Fri, 15 Feb 2008 12:37:10 -0600
> Erik Jones <erik(at)myemma(dot)com> wrote:
>> (welll, forced
>> to) migrate to a new system with a sane drive configuration. The
>> old set up was done horribly by a sysadmin who's no longer with us
>> who set us up with a RAID5 array with both the data and xlogs both
>> mirrored across all of the disks with no spares.
>
> Is the admin still with us? Or is he fertilizer? I have some know some
> great gardeners from Jersey...

Heh, he's definitely no long with us although not in the sense that
he's now "pushin' up daisies"...

Erik Jones

DBA | Emma®
erik(at)myemma(dot)com
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate & market in style.
Visit us online at http://www.myemma.com


From: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-18 07:39:46
Message-ID: 200802180839.46557.peter.schuller@infidyne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> PostgreSQL only uses direct I/O for writing to the WAL; everything else
> goes through the regular OS buffer cache unless you force it to do
> otherwise at the OS level (like some Solaris setups do with
> forcedirectio). This is one reason it still make not make sense to give
> an extremely high percentage of RAM to PostgreSQL even with improvements
> in managing it.

Ok - thank you for the input (that goes for everyone).

> Another is that shared_buffers memory has to be
> reconciled with disk at every checkpoint, where OS buffers do not.

Hmm. Am I interpreting that correctly in that dirty buffers need to be flushed
to disk at checkpoints? That makes perfect sense - but why would that not be
the case with OS buffers? My understanding is that the point of the
checkpoint is to essentially obsolete old WAL data in order to recycle the
space, which would require flushing the data in question first (i.e.,
normally you just fsync the WAL, but when you want to recycle space you need
fsync() for the barrier and are then free to nuke the old WAL).

--
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>'
Key retrieval: Send an E-Mail to getpgpkey(at)scode(dot)org
E-Mail: peter(dot)schuller(at)infidyne(dot)com Web: http://www.scode.org


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Peter Schuller <peter(dot)schuller(at)infidyne(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: shared_buffers in 8.3 w/ lots of RAM on dedicated PG machine
Date: 2008-02-18 21:24:30
Message-ID: Pine.GSO.4.64.0802181606160.29622@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 18 Feb 2008, Peter Schuller wrote:

> Am I interpreting that correctly in that dirty buffers need to be
> flushed to disk at checkpoints? That makes perfect sense - but why would
> that not be the case with OS buffers?

All the dirty buffers in the cache are written out as part of the
checkpoint process--all at once in earlier versions, spread out based on
checkpoint_completion_target in 8.3. In the worst case you could
theoretically have to write the entire shared_buffer cache out, however
big it is, if you managed to get it all dirty just before the checkpoint.

Ultimately everything written to the database (again, with the exception
of non-standard direct I/O setups) passes through the OS buffers, so in
that respect the OS buffers will also be flushed when the checkpoint does
its cleansing fsync.

But dirty buffers for less popular pages do get written before the
checkpoint occurs. As there is a need to allocate new pages for the
database to work with, it evicts pages in order to find space, and if the
page given the boot is dirty it gets written to the OS buffer cache.
Those writes trickle out to disk in advance of the checkpoint itself. If
you've pushed the majority of memory into the PostgreSQL cache, that won't
happen as much (more shared_buffers=>less evictions+less OS cache) and
there's a potential for longer, more intensive checkpoints.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD