questions regarding shared_buffers behavior

Lists: pgsql-performance
From: Mark Rostron <mrostron(at)ql2(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: questions regarding shared_buffers behavior
Date: 2010-11-07 20:33:05
Message-ID: FD020D3E50E7FA479567872E5F5F31E3045A218A28@ex01.corp.ql2.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Question regarding the operation of the shared_buffers cache and implications of the pg_X_stat_tables|pg_X_stat_indexes stats.
( I am also aware that this is all complicated by the kernel cache behavior, however, if, for the purpose of these questions, you wouldn't mind assuming that we don't have a kernel cache, and therefore just focus on the behavior of the db cache as an isolated component, it will help - thanks in advance).

What is the procedure that postgres uses to decide whether or not a table/index block will be left in the shared_buffers cache at the end of the operation?

Are there any particular types of *table* access operations that will cause postgres to choose not to retain the table pages in shared_buffers at the end of the operation?
In particular, the activity tracked by:

- Seq_scan

- Seq_tup_read

- Idx_tup_read

- Idx_tup_fetch

Are there any particular types of *index* access operations that will cause postgres to choose not to retain the index pages in shared_buffers at the end of the operation?
In particular, the activity tracked by:

- idx_scan

- Idx_tup_read

- Idx_tup_fetch


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Mark Rostron <mrostron(at)ql2(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: questions regarding shared_buffers behavior
Date: 2010-11-07 23:30:14
Message-ID: 4CD73686.4090807@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Mark Rostron wrote:
>
> What is the procedure that postgres uses to decide whether or not a
> table/index block will be left in the shared_buffers cache at the end
> of the operation?
>

There is no such procedure. When a table or index page is used, its
usage count goes up, which means it's more likely to stay in the cache
for longer afterwards. Processing trying to allocate pages are
constantly circling the buffer cache looking for pages where the usage
count is at 0 to re-use. The only special cases are for sequential
scans and VACUUM, which use continuously re-use a small section of the
buffer cache in some cases instead.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Mark Rostron <mrostron(at)ql2(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: questions regarding shared_buffers behavior
Date: 2010-11-08 00:33:37
Message-ID: FD020D3E50E7FA479567872E5F5F31E3045A218A2F@ex01.corp.ql2.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

> >
> > What is the procedure that postgres uses to decide whether or not a
> > table/index block will be left in the shared_buffers cache at the end
> > of the operation?
> >
>
> The only special cases are for sequential scans and VACUUM, which use continuously re-use a small section of the buffer cache in some cases instead.

Thanks - the part about sequential scans and the re-use of a small section of shared_buffers is the bit I was interested in.
I don't suppose you would be able to tell me how large that re-useable area might be?

Now, with regard to the behavior of table sequential scans: do the stat values in seq_scan and seq_tup_read reflect actual behavior.
I assume they do, but I'm just checking - these would be updated as the result of real I/O as opposed to fuzzy estimates?

Obviously, the reason I am asking this is that I am noticing high machine io levels that would only result from sequential scan activity.
The explain output says otherwise, but the seq_scan stat value for the table kinda correlates.
Hence my enquiry.

Thanks in advance.
Mr


From: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
To: Mark Rostron <mrostron(at)ql2(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: questions regarding shared_buffers behavior
Date: 2010-11-08 03:03:37
Message-ID: AANLkTinYy00+PaDCTdhq0urGJdJtiKUAFPmBdiC3t5Br@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2010/11/8 Mark Rostron <mrostron(at)ql2(dot)com>:
>> >
>> > What is the procedure that postgres uses to decide whether or not a
>> > table/index block will be left in the shared_buffers cache at the end
>> > of the operation?
>> >
>>
>> The only special cases are for sequential scans and VACUUM, which use continuously re-use a small section of the buffer cache in some cases instead.
>
> Thanks - the part about sequential scans and the re-use of a small section of shared_buffers is the bit I was interested in.
> I don't suppose you would be able to tell me how large that re-useable area might be?

There are 256KB per seqscan and 256KB per vacuum.

I suggest you to go reading src/backend/storage/buffer/README

>
> Now, with regard to the behavior of table sequential scans: do the stat values in seq_scan and seq_tup_read reflect actual behavior.
> I assume they do, but I'm just checking - these would be updated as the result of real I/O as opposed to fuzzy estimates?

They represent the real stat for hit/read from shared_buffers, *not*
from OS buffers.

Getting real statistic from OS has a cost because postgresql don't use
(for other reason) mmap to get data.

>
> Obviously, the reason I am asking this is that I am noticing high machine io levels that would only result from sequential scan activity

You may want to start inspect your postgresql buffer cache with the
contrib module pg_buffercache.
http://www.postgresql.org/docs/9.0/static/pgbuffercache.html

Then if it is not enough you can inspect more precisely your OS cache
with pgfincore but it migh be useless in your situation.
http://villemain.org/projects/pgfincore

> The explain output says otherwise, but the seq_scan stat value for the table kinda correlates.

Starting with 9.0, the contrib module pg_stat_statements provide a lot
of information about buffer access (from shared buffers usage, but
still very valuable information) you should have a look at it if you
have such postgresql version installed.

--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Cédric Villemain <cedric(dot)villemain(dot)debian(at)gmail(dot)com>
Cc: Mark Rostron <mrostron(at)ql2(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: questions regarding shared_buffers behavior
Date: 2010-11-12 21:07:47
Message-ID: AANLkTin2pkLo=gc18h1kKjMFNSLo6ARzbsQgWLyEDzCr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Sun, Nov 7, 2010 at 10:03 PM, Cédric Villemain
<cedric(dot)villemain(dot)debian(at)gmail(dot)com> wrote:
> 2010/11/8 Mark Rostron <mrostron(at)ql2(dot)com>:
>>> >
>>> > What is the procedure that postgres uses to decide whether or not a
>>> > table/index block will be left in the shared_buffers cache at the end
>>> > of the operation?
>>> >
>>>
>>> The only special cases are for sequential scans and VACUUM, which use continuously re-use a small section of the buffer cache in some cases instead.
>>
>> Thanks - the part about sequential scans and the re-use of a small section of shared_buffers is the bit I was interested in.
>> I don't suppose you would be able to tell me how large that re-useable area might be?
>
> There are 256KB per seqscan and 256KB per vacuum.
>
> I suggest you to go reading src/backend/storage/buffer/README

Note that there is a different, higher limit for the "bulk write"
strategy when using COPY IN or CTAS.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company