Re: Bug: Buffer cache is not scan resistant

Lists: pgsql-hackers
From: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
To: "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-07 03:32:28
Message-ID: C3E62232E3BCF24CBA20D72BFDCB6BF802AF284B@MI8NYCMAIL08.Mi8.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Incidentally, we tried triggering NTA (L2 cache bypass) unconditionally and in various patterns and did not see the substantial gain as with reducing the working set size.

My conclusion: Fixing the OS is not sufficient to alleviate the issue. We see a 2x penalty (1700MB/s versus 3500MB/s) at the higher data rates due to this effect.

- Luke

Msg is shrt cuz m on ma treo

-----Original Message-----
From: Sherry Moore [mailto:sherry(dot)moore(at)sun(dot)com]
Sent: Tuesday, March 06, 2007 10:05 PM Eastern Standard Time
To: Simon Riggs
Cc: Sherry Moore; Tom Lane; Luke Lonergan; Mark Kirkwood; Pavan Deolasee; Gavin Sherry; PGSQL Hackers; Doug Rady
Subject: Re: [HACKERS] Bug: Buffer cache is not scan resistant

Hi Simon,

> and what you haven't said
>
> - all of this is orthogonal to the issue of buffer cache spoiling in
> PostgreSQL itself. That issue does still exist as a non-OS issue, but
> we've been discussing in detail the specific case of L2 cache effects
> with specific kernel calls. All of the test results have been
> stand-alone, so we've not done any measurements in that area. I say this
> because you make the point that reducing the working set size of write
> workloads has no effect on the L2 cache issue, but ISTM its still
> potentially a cache spoiling issue.

What I wanted to point out was that (reiterating to avoid requoting),

- My test was simply to demonstrate that the observed performance
difference with VACUUM was caused by whether the size of the
user buffer caused L2 thrashing.

- In general, application should reduce the size of the working set
to reduce the penalty of TLB misses and cache misses.

- If the application access pattern meets the NTA trigger condition,
the benefit of reducing the working set size will be much smaller.

Whatever I said is probably orthogonal to the buffer cache issue you
guys have been discussing, but I haven't read all the email exchange
on the subject.

Thanks,
Sherry
--
Sherry Moore, Solaris Kernel Development http://blogs.sun.com/sherrym


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-09 19:27:55
Message-ID: 1173468476.3641.340.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-03-06 at 22:32 -0500, Luke Lonergan wrote:
> Incidentally, we tried triggering NTA (L2 cache bypass)
> unconditionally and in various patterns and did not see the
> substantial gain as with reducing the working set size.
>
> My conclusion: Fixing the OS is not sufficient to alleviate the issue.
> We see a 2x penalty (1700MB/s versus 3500MB/s) at the higher data
> rates due to this effect.
>
I've implemented buffer recycling, as previously described, patch being
posted now to -patches as "scan_recycle_buffers".

This version includes buffer recycling

- for SeqScans larger than shared buffers, with the objective of
improving L2 cache efficiency *and* reducing the effects of shared
buffer cache spoiling (both as previously discussed on this thread)

- for VACUUMs of any size, with the objective of reducing WAL thrashing
whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
originally suggested by Itagaki-san, just with a different
implementation).

Behaviour is not activated by default in this patch. To request buffer
recycling, set the USERSET GUC
SET scan_recycle_buffers = N
tested with 1,4,8,16, but only > 8 seems sensible, IMHO.

Patch effects StrategyGetBuffer, so only effects the disk->cache path.
The idea is that if its already in shared buffer cache then we get
substantial benefit already and nothing else is needed. So for the
general case, the patch adds a single if test into the I/O path.

The parameter is picked up at the start of SeqScan and VACUUM
(currently). Any change mid-scan will be ignored.

IMHO its possible to do this and to allow Synch Scans at the same time,
with some thought. There is no need for us to rely on cache spoiling
behaviour of scans to implement that feature as well.

Independent performance tests requested, so that we can discuss this
objectively.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-12 07:21:25
Message-ID: 20070312155349.6296.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:

> I've implemented buffer recycling, as previously described, patch being
> posted now to -patches as "scan_recycle_buffers".
>
> - for VACUUMs of any size, with the objective of reducing WAL thrashing
> whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
> originally suggested by Itagaki-san, just with a different
> implementation).

I tested your patch with VACUUM FREEZE. The performance was improved when
I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
but this patch should be useful for normal VACUUMs with backgrond jobs!

N | time | WAL flush(*)
-----+-------+-----------
0 | 58.7s | 0.01%
1 | 80.3s | 81.76%
8 | 73.4s | 16.73%
16 | 64.2s | 9.24%
32 | 59.0s | 4.88%
64 | 56.7s | 2.63%
128 | 55.1s | 1.41%

(*) WAL flush is the ratio of the need of fsync to buffer recycle.

# SET scan_recycle_buffers = 0;
# UPDATE accounts SET aid=aid WHERE random() < 0.005;
# CHECKPOINT;
# SET scan_recycle_buffers = <N>;
# VACUUM FREEZE accounts;

BTW, does the patch change the default usage of buffer in vacuum? From what
I've seen, scan_recycle_buffers = 1 is the same as before. With the default
value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
just like existing sequential scans. Is this intended?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-12 09:14:28
Message-ID: 1173690868.3641.515.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:
>
> > I've implemented buffer recycling, as previously described, patch being
> > posted now to -patches as "scan_recycle_buffers".
> >
> > - for VACUUMs of any size, with the objective of reducing WAL thrashing
> > whilst keeping VACUUM's behaviour of not spoiling the buffer cache (as
> > originally suggested by Itagaki-san, just with a different
> > implementation).
>
> I tested your patch with VACUUM FREEZE. The performance was improved when
> I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> but this patch should be useful for normal VACUUMs with backgrond jobs!

Thanks.

> N | time | WAL flush(*)
> -----+-------+-----------
> 0 | 58.7s | 0.01%
> 1 | 80.3s | 81.76%
> 8 | 73.4s | 16.73%
> 16 | 64.2s | 9.24%
> 32 | 59.0s | 4.88%
> 64 | 56.7s | 2.63%
> 128 | 55.1s | 1.41%
>
> (*) WAL flush is the ratio of the need of fsync to buffer recycle.

Do you have the same measurement without patch applied? I'd be
interested in the current state also (the N=0 path is modified as well
for VACUUM, in this patch).

> # SET scan_recycle_buffers = 0;
> # UPDATE accounts SET aid=aid WHERE random() < 0.005;
> # CHECKPOINT;
> # SET scan_recycle_buffers = <N>;
> # VACUUM FREEZE accounts;

Very good results, thanks. I'd be interested in the same thing for just
VACUUM and for varying ratios of dirty/clean blocks during vacuum.

> BTW, does the patch change the default usage of buffer in vacuum? From what
> I've seen, scan_recycle_buffers = 1 is the same as before.

That was the intention.

> With the default
> value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> just like existing sequential scans. Is this intended?

Yes, but its not very useful for testing to have done that. I'll do
another version within the hour that sets N=0 (only) back to current
behaviour for VACUUM.

One of the objectives of the patch was to prevent VACUUM from tripping
up other backends. I'm confident that it will improve that situation for
OLTP workloads running regular concurrent VACUUMs, but we will need to
wait a couple of days to get those results in also.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-12 10:08:37
Message-ID: 1173694117.3641.536.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
> On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:

> > With the default
> > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > just like existing sequential scans. Is this intended?
>
> Yes, but its not very useful for testing to have done that. I'll do
> another version within the hour that sets N=0 (only) back to current
> behaviour for VACUUM.

New test version enclosed, where scan_recycle_buffers = 0 doesn't change
existing VACUUM behaviour.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
scan_recycle_buffers.v3.patch text/x-patch 22.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-12 14:30:19
Message-ID: 15267.1173709819@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> I tested your patch with VACUUM FREEZE. The performance was improved when
> I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> but this patch should be useful for normal VACUUMs with backgrond jobs!

Proving that you can see a different in a worst-case scenario is not the
same as proving that the patch is useful in normal cases.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-12 18:40:17
Message-ID: 1173724817.3641.669.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-03-12 at 10:30 -0400, Tom Lane wrote:
> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > I tested your patch with VACUUM FREEZE. The performance was improved when
> > I set scan_recycle_buffers > 32. I used VACUUM FREEZE to increase WAL traffic,
> > but this patch should be useful for normal VACUUMs with backgrond jobs!
>
> Proving that you can see a different in a worst-case scenario is not the
> same as proving that the patch is useful in normal cases.

I agree, but I think that this VACUUM FREEZE test does actually
represent the normal case, here's why:

The poor buffer manager behaviour occurs if the block is dirty as a
result of WAL-logged changes. It only takes the removal of a single row
for us to have WAL logged this and dirtied the block.

If we invoke VACUUM from autovacuum, we do this by default when 20% of
the rows have been updated, which means with many distributions of
UPDATEs we'll have touched a very large proportion of blocks before we
VACUUM. That isn't true for *all* distributions of UPDATEs, but it will
soon be. Dead Space Map will deliver only dirty blocks for us.

So running a VACUUM FREEZE is a reasonable simulation of running a large
VACUUM on a real production system with default autovacuum enabled, as
will be the case for 8.3.

It is possible that we run VACUUM when fewer dirty blocks are generated,
but this won't be the common situation and not something we should
optimise for.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-13 04:40:42
Message-ID: 20070313132631.6091.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:

> > > With the default
> > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > > just like existing sequential scans. Is this intended?
> >
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.

This is a result with scan_recycle_buffers.v3.patch. I used normal VACUUM
with background load using slowdown-ed pgbench in this instance. I believe
the patch is useful in normal cases, not only for VACUUM FREEZE.

N | time | WAL flush(*)
-----+--------+-----------
0 | 112.8s | 44.3%
1 | 148.9s | 52.1%
8 | 105.1s | 17.6%
16 | 96.9s | 8.7%
32 | 103.9s | 6.3%
64 | 89.4s | 6.6%
128 | 80.0s | 3.8%

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-13 05:16:04
Message-ID: C21B81A4.292B3%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon,

You may know we've built something similar and have seen similar gains.
We're planning a modification that I think you should consider: when there
is a sequential scan of a table larger than the size of shared_buffers, we
are allowing the scan to write through the shared_buffers cache.

The hypothesis is that if a relation is of a size equal to or less than the
size of shared_buffers, it is "cacheable" and should use the standard LRU
approach to provide for reuse.

- Luke

On 3/12/07 3:08 AM, "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:

> On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
>> On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
>
>>> With the default
>>> value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in
>>> pool,
>>> just like existing sequential scans. Is this intended?
>>
>> Yes, but its not very useful for testing to have done that. I'll do
>> another version within the hour that sets N=0 (only) back to current
>> behaviour for VACUUM.
>
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-13 09:37:20
Message-ID: 1173778641.3641.793.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-03-12 at 22:16 -0700, Luke Lonergan wrote:

> You may know we've built something similar and have seen similar gains.

Cool

> We're planning a modification that I think you should consider: when there
> is a sequential scan of a table larger than the size of shared_buffers, we
> are allowing the scan to write through the shared_buffers cache.

Write? For which operations?

I was thinking to do this for bulk writes also, but it would require
changes to bgwriter's cleaning sequence. Are you saying to write say ~32
buffers then fsync them, rather than letting bgwriter do that? Then
allow those buffers to be reused?

> The hypothesis is that if a relation is of a size equal to or less than the
> size of shared_buffers, it is "cacheable" and should use the standard LRU
> approach to provide for reuse.

Sounds reasonable. Please say more.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-13 10:08:04
Message-ID: 1173780484.3641.809.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2007-03-13 at 13:40 +0900, ITAGAKI Takahiro wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:
>
> > > > With the default
> > > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > > > just like existing sequential scans. Is this intended?
> > >
> > New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> > existing VACUUM behaviour.
>
> This is a result with scan_recycle_buffers.v3.patch. I used normal VACUUM
> with background load using slowdown-ed pgbench in this instance. I believe
> the patch is useful in normal cases, not only for VACUUM FREEZE.
>
> N | time | WAL flush(*)
> -----+--------+-----------
> 0 | 112.8s | 44.3%
> 1 | 148.9s | 52.1%
> 8 | 105.1s | 17.6%
> 16 | 96.9s | 8.7%
> 32 | 103.9s | 6.3%
> 64 | 89.4s | 6.6%
> 128 | 80.0s | 3.8%

Looks good.

Not sure what value of N to pick for normal use. The objectives are
i) don't let VACUUMs spoil the cache
ii) speed up standalone VACUUMs
iii) don't let VACUUM cause others to repeatedly WAL flush

I'm thinking N=16 meets all 3 objectives. We could make VACUUM go faster
still, but by knocking more blocks out of cache that someone doing real
work might need. That will slow them down almost as much as forcing them
to flush WAL, so I'd want to be conservative with VACUUM.

Does anybody think we need a new parameter for this, or are we happy at
16 buffers in the recycle loop for VACUUM?

At this point I should note something I haven't mentioned before.
VACUUMs force other backends to flush out WAL only when we have an I/O
bound workload. If the database already fits in memory then BufferAlloc
never needs to run and therefore we don't need to flush WAL. So I can
understand that the effect of WAL flushing may not have been noticed by
many testers.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Sherry Moore" <sherry(dot)moore(at)sun(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Pavan Deolasee" <pavan(at)enterprisedb(dot)com>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Doug Rady" <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-13 14:36:56
Message-ID: C21C0518.29373%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon,

On 3/13/07 2:37 AM, "Simon Riggs" <simon(at)2ndquadrant(dot)com> wrote:

>> We're planning a modification that I think you should consider: when there
>> is a sequential scan of a table larger than the size of shared_buffers, we
>> are allowing the scan to write through the shared_buffers cache.
>
> Write? For which operations?

I'm actually just referring to the sequential scan "writing into" the shared
buffers cache, sorry for the "write through" :-)

> I was thinking to do this for bulk writes also, but it would require
> changes to bgwriter's cleaning sequence. Are you saying to write say ~32
> buffers then fsync them, rather than letting bgwriter do that? Then
> allow those buffers to be reused?

Off topic, but we think we just found the reason(s) for the abysmal heap
insert performance of pgsql and are working on a fix to that as well. It
involves two main things: the ping-ponging seeks used to extend a relfile
and the bgwriter not flushing aggressively enough. We're hoping to move the
net heap insert rate from 12MB/s for a single stream to something more like
100 MB/s per stream, but it may take a week to get some early results and
find out if we're on the right track. We've been wrong on this before ;-)

- Luke


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Pavan Deolasee <pavan(at)enterprisedb(dot)com>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-03-27 01:04:57
Message-ID: 200703270104.l2R14vx16179@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Simon, is this patch ready to be added to the patch queue? I assume not.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
> > On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
>
> > > With the default
> > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > > just like existing sequential scans. Is this intended?
> >
> > Yes, but its not very useful for testing to have done that. I'll do
> > another version within the hour that sets N=0 (only) back to current
> > behaviour for VACUUM.
>
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Luke Lonergan <LLonergan(at)greenplum(dot)com>, Sherry Moore <sherry(dot)moore(at)sun(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Pavan Deolasee <pavan(at)enterprisedb(dot)com>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Doug Rady <drady(at)greenplum(dot)com>
Subject: Re: Bug: Buffer cache is not scan resistant
Date: 2007-04-02 23:15:44
Message-ID: 200704022315.l32NFiP18580@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"test" version, but I am putting in the queue so we can track it there.

Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Simon Riggs wrote:
> On Mon, 2007-03-12 at 09:14 +0000, Simon Riggs wrote:
> > On Mon, 2007-03-12 at 16:21 +0900, ITAGAKI Takahiro wrote:
>
> > > With the default
> > > value of scan_recycle_buffers(=0), VACUUM seems to use all of buffers in pool,
> > > just like existing sequential scans. Is this intended?
> >
> > Yes, but its not very useful for testing to have done that. I'll do
> > another version within the hour that sets N=0 (only) back to current
> > behaviour for VACUUM.
>
> New test version enclosed, where scan_recycle_buffers = 0 doesn't change
> existing VACUUM behaviour.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +