Re: WIP: bufmgr rewrite per recent discussions

Lists: pgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-15 21:39:10
Message-ID: 17622.1108503550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

I don't feel this is quite ready to commit, but here it is if anyone
would like to try some performance testing. Using "pgbench -s 10"
on a single-CPU machine, I find this code a little slower than CVS tip
at shared_buffers = 1000, but noticeably faster (~10% speedup) at
10000 buffers. So it's not a dead loss for single-CPU anyway. What
we need now is some performance measurements on multi-CPU boxes.

The bgwriter algorithm probably needs more work, maybe some more GUC
parameters.

regards, tom lane

Attachment Content-Type Size
unknown_filename text/plain 165.9 KB

From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-16 09:52:38
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113110@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi Tom,

I compiled and tested your patch on a dual Opteron server with 12GB RAM
running FC3. Here are the results I get with pgbench with a scale-factor of
10 over an average of 6 runs. All other postgresql.conf options were left at
their default values.

CVS tip results
---------------

Shared buffers = 1000
tps = 232.7013962 (including connections establishing)
tps = 250.9295233 (excluding connections establishing)

Shared buffers = 10000
tps = 217.7393482 (including connections establishing)
tps = 236.8184297 (excluding connections establishing)

Shared buffers = 100000
tps = 150.4182863 (including connections establishing)
tps = 178.3881158 (excluding connections establishing)

CVS tip + Tom's Bufmgr patch
----------------------------

Shared buffers = 1000
tps = 248.3239085 (including connections establishing)
tps = 270.0627257 (excluding connections establishing)

Shared buffers = 10000
tps = 249.7294955 (including connections establishing)
tps = 273.6144427 (excluding connections establishing)

Shared buffers = 100000
tps = 197.8116898 (including connections establishing)
tps = 224.362671 (excluding connections establishing)

The interesting thing to note was that going up to 100000 buffers seemed to
cause the performance to go down again which is something I wouldn't have
expected on a server with such a large amount of RAM given that it was
suggested ARC would perform better with more shared buffers. But all in all,
it looks like your patch is having a positive effect on performance.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-16 15:01:12
Message-ID: 23941.1108566072@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> I compiled and tested your patch on a dual Opteron server with 12GB RAM
> running FC3. Here are the results I get with pgbench with a scale-factor of
> 10 over an average of 6 runs.

Thanks for posting these results. What -c and -t settings were you using
with pgbench? (I like to use -c equal to the scale factor and -t of at
least 1000 ... much less than that gives fairly unstable results in my
experience.)

> The interesting thing to note was that going up to 100000 buffers seemed to
> cause the performance to go down again which is something I wouldn't have
> expected on a server with such a large amount of RAM given that it was
> suggested ARC would perform better with more shared buffers.

I think what is probably happening here is the background writer is
eating too many cycles. As of the patch I posted, the bgwriter is still
using its 8.0 control parameters, in which the minimum scan percentage
is 1% of all the buffers (so 1000 buffers scanned in each round in your
last test) and it's willing to write up to 100 dirty buffers per round
by default. I was looking at this yesterday and thinking it seemed
clearly excessive. With a default bgwriter_delay of 200 msec, this
allows the entire buffer array to be scanned every 20 sec, so we're in
effect keeping the thing under constant syncer load.

If you have time to redo your experiment, would you try knocking
bgwriter_maxpages down to 10 to see if it helps at the larger
shared_buffer settings?

Since yesterday I've improved my patch by converting the bgwriter
percentage variable into a float, so that values smaller than 1% can be
selected, and I've split the two variables into four so that people can
independently control the effort spent on the whole buffer array versus
the buffers just in front of nextVictimBuffer (see BgBufferSync in the
patch). I'm not sure how important that is, but I do think that the 1%
/ 100 default settings are way too high for larger buffer pools. Once
that's in, it will be hard to compare the patch directly against CVS tip,
so trying it now with a smaller maxpages setting for both would be a
fairer comparison.

I have another couple of small ideas for improving the patch --- I'll
try to get those done and post a revised version this evening.

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-16 17:21:52
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113113@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi Tom,

> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 16 February 2005 15:01
> To: Mark Cave-Ayland
> Cc: pgsql-patches(at)postgresql(dot)org
> Subject: Re: [PATCHES] WIP: bufmgr rewrite per recent discussions
>
>
> "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> > I compiled and tested your patch on a dual Opteron server with 12GB
> > RAM running FC3. Here are the results I get with pgbench with a
> > scale-factor of 10 over an average of 6 runs.
>
> Thanks for posting these results. What -c and -t settings
> were you using with pgbench? (I like to use -c equal to the
> scale factor and -t of at least 1000 ... much less than that
> gives fairly unstable results in my
> experience.)

Actually I was using the defaults and copied and pasted in Excel to work the
average out across a number of runs ;)

> I think what is probably happening here is the background
> writer is eating too many cycles. As of the patch I posted,
> the bgwriter is still using its 8.0 control parameters, in
> which the minimum scan percentage is 1% of all the buffers
> (so 1000 buffers scanned in each round in your last test) and
> it's willing to write up to 100 dirty buffers per round by
> default. I was looking at this yesterday and thinking it
> seemed clearly excessive. With a default bgwriter_delay of
> 200 msec, this allows the entire buffer array to be scanned
> every 20 sec, so we're in effect keeping the thing under
> constant syncer load.
>
> If you have time to redo your experiment, would you try
> knocking bgwriter_maxpages down to 10 to see if it helps at
> the larger shared_buffer settings?

OK here are some more results with the different settings. These were done
using the following pgbench command line: pgbench -s 10 -c 10 -t 1000 -d
pgbench

CVS tip: shared_buffers = 1000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 207.315539 (including connections establishing)
tps = 207.417611 (excluding connections establishing)

CVS tip: shared_buffers = 10000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 356.937045 (including connections establishing)
tps = 357.344680 (excluding connections establishing)

CVS tip: shared_buffers = 100000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 343.915227 (including connections establishing)
tps = 344.721566 (excluding connections establishing)

CVS tip + bufmgr patch : shared_buffers = 1000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 206.480332 (including connections establishing)
tps = 206.581087 (excluding connections establishing)

CVS tip + bufmgr patch : shared_buffers = 10000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 302.185931 (including connections establishing)
tps = 302.430903 (excluding connections establishing)

CVS tip + bufmgr patch : shared_buffers = 100000, bgwriter_maxpages = 10

transaction type: TPC-B (sort of)
scaling factor: 10
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 375.021808 (including connections establishing)
tps = 375.615606 (excluding connections establishing)

Reducing bgwriter_maxpages definitely seems to have helped with the larger
values for shared_buffers. However, during the test I was still seeing large
pauses that occurred at a rate that seemed inversely proportional to the
number of shared buffers. So with shared buffers set to 1000, the pgbench
test would 'pause' roughly every 5s for about 2-3s before continuing as
quickly as before. With shared buffers set to 100000 there were only 2 or 3
2-3s pauses during the entire duration of the test. As a rule of thumb, it
looked like the pauses occurred during update statements of the form "update
a set b = b + 1". Is the bgwriter supposed to eliminate these type of pauses
altogether?

> Since yesterday I've improved my patch by converting the
> bgwriter percentage variable into a float, so that values
> smaller than 1% can be selected, and I've split the two
> variables into four so that people can independently control
> the effort spent on the whole buffer array versus the buffers
> just in front of nextVictimBuffer (see BgBufferSync in the
> patch). I'm not sure how important that is, but I do think
> that the 1% / 100 default settings are way too high for
> larger buffer pools. Once that's in, it will be hard to
> compare the patch directly against CVS tip, so trying it now
> with a smaller maxpages setting for both would be a fairer comparison.
>
> I have another couple of small ideas for improving the patch
> --- I'll try to get those done and post a revised version
> this evening.

OK I'm just about finished for the day now. If you email the details of
exactly which tests/parameters you would like me to run, I'll try and run
them tomorrow morning when I have a few spare minutes.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-16 17:27:50
Message-ID: 25402.1108574870@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> Reducing bgwriter_maxpages definitely seems to have helped with the larger
> values for shared_buffers. However, during the test I was still seeing large
> pauses that occurred at a rate that seemed inversely proportional to the
> number of shared buffers. So with shared buffers set to 1000, the pgbench
> test would 'pause' roughly every 5s for about 2-3s before continuing as
> quickly as before. With shared buffers set to 100000 there were only 2 or 3
> 2-3s pauses during the entire duration of the test. As a rule of thumb, it
> looked like the pauses occurred during update statements of the form "update
> a set b = b + 1". Is the bgwriter supposed to eliminate these type of pauses
> altogether?

What do you mean by "pause" exactly? pgbench doesn't emit any output
during a run so I'm not sure what you are watching.

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-16 17:57:40
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113114@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


> -----Original Message-----
> From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> Sent: 16 February 2005 17:28
> To: Mark Cave-Ayland
> Cc: pgsql-patches(at)postgresql(dot)org
> Subject: Re: [PATCHES] WIP: bufmgr rewrite per recent discussions

(cut)

> What do you mean by "pause" exactly? pgbench doesn't emit
> any output during a run so I'm not sure what you are watching.

During pgbench I get output on the console similar to the following:

client 1 receiving
client 1 sending end
client 6 receiving
client 6 sending insert into history(tid,bid,aid,delta,mtime)
values(31,6,295341,938,'now')
client 7 receiving
client 7 sending update branches set bbalance = bbalance + 449 where bid = 8
client 6 receiving
client 6 sending end
..etc..

Normally it scrolls faster than I can keep up with, but as described the
test then 'pauses' for a few seconds before continuing at it's normal pace.
It's especially noticeable at 1000 shared buffers, and it occurs
with/without using your patch. I've just actually checked the logfile and
I'm seeing a few messages similar to below:

LOG: checkpoints are occurring too frequently (7 seconds apart)
HINT: Consider increasing the configuration parameter
"checkpoint_segments".

I've just tried running again with checkpoint_segments set to 8 but that
doesn't seem to make any difference to the pauses. Perhaps it is something
to do with the drives or filesystem? It's a default FC3 install so it's
using ext3 - this is mirrored across 2 SATA drives with software RAID 1.

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-17 01:05:15
Message-ID: 4213EDCB.4080401@coretech.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Tom Lane wrote:
> I don't feel this is quite ready to commit, but here it is if anyone
> would like to try some performance testing. Using "pgbench -s 10"
> on a single-CPU machine, I find this code a little slower than CVS tip
> at shared_buffers = 1000, but noticeably faster (~10% speedup) at
> 10000 buffers. So it's not a dead loss for single-CPU anyway. What
> we need now is some performance measurements on multi-CPU boxes.
>
> The bgwriter algorithm probably needs more work, maybe some more GUC
> parameters.
>

Here are some results for a 2xPIII 700Mhz with 2G ram running Freebsd 5.3:

Pgbench: s=10 c=4 t=1000
Pg: wal_buffers=128 checkpoint_segments=10
shared_buffers=1000|10000

3 runs of each combination were averaged. the figure is the tps
including connection time (with range in the brackets).

8.1 CVS

shared_buffers=1000 tps=129 (129-131)
shared_buffers=10000 tps=146 (145-148)

8.1 CVS + buf patch

shared_buffers=1000 tps=135 (131-138)
shared_buffers=10000 tps=154 (154-155)

regards

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-17 01:05:50
Message-ID: 1589.1108602350@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> OK I'm just about finished for the day now. If you email the details of
> exactly which tests/parameters you would like me to run, I'll try and run
> them tomorrow morning when I have a few spare minutes.

I've posted another version of the buffer rewrite patch.

Another thing that might be interesting on a multi-CPU Opteron is to try
to make the shared memory layout more friendly to the CPU cache, which I
believe uses 128-byte cache lines. (Simon was planning to try some of
these things but I haven't heard back about results.) Things to try
here include

1. Change ALIGNOF_BUFFER in src/include/pg_config_manual.h to 128.
This will require a full recompile I think. 2 and 3 don't make any
sense until after you do this.

2. Pad the BufferDesc struct (in src/include/storage/buf_internals.h)
out to be exactly 64 or 128 bytes. (64 would make it exactly 2 buffer
headers per cache line, so two CPUs would contend only when working on
a pair of adjacent headers. 128 would mean no cross-header cache
contention but of course it wastes a lot more storage.) You need only
recompile the files in src/backend/storage/buffer/ after changing
buf_internals.h.

3. Pad the LWLock struct (in src/backend/storage/lmgr/lwlock.c) to some
power of 2 up to 128 bytes --- same issue of space wasted versus
cross-lock contention.

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-17 15:37:10
Message-ID: 9EB50F1A91413F4FA63019487FCD251D113118@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi Tom,

Here are the results (tps) from your second patch (LH column is including
connection establishments, RH column is excluding connection establishments)
using the same test, i.e. pgbench -s 10 -c 10 -t 1000 -d pgbench

Shared_buffers
1000 10000
100000

204.909702 205.01051 345.098727 345.411606
375.812059 376.37741
195.100496 195.197463 348.791481 349.111363
314.718619 315.139878
199.637965 199.735195 313.561366 313.803225
365.061177 365.666103
195.935529 196.029082 325.893744 326.171754
370.040623 370.625072
196.661374 196.756481 314.468751 314.711517
319.643145 320.099164

Mean:
198.4490132 198.5457462 329.5628138 329.841893
349.0551246 349.5815254

Having a percentage of the shared_buffers scanned in each round means that
no extra tweaking is required for higher values of shared_buffers with the
default settings :)

> Another thing that might be interesting on a multi-CPU
> Opteron is to try to make the shared memory layout more
> friendly to the CPU cache, which I believe uses 128-byte
> cache lines. (Simon was planning to try some of these things
> but I haven't heard back about results.) Things to try here include
>
> 1. Change ALIGNOF_BUFFER in src/include/pg_config_manual.h to
> 128. This will require a full recompile I think. 2 and 3
> don't make any sense until after you do this.

OK.

> 2. Pad the BufferDesc struct (in src/include/storage/buf_internals.h)
> out to be exactly 64 or 128 bytes. (64 would make it exactly
> 2 buffer headers per cache line, so two CPUs would contend
> only when working on a pair of adjacent headers. 128 would
> mean no cross-header cache contention but of course it wastes
> a lot more storage.) You need only recompile the files in
> src/backend/storage/buffer/ after changing buf_internals.h.

Here are the results with the padded BufferDesc structure. First here is the
padding to 64 bytes:

Shared_buffers
1000 10000
100000

206.862511 206.965854 302.316799 302.581089
317.357151 317.791769
198.881107 198.974454 352.982754 353.319523
368.020383 368.625353
200.66022 200.756237 319.80475 320.076327
369.440584 370.032709
202.076089 202.17038 304.278037 304.520488
309.897702 310.332232
204.511959 204.612334 314.043021 314.29964
318.424781 318.871094

Mean:
202.5983772 202.6958518 318.6850722 318.9594134
336.6281202 337.1306314

And here are the results padding BufferDesc to 128 bytes:

Shared_buffers
1000 10000
100000

204.071342 204.177755 368.942576 369.298066
373.385305 374.040511
203.616738 203.717336 365.15145 365.508939
366.837804 367.487877
206.353662 206.451992 303.231566 303.491979
312.613215 313.086744
194.403251 194.497714 311.006837 311.250281
309.072588 309.536229
192.950395 193.040478 334.19558 334.476809
316.284982 316.776723

Mean:
200.2790776 200.377055 336.5056018 336.8052148
335.6387788 336.1856168

As I see it, there is not much noticeable performance gain (and maybe even a
small loss)with the padding included. I suspect that since the drives are
software RAID 1, better drives would be needed to try and benchmark this
better.

> 3. Pad the LWLock struct (in
> src/backend/storage/lmgr/lwlock.c) to some power of 2 up to
> 128 bytes --- same issue of space wasted versus cross-lock contention.

Having seen the results above, is it still worth looking at this?

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-17 15:46:24
Message-ID: 7346.1108655184@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

"Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk> writes:
> As I see it, there is not much noticeable performance gain (and maybe even a
> small loss)with the padding included.

Looks that way. Of course we should never trust a single test case very
far, but this suggests that there's not a whole lot of gold to be mined
by padding out the buffer headers.

>> 3. Pad the LWLock struct (in
>> src/backend/storage/lmgr/lwlock.c) to some power of 2 up to
>> 128 bytes --- same issue of space wasted versus cross-lock contention.

> Having seen the results above, is it still worth looking at this?

Yeah, probably, because there are other possible contention sources
besides buffers that might be alleviated by padding LWLocks. For
instance the buffer manager global locks and the LockMgrLock are
probably all in the same cache line at the moment.

Thanks for running these tests.

regards, tom lane


From: "Mark Cave-Ayland" <m(dot)cave-ayland(at)webbased(dot)co(dot)uk>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP: bufmgr rewrite per recent discussions
Date: 2005-02-22 10:14:08
Message-ID: 9EB50F1A91413F4FA63019487FCD251D11312C@WEBBASEDDC.webbasedltd.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


> -----Original Message-----
> From: pgsql-patches-owner(at)postgresql(dot)org
> [mailto:pgsql-patches-owner(at)postgresql(dot)org] On Behalf Of Tom Lane
> Sent: 17 February 2005 15:46
> To: Mark Cave-Ayland
> Cc: pgsql-patches(at)postgresql(dot)org
> Subject: Re: [PATCHES] WIP: bufmgr rewrite per recent discussions

(cut)

> >> 3. Pad the LWLock struct (in
> >> src/backend/storage/lmgr/lwlock.c) to some power of 2 up to
> >> 128 bytes --- same issue of space wasted versus cross-lock
> contention.
>
> > Having seen the results above, is it still worth looking at this?
>
> Yeah, probably, because there are other possible contention
> sources besides buffers that might be alleviated by padding
> LWLocks. For instance the buffer manager global locks and
> the LockMgrLock are probably all in the same cache line at the moment.

Hi Tom,

Here are the results from the LWLock test. Firstly here are the results with
your second patch with no modifications as a refresher:

PATCH #2 No modifications

1000 10000
100000
204.909702 205.01051 345.098727 345.411606
375.812059 376.37741
195.100496 195.197463 348.791481 349.111363
314.718619 315.139878
199.637965 199.735195 313.561366 313.803225
365.061177 365.666103
195.935529 196.029082 325.893744 326.171754
370.040623 370.625072
196.661374 196.756481 314.468751 314.711517
319.643145 320.099164

Mean:
198.4490132 198.5457462 329.5628138 329.841893
349.0551246 349.5815254

Here are the results with ALIGNOF_BUFFER=128 and padding LWLock to 64 bytes:

PATCH #2 with ALIGNOF_BUFFER = 128 and LWLock padded to 64 bytes

1000 10000
100000
199.672932 199.768756 307.051571 307.299088
367.394745 368.016266
196.443585 196.532912 344.898219 345.204228
375.300921 375.979186
191.098411 191.185807 329.485633 329.77679
305.413304 305.841889
201.110132 201.210049 314.219784 314.476356
314.03306 314.477869
196.615748 196.706032 337.315295 337.62437
370.537538 371.16593

Mean:
196.9881616 197.0807112 326.5941004 326.8761664
346.5359136 347.096228

And finally here are the results with ALIGNOF_BUFFER = 128 and LWLock padded
to 128 bytes:

PATCH #2 with ALIGNOF_BUFFER = 128 and LWLock padded to 128 bytes

1000 10000
100000
195.357405 195.449704 346.916069 347.235895
373.354775 373.934842
190.428061 190.515077 323.932436 324.211975
361.908206 362.476886
206.059573 206.159472 338.288825 338.590642
306.22198 306.618689
195.336711 195.427757 309.316534 309.56603
305.295391 305.695336
188.896205 188.983969 322.889651 323.245394
377.673313 378.269907

Mean:
195.215591 195.3071958 328.268703 328.5699872
344.890733 345.399132

So again I don't see any performance improvement. However, I did manage to
find out what was causing the 'freezing' I mentioned in my earlier email. By
temporarily turning fsync=false in postgresql.conf, the freezing goes away,
so I'm guessing it's something to do with disk/kernel caches and buffering.
Since the drives are software RAID 1 with ext3 I guess that the server is
running I/O bound under load which is perhaps why padding the data
structures doesn't seem to make much difference. I'm not sure whether this
makes the test results particularly useful though :(

Kind regards,

Mark.

------------------------
WebBased Ltd
South West Technology Centre
Tamar Science Park
Plymouth
PL6 8BT

T: +44 (0)1752 791021
F: +44 (0)1752 791023
W: http://www.webbased.co.uk