testing ProcArrayLock patches

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: testing ProcArrayLock patches
Date: 2011-11-18 15:11:42
Message-ID: CA+Tgmob5j=UmJKCRQZ5yhy6Fqmp+uZWKBVGEggZ3BQfei48L2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have three patches in the hopper that all have the same goal:
reduce ProcArrayLock contention. They are:

[1] Pavan's patch (subsequently revised by Heikki) to put the "hot"
members of the PGPROC structure into a separate array
http://archives.postgresql.org/message-id/4EB7C4C9.9070309@enterprisedb.com

[2] my FlexLocks patch, and
http://archives.postgresql.org/message-id/CA+Tgmoax_14rbx8Y6mmgvW64gCQL4ZviDzwEObXEMuiV=TwmxQ@mail.gmail.com

[3] my patch to eliminate some snapshot (I think this is also better
semantics, but at any rate it also improves performance)
http://archives.postgresql.org/message-id/CA+TgmoYDe3dx7xuK_rCPLWy7P67hp96ozyGe_K6W87kfx3YCGw@mail.gmail.com

Interestingly, these all try to reduce ProcArrayLock contention in
different ways: [1] does it by making snapshot-taking scan fewer cache
lines, [2] does it by reducing contention for the spinlock protecting
ProcArrayLock, and [3] does it by taking fewer snapshots. So you
might think that the effects of these patches would add, at least to
some degree.

Now the first two patches are the ones that seem to show the most
performance improvement, so I tested both patches individually and
also a combination of the two patches (the combined patch for this is
attached, as there were numerous conflicts). I tested them on two
different machines with completely different architectures; Nate
Boley's AMD 6128 box (which has 32 cores) and an HP Integrity server
(also with 32 cores). On Integrity, I compiled using the aCC
compiler, adjusted the resulting binary with chatr +pi L +pd L, and
ran both pgbench and the server with rtsched -s SCHED_NOAGE -p 178,
which are settings that seem to be necessary for good performance on
that platform. pgbench was run locally on the AMD box but from
another server over a high-speed network interconnect on the Integrity
server. Both servers were configured with shared_buffers=8GB,
checkpoint_segments=300, wal_writer_delay=20ms, and
synchronous_commit=off. Some of the other settings were different; on
the Integrity server, I had effective_cache_size=340GB,
checkpoint_timeout=30min, and wal_buffers=16MB, while on the AMD box I
had checkpoint_completion_target=0.9 and maintenance_work_mem=1GB. I
doubt that these settings differences were material (except that they
probably made reinitializing the database between tests take longer on
the Integrity system, since I forgot to set maintenance_work_mem), but
I could double-check that if anyone is concerned about it.

The results are below. In a nutshell, either patch by itself is very,
very good; and both patches together are somewhat better. Which one
helps more individually is somewhat variable. Lines marked "m" are
unpatched master as of commit
ff4fd4bf53c5512427f8ecea08d6ca7777efa2c5. "p" is Pavan's PGPROC patch
(maybe I should have said ppp...) as revised by Heikki; "f" is the
latest version of my FlexLocks patch, and "b" is the combination patch
attached herewith. The number immediately following is the number of
clients used, each with its own pgbench thread (i.e. -c N -j N). As
usual, each number is the median of three five-minute runs at scale
factor 100.

Since Pavan's patch has the advantage of being quite simple, I'm
thinking we should push that one through to completion first, and then
test all the other possible improvements in this area relative to that
new baseline.

== AMD Opteron 6128, 32 cores, Permanent Tables ==

m01 tps = 631.208073 (including connections establishing)
p01 tps = 631.182923 (including connections establishing)
f01 tps = 636.308562 (including connections establishing)
b01 tps = 629.295507 (including connections establishing)
m08 tps = 4516.479854 (including connections establishing)
p08 tps = 4614.772650 (including connections establishing)
f08 tps = 4652.454768 (including connections establishing)
b08 tps = 4679.363474 (including connections establishing)
m16 tps = 7788.615240 (including connections establishing)
p16 tps = 7824.025406 (including connections establishing)
f16 tps = 7841.876146 (including connections establishing)
b16 tps = 7859.334650 (including connections establishing)
m24 tps = 11720.145052 (including connections establishing)
p24 tps = 12782.696214 (including connections establishing)
f24 tps = 12559.765555 (including connections establishing)
b24 tps = 12891.945766 (including connections establishing)
m32 tps = 10223.015618 (including connections establishing)
p32 tps = 11585.902050 (including connections establishing)
f32 tps = 11626.542744 (including connections establishing)
b32 tps = 11866.969986 (including connections establishing)
m80 tps = 7540.482189 (including connections establishing)
p80 tps = 11598.446238 (including connections establishing)
f80 tps = 11529.752081 (including connections establishing)
b80 tps = 11714.364294 (including connections establishing)

== AMD Opteron 6128, 32 cores, Unlogged Tables ==

m01 tps = 680.398630 (including connections establishing)
p01 tps = 673.293390 (including connections establishing)
f01 tps = 679.993953 (including connections establishing)
b01 tps = 679.377600 (including connections establishing)
m08 tps = 4760.964292 (including connections establishing)
p08 tps = 4870.037842 (including connections establishing)
f08 tps = 5028.719509 (including connections establishing)
b08 tps = 4893.439824 (including connections establishing)
m16 tps = 7997.051705 (including connections establishing)
p16 tps = 8218.884377 (including connections establishing)
f16 tps = 8160.373682 (including connections establishing)
b16 tps = 8144.707958 (including connections establishing)
m24 tps = 13066.867858 (including connections establishing)
p24 tps = 14523.109116 (including connections establishing)
f24 tps = 14098.978673 (including connections establishing)
b24 tps = 14526.330294 (including connections establishing)
m32 tps = 10800.711985 (including connections establishing)
p32 tps = 19159.131614 (including connections establishing)
f32 tps = 22224.839905 (including connections establishing)
b32 tps = 23373.672552 (including connections establishing)
m80 tps = 7885.663468 (including connections establishing)
p80 tps = 17760.149440 (including connections establishing)
f80 tps = 19960.356205 (including connections establishing)
b80 tps = 18665.581069 (including connections establishing)

== HP Integrity, 32 cores, Permanent Tables ==

m01 tps = 883.732295 (including connections establishing)
p01 tps = 866.449154 (including connections establishing)
f01 tps = 924.364403 (including connections establishing)
b01 tps = 926.797302 (including connections establishing)
m08 tps = 6098.047731 (including connections establishing)
p08 tps = 6293.537855 (including connections establishing)
f08 tps = 6059.635731 (including connections establishing)
b08 tps = 6250.132288 (including connections establishing)
m16 tps = 9995.755003 (including connections establishing)
p16 tps = 10654.562946 (including connections establishing)
f16 tps = 10258.008496 (including connections establishing)
b16 tps = 10712.776806 (including connections establishing)
m24 tps = 11646.915026 (including connections establishing)
p24 tps = 13483.345338 (including connections establishing)
f24 tps = 12815.456128 (including connections establishing)
b24 tps = 13506.218109 (including connections establishing)
m32 tps = 10433.315312 (including connections establishing)
p32 tps = 14111.719739 (including connections establishing)
f32 tps = 13990.284158 (including connections establishing)
b32 tps = 14697.189751 (including connections establishing)
m80 tps = 8177.428209 (including connections establishing)
p80 tps = 11343.667289 (including connections establishing)
f80 tps = 11651.244256 (including connections establishing)
b80 tps = 12523.308466 (including connections establishing)

== HP Integrity, 32 cores, Unlogged Tables ==

m01 tps = 949.594327 (including connections establishing)
p01 tps = 958.753925 (including connections establishing)
f01 tps = 931.276655 (including connections establishing)
b01 tps = 943.836646 (including connections establishing)
m08 tps = 6211.621726 (including connections establishing)
p08 tps = 6412.267441 (including connections establishing)
f08 tps = 5843.870591 (including connections establishing)
b08 tps = 6428.415940 (including connections establishing)
m16 tps = 10341.538889 (including connections establishing)
p16 tps = 11161.425798 (including connections establishing)
f16 tps = 10545.954472 (including connections establishing)
b16 tps = 11235.441290 (including connections establishing)
m24 tps = 11859.831632 (including connections establishing)
p24 tps = 14380.766878 (including connections establishing)
f24 tps = 13489.351324 (including connections establishing)
b24 tps = 14579.649665 (including connections establishing)
m32 tps = 10716.208372 (including connections establishing)
p32 tps = 15497.819188 (including connections establishing)
f32 tps = 14590.406972 (including connections establishing)
b32 tps = 15991.920288 (including connections establishing)
m80 tps = 8465.159253 (including connections establishing)
p80 tps = 11945.494890 (including connections establishing)
f80 tps = 14676.324769 (including connections establishing)
b80 tps = 15623.109737 (including connections establishing)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
flexlock-minimal-pgproc-heikki.patch application/octet-stream 145.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-11-18 15:12:43 Re: VACUUM touching file but not updating relation
Previous Message Simon Riggs 2011-11-18 14:55:49 Re: VACUUM touching file but not updating relation