Quick Links

Re: Storing hot members of PGPROC out of the band

Lists:	pgsql-hackers

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Storing hot members of PGPROC out of the band
Date:	2011-11-03 18:12:52
Message-ID:	CABOikdOceQ42p4PP5qY+nD-JkrjicJsVasgXT3an7TadmeWHUw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi All,

While working on some of the performance issues on HP-UX, I noticed a
significant data cache misses for accessing PGPROC members. On a close
inspection, it was quite evident that for large number (even few 10s)
of clients, the loop inside GetSnapshotData will cause data cache miss
for almost every PGPROC because the PGPROC structure is quite heavy
and no more than one structure may fit in a single cache line. So I
experimented by separating the most frequently and closely accessed
members of the PGPROC into an out of band array. I call it
PGPROC_MINIMAL structure which contains xid, xmin, vacuumFlags amongst
others. Basically, all the commonly accessed members by
GetSnapshotData find a place in this minimal structure.

When PGPROC array is allocated, we also allocate another array of
PGPROC_MINIMAL structures of the same size. While accessing the
ProcArray, a simple pointer mathematic can get us the corresponding
PGPROC_MINIMAL structure. The only exception being the dummy PGPROC
for prepared transaction. A first cut version of the patch is
attached. It looks big, but most of the changes are cosmetic because
of added indirection. The patch also contains another change to keep
the ProcArray sorted by (PGPROC *) to preserve locality of references
while traversing the array.

I did some tests of a 32 core IA HP-UX box and the results are quite
good. With a scale factor of 100 and -N option of pgbench (updates on
only accounts table), the numbers look something like this:

Clients HEAD PGPROC-Patched Gain
1 1098.488663 1059.830369 -3.52%
4 3569.481435 3663.898254 2.65%
32 11627.059228 16419.864056 41.22%
48 11044.501244 15825.132582 43.29%
64 10432.206525 15408.50304 47.70%
80 10210.57835 15170.614435 48.58%

The numbers are quite reproducible with couple of percentage points
variance. So even for single client, I sometimes see no degradation.
Here are some more numbers with the normal pgbench tests (without -N
option).

Clients HEAD PGPROC-Patched Gain
1 743 771 3.77%
4 1821 2315 27.13%
32 8011 9166 14.42%
48 7282 8959 23.03%
64 6742 8937 32.56%
80 6316 8664 37.18%

Its quite possible that the effect of the patch is more evident on the
particular hardware that I am testing. But the approach nevertheless
seems reasonable. It will very useful if someone else having access to
a large box can test the effect of the patch.

BTW, since I played with many versions of the patch, the exact numbers
with this version might be a little different than what I posted
above. I will conduct more tests, especially with more number of
clients and see if there is any difference in the improvement.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
pgproc-minimal-v13.patch	text/x-patch	37.1 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-03 22:43:15
Message-ID:	CA+U5nMLJ-=nmrPnZEOb=HnsSmQg0VuQYzG+XDwYn3CmtX+drRA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 3, 2011 at 6:12 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> When PGPROC array is allocated, we also allocate another array of
> PGPROC_MINIMAL structures of the same size. While accessing the
> ProcArray, a simple pointer mathematic can get us the corresponding
> PGPROC_MINIMAL structure. The only exception being the dummy PGPROC
> for prepared transaction. A first cut version of the patch is
> attached. It looks big, but most of the changes are cosmetic because
> of added indirection. The patch also contains another change to keep
> the ProcArray sorted by (PGPROC *) to preserve locality of references
> while traversing the array.

This is very good.

If you look at your PGPROC_MINIMAL, its mostly transaction related
stuff, so I would rename it PGXACT or similar. Not sure why you talk
about pointer math, seems easy enough just to have two arrays
protected by one lock, and have each proc use the same offset in both
arrays.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-07 10:53:12
Message-ID:	CABOikdPSAycvL8TLbG2rg9hXNmzXitxgMjTJYVZdNqeb2MAKVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Nov 4, 2011 at 4:13 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

>
> If you look at your PGPROC_MINIMAL, its mostly transaction related
> stuff, so I would rename it PGXACT or similar.

Yeah, that looks good too. Though I am not sure if all fields are
related to transaction state and whether we would need to add more
fields to the structure in future. Having a general name might help in
that case.

> Not sure why you talk
> about pointer math, seems easy enough just to have two arrays
> protected by one lock, and have each proc use the same offset in both
> arrays.
>

Right now we store PGPROC pointers in the ProcArray and the pointer
math gets us the index to look into the other array. But we can
actually just store indexes in the ProcArray to avoid that. A positive
index may mean offset into the normal PGPROC array and a negative
index can be used to get dummy PGPROC from the prepared transactions.

Thanks,
Pavan

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-07 11:45:13
Message-ID:	4EB7C4C9.9070309@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 04.11.2011 00:43, Simon Riggs wrote:
> On Thu, Nov 3, 2011 at 6:12 PM, Pavan Deolasee<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>> When PGPROC array is allocated, we also allocate another array of
>> PGPROC_MINIMAL structures of the same size. While accessing the
>> ProcArray, a simple pointer mathematic can get us the corresponding
>> PGPROC_MINIMAL structure. The only exception being the dummy PGPROC
>> for prepared transaction. A first cut version of the patch is
>> attached. It looks big, but most of the changes are cosmetic because
>> of added indirection. The patch also contains another change to keep
>> the ProcArray sorted by (PGPROC *) to preserve locality of references
>> while traversing the array.
>
> This is very good.
>
> If you look at your PGPROC_MINIMAL, its mostly transaction related
> stuff, so I would rename it PGXACT or similar. Not sure why you talk
> about pointer math, seems easy enough just to have two arrays
> protected by one lock, and have each proc use the same offset in both
> arrays.

Agreed, that seems more clean. The PGPROCs for prepared transactions are
currently allocated separately, so they're not currently in the same
array as all other PGPROCs, but that could be changed. Here's a WIP
patch for that. I kept the PGPROC_MINIMAL nomenclature for now, but I
agree with Simon's suggestion to rename it.

On 03.11.2011 20:12, Pavan Deolasee wrote:
> Its quite possible that the effect of the patch is more evident on the
> particular hardware that I am testing. But the approach nevertheless
> seems reasonable. It will very useful if someone else having access to
> a large box can test the effect of the patch.

I tested this on an 8-core x64 box, but couldn't see any measurable
difference in pgbench performance. I tried with and without -N and -S,
and --unlogged-tables, but no difference.

While looking at GetSnapshotData(), it occurred to me that when a PGPROC
entry does not have its xid set, ie. xid == InvalidTransactionId, it's
pointless to check the subxid array for that proc. If a transaction has
no xid, none of its subtransactions can have an xid either. That's a
trivial optimization, see attached. I tested this with "pgbench -S -M
prepared -c 500" on the 8-core box, and it seemed to make a 1-2%
difference (without the other patch). So, almost in the noise, but it
also doesn't cost us anything in terms of readability or otherwise.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
pgproc-minimal-heikki-4.patch	text/x-diff	51.7 KB
subxids-must-have-parent-1.patch	text/x-diff	625 bytes

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-07 12:28:57
Message-ID:	CA+TgmoaSnmJnyuih1e1jD=0mkW0t2GV6j4J-9cAjcgaPPW8kCA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 7, 2011 at 6:45 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> While looking at GetSnapshotData(), it occurred to me that when a PGPROC
> entry does not have its xid set, ie. xid == InvalidTransactionId, it's
> pointless to check the subxid array for that proc. If a transaction has no
> xid, none of its subtransactions can have an xid either. That's a trivial
> optimization, see attached. I tested this with "pgbench -S -M prepared -c
> 500" on the 8-core box, and it seemed to make a 1-2% difference (without the
> other patch). So, almost in the noise, but it also doesn't cost us anything
> in terms of readability or otherwise.

Oh, that's a good idea. +1 for doing that now, and then we can keep
working on the rest of it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-22 17:35:01
Message-ID:	CA+TgmoZvpj1RfLu92MdTtvd44bqO56yT6-yo2_WuHQP+z=UxoQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 7, 2011 at 6:45 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Agreed, that seems more clean. The PGPROCs for prepared transactions are
> currently allocated separately, so they're not currently in the same array
> as all other PGPROCs, but that could be changed. Here's a WIP patch for
> that. I kept the PGPROC_MINIMAL nomenclature for now, but I agree with
> Simon's suggestion to rename it.

All right, I did that in the attached version, using Simon's suggested
name. I also fixed up various comments (especially in
InitProcGlobal), fixed the --enable-cassert build (which was busted),
and added code to save/restore PreparedXactProcs in EXEC_BACKEND mode
(which I assume is necessary, though the regression tests failed to
fail without it).

I'm wondering about the changes to how globalxmin is computed in
GetSnapshotData(). That seems like it could hurt performance in the
single-client case, and especially in the case where there is one
active connection and lots of idle connections, and I'm wondering how
thoroughly we've tested that particular bit apart from these other
changes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
pgxact.patch	application/octet-stream	54.2 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-24 13:30:23
Message-ID:	CA+TgmoZ4-T0HM_3UNWjpxgGeXv7eEYzG0vj3D9RWFPzio+AdSA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 22, 2011 at 12:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'm wondering about the changes to how globalxmin is computed in
> GetSnapshotData(). That seems like it could hurt performance in the
> single-client case, and especially in the case where there is one
> active connection and lots of idle connections, and I'm wondering how
> thoroughly we've tested that particular bit apart from these other
> changes.

I separated this part out - see attached. pgxact-v2.patch is the same
as what I posted on Tuesday with those changes factored out;
recentglobalxmin.patch applies over top of it, and contains just the
changes related to how RecentGlobalXmin is recomputed. I tried a
pgbench -S test, with five minute runs, scale factor 100,
shared_buffers = 8GB, on Nate Boley's 32-core box. I tried this on
unpatched master, with just pgxact-v2.patch, and with both patches.
At 1, 8, and 32, pgxact-v2 is faster than master, but pgxact-v2 +
recentglobalxmin is slower than just pgxact-v2. When I get up to 80
clients, the recentgloblaxmin portion becomes a win.

Just to make sure that I wasn't measuring random noise, I did seven
runs with each configuration rather than my usual three. The table
below shows the lowest, median, and highest tps results for each
tested configuration.

== 1 client ==
master: low 4312.806032 median 4343.971972 high 4395.870212
pgxact-v2: low 4391.273988 median 4408.283875 high 4448.315804
pgxact-v2+recentglobalxmin: low 4328.697157 median 4360.581749 high 4399.763953

== 8 clients ==
master: low 33295.56753 median 33356.682726 high 33639.368929
pgxact-v2: low 33719.004950 median 33927.195872 high 34106.679176
pgxact-v2+recentglobalxmin: low 33565.708794 median 33694.140297 high
33795.072927

== 32 clients ==
master: low 210514.460228 median 213168.532917 high 214169.174032
pgxact-v2: low 216230.082361 median 221678.274469 high 231465.732256
pgxact-v2+recentglobalxmin: low 212202.985015 median 219390.075074
high 223353.421472

== 80 clients ==
master: low 208139.915780 median 209867.352113 high 214868.151835
pgxact-v2: low 217003.716877 median 218360.946541 high 222095.321825
pgxact-v2+recentglobalxmin: low 219390.617005 median 220912.168056
high 221779.348531

I'm going to run some more tests, but my thought is that we should
probably leave the recentglobalxmin changes out for the time being,
pending further study and consideration of other alternatives.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
pgxact-v2.patch	application/octet-stream	53.0 KB
recentglobalxmin.patch	application/octet-stream	2.2 KB

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-24 13:54:31
Message-ID:	CA+U5nM+yR=eaemrHAdrwWUPkfMfT=jrcZOsmRNnqr2+b_tYQRg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 24, 2011 at 1:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I'm going to run some more tests, but my thought is that we should
> probably leave the recentglobalxmin changes out for the time being,
> pending further study and consideration of other alternatives.

Agreed

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-25 04:54:23
Message-ID:	CABOikdOisBWjpjUTZ=e-EBO+OWi0yiTkmX=jWoWoMiXNbjvXCg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 24, 2011 at 7:24 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Thu, Nov 24, 2011 at 1:30 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> I'm going to run some more tests, but my thought is that we should
>> probably leave the recentglobalxmin changes out for the time being,
>> pending further study and consideration of other alternatives.
>
> Agreed
>

+1. These are independent patches and should be pursued like that.
BTW, I reviewed the pgxact-v2.patch and I have no objections to that
and it looks good to go in. Thanks Robert for making the necessary
changes and also running the benchmark tests.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-11-25 13:19:08
Message-ID:	CA+TgmoZ6opWsMTixeHFNuMRE7oNU5-ehefpAr+Y0uH3KecFgsg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Nov 24, 2011 at 11:54 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> +1. These are independent patches and should be pursued like that.
> BTW, I reviewed the pgxact-v2.patch and I have no objections to that
> and it looks good to go in. Thanks Robert for making the necessary
> changes and also running the benchmark tests.

I ran some write tests with master, pgxact-v2, and
pgxact-v2+recentglobalxmin. shared_buffers = 8GB,
maintenance_work_mem = 1GB, synchronous_commit = off,
checkpoint_segments = 300, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9, wal_writer_delay = 20ms. Seven
five-minute runs at scale factor 100 for each configuration.

Here's the executive summary: On the read-only test,
recentglobalxmin.patch was only a win at the very highest concurrency
I tested (80 clients); on the read-write test, it starts to show a
benefit at much lower concurrencies (32 clients, certainly, perhaps
even 8 clients, on unlogged tables). However, pgxact-v2.patch seems
to be a win on both read and write tests and at any concurrency level,
including the single-client case.

== 1 client, unlogged tables ==
master: low 671.861618 median 677.324867 high 765.824313 (but the
second highest was only 679.491822)
pgxact-v2: low 663.901614 median 689.496716 high 696.812065
pgxact-v2+recentglobalxmin: low 665.554342 median 685.401979 high 688.832906

== 8 clients, unlogged tables ==
master: low 4722.011063 median 4758.201239 high 4920.130891
pgxact-v2: low 4684.759859 median 4840.081663 high 4979.036845
pgxact-v2+recentglobalxmin: low 4723.743270 median 4856.513904 high 4997.528399

== 32 clients, unlogged tables ==
master: low 10878.959662 median 10901.523672 high 10934.699151
pgxact-v2: low 17944.914228 median 18060.058996 high 19281.541088
pgxact-v2+recentglobalxmin: low 18894.860512 median 19637.190567 high
19817.089456

== 80 clients, unlogged tables ==
master: low 7872.934292 median 7897.811216 high 7909.410723
pgxact-v2: low 12032.684380 median 12397.316995 high 13279.998414
pgxact-v2+recentglobalxmin: low 16964.227483 median 17801.478747 high
18107.646170

== 1 client, permanent tables ==
master: low 625.502929 median 628.442284 high 677.451660
pgxact-v2: low 636.755782 median 640.083573 high 645.273888
pgxact-v2+recentglobalxmin: low 633.320412 median 636.898945 high 637.886099

== 8 clients, permanent tables ==
master: low 4497.012143 median 4624.844801 high 4849.233268
pgxact-v2: low 4561.914897 median 4625.443111 high 4776.095552
pgxact-v2+recentglobalxmin: low 4469.742226 median 4789.249847 high 4824.033794

== 32 clients, permanent tables ==
master: low 10468.362239 median 10511.425102 high 10531.069684
pgxact-v2: low 12821.732396 median 14500.067726 high 14546.538281
pgxact-v2+recentglobalxmin: low 14907.122746 median 15129.665408 high
15186.743199

== 80 clients, permanent tables ==
master: low 7601.067552 median 7612.898321 high 7631.487355
pgxact-v2: low 11712.895410 median 12004.807309 high 12512.078569
pgxact-v2+recentglobalxmin: low 15186.695057 median 15810.452158 high
16166.272699

I don't see much reason to wait around any further on the core patch
(pgact-v2.patch) so I'll commit that now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-16 04:35:46
Message-ID:	CA+TgmoY0Pdy9UK3i1Lm_qC7CbsDGBTE+tV_OJee9PN0nZjY0-Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 7, 2011 at 7:28 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Nov 7, 2011 at 6:45 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> While looking at GetSnapshotData(), it occurred to me that when a PGPROC
>> entry does not have its xid set, ie. xid == InvalidTransactionId, it's
>> pointless to check the subxid array for that proc. If a transaction has no
>> xid, none of its subtransactions can have an xid either. That's a trivial
>> optimization, see attached. I tested this with "pgbench -S -M prepared -c
>> 500" on the 8-core box, and it seemed to make a 1-2% difference (without the
>> other patch). So, almost in the noise, but it also doesn't cost us anything
>> in terms of readability or otherwise.
>
> Oh, that's a good idea. +1 for doing that now, and then we can keep
> working on the rest of it.

I spent some time looking at (and benchmarking) this idea today. I
rearranged that section of code a little more into what seemed like
the optimal order to avoid doing more work than necessary, but I
wasn't getting much of a gain out of it even on unlogged tables and on
permanent tables it was looking like a small loss, though I didn't
bother letting the benchmarks run for long enough to nail that down
because it didn't seem to amount to much one way or the other. I
added then added one more change that helped quite a lot: I introduced
a macro NormalTransactionIdPrecedes() which works like
TransactionIdPrecdes() but (a) is only guaranteed to work if the
arguments are known to be normal transaction IDs (which it also
asserts, for safety) and (b) is a macro rather than a function call.
I found three places to use that inside GetSnapshotData(), and the
results with that change look fairly promising.

Nate Boley's box, m = master, s = with the attached patch, median of
three 5-minute runs at scale factor 100, config same as my other
recent tests:

Permanent Tables

m01 tps = 617.734793 (including connections establishing)
s01 tps = 620.330099 (including connections establishing)
m08 tps = 4589.566969 (including connections establishing)
s08 tps = 4545.942588 (including connections establishing)
m16 tps = 7618.842377 (including connections establishing)
s16 tps = 7596.759619 (including connections establishing)
m24 tps = 11770.534809 (including connections establishing)
s24 tps = 11789.424152 (including connections establishing)
m32 tps = 10776.660575 (including connections establishing)
s32 tps = 10643.361817 (including connections establishing)
m80 tps = 11057.353339 (including connections establishing)
s80 tps = 10598.254713 (including connections establishing)

Unlogged Tables

m01 tps = 668.145495 (including connections establishing)
s01 tps = 676.793337 (including connections establishing)
m08 tps = 4715.214745 (including connections establishing)
s08 tps = 4737.833913 (including connections establishing)
m16 tps = 8110.607784 (including connections establishing)
s16 tps = 8192.013414 (including connections establishing)
m24 tps = 14120.753841 (including connections establishing)
s24 tps = 14302.915040 (including connections establishing)
m32 tps = 17886.032656 (including connections establishing)
s32 tps = 18735.319468 (including connections establishing)
m80 tps = 12711.930739 (including connections establishing)
s80 tps = 17592.715471 (including connections establishing)

Now, you might not initially find those numbers all that encouraging.
Of course, the unlogged tables numbers are quite good, especially at
32 and 80 clients, where the gains are quite marked. But the
permanent table numbers are at best a wash, and maybe a small loss.
Interestingly enough, some recent benchmarking of the FlexLocks patch
showed a similar (though more pronounced) effect:

http://archives.postgresql.org/pgsql-hackers/2011-12/msg00095.php

Now, both the FlexLocks patch and this patch aim to mitigate
contention problems around ProcArrayLock. But they do it in
completely different ways. When I got a small regression on permanent
tables with the FlexLocks patch, I thought the problem was with the
patch itself, which is believable, since it was tinkering with the
LWLock machinery, a fairly global sort of change that you can well
imagine might break something. But it's hard to imagine how that
could possibly be the case here, especially given the speedups on
unlogged tables, because this patch is dead simple and entirely
isolated to GetSnapshotData(). So I have a new theory: on permanent
tables, *anything* that reduces ProcArrayLock contention causes an
approximately equal increase in WALInsertLock contention (or maybe
some other lock), and in some cases that increase in contention
elsewhere can cost more than the amount we're saving here.

On that theory, I'm inclined to think that's not really a problem.
We'll go nuts if we refuse to commit anything until it shows a
meaningful win on every imaginable workload, and it seems like this
can't really be worse than the status quo; any case where it is must
be some kind of artifact. We're better of getting rid of as much
ProcArrayLock contention as possible, rather than keeping it around
because there are corner cases where it decreases contention on some
other lock.

One small detail I'm noticing on further review is that I've slightly
changed the semantics here:

if (!TransactionIdIsNormal(xid)
|| NormalTransactionIdPrecedes(xmax, xid))
continue;

That really ought to be testing <=, not just <. That doesn't seem
like it should affect correctness, though: at worst, we unnecessarily
include one or more XIDs in the snapshot that will be ignored anyway.
I'll fix that and rerun the tests but I don't think it'll make any
difference.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
getsnapshotdata-tweaks.patch	application/octet-stream	2.7 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-16 13:34:52
Message-ID:	CA+TgmobxeA5g4j2GnpMq4ZsJCiiciQCLOoeO7kEhAa1ExmHobg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Dec 15, 2011 at 11:35 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> One small detail I'm noticing on further review is that I've slightly
> changed the semantics here:
>
> if (!TransactionIdIsNormal(xid)
> || NormalTransactionIdPrecedes(xmax, xid))
> continue;
>
> That really ought to be testing <=, not just <. That doesn't seem
> like it should affect correctness, though: at worst, we unnecessarily
> include one or more XIDs in the snapshot that will be ignored anyway.
> I'll fix that and rerun the tests but I don't think it'll make any
> difference.

New results with the attached, updated patch. As predicted, these are
quite similar to the last batch...

m01 tps = 618.460567 (including connections establishing)
s01 tps = 628.394270 (including connections establishing)
m08 tps = 4558.930585 (including connections establishing)
s08 tps = 4490.285074 (including connections establishing)
m16 tps = 7577.677079 (including connections establishing)
s16 tps = 7582.611380 (including connections establishing)
m24 tps = 11556.680518 (including connections establishing)
s24 tps = 11527.982307 (including connections establishing)
m32 tps = 10807.216084 (including connections establishing)
s32 tps = 10871.625992 (including connections establishing)
m80 tps = 10818.092314 (including connections establishing)
s80 tps = 10866.780660 (including connections establishing)

Unlogged Tables:

m01 tps = 670.328782 (including connections establishing)
s01 tps = 818.666971 (including connections establishing)
m08 tps = 4793.337235 (including connections establishing)
s08 tps = 4888.600945 (including connections establishing)
m16 tps = 8016.092785 (including connections establishing)
s16 tps = 8123.217451 (including connections establishing)
m24 tps = 14082.694567 (including connections establishing)
s24 tps = 14114.466246 (including connections establishing)
m32 tps = 17881.340576 (including connections establishing)
s32 tps = 18291.739818 (including connections establishing)
m80 tps = 12767.535657 (including connections establishing)
s80 tps = 17380.055381 (including connections establishing)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment	Content-Type	Size
getsnapshotdata-tweaks-v2.patch	application/octet-stream	2.7 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-17 01:25:51
Message-ID:	201112170125.pBH1PpC07675@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On that theory, I'm inclined to think that's not really a problem.
> We'll go nuts if we refuse to commit anything until it shows a
> meaningful win on every imaginable workload, and it seems like this
> can't really be worse than the status quo; any case where it is must
> be some kind of artifact. We're better of getting rid of as much
> ProcArrayLock contention as possible, rather than keeping it around
> because there are corner cases where it decreases contention on some
> other lock.

Interesting conclusion, and it makes sense. Seems once this is applied
we will have more places to look for contention improvements.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-17 02:54:29
Message-ID:	CA+TgmobpyrFFcuD2KWtwn+WqDUSpFeFLABjvdcLztvTiSqWOdA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Dec 16, 2011 at 8:25 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On that theory, I'm inclined to think that's not really a problem.
>> We'll go nuts if we refuse to commit anything until it shows a
>> meaningful win on every imaginable workload, and it seems like this
>> can't really be worse than the status quo; any case where it is must
>> be some kind of artifact. We're better of getting rid of as much
>> ProcArrayLock contention as possible, rather than keeping it around
>> because there are corner cases where it decreases contention on some
>> other lock.
>
> Interesting conclusion, and it makes sense. Seems once this is applied
> we will have more places to look for contention improvements.

Yeah. The performance results I posted the other day seem to show
that on some of these tests we're thrashing our CLOG buffers, and the
difference between unlogged tables and permanent tables seems to
indicate pretty clearly that WALInsertLock is a huge problem. I'm
going to look more at the CLOG stuff next week, and also keep poking
at ProcArrayLock, where I think there's still room for further
improvement. I am leaving WALInsertLock to Heikki for now, since (1)
I don't want to collide with what he's working on, (2) he knows more
about it than I do, anyway, and (3) it's a really hard problem and I
don't have any particularly good ideas about how to fix it. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Jim Nasby <jim(at)nasby(dot)net>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-17 06:00:30
Message-ID:	85C5CFD4-0395-409A-B372-2473BBE5E83C@nasby.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Dec 16, 2011, at 7:25 PM, Bruce Momjian wrote:
> Robert Haas wrote:
>> On that theory, I'm inclined to think that's not really a problem.
>> We'll go nuts if we refuse to commit anything until it shows a
>> meaningful win on every imaginable workload, and it seems like this
>> can't really be worse than the status quo; any case where it is must
>> be some kind of artifact. We're better of getting rid of as much
>> ProcArrayLock contention as possible, rather than keeping it around
>> because there are corner cases where it decreases contention on some
>> other lock.
>
> Interesting conclusion, and it makes sense. Seems once this is applied
> we will have more places to look for contention improvements.

I also wonder how much this throws some previous performance tests into suspicion. If it's not uncommon for performance improvement attempts to shift a bottleneck to a different part of the system and marginally hurt performance then we might be throwing away good performance improvement ideas before we should...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Jim Nasby <jim(at)nasby(dot)net>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Storing hot members of PGPROC out of the band
Date:	2011-12-17 14:00:39
Message-ID:	CA+TgmoZqb=bXhOT+u9WBpc-ncuV2fE6E=ugTe5tGUzPwZhWzyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Dec 17, 2011 at 1:00 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> I also wonder how much this throws some previous performance tests into suspicion. If it's not uncommon for performance improvement attempts to shift a bottleneck to a different part of the system and marginally hurt performance then we might be throwing away good performance improvement ideas before we should...

I think we are (mostly) OK on this point, at least as far as the work
I've been doing. We've actually had a few previous instances of this
phenomenon - e.g. when I first committed my fastlock patch,
performance actually got worse if you had >40 cores doing read-only
queries, because speeding up the lock manager made it possible for the
spinlock protection SInvalReadLock to mess things up royally.
Nevertheless, we got it committed - and fixed the SInvalReadLock
problem, too. This one is/was somewhat more subtle, but I'm feeling
pretty good about our chances of making at least some further progress
in time for 9.2.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company