Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature

Lists: pgsql-committerspgsql-hackers
From: Kevin Grittner <kgrittn(at)postgresql(dot)org>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Add the "snapshot too old" feature
Date: 2016-04-08 19:45:45
Message-ID: E1aocLh-0006C7-2W@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Add the "snapshot too old" feature

This feature is controlled by a new old_snapshot_threshold GUC. A
value of -1 disables the feature, and that is the default. The
value of 0 is just intended for testing. Above that it is the
number of minutes a snapshot can reach before pruning and vacuum
are allowed to remove dead tuples which the snapshot would
otherwise protect. The xmin associated with a transaction ID does
still protect dead tuples. A connection which is using an "old"
snapshot does not get an error unless it accesses a page modified
recently enough that it might not be able to produce accurate
results.

This is similar to the Oracle feature, and we use the same SQLSTATE
and error message for compatibility.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/848ef42bb8c7909c9d7baa38178d4a209906e7c1

Modified Files
--------------
contrib/bloom/blscan.c | 3 +-
doc/src/sgml/config.sgml | 50 +++
src/backend/access/brin/brin.c | 19 +-
src/backend/access/brin/brin_revmap.c | 11 +-
src/backend/access/gin/ginbtree.c | 9 +-
src/backend/access/gin/gindatapage.c | 7 +-
src/backend/access/gin/ginget.c | 22 +-
src/backend/access/gin/gininsert.c | 2 +-
src/backend/access/gist/gistget.c | 2 +-
src/backend/access/hash/hash.c | 3 +-
src/backend/access/hash/hashsearch.c | 10 +-
src/backend/access/heap/heapam.c | 31 +-
src/backend/access/heap/pruneheap.c | 11 +-
src/backend/access/nbtree/nbtinsert.c | 7 +-
src/backend/access/nbtree/nbtpage.c | 2 +-
src/backend/access/nbtree/nbtsearch.c | 51 ++-
src/backend/access/spgist/spgscan.c | 2 +-
src/backend/commands/vacuum.c | 3 +-
src/backend/commands/vacuumlazy.c | 3 +-
src/backend/storage/buffer/bufmgr.c | 40 ++
src/backend/storage/ipc/ipci.c | 3 +
src/backend/storage/ipc/procarray.c | 9 +
src/backend/storage/lmgr/lwlocknames.txt | 1 +
src/backend/utils/errcodes.txt | 4 +
src/backend/utils/misc/guc.c | 11 +
src/backend/utils/misc/postgresql.conf.sample | 2 +
src/backend/utils/time/snapmgr.c | 404 +++++++++++++++++++++
src/include/access/brin_revmap.h | 5 +-
src/include/access/gin_private.h | 4 +-
src/include/access/nbtree.h | 7 +-
src/include/storage/bufmgr.h | 19 +-
src/include/utils/rel.h | 1 +
src/include/utils/snapmgr.h | 13 +
src/include/utils/snapshot.h | 4 +
src/test/modules/Makefile | 1 +
src/test/modules/snapshot_too_old/Makefile | 47 +++
.../snapshot_too_old/expected/sto_using_cursor.out | 73 ++++
.../snapshot_too_old/expected/sto_using_select.out | 55 +++
.../snapshot_too_old/specs/sto_using_cursor.spec | 37 ++
.../snapshot_too_old/specs/sto_using_select.spec | 36 ++
src/test/modules/snapshot_too_old/sto.conf | 3 +
41 files changed, 942 insertions(+), 85 deletions(-)


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Cc: "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-08 21:52:19
Message-ID: CACjxUsP=-PHqPsDLY1t06NOc8g4L=eu_7D=mOMhkMsjCyy3cig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Apr 8, 2016 at 2:45 PM, Kevin Grittner <kgrittn(at)postgresql(dot)org> wrote:
> Add the "snapshot too old" feature

> src/test/modules/Makefile | 1 +
> src/test/modules/snapshot_too_old/Makefile | 47 +++
> .../snapshot_too_old/expected/sto_using_cursor.out | 73 ++++
> .../snapshot_too_old/expected/sto_using_select.out | 55 +++
> .../snapshot_too_old/specs/sto_using_cursor.spec | 37 ++
> .../snapshot_too_old/specs/sto_using_select.spec | 36 ++
> src/test/modules/snapshot_too_old/sto.conf | 3 +

I see that there are failures on buildfarm Windows machines.
Makefiles are not my strong suit, and I don't have access to a
Windows machine for builds. I'll start digging, but if someone can
help me with that, it would be much appreciated.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)postgresql(dot)org>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-08 21:53:35
Message-ID: 925.1460152415@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)postgresql(dot)org> writes:
> Add the "snapshot too old" feature

This has broken the MSVC build, evidently because you didn't
connect up the new test module properly.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-08 22:04:55
Message-ID: 5493.1460153095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)gmail(dot)com> writes:
> I see that there are failures on buildfarm Windows machines.
> Makefiles are not my strong suit, and I don't have access to a
> Windows machine for builds. I'll start digging, but if someone can
> help me with that, it would be much appreciated.

I think you need to add snapshot_too_old to @contrib_excludes in
Mkvcbuild.pm to stop that file from thinking that this module
contains anything to build. Not sure if anything else needs to be
done to get the MSVC script to run the test the module embodies,
but that would at least let it get further than it's getting now.

regards, tom lane


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-08 22:31:46
Message-ID: CACjxUsNfiDC2SE9VwsGWF3mLoLbCoDXZSWu2W9WgTsBLOQfWzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Fri, Apr 8, 2016 at 5:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)gmail(dot)com> writes:
>> I see that there are failures on buildfarm Windows machines.
>> Makefiles are not my strong suit, and I don't have access to a
>> Windows machine for builds. I'll start digging, but if someone can
>> help me with that, it would be much appreciated.
>
> I think you need to add snapshot_too_old to @contrib_excludes in
> Mkvcbuild.pm to stop that file from thinking that this module
> contains anything to build. Not sure if anything else needs to be
> done to get the MSVC script to run the test the module embodies,
> but that would at least let it get further than it's getting now.

Thanks! Pushed.

We'll see what happens now....

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-04-09 02:28:55
Message-ID: 20160409022855.GA728723@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner wrote:
> Add the "snapshot too old" feature
>
> This feature is controlled by a new old_snapshot_threshold GUC. A
> value of -1 disables the feature, and that is the default. The
> value of 0 is just intended for testing. Above that it is the
> number of minutes a snapshot can reach before pruning and vacuum
> are allowed to remove dead tuples which the snapshot would
> otherwise protect. The xmin associated with a transaction ID does
> still protect dead tuples. A connection which is using an "old"
> snapshot does not get an error unless it accesses a page modified
> recently enough that it might not be able to produce accurate
> results.

I think this formulation of TestForOldSnapshot as returning the Page it
checks is a bit strange; you seem to have done it that way only to be
able to write BufferGetPage in a reasonable manner. I vote for changing
both those macros into inline functions instead, pursuant to
https://www.postgresql.org/message-id/20160409020835.GA727750%40alvherre.pgsql
and have TestForOldSnapshot return void.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)postgresql(dot)org>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-10 15:06:42
Message-ID: 17585.1460300802@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner <kgrittn(at)postgresql(dot)org> writes:
> Add the "snapshot too old" feature

I am fairly certain that this:

typedef struct OldSnapshotControlData *OldSnapshotControl;

static volatile OldSnapshotControl oldSnapshotControl;

does not do what you intended. That causes the pointer variable
to be marked volatile, not what it points to. You need to write

static volatile OldSnapshotControlData *oldSnapshotControl;

to cause oldSnapshotControl to be understood as a pointer to
something volatile.

I think you could lose the OldSnapshotControl typedef altogether;
it's practically unused, it's confusingly similar to the name of
the pointer variable, and if anyplace did use it (eg to declare
a local-variable copy of oldSnapshotControl) it would be wrong
because of the same misplacement of the volatile property as here.

It's possible that you could instead fix this by putting the volatile
marker inside the typedef, but I'm not enough of a C language lawyer
to be sure that that works in the way you need; and since we don't
rely on such a thing anyplace else, I would be hesitant to start here.

regards, tom lane


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>
Cc: pgsql-committers(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-04-11 13:20:45
Message-ID: CAPpHfdtMONZFOXSsw1HkrD9Eb4ozF8Q8oCqH4tkpH_girJPPuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin,

This commit makes me very uneasy. I didn't much care about this patch
mainly because I didn't imagine its consequences. Now, I see following:

1) We uglify buffer manager interface a lot. This patch adds 3 more
arguments to BufferGetPage(). It looks weird. Should we try to find less
invasive way for doing this?
2) Did you ever try to examine performance regression? I tried simple read
only test on 4x18 Intel server.
pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
fits to shared_buffers)

master - 193 740.3 TPS
snapshot too old reverted - 1 065 732.6 TPS

So, for read-only benchmark this patch introduces more than 5 times
regression on big machine.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-04-11 13:27:30
Message-ID: 20160411132730.GA836503@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alexander Korotkov wrote:
> Kevin,
>
> This commit makes me very uneasy. I didn't much care about this patch
> mainly because I didn't imagine its consequences. Now, I see following:
>
> 1) We uglify buffer manager interface a lot. This patch adds 3 more
> arguments to BufferGetPage(). It looks weird. Should we try to find less
> invasive way for doing this?

Kevin's patch was much less invasive originally. It became invasive in
the course of later review -- there was fear that future code would
"forget" the snapshot check accidentally, which would have disastrous
effects (data becomes invisible without notice).

> 2) Did you ever try to examine performance regression? I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> fits to shared_buffers)
>
> master - 193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

Wow, this is terrible, but the patch is supposed to have no impact when
the feature is not in use. Maybe there's some simple oversight that can
be fixed.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-04-11 14:49:00
Message-ID: CAPpHfduQf6N03oaa=NQc3AR7BvrB-Xz2xo8Ss5uEGByy2qPWQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Apr 11, 2016 at 4:27 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
wrote:

> Alexander Korotkov wrote:
> > Kevin,
> >
> > This commit makes me very uneasy. I didn't much care about this patch
> > mainly because I didn't imagine its consequences. Now, I see following:
> >
> > 1) We uglify buffer manager interface a lot. This patch adds 3 more
> > arguments to BufferGetPage(). It looks weird. Should we try to find
> less
> > invasive way for doing this?
>
> Kevin's patch was much less invasive originally. It became invasive in
> the course of later review -- there was fear that future code would
> "forget" the snapshot check accidentally, which would have disastrous
> effects (data becomes invisible without notice).
>

OK, I will read that thread and try to verify these thoughts.

> > 2) Did you ever try to examine performance regression? I tried simple
> read
> > only test on 4x18 Intel server.
> > pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data
> > fits to shared_buffers)
> >
> > master - 193 740.3 TPS
> > snapshot too old reverted - 1 065 732.6 TPS
> >
> > So, for read-only benchmark this patch introduces more than 5 times
> > regression on big machine.
>
> Wow, this is terrible, but the patch is supposed to have no impact when
> the feature is not in use. Maybe there's some simple oversight that can
> be fixed.

Perf show that 50% of time is spent in s_lock() called from
GetXLogInsertRecPtr() called from GetSnapshotData(). I think at very least
we should at least surround with "if" checking that "snapshot too old" is
enabled.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-11 17:31:09
Message-ID: CACjxUsMwh980686LfmqGOGv58XsTpyUqqHPcsD3NGF5CrnXs+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov
<a(dot)korotkov(at)postgrespro(dot)ru> wrote:

> 1) We uglify buffer manager interface a lot. This patch adds 3 more
> arguments to BufferGetPage(). It looks weird. Should we try to find less
> invasive way for doing this?

As already pointed out, I originally touched about 450 fewer places
in the code, and did not change the signature of BufferGetPage().
I was torn on the argument that we needed a "forced choice" on
checking the snapshot age built into BufferGetPage() -- it is a bit
annoying, but it might prevent a later bug which could silently
return incorrect data. In the end, I caved to the argument that
the annoyance was worth the improved chances of avoiding such a
bug.

> 2) Did you ever try to examine performance regression?

Yes. Our customer used big machines for extensive performance
testing -- although they used "paced" input to simulate real users,
and saw nothing but gains. On my own i7 box, your test scaled to
100 (so it would fit in memory) yielded this:

unpatched:

number of transactions actually processed: 40534737
latency average = 0.739 ms
latency stddev = 0.359 ms
tps = 134973.430374 (including connections establishing)
tps = 135039.395041 (excluding connections establishing)

with the "snapshot too old" patch:

number of transactions actually processed: 40679654
latency average = 0.735 ms
latency stddev = 0.486 ms
tps = 135463.614244 (including connections establishing)
tps = 135866.160933 (excluding connections establishing)

> I tried simple read
> only test on 4x18 Intel server.
> pgbench -s 1000 -c 140 -j 100 -M prepared -S -T 300 -P 1 postgres (data fits
> to shared_buffers)
>
> master - 193 740.3 TPS
> snapshot too old reverted - 1 065 732.6 TPS
>
> So, for read-only benchmark this patch introduces more than 5 times
> regression on big machine.

I did not schedule a saturation test on a big machine. I guess I
should have done.

I'm confident this can be fixed; your suggestion for a wrapping
"if" is probably sufficient. I am looking at this and the misuse
of "volatile" now.

Are you able to easily test that or should I book time on one (or
more) of our big machines on my end?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-12 18:57:26
Message-ID: CACjxUsMZtM-qXZCfGovNYX95qxLRGq5sYPNaOPqft7iTxjG1sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, Apr 11, 2016 at 12:31 PM, Kevin Grittner <kgrittn(at)gmail(dot)com> wrote:
> On Mon, Apr 11, 2016 at 8:20 AM, Alexander Korotkov

>> So, for read-only benchmark this patch introduces more than 5 times
>> regression on big machine.
>
> I did not schedule a saturation test on a big machine. I guess I
> should have done.

I have booked two large NUMA machines for Friday to look for any
remaining performance regressions on high-concurrency loads on such
machines. Anyone with ideas on particular tests that they feel
should be included, please let me know before then.

Thanks!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, Kevin Grittner <kgrittn(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Robert Haas <robert(dot)haas(at)enterprisedb(dot)com>, "pgsql-committers(at)postgresql(dot)org" <pgsql-committers(at)postgresql(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pgsql: Add the "snapshot too old" feature
Date: 2016-04-12 18:59:06
Message-ID: 20160412185906.gx6qosdnag2efhgt@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On 2016-04-12 13:57:26 -0500, Kevin Grittner wrote:
> I have booked two large NUMA machines for Friday to look for any
> remaining performance regressions on high-concurrency loads on such
> machines. Anyone with ideas on particular tests that they feel
> should be included, please let me know before then.

The worst case is probably statements which are utterly trivial to
execute, but do require a snapshot both during planning and
execution. Like e.g. SELECT 1;. That's probably a good thing to get
started with.

Greetings,

Andres Freund


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-08-25 19:56:09
Message-ID: 20160825195609.GA273578@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner wrote:
> Add the "snapshot too old" feature

> src/backend/access/gin/ginbtree.c | 9 +-
> src/backend/access/gin/gindatapage.c | 7 +-
> src/backend/access/gin/ginget.c | 22 +-
> src/backend/access/gin/gininsert.c | 2 +-

I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
Why do we apply it to the metapage buffer (line 1717 in master)?
Shouldn't we apply it to the pending-list pages themselves only, if any?
(If there are no pending-list pages, surely the age of the snapshot used
to read the metapage doesn't matter; and if there are, then the age of
the pending-list pages will fail the test.)

FWIW I like the "revert" commit, because it easily shows me in what
places you considered a snapshot-too-old test and decided not to add
one. Bare BufferGetPage calls (the current situation) don't indicate that.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-08-25 20:19:25
Message-ID: CACjxUsN-EXf4u5=76Co1kto-KG=mOj63=taO7ioo4qQ6jBkwew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Kevin Grittner wrote:
>> Add the "snapshot too old" feature
>
>> src/backend/access/gin/ginbtree.c | 9 +-
>> src/backend/access/gin/gindatapage.c | 7 +-
>> src/backend/access/gin/ginget.c | 22 +-
>> src/backend/access/gin/gininsert.c | 2 +-
>
> I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> Why do we apply it to the metapage buffer (line 1717 in master)?
> Shouldn't we apply it to the pending-list pages themselves only, if any?
> (If there are no pending-list pages, surely the age of the snapshot used
> to read the metapage doesn't matter; and if there are, then the age of
> the pending-list pages will fail the test.)
>
>
> FWIW I like the "revert" commit, because it easily shows me in what
> places you considered a snapshot-too-old test and decided not to add
> one. Bare BufferGetPage calls (the current situation) don't indicate that.

What about the state after pending-list entries are applied? Would
the change you suggest still run across a page with an appropriate
LSN?

I will go review what I did in the gin AM, but wanted to put that
question out there first...

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)postgresql(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-08-25 21:38:38
Message-ID: CACjxUsMTNGi_SfeFbxNy+GneyJfA7VZNaTqxdSox2dGnpOJvcw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:

> I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> Why do we apply it to the metapage buffer (line 1717 in master)?

If there is any chance that GinPageGetMeta(page)->head could have
changed from a valid block number to InvalidBlockNumber or a
different pending-list page due to a vacuum freeing pages from the
pending-list, the metapage must be checked -- there is no other way
to detect that data might have disappeared.

> Shouldn't we apply it to the pending-list pages themselves only, if any?

If the pending-list pages are never freed, we could drop the
metapage test.

> (If there are no pending-list pages, surely the age of the snapshot used
> to read the metapage doesn't matter; and if there are, then the age of
> the pending-list pages will fail the test.)

True as long as no pending-list page is ever removed from the
pending-list. If we take out the test, it might be worth a comment
explaining why it's not needed and that it will be needed if
someone modifies the AM to recycle empty pages from the list for
reuse.

> FWIW I like the "revert" commit, because it easily shows me in what
> places you considered a snapshot-too-old test and decided not to add
> one. Bare BufferGetPage calls (the current situation) don't indicate that.

I'm glad there is some value from having done that little dance. :-)

Thanks for looking this over so closely!

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-08-25 21:51:45
Message-ID: 20160825215145.GA290110@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Kevin Grittner wrote:
> On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>
> > I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
> > Why do we apply it to the metapage buffer (line 1717 in master)?
>
> If there is any chance that GinPageGetMeta(page)->head could have
> changed from a valid block number to InvalidBlockNumber or a
> different pending-list page due to a vacuum freeing pages from the
> pending-list, the metapage must be checked -- there is no other way
> to detect that data might have disappeared.

Hmm ... but the disappearing data is moved to regular GIN pages, right?
It doesn't just go away. I suppose that the error would be raised as
soon as we scan a GIN data page that, because of receiving data from the
pending list, has a newer LSN. I don't know GIN in detail but perhaps
it's possible that the data is inserted into data pages in lsn A, then
removed from the pending list in lsn B (where A < B). If the snapshot's
LSN lies in between, a spurious error would be raised.

> > FWIW I like the "revert" commit, because it easily shows me in what
> > places you considered a snapshot-too-old test and decided not to add
> > one. Bare BufferGetPage calls (the current situation) don't indicate that.
>
> I'm glad there is some value from having done that little dance. :-)

I'm sure that was an annoying thing to do, so I agree.

> Thanks for looking this over so closely!

You're welcome. I'm not actually reviewing this patch specifically,
just trying to figure out a different new feature and reading a few AMs
code while at it.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Kevin Grittner <kgrittn(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Add the "snapshot too old" feature
Date: 2016-08-26 12:34:32
Message-ID: CACjxUsPgcxoZYjQPOkj1sSDWwP4Q630=6hY2RL8fz1TVuOY07w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Thu, Aug 25, 2016 at 4:51 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Kevin Grittner wrote:
>> On Thu, Aug 25, 2016 at 2:56 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>>
>> > I'm wondering about the TestForOldSnapshot call in scanPendingInsert().
>> > Why do we apply it to the metapage buffer (line 1717 in master)?
>>
>> If there is any chance that GinPageGetMeta(page)->head could have
>> changed from a valid block number to InvalidBlockNumber or a
>> different pending-list page due to a vacuum freeing pages from the
>> pending-list, the metapage must be checked -- there is no other way
>> to detect that data might have disappeared.
>
> Hmm ... but the disappearing data is moved to regular GIN pages, right?
> It doesn't just go away. I suppose that the error would be raised as
> soon as we scan a GIN data page that, because of receiving data from the
> pending list, has a newer LSN.

That would cover things as long as data is always moved from the
pending list to a data page before it is vacuumed away. If that is
true, there is no need to check the metapage *or* the pending list
-- but I'm pretty skeptical that this is the case. The real
question is whether pages are ever removed from the pending list.

> I don't know GIN in detail but perhaps
> it's possible that the data is inserted into data pages in lsn A, then
> removed from the pending list in lsn B (where A < B). If the snapshot's
> LSN lies in between, a spurious error would be raised.

The implementation does allow false positives and slightly less
aggressive early cleanup than could be achieved -- in order to
avoid the extra locking that would be needed to achieve higher
precision. Since I expect that the threshold will usually be set
to at least a couple hours, these effects should have minimal
impact.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company