Re: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records

Lists: pgsql-committerspgsql-hackers
From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2010-12-10 07:00:42
Message-ID: E1PQwyA-0006L7-EO@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Reduce spurious Hot Standby conflicts from never-visible records.
Hot Standby conflicts only with tuples that were visible at
some point. So ignore tuples from aborted transactions or for
tuples updated/deleted during the inserting transaction when
generating the conflict transaction ids.

Following detailed analysis and test case by Noah Misch.
Original report covered btree delete records, correctly observed
by Heikki Linnakangas that this applies to other cases also.
Fix covers all sources of cleanup records via common code.
Includes additional fix compared to commit on HEAD

Branch
------
REL9_0_STABLE

Details
-------
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a804a23e7af0e075b88e7b03bfd9b0f22d2657b2

Modified Files
--------------
src/backend/access/heap/heapam.c | 31 +++++++++++++++++++++++--------
src/backend/access/heap/pruneheap.c | 1 -
src/backend/access/nbtree/nbtxlog.c | 19 +++++--------------
3 files changed, 28 insertions(+), 23 deletions(-)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-committers(at)postgresql(dot)org
Subject: Re: pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2010-12-10 19:21:04
Message-ID: 27713.1292008864@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> Reduce spurious Hot Standby conflicts from never-visible records.
> Hot Standby conflicts only with tuples that were visible at
> some point. So ignore tuples from aborted transactions or for
> tuples updated/deleted during the inserting transaction when
> generating the conflict transaction ids.

> Following detailed analysis and test case by Noah Misch.
> Original report covered btree delete records, correctly observed
> by Heikki Linnakangas that this applies to other cases also.
> Fix covers all sources of cleanup records via common code.
> Includes additional fix compared to commit on HEAD

ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
in that it's examining xmax without having checked that xmax is (a)
valid or (b) a lock rather than a deletion xmax.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2010-12-11 21:03:23
Message-ID: 4D03E71B.9080801@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

(Moving to pgsql-hackers)

On 10.12.2010 20:21, Tom Lane wrote:
> Simon Riggs<simon(at)2ndQuadrant(dot)com> writes:
>> Reduce spurious Hot Standby conflicts from never-visible records.
>> Hot Standby conflicts only with tuples that were visible at
>> some point. So ignore tuples from aborted transactions or for
>> tuples updated/deleted during the inserting transaction when
>> generating the conflict transaction ids.
>
>> Following detailed analysis and test case by Noah Misch.
>> Original report covered btree delete records, correctly observed
>> by Heikki Linnakangas that this applies to other cases also.
>> Fix covers all sources of cleanup records via common code.
>> Includes additional fix compared to commit on HEAD
>
> ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
> in that it's examining xmax without having checked that xmax is (a)
> valid or (b) a lock rather than a deletion xmax.

In current use, it's only called for tuples that are known to be dead,
so either xmax is a valid deletion, or xmin didn't commit in which case
the function doesn't use xmax for anything. So I think it actually works
as it is.

I agree it doesn't look right, though. At the very least it needs
comments explaining that, but preferably it should do something sane
when faced with a tuple that's not dead after all. Perhaps throw an
error (though that would be bad during recovery), or an Assert, or just
refrain from advancing latestRemovedXid (or advance it, that would be
the conservative stance given the current use).

Also, I'm not totally convinced it's correct when xmin > xmax, despite
Simon's follow-up commit to fix that. Shouldn't it advance
latestRemovedXid to xmin in that case? Or maybe it's ok as it is because
we know that xmax committed after xmin. The impression I get from the
comment above the function now is that it advances latestRemovedXid to
the highest XID present in the tuple, but that's not what it does in the
xmin > xmax case. That comment needs clarification.

While we're at it, perhaps it would be better to move this function to
tqual.c. And I feel that a more natural interface would be something like:

TransactionId
HeapTupleHeaderGetLatestRemovedXid(HeapTupleHeader tuple);

IOW, instead bumping up the passed-in latestRemovedXid value, return the
highest XID on the tuple (if it was dead).

PS. it would be good to set hint bits in that function like in
HeapTupleSatisfies* functions.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2010-12-12 10:15:40
Message-ID: 1292148940.2737.1530.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sat, 2010-12-11 at 22:03 +0100, Heikki Linnakangas wrote:
> (Moving to pgsql-hackers)
>
> On 10.12.2010 20:21, Tom Lane wrote:
> > Simon Riggs<simon(at)2ndQuadrant(dot)com> writes:
> >> Reduce spurious Hot Standby conflicts from never-visible records.
> >> Hot Standby conflicts only with tuples that were visible at
> >> some point. So ignore tuples from aborted transactions or for
> >> tuples updated/deleted during the inserting transaction when
> >> generating the conflict transaction ids.
> >
> >> Following detailed analysis and test case by Noah Misch.
> >> Original report covered btree delete records, correctly observed
> >> by Heikki Linnakangas that this applies to other cases also.
> >> Fix covers all sources of cleanup records via common code.
> >> Includes additional fix compared to commit on HEAD
> >
> > ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
> > in that it's examining xmax without having checked that xmax is (a)
> > valid or (b) a lock rather than a deletion xmax.
>
> In current use, it's only called for tuples that are known to be dead,
> so either xmax is a valid deletion, or xmin didn't commit in which case
> the function doesn't use xmax for anything. So I think it actually works
> as it is.

Well, I think you're both right.

The function shouldn't be called in places where xmax is the wrong
flavour, but there should be specific safeguards in case of mistake.

> I agree it doesn't look right, though. At the very least it needs
> comments explaining that, but preferably it should do something sane
> when faced with a tuple that's not dead after all. Perhaps throw an
> error (though that would be bad during recovery), or an Assert, or just
> refrain from advancing latestRemovedXid (or advance it, that would be
> the conservative stance given the current use).

Yes

> Also, I'm not totally convinced it's correct when xmin > xmax, despite
> Simon's follow-up commit to fix that. Shouldn't it advance
> latestRemovedXid to xmin in that case? Or maybe it's ok as it is because
> we know that xmax committed after xmin. The impression I get from the
> comment above the function now is that it advances latestRemovedXid to
> the highest XID present in the tuple, but that's not what it does in the
> xmin > xmax case. That comment needs clarification.

Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
what I have now is better, but your point is there may be an even better
truth. I'll think on that a little more.

> While we're at it, perhaps it would be better to move this function to
> tqual.c. And I feel that a more natural interface would be something like:
>
> TransactionId
> HeapTupleHeaderGetLatestRemovedXid(HeapTupleHeader tuple);
>
> IOW, instead bumping up the passed-in latestRemovedXid value, return the
> highest XID on the tuple (if it was dead).
>
> PS. it would be good to set hint bits in that function like in
> HeapTupleSatisfies* functions.

I'm not that happy with refactoring inside a release, plus I'm not even
sure if that is the right way.

I suspect the best way would be to do this as a side-effect of
HeapSatisfiesVacuum(), since this processing should only ever be done in
conjunction with that function.

Will respond later today on those thoughts.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2010-12-14 00:28:26
Message-ID: 1292286507.2737.3574.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, 2010-12-12 at 10:15 +0000, Simon Riggs wrote:
> > Also, I'm not totally convinced it's correct when xmin > xmax,
> despite
> > Simon's follow-up commit to fix that. Shouldn't it advance
> > latestRemovedXid to xmin in that case? Or maybe it's ok as it is
> because
> > we know that xmax committed after xmin. The impression I get from
> the comment above the function now is that it advances
> latestRemovedXid to
> > the highest XID present in the tuple, but that's not what it does in
> the xmin > xmax case. That comment needs clarification.
>
> Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
> what I have now is better, but your point is there may be an even
> better
> truth. I'll think on that a little more.

This has a stranger answer than I was expecting.

HeapTupleSatisfiesVacuum() shows there is no interaction between the
xmin of a tuple and the point at which it is removed, if the xmin
transaction commits.

So the hot standby conflict depends only upon the xmax, meaning that
xmin > xmax is not a problem.

So no immediate change to the code is warranted, on that specific point.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2011-01-04 04:13:04
Message-ID: AANLkTi=D+m36jh9oApnqc6P2U64ibP92s_cDHk_A2vY7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Sun, Dec 12, 2010 at 5:15 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Sat, 2010-12-11 at 22:03 +0100, Heikki Linnakangas wrote:
>> (Moving to pgsql-hackers)
>>
>> On 10.12.2010 20:21, Tom Lane wrote:
>> > Simon Riggs<simon(at)2ndQuadrant(dot)com>  writes:
>> >> Reduce spurious Hot Standby conflicts from never-visible records.
>> >> Hot Standby conflicts only with tuples that were visible at
>> >> some point. So ignore tuples from aborted transactions or for
>> >> tuples updated/deleted during the inserting transaction when
>> >> generating the conflict transaction ids.
>> >
>> >> Following detailed analysis and test case by Noah Misch.
>> >> Original report covered btree delete records, correctly observed
>> >> by Heikki Linnakangas that this applies to other cases also.
>> >> Fix covers all sources of cleanup records via common code.
>> >> Includes additional fix compared to commit on HEAD
>> >
>> > ISTM HeapTupleHeaderAdvanceLatestRemovedXid is still pretty broken,
>> > in that it's examining xmax without having checked that xmax is (a)
>> > valid or (b) a lock rather than a deletion xmax.
>>
>> In current use, it's only called for tuples that are known to be dead,
>> so either xmax is a valid deletion, or xmin didn't commit in which case
>> the function doesn't use xmax for anything. So I think it actually works
>> as it is.
>
> Well, I think you're both right.
>
> The function shouldn't be called in places where xmax is the wrong
> flavour, but there should be specific safeguards in case of mistake.

Should we do something about this? An Assert(), maybe?

>> Also, I'm not totally convinced it's correct when xmin > xmax, despite
>> Simon's follow-up commit to fix that. Shouldn't it advance
>> latestRemovedXid to xmin in that case? Or maybe it's ok as it is because
>> we know that xmax committed after xmin. The impression I get from the
>> comment above the function now is that it advances latestRemovedXid to
>> the highest XID present in the tuple, but that's not what it does in the
>> xmin > xmax case. That comment needs clarification.
>
> Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
> what I have now is better, but your point is there may be an even better
> truth. I'll think on that a little more.

I guess the problem case here is something like:

1. T1 begins. T1 writes a tuple A (so it gets an XID).
2. T2 begins. T2 writes a tuple B (so it gets a later XID).
3. T1 takes a new snapshot that can see B and deletes B.
4. T2 commits.
5. T1 commits.

At this point we have a tuple B with XMAX (T1's XID) < XMIN (T2's
XID). Now, on the standby, there can be a transaction TS which takes
a snapshot that can see T2's XID but not T1's XID. While that
transaction is still running, VACUUM (or a HOT prune) comes along and
zaps B, and this record is replayed on the standby, advancing
latestRemovedXID to T1's XID, when in fact we also removed T2's later
XID. This means we MUST kill TS (at least if it tries to read that
block) because otherwise it'll fail to see B and return the wrong
answer. Will we actually kill TS?

GetConflictingVirtualXIDs() looks for transactions where proc->xmin
precedes limitXmin, and if TS has a snapshot that can't see T1's XID
then its proc->xmin might be exactly equal to limitXmin =
latestRemovedXID = T1s XID, causing it to not get killed.

I think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2011-01-05 20:00:45
Message-ID: 1294257645.19612.14477.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Mon, 2011-01-03 at 23:13 -0500, Robert Haas wrote:

> > Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
> > what I have now is better, but your point is there may be an even better
> > truth. I'll think on that a little more.

I remember that I thought some more on this and decided that I couldn't
see a problem. I also see I didn't update the list to say that.

> I guess the problem case here is something like:
>
> 1. T1 begins. T1 writes a tuple A (so it gets an XID).
> 2. T2 begins. T2 writes a tuple B (so it gets a later XID).
> 3. T1 takes a new snapshot that can see B and deletes B.
> 4. T2 commits.
> 5. T1 commits.

How is step (3) possible before step (4)?

There are later errors in your example also.

It's possible that that's all wrong because of how my head is feeling,
if so please explain long hand so I don't have to think too much.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2011-01-05 20:06:31
Message-ID: AANLkTinf+pgZ5reD3dvq7CmNMiiq_xpb3hf=jOneORiK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jan 5, 2011 at 3:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, 2011-01-03 at 23:13 -0500, Robert Haas wrote:
>
>> > Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
>> > what I have now is better, but your point is there may be an even better
>> > truth. I'll think on that a little more.
>
> I remember that I thought some more on this and decided that I couldn't
> see a problem. I also see I didn't update the list to say that.
>
>> I guess the problem case here is something like:
>>
>> 1. T1 begins.  T1 writes a tuple A (so it gets an XID).
>> 2. T2 begins.  T2 writes a tuple B (so it gets a later XID).
>> 3. T1 takes a new snapshot that can see B and deletes B.
>> 4. T2 commits.
>> 5. T1 commits.
>
> How is step (3) possible before step (4)?

At read committed isolation level, which is the default, we take a new
snapshot after every command.

> There are later errors in your example also.

Well, point them out and let's discuss.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: [COMMITTERS] pgsql: Reduce spurious Hot Standby conflicts from never-visible records
Date: 2011-01-05 20:08:13
Message-ID: AANLkTimKVVeH3Lu7Ar=mG4e+KzSvuxFeuRpn1UJu84fh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jan 5, 2011 at 3:06 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jan 5, 2011 at 3:00 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On Mon, 2011-01-03 at 23:13 -0500, Robert Haas wrote:
>>
>>> > Hmmm, my earlier code took xmax only if xmax > xmin. That was wrong;
>>> > what I have now is better, but your point is there may be an even better
>>> > truth. I'll think on that a little more.
>>
>> I remember that I thought some more on this and decided that I couldn't
>> see a problem. I also see I didn't update the list to say that.
>>
>>> I guess the problem case here is something like:
>>>
>>> 1. T1 begins.  T1 writes a tuple A (so it gets an XID).
>>> 2. T2 begins.  T2 writes a tuple B (so it gets a later XID).
>>> 3. T1 takes a new snapshot that can see B and deletes B.
>>> 4. T2 commits.
>>> 5. T1 commits.
>>
>> How is step (3) possible before step (4)?
>
> At read committed isolation level, which is the default, we take a new
> snapshot after every command.

Oh, I'm a dork. You're saying T2 hasn't committed yet. Let me think
about this some more...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company