Quick Links

Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple

Lists:	pgsql-committerspgsql-hackers

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	pgsql-committers(at)postgresql(dot)org
Subject:	pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 14:47:53
Message-ID:	E1dxa6T-00005i-4X@gemulon.postgresql.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Fix freezing of a dead HOT-updated tuple

Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
concurrent transaction commit/abort may turn DEAD some of the HOT tuples
that survived the prune, before HeapTupleSatisfiesVacuum tests them.
This happens to activate the code that decides to freeze the tuple ...
which resuscitates it, duplicating data.

(This is especially bad if there's any unique constraints, because those
are now internally violated due to the duplicate entries, though you
won't know until you try to REINDEX or dump/restore the table.)

One possible fix would be to simply skip doing anything to the tuple,
and hope that the next HOT prune would remove it. But there is a
problem: if the tuple is older than freeze horizon, this would leave an
unfrozen XID behind, and if no HOT prune happens to clean it up before
the containing pg_clog segment is truncated away, it'd later cause an
error when the XID is looked up.

Fix the problem by having the tuple freezing routines cope with the
situation: don't freeze the tuple (and keep it dead). In the cases that
the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
so that there is no need to look up the XID in pg_clog later on.

An isolation test is included, authored by Michael Paquier, loosely
based on Daniel Wood's original reproducer. It only tests one
particular scenario, though, not all the possible ways for this problem
to surface; it be good to have a more reliable way to test this more
fully, but it'd require more work.
In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql
I outlined another test case (more closely matching Dan Wood's) that
exposed a few more ways for the problem to occur.

Backpatch all the way back to 9.3, where this problem was introduced by
multixact juggling. In branches 9.3 and 9.4, this includes a backpatch
of commit e5ff9fefcd50 (of 9.5 era), since the original is not
correctable without matching the coding pattern in 9.5 up.

Reported-by: Daniel Wood
Diagnosed-by: Daniel Wood
Reviewed-by: Yi Wen Wong, Michaël Paquier
Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/20b655224249e6d2daf7ef0595995228baddb381

Modified Files
--------------
src/backend/access/heap/heapam.c | 57 +++++++++----
src/backend/commands/vacuumlazy.c | 20 ++---
src/test/isolation/expected/freeze-the-dead.out | 101 ++++++++++++++++++++++++
src/test/isolation/isolation_schedule | 1 +
src/test/isolation/specs/freeze-the-dead.spec | 27 +++++++
5 files changed, 179 insertions(+), 27 deletions(-)

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 18:34:35
Message-ID:	CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Fix freezing of a dead HOT-updated tuple

If I run Dan Wood's test case again, the obvious symptom (spurious
duplicates) goes away. However, the enhanced amcheck, and thus CREATE
INDEX/REINDEX, still isn't happy about this:

postgres=# select bt_index_check('t_pkey', true);
DEBUG: 00000: verifying presence of required tuples in index "t_pkey"
LOCATION: bt_check_every_level, verify_nbtree.c:424
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,6) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597
Time: 3.699 ms

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 20:07:09
Message-ID:	20170928200709.badpkwtaxafc4von@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:
> On Thu, Sep 28, 2017 at 7:47 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Fix freezing of a dead HOT-updated tuple
>
> If I run Dan Wood's test case again, the obvious symptom (spurious
> duplicates) goes away. However, the enhanced amcheck, and thus CREATE
> INDEX/REINDEX, still isn't happy about this:
>
> postgres=# select bt_index_check('t_pkey', true);
> DEBUG: 00000: verifying presence of required tuples in index "t_pkey"
> LOCATION: bt_check_every_level, verify_nbtree.c:424
> ERROR: XX000: failed to find parent tuple for heap-only tuple at
> (0,6) in table "t"
> LOCATION: IndexBuildHeapRangeScan, index.c:2597
> Time: 3.699 ms

... Rats, I obviously missed the message where you said that amcheck
detected this problem.

Odd that it's not fixed. I guess there's still some more work to do
here ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 20:20:30
Message-ID:	20170928202030.devtjtv24x7twfkh@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Alvaro Herrera wrote:

> Odd that it's not fixed. I guess there's still some more work to do
> here ...

Maybe what this means is that we need to do both Dan's initially
proposed patch (or something related to it) apart from the fixes already
pushed. IOW we need to put back some of the "tupkeep" business ...

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 21:15:15
Message-ID:	CAH2-Wzmscv3DYDtOLjOSrzv9khMr4LR=UFGeFvOWBMG-U1DpyQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Alvaro Herrera wrote:
>
>> Odd that it's not fixed. I guess there's still some more work to do
>> here ...
>
> Maybe what this means is that we need to do both Dan's initially
> proposed patch (or something related to it) apart from the fixes already
> pushed. IOW we need to put back some of the "tupkeep" business ...

We certainly do still see wrong answers to queries here:

postgres=# select ctid, xmin, xmax, * from t;
ctid | xmin | xmax | id | name | x
-------+-------+------+----+------+---
(0,1) | 21171 | 0 | 1 | 111 | 0
(0,7) | 21177 | 0 | 3 | 333 | 5
(2 rows)

postgres=# select * from t where id = 3;
id | name | x
----+------+---
3 | 333 | 5
(1 row)

postgres=# set enable_seqscan = off;
SET
postgres=# select * from t where id = 3;
id | name | x
----+------+---
(0 rows)

FWIW, I am reminded a little bit of the MultiXact/recovery bug I
reported way back in February of 2014 [1], which also had a HOT
interaction that caused index scans to give wrong answers, despite
more or less structurally sound indexes.

[1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 21:39:59
Message-ID:	20170928213959.k5jtluoagz5riwx3@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:

> We certainly do still see wrong answers to queries here:
>
> postgres=# select ctid, xmin, xmax, * from t;
> ctid | xmin | xmax | id | name | x
> -------+-------+------+----+------+---
> (0,1) | 21171 | 0 | 1 | 111 | 0
> (0,7) | 21177 | 0 | 3 | 333 | 5
> (2 rows)
>
> postgres=# select * from t where id = 3;
> id | name | x
> ----+------+---
> 3 | 333 | 5
> (1 row)
>
> postgres=# set enable_seqscan = off;
> SET
> postgres=# select * from t where id = 3;
> id | name | x
> ----+------+---
> (0 rows)

Yeah, oops.

> FWIW, I am reminded a little bit of the MultiXact/recovery bug I
> reported way back in February of 2014 [1], which also had a HOT
> interaction that caused index scans to give wrong answers, despite
> more or less structurally sound indexes.
>
> [1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com

Thanks for the reference. I didn't remember this problem and it's not
(wasn't) in my list of things to look into. Perhaps these are both the
same bug.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 21:47:30
Message-ID:	CAH2-Wzne7UAzyng6zQ63=KYtYZc0J30_V4vgn5jt9FtECs9Lqw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 2:39 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> FWIW, I am reminded a little bit of the MultiXact/recovery bug I
>> reported way back in February of 2014 [1], which also had a HOT
>> interaction that caused index scans to give wrong answers, despite
>> more or less structurally sound indexes.
>>
>> [1] https://www.postgresql.org/message-id/CAM3SWZTMQiCi5PV5OWHb+bYkUcnCk=O67w0cSswPvV7XfUcU5g@mail.gmail.com
>
> Thanks for the reference. I didn't remember this problem and it's not
> (wasn't) in my list of things to look into. Perhaps these are both the
> same bug.

I was reminded of that old bug because initially, at the time, it
looked very much like a corrupt index: sequential scans were fine, but
index scans gave wrong answers. This is what I saw today.

In the end, commit 6bfa88a fixed that old recovery bug by making sure
the recovery routine heap_xlog_lock() did the right thing. In both
cases (Feb 2014 and today), the index wasn't really corrupt -- it just
pointed to the root of a HOT chain when it should point to some child
tuple (or maybe a successor HOT chain).

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 22:09:38
Message-ID:	CAH2-WznySyu-yKDGcKxZsQ5asQDK-=b98J2Cvi1eNaSxcP0NiA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 2:47 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> In the end, commit 6bfa88a fixed that old recovery bug by making sure
> the recovery routine heap_xlog_lock() did the right thing. In both
> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
> pointed to the root of a HOT chain when it should point to some child
> tuple (or maybe a successor HOT chain).

Just to be clear, obviously I don't mean that the index should have
been magically updated to compensate for that 2014 heap_lock_tuple()
recovery bug (say, via an in-place IndexTuple update during recovery).
Certainly, the index AM was entitled to assume that the heap TID it
pointed to would continue to be the root of the same HOT chain, at
least until the next VACUUM. That was what I meant by "the index
wasn't actually corrupt".

All I meant here is that the index scan and sequential scan would have
been in agreement if something like that *had* happened artificially
(say, when the index tuple was edited with a hex editor). And, that
the actual high level problem was that we failed to treat a
HOT-updated tuple as a HOT-updated tuple, leading to consequences that
were mostly noticeable during index scans. Probably on both occasions
(back in 2014, and now).

--
Peter Geoghegan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 22:20:51
Message-ID:	CA+TgmoZ6r6480qzit4+q3fGmTSy1UHYA8o8RTn46iOxsY6UkwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> In the end, commit 6bfa88a fixed that old recovery bug by making sure
> the recovery routine heap_xlog_lock() did the right thing. In both
> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
> pointed to the root of a HOT chain when it should point to some child
> tuple (or maybe a successor HOT chain).

Unless I'm very confused, it's really not OK to point at a child tuple
rather than the root of the HOT chain.

Pointing to a successor HOT chain would be OK, as long as you point to
the root tuple thereof.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-28 22:24:47
Message-ID:	CAH2-Wzn5U+jE1yuQO9e1o+m=VkO47iEQWx042ra0WyLehtu+Pg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 3:20 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Sep 28, 2017 at 5:47 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> In the end, commit 6bfa88a fixed that old recovery bug by making sure
>> the recovery routine heap_xlog_lock() did the right thing. In both
>> cases (Feb 2014 and today), the index wasn't really corrupt -- it just
>> pointed to the root of a HOT chain when it should point to some child
>> tuple (or maybe a successor HOT chain).
>
> Unless I'm very confused, it's really not OK to point at a child tuple
> rather than the root of the HOT chain.

Please see my later clarification.

--
Peter Geoghegan

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-09-29 06:04:12
Message-ID:	CAB7nPqSeFiDcWsrGNbeDx=AxCibRYfiOH4Ag+DQtmhQL=SVRog@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Sep 29, 2017 at 6:39 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Peter Geoghegan wrote:
>
>> We certainly do still see wrong answers to queries here:
>>
>> postgres=# select ctid, xmin, xmax, * from t;
>> ctid | xmin | xmax | id | name | x
>> -------+-------+------+----+------+---
>> (0,1) | 21171 | 0 | 1 | 111 | 0
>> (0,7) | 21177 | 0 | 3 | 333 | 5
>> (2 rows)
>>
>> postgres=# select * from t where id = 3;
>> id | name | x
>> ----+------+---
>> 3 | 333 | 5
>> (1 row)
>>
>> postgres=# set enable_seqscan = off;
>> SET
>> postgres=# select * from t where id = 3;
>> id | name | x
>> ----+------+---
>> (0 rows)
>
> Yeah, oops.

This really looks like a problem at heap-level with the parent
redirection not getting defined (?). Please note that a subsequent
REINDEX fails as well:
=# reindex index t_pkey ;
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

VACUUM FREEZE also is not getting things right, but a VACUUM FULL does.

Also, dropping the constraint and attempting to recreate it is failing:
=# alter table t drop constraint t_pkey;
ALTER TABLE
=# create index t_pkey on t(id);
ERROR: XX000: failed to find parent tuple for heap-only tuple at
(0,7) in table "t"
LOCATION: IndexBuildHeapRangeScan, index.c:2597

A corrupted page clearly indicates that there are no tuple redirections:
=# select lp, t_ctid, lp_off, t_infomask, t_infomask2 from
heap_page_items(get_raw_page('t', 0));
lp | t_ctid | lp_off | t_infomask | t_infomask2
----+--------+--------+------------+-------------
1 | (0,1) | 8152 | 2818 | 3
2 | null | 0 | null | null
3 | (0,4) | 8112 | 9986 | 49155
4 | (0,5) | 8072 | 9986 | 49155
5 | (0,6) | 8032 | 9986 | 49155
6 | (0,7) | 7992 | 9986 | 49155
7 | (0,7) | 7952 | 11010 | 32771
(7 rows)

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-01 01:25:30
Message-ID:	CAH2-WznUkrH4mxdwXhASb3-i1zAt69AsBWftn-m0-1k-dp13Jw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Maybe what this means is that we need to do both Dan's initially
> proposed patch (or something related to it) apart from the fixes already
> pushed. IOW we need to put back some of the "tupkeep" business ...

I took the time to specifically check if that would fix the problem.
Unfortunately, it did not. We see exactly the same problem, or at
least amcheck/REINDEX produces exactly the same error. I checked both
Dan's original update_freeze.patch, and your revision that retained
some of the "tupkeep" stuff,
0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
September 6th.

--
Peter Geoghegan

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-01 08:45:57
Message-ID:	CAB7nPqSYxZd-OKqsfH1VgwiK90YHDQ-JSgGmgGqT3crQ_DserA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> Maybe what this means is that we need to do both Dan's initially
>> proposed patch (or something related to it) apart from the fixes already
>> pushed. IOW we need to put back some of the "tupkeep" business ...
>
> I took the time to specifically check if that would fix the problem.
> Unfortunately, it did not. We see exactly the same problem, or at
> least amcheck/REINDEX produces exactly the same error. I checked both
> Dan's original update_freeze.patch, and your revision that retained
> some of the "tupkeep" stuff,
> 0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
> September 6th.

I did not take the time to dig into that more than two hours, but my
first feeling is that some race condition is going on with the heap
pruning.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-01 13:31:01
Message-ID:	20171001133101.wax43cgsnujrmql7@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Michael Paquier wrote:
> On Sun, Oct 1, 2017 at 10:25 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Thu, Sep 28, 2017 at 1:20 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >> Maybe what this means is that we need to do both Dan's initially
> >> proposed patch (or something related to it) apart from the fixes already
> >> pushed. IOW we need to put back some of the "tupkeep" business ...
> >
> > I took the time to specifically check if that would fix the problem.
> > Unfortunately, it did not. We see exactly the same problem, or at
> > least amcheck/REINDEX produces exactly the same error. I checked both
> > Dan's original update_freeze.patch, and your revision that retained
> > some of the "tupkeep" stuff,
> > 0002-Don-t-freeze-recently-dead-HOT-tuples, which you posted on
> > September 6th.
>
> I did not take the time to dig into that more than two hours, but my
> first feeling is that some race condition is going on with the heap
> pruning.

I'll look into this on Monday. I found out yesterday that the
problematic case is when HTSV returns HEAPTUPLE_DEAD and the HOT tests
return true. I haven't yet figured if it is one of those specifically,
but I suspect it is the IsHotUpdated case.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	pgsql-hackers(at)postgresql(dot)org, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-03 16:48:20
Message-ID:	20171003164820.lmohsuyf7hygafdq@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

So here's my attempt at an explanation for what is going on. At one
point, we have this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+--------+--------+--------+----------+-----------
1 | 1 | 2 | 0 | (0,1) | 902 | 3
2 | 0 | | | | |
3 | 1 | 2 | 19928 | (0,4) | 3142 | c003
4 | 1 | 14662 | 19929 | (0,5) | 3142 | c003
5 | 1 | 14663 | 19931 | (0,6) | 3142 | c003
6 | 1 | 14664 | 19933 | (0,7) | 3142 | c003
7 | 1 | 14665 | 0 | (0,7) | 2902 | 8003
(7 filas)

which shows a HOT-update chain, where the t_xmax are multixacts. Then a
vacuum freeze comes, and because the multixacts are below the freeze
horizon for multixacts, we get this:

select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+--------+--------+--------+----------+-----------
1 | 1 | 2 | 0 | (0,1) | 902 | 3
2 | 0 | | | | |
3 | 1 | 2 | 14662 | (0,4) | 2502 | c003
4 | 1 | 2 | 14663 | (0,5) | 2502 | c003
5 | 1 | 2 | 14664 | (0,6) | 2502 | c003
6 | 1 | 2 | 14665 | (0,7) | 2502 | c003
7 | 1 | 2 | 0 | (0,7) | 2902 | 8003
(7 filas)

where the xmin values have all been frozen, and the xmax values are now
regular Xids. I think the HOT code that walks the chain fails to detect
these as chains, because the xmin values no longer match the xmax
values. I modified the multixact freeze code, so that whenever the
update Xid is below the cutoff Xid, it's set to FrozenTransactionId,
since keeping the other value is invalid anyway (even though we have set
the HEAP_XMAX_COMMITTED flag). But that still doesn't fix the problem;
as far as I can see, vacuum removes the root of the chain, not yet sure
why, and then things are just as corrupted as before.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-03 17:01:58
Message-ID:	CAH2-Wzk5tvOqSaYcuYGEynN7Pp+9=sktdpt79e_bta9jACeEqg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> which shows a HOT-update chain, where the t_xmax are multixacts. Then a
> vacuum freeze comes, and because the multixacts are below the freeze
> horizon for multixacts, we get this:
>
> select lp, lp_flags, t_xmin, t_xmax, t_ctid, to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> ----+----------+--------+--------+--------+----------+-----------
> 1 | 1 | 2 | 0 | (0,1) | 902 | 3
> 2 | 0 | | | | |
> 3 | 1 | 2 | 14662 | (0,4) | 2502 | c003
> 4 | 1 | 2 | 14663 | (0,5) | 2502 | c003
> 5 | 1 | 2 | 14664 | (0,6) | 2502 | c003
> 6 | 1 | 2 | 14665 | (0,7) | 2502 | c003
> 7 | 1 | 2 | 0 | (0,7) | 2902 | 8003
> (7 filas)
>
> where the xmin values have all been frozen, and the xmax values are now
> regular Xids.

I thought that we no longer store FrozenTransactionId (xid 2) as our
"raw" xmin while freezing, and yet that's what we see here. It looks
like pageinspect is looking at raw xmin (it's calling
HeapTupleHeaderGetRawXmin(), not HeapTupleHeaderGetXmin()). What's the
deal with that? Shouldn't the original xmin be preserved for forensic
analysis, following commit 37484ad2?

I guess what you mean is that this is what you see having modified the
code to actually store FrozenTransactionId as xmin once more, in an
effort to fix this?

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-03 23:10:20
Message-ID:	CAH2-WznBLP5s2fvUuwUyNJC6sSRUDkVCMcXjCuL7nTShqi4jig@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 9:48 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> But that still doesn't fix the problem;
> as far as I can see, vacuum removes the root of the chain, not yet sure
> why, and then things are just as corrupted as before.

Are you sure it's not opportunistic pruning? Another thing that I've
noticed with this problem is that the relevant IndexTuple will pretty
quickly vanish, presumably due to LP_DEAD setting (but maybe not
actually due to LP_DEAD setting).

(Studies the problem some more...)

I now think that it actually is a VACUUM problem, specifically a
problem with VACUUM pruning. You see the HOT xmin-to-xmax check
pattern that you mentioned within heap_prune_chain(), which looks like
where the incorrect tuple prune (or possibly, at times, redirect?)
takes place. (I refer to the prune/kill that you mentioned today, that
frustrated your first attempt at a fix -- "I modified the multixact
freeze code...".)

The attached patch "fixes" the problem -- I cannot get amcheck to
complain about corruption with this applied. And, "make check-world"
passes. Hopefully it goes without saying that this isn't actually my
proposed fix. It tells us something that this at least *masks* the
problem, though; it's a start.

FYI, the repro case page contents looks like this with the patch applied:

postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
to_hex(t_infomask) as infomask,
to_hex(t_infomask2) as infomask2
from heap_page_items(get_raw_page('t', 0));
lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
----+----------+---------+--------+--------+----------+-----------
1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
2 | 2 | | | | |
3 | 0 | | | | |
4 | 0 | | | | |
5 | 0 | | | | |
6 | 0 | | | | |
7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
(7 rows)

--
Peter Geoghegan

Attachment	Content-Type	Size
suppress-bad-prune.patch	text/x-patch	662 bytes

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 00:15:44
Message-ID:	CAB7nPqQT6hSvL0Tn64QBNXpABcOF1bsGxW4WN+TRZu0GH5PZmA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> ----+----------+---------+--------+--------+----------+-----------
> 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
> 2 | 2 | | | | |
> 3 | 0 | | | | |
> 4 | 0 | | | | |
> 5 | 0 | | | | |
> 6 | 0 | | | | |
> 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
> (7 rows)

Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.

- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.

Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
- priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+ if (!tupdead)
+ priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
--
Michael

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 01:09:53
Message-ID:	10CE2AE7-B1AC-4E81-BF00-E9EF4BA855EE@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I’ve just started looking at this again after a few weeks break.

There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
pruneheap.c:

/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
break;

chainitems[nchain++] = offnum;

The priorXmax is a multixact key share lock, thus not equal to xmin. This terminates the scan. Unlike the scan termination with “if (!recent_dead) break;” below the switch (...SatisiesVacuum…), this “break” does not put the offnum into the chain even though it is in the chain. If the first not-deleted item isn’t put in the chain then we’ll not call heap_prune_record_redirect().

I do not know what the above ‘if’ test is protecting. Normally the xmin is equal to the priorXmax. Other than protecting against corruption a key share lock can cause this. So, I tried a fix which does the “if” check after adding it to chainitems. This will break whatever real situation this IF was protecting. Tom Lane put this in.

OK: With that hack of a fix the redirect now works correctly. However, we still get the reindex problem with not finding the parent. That problem is caused by:
Pruneheap.c:heap_get_root_tuples()

if (TransactionIdIsValid(priorXmax) &&
!TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;

In this case, instead of these not being equal because of a multixact key share lock, it is because XMIN is frozen and FrozenTransactionId doesn’t equal the priorXmax. Thus, we don’t build the entire chain from root to most current row version and this causes the reindex failure.

If we disable this ‘if’ as a hack then we no longer get a problem on the reindex. However, YiWen reported that at the end of an install check out index checking reported corruption in the system catalogues. So we are still looking.

We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

FYI, someone should look at the same ”if” test in heapam.c: heap_lock_updated_tuple_rec(). Also, I hope there are no strange issues with concurrent index builds.

Finally, the idea behind the original fix was to simply NOT to do an unsupported freeze on a dead tuple. It had two drawbacks:
1) CLOG truncation. This could have been handled by keeping track of the old unpruned item found and using that to update the table’s/DB’s freeze xid.
2) Not making freeze progress. The only reason the prune would fail should be because of an open TXN. Unless that TXN was so old such that it’s XID was as old as the ?min freeze threshold? then we would make progress. If we were doing TXN’s that old then we’d be having problems anyway.

On 10/3/17, 5:15 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:

On Wed, Oct 4, 2017 at 8:10 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> I now think that it actually is a VACUUM problem, specifically a
> problem with VACUUM pruning. You see the HOT xmin-to-xmax check
> pattern that you mentioned within heap_prune_chain(), which looks like
> where the incorrect tuple prune (or possibly, at times, redirect?)
> takes place. (I refer to the prune/kill that you mentioned today, that
> frustrated your first attempt at a fix -- "I modified the multixact
> freeze code...".)

My lookup of the problem converges to the same conclusion. Something
is wrong with the vacuum's pruning. I have spent some time trying to
design a patch, all the solutions I tried have proved to make the
problem harder to show up, but it still showed up, sometimes after
dozens of attempts.

> The attached patch "fixes" the problem -- I cannot get amcheck to
> complain about corruption with this applied. And, "make check-world"
> passes. Hopefully it goes without saying that this isn't actually my
> proposed fix. It tells us something that this at least *masks* the
> problem, though; it's a start.

Yep.

> FYI, the repro case page contents looks like this with the patch applied:
> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
> to_hex(t_infomask) as infomask,
> to_hex(t_infomask2) as infomask2
> from heap_page_items(get_raw_page('t', 0));
> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
> ----+----------+---------+--------+--------+----------+-----------
> 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
> 2 | 2 | | | | |
> 3 | 0 | | | | |
> 4 | 0 | | | | |
> 5 | 0 | | | | |
> 6 | 0 | | | | |
> 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
> (7 rows)

Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
what would look correct to me.

- * Check the tuple XMIN against prior XMAX, if any
- */
- if (TransactionIdIsValid(priorXmax) &&
- !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
- break;
If you remove this check, you could also remove completely priorXmax.

Actually, I may be missing something, but why is priorXmax updated
even for dead tuples? For example just doing that is also taking care
of the problem:
@@ -558,7 +558,8 @@ heap_prune_chain(Relation relation, Buffer buffer,
OffsetNumber rootoffnum,
Assert(ItemPointerGetBlockNumber(&htup->t_ctid) ==
BufferGetBlockNumber(buffer));
offnum = ItemPointerGetOffsetNumber(&htup->t_ctid);
- priorXmax = HeapTupleHeaderGetUpdateXid(htup);
+ if (!tupdead)
+ priorXmax = HeapTupleHeaderGetUpdateXid(htup);
}
--
Michael

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 01:13:24
Message-ID:	CAH2-Wzn8-zMQyNf07vTp=WW6oVZwTBoNbypjKrygixCn-o-3jg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 5:15 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
>> FYI, the repro case page contents looks like this with the patch applied:
>> postgres=# select lp, lp_flags, t_xmin, t_xmax, t_ctid,
>> to_hex(t_infomask) as infomask,
>> to_hex(t_infomask2) as infomask2
>> from heap_page_items(get_raw_page('t', 0));
>> lp | lp_flags | t_xmin | t_xmax | t_ctid | infomask | infomask2
>> ----+----------+---------+--------+--------+----------+-----------
>> 1 | 1 | 1845995 | 0 | (0,1) | b02 | 3
>> 2 | 2 | | | | |
>> 3 | 0 | | | | |
>> 4 | 0 | | | | |
>> 5 | 0 | | | | |
>> 6 | 0 | | | | |
>> 7 | 1 | 1846001 | 0 | (0,7) | 2b02 | 8003
>> (7 rows)
>
> Is lp_off for tid (0,2) pointing to (0,7)? A hot chain preserved is
> what would look correct to me.

Yes, it is:

postgres=# select * from bt_page_items('foo', 1);
itemoffset | ctid | itemlen | nulls | vars | data
------------+-------+---------+-------+------+-------------------------
1 | (0,1) | 16 | f | f | 01 00 00 00 00 00 00 00
2 | (0,2) | 16 | f | f | 03 00 00 00 00 00 00 00
(2 rows)

I can tell from looking at my hex editor that the 4 bytes of ItemId
that we see for position '(0,2)' in the ItemId array are "07 00 01
00", meaning that '(0,2)' this is a LP_REDIRECT item, repointing us to
'(0,7)'. Everything here looks sane to me, at least at first blush.

> - * Check the tuple XMIN against prior XMAX, if any
> - */
> - if (TransactionIdIsValid(priorXmax) &&
> - !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
> - break;
> If you remove this check, you could also remove completely priorXmax.
>
> Actually, I may be missing something, but why is priorXmax updated
> even for dead tuples? For example just doing that is also taking care
> of the problem:

I'll study what you suggest here some more tomorrow.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 01:19:56
Message-ID:	CAH2-WzmtGabEF75yg2nW1HqNyx0Tmq2xrcG40hvJ7-OfPKnzQQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".

--
Peter Geoghegan

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 01:25:12
Message-ID:	360B0EC6-4ED0-4D68-A392-BD1C3453DEAB@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain?

lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax
----+--------+--------+------------+-------------+--------+--------
1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0
2 | | 5 | | | |
3 | | 0 | | | |
4 | | 0 | | | |
5 | (0,6) | 8112 | 9986 | 49155 | 36962 | 36963
6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961
7 | (0,7) | 8032 | 11010 | 32771 | 36961 | 0
(7 rows)

On 10/3/17, 6:20 PM, "Peter Geoghegan" <pg(at)bowt(dot)ie> wrote:

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".

--
Peter Geoghegan

From:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 03:43:00
Message-ID:	1507088579842.74063@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
> break;

So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened

The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I've run it a couple of times and have consistently gotten the following page

lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
----+--------+--------+----------+------------+-------------
1 | (0,1) | 8152 | 1 | 2818 | 3
2 | | 7 | 2 | |
3 | | 0 | 0 | |
4 | | 0 | 0 | |
5 | | 0 | 0 | |
6 | | 0 | 0 | |
7 | (0,7) | 8112 | 1 | 11010 | 32771
(7 rows)

I've made this change to conditions in both heap_prune_chain and heap_get_root_tuples and this seems to cause things to pass my smoke tests.

I'll look into this deeper tomorrow.

Yi Wen
________________________________________
From: Peter Geoghegan <pg(at)bowt(dot)ie>
Sent: Tuesday, October 3, 2017 6:19 PM
To: Wood, Dan
Cc: Michael Paquier; Alvaro Herrera; PostgreSQL Hackers; Wong, Yi Wen
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

On Tue, Oct 3, 2017 at 6:09 PM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> I’ve just started looking at this again after a few weeks break.

> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetXmin(htup)))
> break;

> We need to understand why these TXID equal checks exist. Can we differentiate the cases they are protecting against with the two exceptions I’ve found?

I haven't read your remarks here in full, since I'm about to stop
working for the day, but I will point out that
src/backend/access/heap/README.HOT says a fair amount about this,
under "Abort Cases".

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 09:40:57
Message-ID:	20171004094057.2ogpt36lvuanoe5f@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:

> I thought that we no longer store FrozenTransactionId (xid 2) as our
> "raw" xmin while freezing, and yet that's what we see here.

I'm doing this in 9.3. I can't tell if the new tuple freezing stuff
broke things more thoroughly, but it is already broken in earlier
releases.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 09:51:59
Message-ID:	20171004095159.wi277fwlmky6ymmz@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wood, Dan wrote:

> There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
> pruneheap.c:
>
> /*
> * Check the tuple XMIN against prior XMAX, if any
> */
> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
> break;
>
> chainitems[nchain++] = offnum;
>
> The priorXmax is a multixact key share lock,

Uhh, what? That certainly shouldn't happen -- the priorXmax comes from

priorXmax = HeapTupleHeaderGetUpdateXid(htup);

so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 09:54:29
Message-ID:	20171004095429.lh37mah23vp3ydau@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wood, Dan wrote:
> One minor side note… Is it weird for xmin/xmax to go backwards in a hot row chain?
>
> lp | t_ctid | lp_off | t_infomask | t_infomask2 | t_xmin | t_xmax
> ----+--------+--------+------------+-------------+--------+--------
> 1 | (0,1) | 8152 | 2818 | 3 | 36957 | 0
> 2 | | 5 | | | |
> 3 | | 0 | | | |
> 4 | | 0 | | | |
> 5 | (0,6) | 8112 | 9986 | 49155 | 36962 | 36963
> 6 | (0,7) | 8072 | 9986 | 49155 | 36963 | 36961
> 7 | (0,7) | 8032 | 11010 | 32771 | 36961 | 0
> (7 rows)

No, it just means transaction A got its XID before transaction B, but B
executed the update first and A updated the tuple second.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 12:06:09
Message-ID:	20171004120609.kuf2wtyu6osdfpxm@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Alvaro Herrera wrote:
> Peter Geoghegan wrote:
>
> > I thought that we no longer store FrozenTransactionId (xid 2) as our
> > "raw" xmin while freezing, and yet that's what we see here.
>
> I'm doing this in 9.3. I can't tell if the new tuple freezing stuff
> broke things more thoroughly, but it is already broken in earlier
> releases.

In fact, I think in 9.3 we should include this patch, to set the Xmin to
FrozenXid. 9.4 and onwards have commit 37484ad2a "Change the way we
mark tuples as frozen" which uses a combination of infomask bits, but in
9.3 I think leaving the unfrozen value in the xmax field is a bad idea
even if we set the HEAP_XMAX_COMMITTED bit.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
fix-multi-freezing.patch	text/plain	522 bytes

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 13:46:05
Message-ID:	20171004134605.o7qorx4f4rvyvgok@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wong, Yi Wen wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
>
> > if (TransactionIdIsValid(priorXmax) &&
> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
> > break;
>
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened

I independently arrived at the same conclusion. Since I was trying with
9.3, the patch differs -- in the old version we must explicitely test
for the FrozenTransactionId value, instead of using GetRawXmin.
Attached is the patch I'm using, and my own oneliner test (pretty much
the same I posted earlier) seems to survive dozens of iterations without
showing any problem in REINDEX.

This patch is incomplete, since I think there are other places that need
to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
Of course, for 9.4 and onwards we need to patch like you described.

This bit in EvalPlanQualFetch caught my attention ... why is it saying
xmin never changes? It does change with freezing.

/*
* If xmin isn't what we're expecting, the slot must have been
* recycled and reused for an unrelated tuple. This implies that
* the latest version of the row was deleted, so we need do
* nothing. (Should be safe to examine xmin without getting
* buffer's content lock, since xmin never changes in an existing
* tuple.)
*/
if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
priorXmax))

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
93-fix.patch	text/plain	1.1 KB

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 15:40:32
Message-ID:	CAH2-WzmpuR3vF6+PW2nRh2+eHPTb-c3tAw+S8iBWPK8pwtnUpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Oct 3, 2017 at 8:43 PM, Wong, Yi Wen <yiwong(at)amazon(dot)com> wrote:
> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
>
>> if (TransactionIdIsValid(priorXmax) &&
>> !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> break;
>
> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened

I was thinking along similar lines.

> The interesting consequence of changing that is the prune seems to get the entire chain altogether with Dan's repro... I've run it a couple of times and have consistently gotten the following page
>
> lp | t_ctid | lp_off | lp_flags | t_infomask | t_infomask2
> ----+--------+--------+----------+------------+-------------
> 1 | (0,1) | 8152 | 1 | 2818 | 3
> 2 | | 7 | 2 | |
> 3 | | 0 | 0 | |
> 4 | | 0 | 0 | |
> 5 | | 0 | 0 | |
> 6 | | 0 | 0 | |
> 7 | (0,7) | 8112 | 1 | 11010 | 32771
> (7 rows)

That's also what I see. This is a good thing, I think; that's how we
ought to prune.

--
Peter Geoghegan

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 18:00:41
Message-ID:	CB2B4460-3564-48D9-B014-7BDA6A5B3DB7@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Sorry. Should have looked at the macro. I sent this too soon. The early “break;” here is likely the xmin frozen reason as I found in the other loop.

On 10/4/17, 2:52 AM, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

Wood, Dan wrote:

> There is a tangled web of issues here. With the community fix we get a corrupted page(invalid redirect ptr from indexed item). The cause of that is:
> pruneheap.c:
>
> /*
> * Check the tuple XMIN against prior XMAX, if any
> */
> if (TransactionIdIsValid(priorXmax) &&
> !TransactionIdEquals(HeapTupleHeaderGetXmin(htup), priorXmax))
> break;
>
> chainitems[nchain++] = offnum;
>
> The priorXmax is a multixact key share lock,

Uhh, what? That certainly shouldn't happen -- the priorXmax comes from

priorXmax = HeapTupleHeaderGetUpdateXid(htup);

so only the XID of the update itself should be reported, not a multixact
and certainly not just a tuple lock XID.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 18:50:03
Message-ID:	CAH2-WzmJoJqLjE-uJpYhTvE1=BTikQnB5SrcM9-nGbQRou7ESw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Wed, Oct 4, 2017 at 11:00 AM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> The early “break;” here is likely the xmin frozen reason as I found in the other loop.

It looks that way.

Since we're already very defensive when it comes to this xmin/xmax
matching business, and we're defensive while following an update chain
more generally, I wonder if we should be checking
HeapTupleHeaderIsSpeculative() on versions >= 9.5 (versions with ON
CONFLICT DO UPDATE, where t_ctid block number might actually be a
speculative insertion token). Or, at least acknowledging that case in
comments. I remember expressing concern that something like this was
possible at the time that that went in.

We certainly don't want to have heap_abort_speculative() "super
delete" the wrong tuple in the event of item pointer recycling. There
are at least defensive sanity checks that would turn that into an
error within heap_abort_speculative(), so it wouldn't be a disaster if
it was attempted. I don't think that it's possible in practice, and
maybe it's sufficient that routines like heap_get_latest_tid() check
for a sane item offset, which will discriminate against
SpecTokenOffsetNumber, which must be well out of range for ItemIds on
the page. Could be worth a comment.

(I guess that heap_prune_chain() wouldn't need to be changed if we
decide to add such comments, because the speculative tuple ItemId is
going to be skipped over due to being ItemIdIsUsed() before we even
get there.)
--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-04 22:50:10
Message-ID:	CAH2-WznGp4_Xhq7sxhoAe5rX-Ut0AcGrR=3+gVvw+WAJ=jm4sQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened

As you know, on version 9.4+, as of commit 37484ad2a, we decided that
we are "largely ignoring the value to which it [xmin] is set". The
expectation became that raw xmin is available after freezing, but
mostly for forensic purposes. I think Alvaro should now memorialize
the idea that its value is actually critical in some place
(htup_details.h?).

> I independently arrived at the same conclusion. Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.

Obviously you're going to have to be prepared for a raw xmin of
FrozenTransactionId, even on 9.4+, due to pg_upgrade. I can see why it
would be safe (or at least no more dangerous) to rely on
HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
installations that initdb'd on a version after commit 37484ad2a
(version 9.4+). However, I'm not sure why what you propose here would
be safe when even raw xmin happens to be FrozenTransactionId. Are you
sure that that's truly race-free? If it's really true that we only
need to check for FrozenTransactionId on 9.3, why not just do that on
all versions, and never bother with HeapTupleHeaderGetRawXmin()?
("Sheer paranoia" is a valid answer; I just want us to be clear on the
reasoning.)

Obviously any race would have a ridiculously tiny window, but it's not
obvious why this protocol would be completely race-free (in the event
of a FrozenTransactionId raw xmin).

--
Peter Geoghegan

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wood, Dan" <hexpert(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 01:31:43
Message-ID:	CAB7nPqT7ZZ2Z_fVBPEVUKriCBWXTX=EEzy_dJFyVNY8O8OJPFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes? It does change with freezing.
>
> /*
> * If xmin isn't what we're expecting, the slot must have been
> * recycled and reused for an unrelated tuple. This implies that
> * the latest version of the row was deleted, so we need do
> * nothing. (Should be safe to examine xmin without getting
> * buffer's content lock, since xmin never changes in an existing
> * tuple.)
> */
> if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
> priorXmax))

Agreed. That's not good.
--
Michael

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 01:39:52
Message-ID:	8ABEB00F-E19E-4178-A00A-DDA99EA73D94@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.

Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.

On 10/4/17, 6:31 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:

On Wed, Oct 4, 2017 at 10:46 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Wong, Yi Wen wrote:
>> My interpretation of README.HOT is the check is just to ensure the chain is continuous; in which case the condition should be:
>>
>> > if (TransactionIdIsValid(priorXmax) &&
>> > !TransactionIdEquals(priorXmax, HeapTupleHeaderGetRawXmin(htup)))
>> > break;
>>
>> So the difference is GetRawXmin vs GetXmin, because otherwise we get the FreezeId instead of the Xmin when the transaction happened
>
> I independently arrived at the same conclusion. Since I was trying with
> 9.3, the patch differs -- in the old version we must explicitely test
> for the FrozenTransactionId value, instead of using GetRawXmin.
> Attached is the patch I'm using, and my own oneliner test (pretty much
> the same I posted earlier) seems to survive dozens of iterations without
> showing any problem in REINDEX.

Confirmed, the problem goes away with this patch on 9.3.

> This patch is incomplete, since I think there are other places that need
> to be patched in the same way (EvalPlanQualFetch? heap_get_latest_tid?).
> Of course, for 9.4 and onwards we need to patch like you described.

I have just done a lookup of the source code, and here is an
exhaustive list of things in need of surgery:
- heap_hot_search_buffer
- heap_get_latest_tid
- heap_lock_updated_tuple_rec
- heap_prune_chain
- heap_get_root_tuples
- rewrite_heap_tuple
- EvalPlanQualFetch (twice)

> This bit in EvalPlanQualFetch caught my attention ... why is it saying
> xmin never changes? It does change with freezing.
>
> /*
> * If xmin isn't what we're expecting, the slot must have been
> * recycled and reused for an unrelated tuple. This implies that
> * the latest version of the row was deleted, so we need do
> * nothing. (Should be safe to examine xmin without getting
> * buffer's content lock, since xmin never changes in an existing
> * tuple.)
> */
> if (!TransactionIdEquals(HeapTupleHeaderGetXmin(tuple.t_data),
> priorXmax))

Agreed. That's not good.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 02:17:09
Message-ID:	CAB7nPqS6X2nQD4phz1amKbng4ppru9PGRd5bqygSHgdknLE9Wg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 09:54:54
Message-ID:	20171005095454.wtpnjlbqdnyqgrad@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wood, Dan wrote:
> Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.

Good idea. You can achieve a similar effect by adding a filler column,
and reducing fillfactor.

> Once nearly all 250 clients have done their updates and everybody is
> waiting to vacuum which one by one will take a while I usually just
> “pkill -9 psql”. After that I have many of duplicate “id=3” rows.

Odd ...

> On top of that I think we might have a lock leak. After the pkill I
> tried to rerun setup.sql to drop/create the table and it hangs. I see
> an autovacuum process starting and existing every couple of seconds.
> Only by killing and restarting PG can I drop the table.

Please do try to figure this one out. It'd be a separate problem,
worthy of its own thread.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 11:35:51
Message-ID:	20171005113551.kjo7kwpdkdxlp6ex@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:

> As you know, on version 9.4+, as of commit 37484ad2a, we decided that
> we are "largely ignoring the value to which it [xmin] is set". The
> expectation became that raw xmin is available after freezing, but
> mostly for forensic purposes. I think Alvaro should now memorialize
> the idea that its value is actually critical in some place
> (htup_details.h?).

I'd love to be certain that we preserve the original value in all
cases.

> On Wed, Oct 4, 2017 at 6:46 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> > I independently arrived at the same conclusion. Since I was trying with
> > 9.3, the patch differs -- in the old version we must explicitely test
> > for the FrozenTransactionId value, instead of using GetRawXmin.
>
> Obviously you're going to have to be prepared for a raw xmin of
> FrozenTransactionId, even on 9.4+, due to pg_upgrade.

Hmm. It would be good to be able to get rid of that case, actually,
because I now think it's less safe (see below).

At any rate, I was thinking in a new routine to encapsulate the logic,

/*
* Check the tuple XMIN against prior XMAX, if any
*/
if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup)))
break;

where ..XmaxMatchesXmin would know about checking for possibly frozen
tuples.

> I can see why it
> would be safe (or at least no more dangerous) to rely on
> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
> installations that initdb'd on a version after commit 37484ad2a
> (version 9.4+). However, I'm not sure why what you propose here would
> be safe when even raw xmin happens to be FrozenTransactionId. Are you
> sure that that's truly race-free? If it's really true that we only
> need to check for FrozenTransactionId on 9.3, why not just do that on
> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
> reasoning.)

I think the RawXmin is the better mechanism. I'm not absolutely certain
that the windows is completely closed in 9.3; as I understand things, it
is possible for transaction A to prune an aborted heap-only tuple, then
transaction B to insert a frozen tuple in the same location, then
transaction C follows a link to the HOT that was pruned. I think we'd
end up considering that the new frozen tuple is part of the chain, which
is wrong ... In 9.4 we can compare the real xmin.

I hope I am proved wrong about this, because if not, I think we're
looking at an essentially unsolvable problem in 9.3.

> Obviously any race would have a ridiculously tiny window, but it's not
> obvious why this protocol would be completely race-free (in the event
> of a FrozenTransactionId raw xmin).

As far as I understand, this problem only emerges if one part of a HOT
chain reaches the min freeze age while another part of the same chain is
still visible by some running transaction. It is particularly
noticeably in our current test case because we use a min freeze age of
0, with many concurrrent modifying the same page. What this says to me
is that VACUUM FREEZE is mildly dangerous when there's lots of high
concurrent HOT UPDATE activity.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 16:24:03
Message-ID:	20171005162402.jahqflf3mekileqm@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I think this is the patch for 9.3. I ran the test a few hundred times
(with some additional changes such as randomly having an update inside a
savepoint that's randomly aborted, randomly aborting the transaction,
randomly skipping the for key share lock, randomly sleeping at a few
points; and also adding filler columns, reducing fillfactor and using
5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
to stay in the same page or migrate to later pages). The final REINDEX
has never complained again about failing to find the root tuple. I hope
it's good now.

The attached patch needs a few small tweaks, such as improving
commentary in the new function, maybe turn it into a macro (otherwise I
think it could be bad for performance; I'd like a static func but not
sure those are readily available in 9.3), change the XID comparison to
use the appropriate macro rather than ==, and such.

Regarding changes of xmin/xmax comparison, I also checked manually the
spots I thought should be modified and later double-checked against the
list that Michael posted. It's a match, except for rewriteheap.c which
I cannot make heads or tails about. (I think it's rather unfortunate
that it sticks a tuple's Xmax into a field that's called Xmin, but let's
put that aside). Maybe there's a problem here, maybe there isn't.

I'm now going to forward-port this to 9.4.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Fix-bugs.patch	text/plain	4.8 KB

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 17:52:28
Message-ID:	CAH2-Wzm=4mf7wC9hWy-OAzJycSbnUdLmGFhTWkx-N-LFF+pijA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Oct 5, 2017 at 4:35 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> At any rate, I was thinking in a new routine to encapsulate the logic,
>
> /*
> * Check the tuple XMIN against prior XMAX, if any
> */
> if (!HeapTupleUpdateXmaxMatchesXmin(priorXmax, HeapTupleHeaderGetXmin(htup)))
> break;
>
> where ..XmaxMatchesXmin would know about checking for possibly frozen
> tuples.

Makes sense.

>> I can see why it
>> would be safe (or at least no more dangerous) to rely on
>> HeapTupleHeaderGetRawXmin() in the way mentioned here, at least on
>> installations that initdb'd on a version after commit 37484ad2a
>> (version 9.4+). However, I'm not sure why what you propose here would
>> be safe when even raw xmin happens to be FrozenTransactionId. Are you
>> sure that that's truly race-free? If it's really true that we only
>> need to check for FrozenTransactionId on 9.3, why not just do that on
>> all versions, and never bother with HeapTupleHeaderGetRawXmin()?
>> ("Sheer paranoia" is a valid answer; I just want us to be clear on the
>> reasoning.)
>
> I think the RawXmin is the better mechanism. I'm not absolutely certain
> that the windows is completely closed in 9.3; as I understand things, it
> is possible for transaction A to prune an aborted heap-only tuple, then
> transaction B to insert a frozen tuple in the same location, then
> transaction C follows a link to the HOT that was pruned. I think we'd
> end up considering that the new frozen tuple is part of the chain, which
> is wrong ... In 9.4 we can compare the real xmin.

Good point: the race doesn't exist on 9.4+ with pg_upgrade from 9.3,
when raw xmin happens to be FrozenTransactionId, because there can be
no new tuples that have that property. This possible race is strictly
a 9.3 issue. (You have to deal with a raw xmin of FrozenTransactionId
on 9.4+, but there are no race conditions, because affected tuples are
all pre-pg_upgrade tuples -- they were frozen months ago, not
microseconds ago.)

> I hope I am proved wrong about this, because if not, I think we're
> looking at an essentially unsolvable problem in 9.3.

Well, as noted within README.HOT, the xmin/xmax matching did not
appear in earlier versions of the original HOT patch, because it was
thought that holding a pin of the buffer was a sufficient interlock
against concurrent line pointer recycling (pruning must have a buffer
cleanup lock). Clearly it is more robust to match xmin to xmax, but is
there actually any evidence that it was really necessary at all (for
single-page HOT chain traversals) when HOT went in in 2007, or that it
has since become necessary? heap_page_prune()'s caller has to have a
buffer cleanup lock.

I'm certainly not advocating removing the xmin/xmax matching within
heap_prune_chain() on 9.3. However, it may be acceptable to rely on
holding a buffer cleanup lock within heap_prune_chain() on 9.3, just
for the rare/theoretical cases where the FrozenTransactionId raw xmin
ambiguity means that xmin/xmax matching alone might not be enough. As
you say, it seems unsolvable through looking at state on the page
directly on 9.3, so there may be no other way. And, that's still
strictly better than what we have today.

> As far as I understand, this problem only emerges if one part of a HOT
> chain reaches the min freeze age while another part of the same chain is
> still visible by some running transaction. It is particularly
> noticeably in our current test case because we use a min freeze age of
> 0, with many concurrrent modifying the same page. What this says to me
> is that VACUUM FREEZE is mildly dangerous when there's lots of high
> concurrent HOT UPDATE activity.

I'm not sure what you mean here. It is dangerous right now, because
there is a bug, but we can squash the bug.

--
Peter Geoghegan

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-05 17:54:46
Message-ID:	EB598F1C-B8F3-4FB2-98B5-26A68A21BEDB@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.

I would prefer to focus on either latest 9X or 11dev. Does Alvaro’s patch presume any of the other patch to set COMMITTED in the freeze code?

On 10/4/17, 7:17 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:

On Thu, Oct 5, 2017 at 10:39 AM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> Whatever you do make sure to also test 250 clients running lock.sql. Even with the communities fix plus YiWen’s fix I can still get duplicate rows. What works for “in-block” hot chains may not work when spanning blocks.

Interesting. Which version did you test? Only 9.6?

> Once nearly all 250 clients have done their updates and everybody is waiting to vacuum which one by one will take a while I usually just “pkill -9 psql”. After that I have many of duplicate “id=3” rows. On top of that I think we might have a lock leak. After the pkill I tried to rerun setup.sql to drop/create the table and it hangs. I see an autovacuum process starting and existing every couple of seconds. Only by killing and restarting PG can I drop the table.

Yeah, that's more or less what I have been doing. My tests involve
using your initial script with way more sessions triggering lock.sql,
minus the kill-9 portion (good idea actually). I can of course see the
sessions queuing for VACUUM, still I cannot see duplicated rows, even
if I headshot Postgres in the middle of the VACUUM waiting queue. Note
that I have just tested Alvaro's patch on 9.3.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 07:12:47
Message-ID:	CAB7nPqRrCA=394EQ4YhJ1bEZBy-R0-_ZL7ofqL95WeHZv_GrTw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> I think this is the patch for 9.3. I ran the test a few hundred times
> (with some additional changes such as randomly having an update inside a
> savepoint that's randomly aborted, randomly aborting the transaction,
> randomly skipping the for key share lock, randomly sleeping at a few
> points; and also adding filler columns, reducing fillfactor and using
> 5, 50, 180, 250, 500 sessions after verifying that it causes the tuples
> to stay in the same page or migrate to later pages). The final REINDEX
> has never complained again about failing to find the root tuple. I hope
> it's good now.

I have looked and played with your patch as well for a couple of
hours, and did not notice any failures again. The structure of the
pages looked sane as well, I could see also with pageinspect a correct
HOT chain using the first set of tests provided on this thread.

> The attached patch needs a few small tweaks, such as improving
> commentary in the new function, maybe turn it into a macro (otherwise I
> think it could be bad for performance; I'd like a static func but not
> sure those are readily available in 9.3), change the XID comparison to
> use the appropriate macro rather than ==, and such.

Yeah that would be better.

> Regarding changes of xmin/xmax comparison, I also checked manually the
> spots I thought should be modified and later double-checked against the
> list that Michael posted.

Thanks. I can see see that your patch is filling the holes.

> It's a match, except for rewriteheap.c which
> I cannot make heads or tails about. (I think it's rather unfortunate
> that it sticks a tuple's Xmax into a field that's called Xmin, but let's
> put that aside). Maybe there's a problem here, maybe there isn't.

rewrite_heap_tuple is used only by CLUSTER, and the oldest new tuples
are frozen before being rewritten. So this looks safe to me at the
end.

> I'm now going to forward-port this to 9.4.

+ /*
+ * If the xmax of the old tuple is identical to the xmin of the new one,
+ * it's a match.
+ */
+ if (xmax == xmin)
+ return true;
I would use TransactionIdEquals() here, to remember once you switch
that to a macro.

+/*
+ * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
+ * taking into account that the Xmin might have been frozen.
+ */
[...]
+ /*
+ * We actually don't know if there's a match, but if the previous tuple
+ * was frozen, we cannot really rely on a perfect match.
+ */
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 10:57:18
Message-ID:	20171006105718.nl5ik274wqjsjou6@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> + /*
> + * If the xmax of the old tuple is identical to the xmin of the new one,
> + * it's a match.
> + */
> + if (xmax == xmin)
> + return true;
> I would use TransactionIdEquals() here, to remember once you switch
> that to a macro.

I've had second thoughts about the macro thing -- for now I'm keeping it
a function, actually.

> +/*
> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
> + * taking into account that the Xmin might have been frozen.
> + */
> [...]
> + /*
> + * We actually don't know if there's a match, but if the previous tuple
> + * was frozen, we cannot really rely on a perfect match.
> + */

I don't know what you had in mind here, but I tweaked the 9.3 version so
that it now looks like this:

/*
* HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
*
* Given the new version of a tuple after some update, verify whether the
* given Xmax (corresponding to the previous version) matches the tuple's
* Xmin, taking into account that the Xmin might have been frozen after the
* update.
*/
bool
HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
{
TransactionId xmin = HeapTupleHeaderGetXmin(htup);

/*
* If the xmax of the old tuple is identical to the xmin of the new one,
* it's a match.
*/
if (TransactionIdEquals(xmax, xmin))
return true;

/*
* When a tuple is frozen, the original Xmin is lost, but we know it's a
* committed transaction. So unless the Xmax is InvalidXid, we don't
* know for certain that there is a match, but there may be one; and we
* must return true so that a HOT chain that is half-frozen can be walked
* correctly.
*/
if (TransactionIdEquals(xmin, FrozenTransactionId) &&
TransactionIdIsValid(xmax))
return true;

return false;
}

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 13:18:30
Message-ID:	20171006131830.ktirfceetnnjwigm@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wood, Dan wrote:
> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
>
> I would prefer to focus on either latest 9X or 11dev.

I tested my patch on 9.4 and 9.5 today and it seems to close the problem
(with the patch, I waited 10x as many iterations as it took for the
problem to occur ~10 times without the patch), but I can reproduce a
problem in 9.6 with my patch installed. There must be something new in
9.6 that is causing the problem to reappear.

> Does Alvaro’s patch presume any of the other patch to set COMMITTED in
> the freeze code?

I don't know what you mean. Here is the 9.6 version of my patch. Note
that HEAP_XMIN_FROZEN (which uses the XMIN_COMMITTED bit as I recall)
was introduced in 9.4.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-Fix-bugs.patch	text/plain	5.7 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 13:28:03
Message-ID:	CAB7nPqQz6kMrcLEM+7VR-hjKK+Yrf0ti0q2Svj6HBNBFD4ASJQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 7:57 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Michael Paquier wrote:
>> On Fri, Oct 6, 2017 at 1:24 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> +/*
>> + * Given a tuple, verify whether the given Xmax matches the tuple's Xmin,
>> + * taking into account that the Xmin might have been frozen.
>> + */
>> [...]
>> + /*
>> + * We actually don't know if there's a match, but if the previous tuple
>> + * was frozen, we cannot really rely on a perfect match.
>> + */
>
> I don't know what you had in mind here,

Impossible to know if I don't actually send the contents :)

> but I tweaked the 9.3 version so that it now looks like this:

I wanted to mention that the comments could be reworked. And forgot to
suggest some.

> /*
> * HeapTupleUpdateXmaxMatchesXmin - verify update chain xmax/xmin lineage
> *
> * Given the new version of a tuple after some update, verify whether the
> * given Xmax (corresponding to the previous version) matches the tuple's
> * Xmin, taking into account that the Xmin might have been frozen after the
> * update.
> */
> bool
> HeapTupleUpdateXmaxMatchesXmin(TransactionId xmax, HeapTupleHeader htup)
> {
> TransactionId xmin = HeapTupleHeaderGetXmin(htup);
>
> /*
> * If the xmax of the old tuple is identical to the xmin of the new one,
> * it's a match.
> */
> if (TransactionIdEquals(xmax, xmin))
> return true;
>
> /*
> * When a tuple is frozen, the original Xmin is lost, but we know it's a
> * committed transaction. So unless the Xmax is InvalidXid, we don't
> * know for certain that there is a match, but there may be one; and we
> * must return true so that a HOT chain that is half-frozen can be walked
> * correctly.
> */
> if (TransactionIdEquals(xmin, FrozenTransactionId) &&
> TransactionIdIsValid(xmax))
> return true;
>
> return false;
> }

Those are clearly improvements.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 13:36:17
Message-ID:	CAB7nPqS_CrdeHEKHd_biUx_Ua3-HWxu3phbbs4PripvpfZvbwQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed. There must be something new in
> 9.6 that is causing the problem to reappear.

The freeze visibility map has been introduced in 9.6... There could be
interactions on this side.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 13:57:12
Message-ID:	20171006135712.qqcuhrmrubk57ode@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Michael Paquier wrote:
> On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed. There must be something new in
> > 9.6 that is causing the problem to reappear.
>
> The freeze visibility map has been introduced in 9.6... There could be
> interactions on this side.

Ah, thanks for the tip. I hope the authors of that can do the gruntwork
of researching this problem there, then. I'll go commit what I have
now.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 14:06:38
Message-ID:	CAB7nPqS+EV4k8stkhTe7ofP9Pn-JzksVRaimGC=qSmHjkPVcNQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 10:57 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Ah, thanks for the tip. I hope the authors of that can do the gruntwork
> of researching this problem there, then.

I have some stuff using 9.6 extensively, so like Dan I think I'll
chime in anyway. Not before Tuesday though, long weekend in Japan
ahead.

> I'll go commit what I have now.

As far as I saw this set definitely improves the situation.
--
Michael

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 14:11:40
Message-ID:	20171006141140.GB4628@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Robert,

* Alvaro Herrera (alvherre(at)alvh(dot)no-ip(dot)org) wrote:
> Michael Paquier wrote:
> > On Fri, Oct 6, 2017 at 10:18 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > > Wood, Dan wrote:
> > >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
> > >>
> > >> I would prefer to focus on either latest 9X or 11dev.
> > >
> > > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > > (with the patch, I waited 10x as many iterations as it took for the
> > > problem to occur ~10 times without the patch), but I can reproduce a
> > > problem in 9.6 with my patch installed. There must be something new in
> > > 9.6 that is causing the problem to reappear.
> >
> > The freeze visibility map has been introduced in 9.6... There could be
> > interactions on this side.
>
> Ah, thanks for the tip. I hope the authors of that can do the gruntwork
> of researching this problem there, then. I'll go commit what I have
> now.

I don't doubt you're watching this thread too, but just to be 110% sure
that we don't end up with the November releases still having this issue,
I'm adding you to the CC on this thread as the one who did the freeze
visibility map work. Depending on hope here is a bit too squishy for me
when we're talking about corruption issues.

Thanks!

Stephen

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 14:59:12
Message-ID:	20171006145912.kjqxbsw5esqrmrkv@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

By the way, I still wonder if there's any way for a new tuple to get
inserted in the place where a HOT redirect would be pointing to, and
have it be marked as Frozen, where the old redirect contains a
non-invalid Xmax. I tried to think of a way for that to happen, but
couldn't think of anything.

What I imagine is a sequence like this:

1. insert a tuple
2. HOT-update a tuple
3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
4. start transaction
5. HOT-update the tuple again, creating HOT in lp 3
6. abort transaction (leaving aborted update in lp 3)
7. somehow remove tuple from lp 3, make slot available
8. new transaction comes along, inserts tuple in lp 3
9. somebody freezes tuple in lp3 (???)

Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
the tuple is part of the chain because of an xid "match".

Basically from step 7 onwards I don't think this is possible, but maybe
I'm just blind.

Maybe we can forestall the problem by checking whether the Xmax
TransactionIdIsCurrentTransaction || TransactionIdDidCommit (or some
variant thereof). This would be very slow but safer; and in 9.4 and up
we'd only need to do it if the xmin value is actually FrozenXid which
should be rare (only in pages upgraded from 9.3).

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 17:03:51
Message-ID:	CAH2-Wz=-eBQTEAfb_vnzMpYqH0EGEiohcYSmc=ykH_Y9fNywww@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Wood, Dan wrote:
>> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
>>
>> I would prefer to focus on either latest 9X or 11dev.
>
> I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> (with the patch, I waited 10x as many iterations as it took for the
> problem to occur ~10 times without the patch), but I can reproduce a
> problem in 9.6 with my patch installed. There must be something new in
> 9.6 that is causing the problem to reappear.

What problem persists? The original one (or, at least, the original
symptom of pruning HOT chains incorrectly)? If that's what you mean, I
wouldn't be so quick to assume that it's the freeze map.

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 17:49:56
Message-ID:	20171006174956.4kofump6jcmlnjci@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 6:18 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Wood, Dan wrote:
> >> Yes, I’ve been testing 9.6. I’ll try Alvaro’s patch today.
> >>
> >> I would prefer to focus on either latest 9X or 11dev.
> >
> > I tested my patch on 9.4 and 9.5 today and it seems to close the problem
> > (with the patch, I waited 10x as many iterations as it took for the
> > problem to occur ~10 times without the patch), but I can reproduce a
> > problem in 9.6 with my patch installed. There must be something new in
> > 9.6 that is causing the problem to reappear.
>
> What problem persists? The original one (or, at least, the original
> symptom of pruning HOT chains incorrectly)? If that's what you mean, I
> wouldn't be so quick to assume that it's the freeze map.

I can tell that, in 9.6, REINDEX still reports the error we saw in
earlier releases, after some of the runs of my reproducer scripts. I'm
unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan
originally reported anywhere, either.

I don't know if it's really the freeze map at fault or something else.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 18:07:45
Message-ID:	CAH2-Wzk7paqsOQexUDiukuZMK08r-=MuOrAtntn-8jZ75mo5rQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 7:59 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> By the way, I still wonder if there's any way for a new tuple to get
> inserted in the place where a HOT redirect would be pointing to, and
> have it be marked as Frozen, where the old redirect contains a
> non-invalid Xmax. I tried to think of a way for that to happen, but
> couldn't think of anything.
>
> What I imagine is a sequence like this:
>
> 1. insert a tuple
> 2. HOT-update a tuple
> 3. prune the page, making lp 1 be a redirect (lp 2 is the new tuple)
> 4. start transaction
> 5. HOT-update the tuple again, creating HOT in lp 3
> 6. abort transaction (leaving aborted update in lp 3)
> 7. somehow remove tuple from lp 3, make slot available
> 8. new transaction comes along, inserts tuple in lp 3
> 9. somebody freezes tuple in lp3 (???)
>
> Then we follow the HOT chain, see that Xmin=2 in lp3 and conclude that
> the tuple is part of the chain because of an xid "match".
>
> Basically from step 7 onwards I don't think this is possible, but maybe
> I'm just blind.

For the record, I also think that this is impossible, in part because
pruning requires a cleanup buffer lock (and because HOT chains cannot
span pages). I wouldn't say that I am 100% confident about this,
though.

BTW, is this comment block that appears above
heap_prepare_freeze_tuple() now obsolete, following 20b65522 (and
maybe much earlier commits)?

* NB: It is not enough to set hint bits to indicate something is
* committed/invalid -- they might not be set on a standby, or after crash
* recovery. We really need to remove old xids.
*/

We WAL-log setting hint bits during freezing now, iff tuple xmin is
before the Xid cutoff and tuple is a heap-only tuple.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 18:34:20
Message-ID:	CAH2-WznvU6=x3z7ND_c7Nffx6_t-k8OQWE64H_Da5bvSCwSQFA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 10:49 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> I can tell that, in 9.6, REINDEX still reports the error we saw in
> earlier releases, after some of the runs of my reproducer scripts. I'm
> unable to reproduce it anymore in 9.3 to 9.5. I can't see the one Dan
> originally reported anywhere, either.

You mean the enhanced stress-test that varied fillfactor, added filler
columns, and so on [1]? Can you post that to the list, please? I think
that several of us would like to have a reproducible test case.

> I don't know if it's really the freeze map at fault or something else.

Ideally, it would be possible to effectively disable the new freeze
map stuff in a minimal way, for testing purposes. Perhaps the authors
of that patch, CC'd, can suggest a way to do that.

If I had to guess, I'd say that it's just as likely that the issue is
only reproducible on 9.6 because of the enhancements added in that
release that improved buffer pinning (the use of atomic ops to pin
buffers, moving buffer content locks into buffer descriptors, etc). It
was already a bit tricky to get the problem that remained after
20b6552 but before today's a5736bf to reproduce with Dan's script. It
often took me 4 or 5 attempts. (I wonder what it looks like with your
enhanced version of that script -- the one that I just asked about.)

It seems possible that we've merely reduced the window for the race to
the point that it's practically (though not theoretically) impossible
to reproduce the problem on versions < 9.6, though not on 9.6+.
Applying Occam's razor, the problem doesn't seem particularly likely
to be in the freeze map stuff, which isn't actually all that closely
related.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 19:27:26
Message-ID:	CAH2-WzkR0m4ZBrPgvUF2GaLQtwimN6xhXBbGzBc51mNfZeo73w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>> I don't know if it's really the freeze map at fault or something else.
>
> Ideally, it would be possible to effectively disable the new freeze
> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

Actually, the simplest thing might be to just use pg_visibility's
pg_check_frozen() to check that the visibility/freeze map accurately
summarizes the all-frozen status of tuples in the heap. If that
doesn't indicate that there is corruption, we can be fairly confident
that the problem is elsewhere. The metadata in the visibility/freeze
map should be accurate when a bit is set to indicate that an entire
heap page is all-frozen (or, separately, all-visible). We can hardly
expect it to have better information that the authoritative source of
truth, the heap itself.

The more I think about it, the more I tend to doubt that the remaining
problems are with the freeze map. If the freeze map was wrong, and
incorrectly said that a page was all-frozen, then surely the outward
symptoms would take a long time to show up, as they always do when we
accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
that that's the only meaningful way that the freeze map can be wrong
-- it only promises to be accurate when it says that no further
freezing is needed for a page/bit.

--
Peter Geoghegan

From:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 21:09:12
Message-ID:	1507324152711.78083@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

>On Fri, Oct 6, 2017 at 11:34 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>>> I don't know if it's really the freeze map at fault or something else.
>>
>> Ideally, it would be possible to effectively disable the new freeze
>> map stuff in a minimal way, for testing purposes. Perhaps the authors
> of that patch, CC'd, can suggest a way to do that.

>Actually, the simplest thing might be to just use pg_visibility's
>pg_check_frozen() to check that the visibility/freeze map accurately
>summarizes the all-frozen status of tuples in the heap. If that
>doesn't indicate that there is corruption, we can be fairly confident
>that the problem is elsewhere. The metadata in the visibility/freeze
>map should be accurate when a bit is set to indicate that an entire
>heap page is all-frozen (or, separately, all-visible). We can hardly
>expect it to have better information that the authoritative source of
>truth, the heap itself.

>The more I think about it, the more I tend to doubt that the remaining
>problems are with the freeze map. If the freeze map was wrong, and
>incorrectly said that a page was all-frozen, then surely the outward
>symptoms would take a long time to show up, as they always do when we
>accidentally fail to freeze a tuple before a relfrozenxid cutoff. ISTM
>that that's the only meaningful way that the freeze map can be wrong
>-- it only promises to be accurate when it says that no further
>freezing is needed for a page/bit.

Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
None of the all-frozen or all-visible bits are necessarily set in problematic pages.

ERROR: failed to find parent tuple for heap-only tuple at (0,182) in table "accounts"

Even when the bits were set, I haven't found issues with the pg_check_xxx functions in the dozens of times I've run them.

postgres=# select * from pg_visibility('accounts'::regclass);
blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
0 | f | f | f
1 | t | t | t
(2 rows)

postgres=# select pg_check_visible('accounts'::regclass);
pg_check_visible
------------------
(0 rows)

postgres=# select pg_check_frozen('accounts'::regclass);
pg_check_frozen
-----------------
(0 rows)

Yi Wen

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-06 22:15:07
Message-ID:	CAH2-Wzn72rWQ52p=PuZ1KLHQaGD+sGd8DoKLfsPY=xSziVukyg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong(at)amazon(dot)com> wrote:
> Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
> None of the all-frozen or all-visible bits are necessarily set in problematic pages.

Since this happened yesterday, I assume it was with an unfixed version?

As you must have seen, Alvaro said he has a variant of Dan's original
script that demonstrates that a problem remains, at least on 9.6+,
even with today's fix. I think it's the stress-test that plays with
fillfactor, many clients, etc [1].

I've tried to independently reproduce the problem on the master
branch's current tip, with today's new fix, but cannot break things
despite trying many variations. I cannot reproduce the problem that
Alvaro still sees.

I'll have to wait until Alvaro posts his repro to the list before
commenting further, which I assume he'll post as soon as he can. There
doesn't seem to be much point in not waiting for that.

[1] https://postgr.es/m/20171005162402.jahqflf3mekileqm@alvherre.pgsql
--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-07 08:31:06
Message-ID:	20171007083106.crbmf7acrlhdb62k@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:
> On Fri, Oct 6, 2017 at 2:09 PM, Wong, Yi Wen <yiwong(at)amazon(dot)com> wrote:
> > Yesterday, I've been spending time with pg_visibility on the pages when I reproduce the issue in 9.6.
> > None of the all-frozen or all-visible bits are necessarily set in problematic pages.
>
> Since this happened yesterday, I assume it was with an unfixed version?
>
> As you must have seen, Alvaro said he has a variant of Dan's original
> script that demonstrates that a problem remains, at least on 9.6+,
> even with today's fix. I think it's the stress-test that plays with
> fillfactor, many clients, etc [1].

I just execute setup.sql once and then run this shell command,

while :; do
psql -e -P pager=off -f ./repro.sql
for i in `seq 1 5`; do
psql -P pager=off -e --no-psqlrc -f ./lock.sql &
done
wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
psql -P pager=off -e --no-psqlrc -f ./report.sql
echo "done"
done

Note that you need to use pg10's psql because of the \if lines in
lock.sql. For some runs I change the values to compare random() to, and
originally the commented out section in lock.sql was not commented out,
but I'm fairly sure the failures I saw where with this version. Also, I
sometime change the 5 in the `seq` command to higher values (180, 250).

I didn't find the filler column to have any effect, so I took that out.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
lock.sql	text/plain	607 bytes
reindex.sql	text/plain	25 bytes
report.sql	text/plain	481 bytes
repro.sql	text/plain	176 bytes
setup.sql	text/plain	1.7 KB

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-07 23:22:08
Message-ID:	CAH2-Wz=hzf9R0VxMyqcSm9TikeNL+cXVQsciiSjDZOCspabW7w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> As you must have seen, Alvaro said he has a variant of Dan's original
>> script that demonstrates that a problem remains, at least on 9.6+,
>> even with today's fix. I think it's the stress-test that plays with
>> fillfactor, many clients, etc [1].
>
> I just execute setup.sql once and then run this shell command,
>
> while :; do
> psql -e -P pager=off -f ./repro.sql
> for i in `seq 1 5`; do
> psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> done
> wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> psql -P pager=off -e --no-psqlrc -f ./report.sql
> echo "done"
> done

I cannot reproduce the problem on my personal machine using this
script/stress-test. I tried to do so on the master branch git tip.
This reinforces the theory that there is some timing sensitivity,
because the remaining race condition is very narrow.

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-07 23:25:24
Message-ID:	20171007232524.sdpi3jk2636fvjge@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:
> On Sat, Oct 7, 2017 at 1:31 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >> As you must have seen, Alvaro said he has a variant of Dan's original
> >> script that demonstrates that a problem remains, at least on 9.6+,
> >> even with today's fix. I think it's the stress-test that plays with
> >> fillfactor, many clients, etc [1].
> >
> > I just execute setup.sql once and then run this shell command,
> >
> > while :; do
> > psql -e -P pager=off -f ./repro.sql
> > for i in `seq 1 5`; do
> > psql -P pager=off -e --no-psqlrc -f ./lock.sql &
> > done
> > wait && psql -P pager=off -e --no-psqlrc -f ./reindex.sql
> > psql -P pager=off -e --no-psqlrc -f ./report.sql
> > echo "done"
> > done
>
> I cannot reproduce the problem on my personal machine using this
> script/stress-test. I tried to do so on the master branch git tip.
> This reinforces the theory that there is some timing sensitivity,
> because the remaining race condition is very narrow.

Hmm, I think I added a random sleep (max. 100ms) right after the
HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
makes the race easier to hit.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-08 17:29:06
Message-ID:	CAH2-Wzmfc7pv-WV7aphJrUaOaEo6n_2EBowPdQrjTiMFWOwWdw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Hmm, I think I added a random sleep (max. 100ms) right after the
> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
> makes the race easier to hit.

I still cannot reproduce. Perhaps you can be more specific?

--
Peter Geoghegan

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-09 06:11:16
Message-ID:	835EB743-A57A-49AE-ADA7-A16ACCB737D7@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I’m unclear on what is being repro’d in 9.6. Are you getting the duplicate rows problem or just the reindex problem? Are you testing with asserts enabled(I’m not)?

If you are getting the dup rows consider the code in the block in heapam.c that starts with the comment “replace multi by update xid”.
When I repro this I find that MultiXactIdGetUpdateXid() returns 0. There is an updater in the multixact array however the status is MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I assume this is a preliminary status before the following row in the hot chain has it’s multixact set to NoKeyUpdate.

Since a 0 is returned this does precede cutoff_xid and TransactionIdDidCommit(0) will return false. This ends up aborting the multixact on the row even though the real xid is committed. This sets XMAX to 0 and that row becomes visible as one of the dups. Interestingly the real xid of the updater is 122944 and the cutoff_xid is 122945.

I’m still debugging but I start late so I’m passing this incomplete info along now.

On 10/7/17, 4:25 PM, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-10 01:00:10
Message-ID:	CAB7nPqSrXiFEkv7Ug8jSmAuH-4tF65pPYATb-J1PzB5KAP86DA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Mon, Oct 9, 2017 at 2:29 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Sat, Oct 7, 2017 at 4:25 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>> Hmm, I think I added a random sleep (max. 100ms) right after the
>> HeapTupleSatisfiesVacuum call in vacuumlazy.c (lazy_scan_heap), and that
>> makes the race easier to hit.
>
> I still cannot reproduce. Perhaps you can be more specific?

I have been trying to reproduce things for a total of 4 hours, testing
various variations of the proposed test cases (killed Postgres,
changed fillfactor, manual sleep calls), but I am proving unable to
see a regression as well. I would not think that the OS matters here,
all my attempts were on macos with assertions and debugging enabled.

At least the code is now more stable, which is definitely a good thing.
--
Michael

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-10 14:14:44
Message-ID:	20171010141444.s5tadnstsednska6@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Wood, Dan wrote:
> I’m unclear on what is being repro’d in 9.6. Are you getting the
> duplicate rows problem or just the reindex problem? Are you testing
> with asserts enabled(I’m not)?

I was seeing just the reindex problem. I don't see any more dups.

But I've tried to reproduce it afresh now, and let it run for a long
time and nothing happened. Maybe I made a mistake last week and
ran an unfixed version. I don't see any more problems now.

> If you are getting the dup rows consider the code in the block in
> heapam.c that starts with the comment “replace multi by update xid”.
>
> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
> There is an updater in the multixact array however the status is
> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
> assume this is a preliminary status before the following row in the
> hot chain has it’s multixact set to NoKeyUpdate.

Yes, the "For" version is the locker version rather than the actual
update. That lock is acquired by EvalPlanQual locking the row just
before doing the update. I think GetUpdateXid has no reason to return
such an Xid, since it's not an update.

> Since a 0 is returned this does precede cutoff_xid and
> TransactionIdDidCommit(0) will return false. This ends up aborting
> the multixact on the row even though the real xid is committed. This
> sets XMAX to 0 and that row becomes visible as one of the dups.
> Interestingly the real xid of the updater is 122944 and the cutoff_xid
> is 122945.

I haven't seen this effect. Please keep us updated if you're able to
verify corruption this way.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-11 02:25:46
Message-ID:	CAB7nPqRCR1DOZoOwAB42MDAgQUqsokAsFdKDdj4ka7BWQusQpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update. That lock is acquired by EvalPlanQual locking the row just
> before doing the update. I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false. This ends up aborting
>> the multixact on the row even though the real xid is committed. This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
--
Michael

From:	"Wood, Dan" <hexpert(at)amazon(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-11 02:31:13
Message-ID:	2BD5E9A0-4166-4CD6-8A66-2293AFE533AE@amazon.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid.
No dup’s. No index corruption.

Thanks so much.

On 10/10/17, 7:25 PM, "Michael Paquier" <michael(dot)paquier(at)gmail(dot)com> wrote:

On Tue, Oct 10, 2017 at 11:14 PM, Alvaro Herrera
<alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> I was seeing just the reindex problem. I don't see any more dups.
>
> But I've tried to reproduce it afresh now, and let it run for a long
> time and nothing happened. Maybe I made a mistake last week and
> ran an unfixed version. I don't see any more problems now.

Okay, so that's one person more going to this trend, making three with
Peter and I.

>> If you are getting the dup rows consider the code in the block in
>> heapam.c that starts with the comment “replace multi by update xid”.
>>
>> When I repro this I find that MultiXactIdGetUpdateXid() returns 0.
>> There is an updater in the multixact array however the status is
>> MultiXactStatusForNoKeyUpdate and not MultiXactStatusNoKeyUpdate. I
>> assume this is a preliminary status before the following row in the
>> hot chain has it’s multixact set to NoKeyUpdate.
>
> Yes, the "For" version is the locker version rather than the actual
> update. That lock is acquired by EvalPlanQual locking the row just
> before doing the update. I think GetUpdateXid has no reason to return
> such an Xid, since it's not an update.
>
>> Since a 0 is returned this does precede cutoff_xid and
>> TransactionIdDidCommit(0) will return false. This ends up aborting
>> the multixact on the row even though the real xid is committed. This
>> sets XMAX to 0 and that row becomes visible as one of the dups.
>> Interestingly the real xid of the updater is 122944 and the cutoff_xid
>> is 122945.
>
> I haven't seen this effect. Please keep us updated if you're able to
> verify corruption this way.

Me neither. It would be nice to not live long with such a sword of Damocles.
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	"Wood, Dan" <hexpert(at)amazon(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-10-11 02:35:20
Message-ID:	CAB7nPqSQDqd2h8xppL+2YpCzanObokuAMcP_HJNVn8NOZxkHpw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Wed, Oct 11, 2017 at 11:31 AM, Wood, Dan <hexpert(at)amazon(dot)com> wrote:
> I found one glitch with our merge of the original dup row fix. With that corrected AND Alvaro’s Friday fix things are solid.
> No dup’s. No index corruption.
>
> Thanks so much.

Nice to hear that! You guys seem to be doing extensive testing and
actually report back about it, which is really nice to see.
--
Michael

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 11:20:19
Message-ID:	20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-09-28 14:47:53 +0000, Alvaro Herrera wrote:
> Fix freezing of a dead HOT-updated tuple
>
> Vacuum calls page-level HOT prune to remove dead HOT tuples before doing
> liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But
> concurrent transaction commit/abort may turn DEAD some of the HOT tuples
> that survived the prune, before HeapTupleSatisfiesVacuum tests them.
> This happens to activate the code that decides to freeze the tuple ...
> which resuscitates it, duplicating data.
>
> (This is especially bad if there's any unique constraints, because those
> are now internally violated due to the duplicate entries, though you
> won't know until you try to REINDEX or dump/restore the table.)
>
> One possible fix would be to simply skip doing anything to the tuple,
> and hope that the next HOT prune would remove it. But there is a
> problem: if the tuple is older than freeze horizon, this would leave an
> unfrozen XID behind, and if no HOT prune happens to clean it up before
> the containing pg_clog segment is truncated away, it'd later cause an
> error when the XID is looked up.
>
> Fix the problem by having the tuple freezing routines cope with the
> situation: don't freeze the tuple (and keep it dead). In the cases that
> the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag
> so that there is no need to look up the XID in pg_clog later on.

I think this is the wrong fix - the assumption that ctid chains can be
validated based on the prev-xmax = cur-xmin is fairly ingrained into the
system, and we shouldn't just be breaking it. The need to later
lobotomize the checks, in a5736bf754, is some evidence of that.

I spent some time discussing this with Robert today (with both of us
alternating between feeling the other and ourselves as stupid), and the
conclusion I think is that the problem is on the pruning, rather than
the freezing side.

FWIW, I don't think the explanation in the commit message of how the
problem triggers is actually correct - the testcase you added doesn't
have any transactions concurrently committing / aborting when
crashing. Rather the problem is that the liveliness checks for freezing
is different from the ones for pruning. HTSV considers xmin
RECENTLY_DEAD when there's a multi xmax with at least one alive locker,
whereas pruning thinks it has to do something because there's the member
xid below the cutoff. No concurrent activity is needed to trigger that.

I think the problem is on the pruning, rather than the freezing side. We
can't freeze a tuple if it has an alive predecessor - rather than
weakining this, we should be fixing the pruning to not have the alive
predecessor.

The relevant logic in HTSV is:

if (tuple->t_infomask & HEAP_XMAX_IS_MULTI)
{
TransactionId xmax;

if (MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
{
/* already checked above */
Assert(!HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_infomask));

xmax = HeapTupleGetUpdateXid(tuple);

/* not LOCKED_ONLY, so it has to have an xmax */
Assert(TransactionIdIsValid(xmax));

if (TransactionIdIsInProgress(xmax))
return HEAPTUPLE_DELETE_IN_PROGRESS;
else if (TransactionIdDidCommit(xmax))
/* there are still lockers around -- can't return DEAD here */
return HEAPTUPLE_RECENTLY_DEAD;
/* updating transaction aborted */
return HEAPTUPLE_LIVE;

with the problematic branch being the TransactionIdDidCommit()
case. Rather than unconditionally returning HEAPTUPLE_RECENTLY_DEAD, we
should test the update xid against OldestXmin and return DEAD /
RECENTLY_DEAD according to that.

If the update xmin is actually below the cutoff we can remove the tuple
even if there's live lockers - the lockers will also be present in the
newer version of the tuple. I verified that for me that fixes the
problem. Obviously that'd require some comment work and more careful
diagnosis.

I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
the face of previously corrupted tuple chains and pg_upgraded clusters -
it can lead to tuples being considered related, even though they they're
from entirely independent hot chains. Especially when upgrading 9.3 post
your fix, to current releases.

In short, I think the two commits should be reverted, and replaced with
a fix along what I'm outlining above.

There'll be some trouble for people that upgraded to an unreleased
version, but I don't really see what we could do about that.

I could be entirely wrong - I've been travelling for the last two weeks
and my brain is somewhat more fried than usual.

Regards,

Andres

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 12:25:27
Message-ID:	CA+Tgmob1bvRwMK=V6e_C9RrF1G409MiEG7xsKKrUvsTmm1SsFQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 4:50 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

I think this is a key point. If the new behavior were merely not
entirely correct, we could perhaps refine it later. But it's not only
not correct - it actually has the potential to create new problems
that didn't exist before those commits. And if we release without
reverting those commits then we can't change our mind later.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 12:49:47
Message-ID:	20171102124947.hshl7znx6ixy62pf@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Andres Freund wrote:

> I spent some time discussing this with Robert today (with both of us
> alternating between feeling the other and ourselves as stupid), and the
> conclusion I think is that the problem is on the pruning, rather than
> the freezing side.

Thanks both for spending some more time on this.

> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

I gave a look at HTSV back then, but I didn't find what the right tweak
was, but then I only tried changing the return value to DEAD and
DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
on OldestXmin didn't occur to me ... I was thinking that the fact that
there were live lockers meant that the tuple could not be removed,
obviously failing to notice that the subsequent versions of the tuple
would be good enough.

> If the update xmin is actually below the cutoff we can remove the tuple
> even if there's live lockers - the lockers will also be present in the
> newer version of the tuple. I verified that for me that fixes the
> problem. Obviously that'd require some comment work and more careful
> diagnosis.

Sounds good.

I'll revert those commits then, keeping the test, and then you can
commit your change. OK?

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 13:05:51
Message-ID:	20171102130551.i2w33sqdw3bdvetn@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> Andres Freund wrote:
> > I think the problem is on the pruning, rather than the freezing side. We
> > can't freeze a tuple if it has an alive predecessor - rather than
> > weakining this, we should be fixing the pruning to not have the alive
> > predecessor.
>
> I gave a look at HTSV back then, but I didn't find what the right tweak
> was, but then I only tried changing the return value to DEAD and
> DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> on OldestXmin didn't occur to me ... I was thinking that the fact that
> there were live lockers meant that the tuple could not be removed,
> obviously failing to notice that the subsequent versions of the tuple
> would be good enough.

I'll try to write up a commit based on that idea. I think there's some
comment work needed too, Robert and I were both confused by a few
things.
I'm unfortunately travelling atm - it's evening here, and I'll flying
back to the US all Saturday. I'm fairly sure I'll be able to come up
with a decent patch tomorrow, but I'll need review and testing by
others.

> > If the update xmin is actually below the cutoff we can remove the tuple
> > even if there's live lockers - the lockers will also be present in the
> > newer version of the tuple. I verified that for me that fixes the
> > problem. Obviously that'd require some comment work and more careful
> > diagnosis.
>
> Sounds good.
>
> I'll revert those commits then, keeping the test, and then you can
> commit your change. OK?

Generally that sounds good - but you can't keep the testcase in without
the new fix - the buildfarm would immediately turn red. I guess the best
thing would be to temporarily remove it from the schedule?

Do we care about people upgrading to unreleased versions? We could do
nothing, document it in the release notes, or ???

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 13:58:42
Message-ID:	20648.1509631122@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Do we care about people upgrading to unreleased versions? We could do
> nothing, document it in the release notes, or ???

Do nothing.

regards, tom lane

From:	Stephen Frost <sfrost(at)snowman(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 14:12:23
Message-ID:	20171102141223.GF4628@tamriel.snowman.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > Do we care about people upgrading to unreleased versions? We could do
> > nothing, document it in the release notes, or ???
>
> Do nothing.

Agreed. Not much we can do there.

Thanks!

Stephen

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 14:55:25
Message-ID:	20171102145525.ptv7z7c4hrdpou5c@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> > > Do we care about people upgrading to unreleased versions? We could do
> > > nothing, document it in the release notes, or ???
> >
> > Do nothing.
>
> Agreed. Not much we can do there.

Pushed the reverts.

I noticed while doing so that REL_10_STABLE contains the bogus commits.
Does that change our opinion regarding what to do for people upgrading
to a version containing the broken commits? I don't think so, because

1) we hope that not many people will trust their data to 10.0
immediately after release
2) the bug is very low probability
3) it doesn't look like we can do a lot about it anyway.

I'll experiment with Andres' proposed fix now.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 16:44:34
Message-ID:	CA+Tgmoa5-sjqMzyKDUjR3NeNiH103WiBY1q8bXMdXA-x8qW2BA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 8:25 PM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Pushed the reverts.
>
> I noticed while doing so that REL_10_STABLE contains the bogus commits.
> Does that change our opinion regarding what to do for people upgrading
> to a version containing the broken commits? I don't think so, because
>
> 1) we hope that not many people will trust their data to 10.0
> immediately after release
> 2) the bug is very low probability
> 3) it doesn't look like we can do a lot about it anyway.

Just to be clear, it looks like "Fix freezing of a dead HOT-updated
tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
was stamped, but "Fix traversal of half-frozen update chains"
(22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
therefore unreleased at present.

Users of 10.0 who hit the code introduced by
46c35116ae1acc8826705ef2a7b5d9110f9d6e84 will have XIDs stored in the
xmax fields of tuples that predate relfrozenxid. Those tuples will be
hinted-committed. That's not good, but it might not really have much
in the way of consequences. *IF* the next VACUUM doesn't get confused
by the old XID, then it will prune the tuple then and I think we'll be
OK. And I think it won't, because it should just call
HeapTupleSatisfiesVacuum() and that should see that
HEAP_XMAX_COMMITTED is set and not actually try to consult the old
CLOG. If that hint bit can ever get lost - or fail to propagate to a
standby - then we have more trouble, but the fact that it's set by a
logged operation makes me hope that can't happen. Furthermore, that
follow-on VACUUM should indeed arrive in due time, because we will not
have marked the page all-visible -- HeapTupleSatisfiesVacuum() will
NOT have returned HEAPTUPLE_LIVE when called from lazy_scan_heap(),
and therefore we will have set all_visible = false.

The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
I think things get a lot more dangerous. The problem (as Andres
pointed out to me this afternoon) is that it seems possible to end up
with a situation where there should be two HOT chains on a page, and
because of the weakened xmin/xmax checking rules, we end up thinking
that the second one is a continuation of the first one, which will be
all kinds of bad news. That would be particularly likely to happen in
releases from before we invented HEAP_XMIN_FROZEN, when there's no
xmin/xmax matching at all, but could happen on later releases if we
are extraordinarily unlucky (i.e. xmin of the first tuple in the
second chain happens to be the same as the pre-freeze xmax in the old
chain, probably because the same XID was used to update the page in
two consecutive epochs). Fortunately, that commit is (I think) not
released anywhere.

Personally, I think it would be best to push the release out a week.
I think we understand this well enough now that we can fix it
relatively easily, but haste makes bugs, and (I know you're all tired
of hearing me say this) patches that implicate the on-disk format are
scary.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 16:47:26
Message-ID:	CAH2-Wzn42P7Vo7tb9iWJDBDmzmboPXRGeP1AabFQ_9VYyL3isQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 4:20 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I think the problem is on the pruning, rather than the freezing side. We
> can't freeze a tuple if it has an alive predecessor - rather than
> weakining this, we should be fixing the pruning to not have the alive
> predecessor.

Excellent catch.

I didn't even know that that was safe.

> I think a5736bf754c82d8b86674e199e232096c679201d might be dangerous in
> the face of previously corrupted tuple chains and pg_upgraded clusters -
> it can lead to tuples being considered related, even though they they're
> from entirely independent hot chains. Especially when upgrading 9.3 post
> your fix, to current releases.

Frankly, I'm relieved that you got to this. I was highly suspicious of
a5736bf754c82d8b86674e199e232096c679201d, even beyond my specific,
actionable concern about how it failed to handle the
9.3/FrozenTransactionId xmin case as special. As I went into in the
"heap/SLRU verification, relfrozenxid cut-off, and freeze-the-dead
bug" thread, these commits left us with a situation where there didn't
seem to be a reliable way of knowing whether or not it is safe to
interrogate clog for a given heap tuple using a tool like amcheck.
And, it wasn't obvious that you couldn't have a codepath that failed
to account for pre-cutoff non-frozen tuples -- codepaths that call
TransactionIdDidCommit() despite it actually being unsafe.

If I'm not mistaken, your proposed fix restores sanity there.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 16:56:39
Message-ID:	CAH2-WznXsOym3PkNSGdUCF5JK-4qjDxHN9JMjCfRDeCDsUUJnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
> I think things get a lot more dangerous. The problem (as Andres
> pointed out to me this afternoon) is that it seems possible to end up
> with a situation where there should be two HOT chains on a page, and
> because of the weakened xmin/xmax checking rules, we end up thinking
> that the second one is a continuation of the first one, which will be
> all kinds of bad news. That would be particularly likely to happen in
> releases from before we invented HEAP_XMIN_FROZEN, when there's no
> xmin/xmax matching at all, but could happen on later releases if we
> are extraordinarily unlucky (i.e. xmin of the first tuple in the
> second chain happens to be the same as the pre-freeze xmax in the old
> chain, probably because the same XID was used to update the page in
> two consecutive epochs). Fortunately, that commit is (I think) not
> released anywhere.

FWIW, if you look at the second commit
(22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
that it doesn't even treat those two cases differently. It was buggy
even on its own terms. The FrozenTransactionId test used an xmin from
HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

--
Peter Geoghegan

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 17:08:10
Message-ID:	4706.1509642490@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Just to be clear, it looks like "Fix freezing of a dead HOT-updated
> tuple" (46c35116ae1acc8826705ef2a7b5d9110f9d6e84) went in before 10.0
> was stamped, but "Fix traversal of half-frozen update chains"
> (22576734b805fb0952f9be841ca8f643694ee868) went in afterwards and is
> therefore unreleased at present.

Thanks for doing this analysis of the actual effects in 10.0.

> Personally, I think it would be best to push the release out a week.

I would only be in favor of that if there were some reason to think that
the bug is worse now than it's been in the four years since 9.3 was
released. Otherwise, we should ship the bug fixes we have on-schedule.
I think it's a very very safe bet that there are other data-loss-causing
bugs in there, so I see no good reason for panicking over this one.

> ... and (I know you're all tired of hearing me say this) patches that
> implicate the on-disk format are scary.

Agreed, but how is that relevant now that the bogus patches are reverted?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 17:08:44
Message-ID:	CA+TgmoYZteP7-1V2Lbb90=LEz25JFpFkAX=kb2WVDXV2OEddTA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 10:26 PM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Thu, Nov 2, 2017 at 9:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The second commit (22576734b805fb0952f9be841ca8f643694ee868) is where
>> I think things get a lot more dangerous. The problem (as Andres
>> pointed out to me this afternoon) is that it seems possible to end up
>> with a situation where there should be two HOT chains on a page, and
>> because of the weakened xmin/xmax checking rules, we end up thinking
>> that the second one is a continuation of the first one, which will be
>> all kinds of bad news. That would be particularly likely to happen in
>> releases from before we invented HEAP_XMIN_FROZEN, when there's no
>> xmin/xmax matching at all, but could happen on later releases if we
>> are extraordinarily unlucky (i.e. xmin of the first tuple in the
>> second chain happens to be the same as the pre-freeze xmax in the old
>> chain, probably because the same XID was used to update the page in
>> two consecutive epochs). Fortunately, that commit is (I think) not
>> released anywhere.
>
> FWIW, if you look at the second commit
> (22576734b805fb0952f9be841ca8f643694ee868) carefully, you'll realize
> that it doesn't even treat those two cases differently. It was buggy
> even on its own terms. The FrozenTransactionId test used an xmin from
> HeapTupleHeaderGetXmin(), not HeapTupleHeaderGetRawXmin().

Oh, wow. You seem to be correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 17:16:08
Message-ID:	CA+TgmoZRfPy5y5p1fOftENF6zesRu98fodZ9BKh2kOCgBqO6=Q@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 2, 2017 at 10:38 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Personally, I think it would be best to push the release out a week.
>
> I would only be in favor of that if there were some reason to think that
> the bug is worse now than it's been in the four years since 9.3 was
> released. Otherwise, we should ship the bug fixes we have on-schedule.
> I think it's a very very safe bet that there are other data-loss-causing
> bugs in there, so I see no good reason for panicking over this one.

Well, my thought was that delaying this release for a week would be
better than either (a) doing an extra minor release just to get this
fix out or (b) waiting another three months to release this fix. The
former seems like fairly unnecessary work, and the latter doesn't seem
particularly responsible. Users can't reasonably expect us to fix
data-loss-causing bugs that we don't know about yet, but they can
reasonably expect us to issue fixes promptly for ones that we do know
about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-02 17:28:27
Message-ID:	5536.1509643707@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Well, my thought was that delaying this release for a week would be
> better than either (a) doing an extra minor release just to get this
> fix out or (b) waiting another three months to release this fix. The
> former seems like fairly unnecessary work, and the latter doesn't seem
> particularly responsible. Users can't reasonably expect us to fix
> data-loss-causing bugs that we don't know about yet, but they can
> reasonably expect us to issue fixes promptly for ones that we do know
> about.

Our experience with "hold the release waiting for a fix for bug X"
decisions has been consistently bad. Furthermore, if we can't produce
a patch we trust by Monday, I would much rather that we not do it
in a rushed fashion at all. I think it would be entirely reasonable
to consider making off-cadence releases in, perhaps, a month, once
the dust is entirely settled.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-03 14:53:30
Message-ID:	20171103145330.5ycjoje5s6lfwxps@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-02 06:05:51 -0700, Andres Freund wrote:
> Hi,
>
> On 2017-11-02 13:49:47 +0100, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > I think the problem is on the pruning, rather than the freezing side. We
> > > can't freeze a tuple if it has an alive predecessor - rather than
> > > weakining this, we should be fixing the pruning to not have the alive
> > > predecessor.
> >
> > I gave a look at HTSV back then, but I didn't find what the right tweak
> > was, but then I only tried changing the return value to DEAD and
> > DELETE_IN_PROGRESS; the thought of selecting DEAD or RECENTLY_DEAD based
> > on OldestXmin didn't occur to me ... I was thinking that the fact that
> > there were live lockers meant that the tuple could not be removed,
> > obviously failing to notice that the subsequent versions of the tuple
> > would be good enough.
>
> I'll try to write up a commit based on that idea. I think there's some
> comment work needed too, Robert and I were both confused by a few
> things.
> I'm unfortunately travelling atm - it's evening here, and I'll flying
> back to the US all Saturday. I'm fairly sure I'll be able to come up
> with a decent patch tomorrow, but I'll need review and testing by
> others.

Here's that patch. I've stared at this some, and Robert did too. Robert
mentioned that the commit message might need some polish and I'm not
100% sure about the error message texts yet.

I'm not yet convinced that the new elog in vacuumlazy can never trigger
- but I also don't think we want to actually freeze the tuple in that
case.

Staring at the vacuumlazy hunk I think I might have found a related bug:
heap_update_tuple() just copies the old xmax to the new tuple's xmax if
a multixact and still running. It does so without verifying liveliness
of members. Isn't that buggy? Consider what happens if we have three
blocks: 1 has free space, two is being vacuumed and is locked, three is
full and has a tuple that's key share locked by a live tuple and is
updated by a dead xmax from before the xmin horizon. In that case afaict
the multi will be copied from the third page to the first one. Which is
quite bad, because vacuum already processed it, and we'll set
relfrozenxid accordingly. I hope I'm missing something here?

Greetings,

Andres Freund

Attachment	Content-Type	Size
0001-Fix-pruning-of-locked-and-updated-tuples.patch	text/x-diff	7.9 KB

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-03 19:36:59
Message-ID:	20171103193659.GA3404@marmot
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> wrote:
>Here's that patch. I've stared at this some, and Robert did too. Robert
>mentioned that the commit message might need some polish and I'm not
>100% sure about the error message texts yet.

The commit message should probably say that the bug involves the
resurrection of previously dead tuples, which is different to there
being duplicates because a constraint is not enforced because HOT chains
are broken (that's a separate, arguably less serious problem).

>Staring at the vacuumlazy hunk I think I might have found a related bug:
>heap_update_tuple() just copies the old xmax to the new tuple's xmax if
>a multixact and still running. It does so without verifying liveliness
>of members. Isn't that buggy? Consider what happens if we have three
>blocks: 1 has free space, two is being vacuumed and is locked, three is
>full and has a tuple that's key share locked by a live tuple and is
>updated by a dead xmax from before the xmin horizon. In that case afaict
>the multi will be copied from the third page to the first one. Which is
>quite bad, because vacuum already processed it, and we'll set
>relfrozenxid accordingly. I hope I'm missing something here?

Can you be more specific about what you mean here? I think that I
understand where you're going with this, but I'm not sure.

--
Peter Geoghegan

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-03 19:52:04
Message-ID:	20171103195204.omoi4olywmv3nyqk@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Peter Geoghegan wrote:
> Andres Freund <andres(at)anarazel(dot)de> wrote:

> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running. It does so without verifying liveliness
> > of members. Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one. Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly. I hope I'm missing something here?
>
> Can you be more specific about what you mean here? I think that I
> understand where you're going with this, but I'm not sure.

He means that the tuple that heap_update moves to page 1 (which will no
longer be processed by vacuum) will contain a multixact that's older
than relminmxid -- because it is copied unchanged by heap_update instead
of properly checking against age limit.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-03 20:21:23
Message-ID:	20171103202123.GA6004@marmot
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>He means that the tuple that heap_update moves to page 1 (which will no
>longer be processed by vacuum) will contain a multixact that's older
>than relminmxid -- because it is copied unchanged by heap_update instead
>of properly checking against age limit.

I see. The problem is more or less with this heap_update() code:

/*
* And also prepare an Xmax value for the new copy of the tuple. If there
* was no xmax previously, or there was one but all lockers are now gone,
* then use InvalidXid; otherwise, get the xmax from the old tuple. (In
* rare cases that might also be InvalidXid and yet not have the
* HEAP_XMAX_INVALID bit set; that's fine.)
*/
if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

My naive guess is that we have to create a new MultiXactId here in at
least some cases, just like FreezeMultiXactId() sometimes does.

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>,Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>,pgsql-committers(at)postgresql(dot)org,"Wong, Yi Wen" <yiwong(at)amazon(dot)com>,Michael Paquier <michael(dot)paquier(at)gmail(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-03 20:33:45
Message-ID:	698409B0-5E70-4F88-A07B-7460C674874B@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On November 4, 2017 1:22:04 AM GMT+05:30, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>Peter Geoghegan wrote:
>> Andres Freund <andres(at)anarazel(dot)de> wrote:
>
>> > Staring at the vacuumlazy hunk I think I might have found a related
>bug:
>> > heap_update_tuple() just copies the old xmax to the new tuple's
>xmax if
>> > a multixact and still running. It does so without verifying
>liveliness
>> > of members. Isn't that buggy? Consider what happens if we have
>three
>> > blocks: 1 has free space, two is being vacuumed and is locked,
>three is
>> > full and has a tuple that's key share locked by a live tuple and is
>> > updated by a dead xmax from before the xmin horizon. In that case
>afaict
>> > the multi will be copied from the third page to the first one.
>Which is
>> > quite bad, because vacuum already processed it, and we'll set
>> > relfrozenxid accordingly. I hope I'm missing something here?
>>
>> Can you be more specific about what you mean here? I think that I
>> understand where you're going with this, but I'm not sure.
>
>He means that the tuple that heap_update moves to page 1 (which will no
>longer be processed by vacuum) will contain a multixact that's older
>than relminmxid -- because it is copied unchanged by heap_update
>instead
>of properly checking against age limit.

Right. That, or a member xid below relminxid. I think both scenarios are possible.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-04 13:15:00
Message-ID:	20171104131500.w7enyo7olxk7ygca@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-03 12:36:59 -0700, Peter Geoghegan wrote:
> Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Here's that patch. I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
>
> The commit message should probably say that the bug involves the
> resurrection of previously dead tuples, which is different to there
> being duplicates because a constraint is not enforced because HOT chains
> are broken (that's a separate, arguably less serious problem).

The reason for that is that I hadn't yet quite figured out how the bug I
described in the commit message (and the previously committed testcase)
would cause that. I was planning to diagnose / experiment more with this
and write an email if I couldn't come up with an explanation. The
committed test does *not* actually trigger that.

The reason I couldn't quite figure out how the problem triggers is that
the xmax removing branch in FreezeMultiXactId() can only be reached if
the multi is from before the cutoff - which it can't have been for a
single vacuum execution to trigger the bug, because that'd require the
multi to be running to falsely return RECENTLY_DEAD rather than DEAD (by
definition a multi can't be below the cutoff if running).

For the problem to occur I think vacuum has to be executed *twice*: The
first time through HTSV mistakenly returns RECENTLY_DEAD preventing the
tuple from being pruned. That triggers FreezeMultiXactId() to create a
*new* multi with dead members. At this point the database already is in
a bad state. Then in a second vacuum HTSV returns DEAD, but
* Ordinarily, DEAD tuples would have been removed by
* heap_page_prune(), but it's possible that the tuple
* state changed since heap_page_prune() looked. In
* particular an INSERT_IN_PROGRESS tuple could have
* changed to DEAD if the inserter aborted. So this
* cannot be considered an error condition.
*
..
if (HeapTupleIsHotUpdated(&tuple) ||
HeapTupleIsHeapOnly(&tuple))
{
nkeep += 1;

prevents the tuple from being removed. If now the multi xmax is below
the xmin horizon it triggers
/*
* If the xid is older than the cutoff, it has to have aborted,
* otherwise the tuple would have gotten pruned away.
*/
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdDidCommit(xid))
elog(ERROR, "can't freeze committed xmax");
*flags |= FRM_INVALIDATE_XMAX;
in FreezeMultiXact. Without the new elog, this then causes xmax to be
removed, reviving the tuple.

The current testcase, and I think the descriptions in the relevant
threads, all actually fail to point out the precise way the bug is
triggered. As e.g. evidenced that the freeze-the-dead case appears to
not cause any failures in !assertion builds even if the bug is present.

The good news is that the error checks I added in the patch upthread
prevent all of this from happening, even though I'd not yet understood
the mechanics fully - it's imnsho pretty clear that we need to be more
paranoid in production builds around this. A bunch of users that
triggered largely "invisible" corruption (the first vacuum described
above) will possibly run into one of these elog()s, but that seems far
preferrable to making the corruption a lot worse.

I think unfortunately the check + elog() in the
HeapTupleIsHeapOnly(&tuple))
{
nkeep += 1;

/*
* If this were to happen for a tuple that actually
* needed to be frozen, we'd be in trouble, because
* it'd leave a tuple below the relation's xmin
* horizon alive.
*/
if (heap_tuple_needs_freeze(tuple.t_data, FreezeLimit,
MultiXactCutoff, buf))
{
elog(ERROR, "encountered tuple from before xid cutoff, sleeping");
case needs to go, because it's too coarse - there very well could be
lockers or such that need to be removed and where it's fine to do
so. The checks the actual freezing code ought to be sufficient however.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-04 17:52:42
Message-ID:	20171104175241.se6efntjsukcxi4b@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The current testcase, and I think the descriptions in the relevant
> threads, all actually fail to point out the precise way the bug is
> triggered. As e.g. evidenced that the freeze-the-dead case appears to
> not cause any failures in !assertion builds even if the bug is present.

Trying to write up tests reproducing more of the issues in the area, I
think I might have found a third issue - although I'm not sure how
practically relevant it is:

When FreezeMultiXactId() decides it needs to create a new multi because
the old one is below the cutoff, that attempt can be defeated by the
multixact id caching. If the new multi has exactly the same members the
multixact id cache will just return the existing multi with the same
members. The cache will routinely be primed from the lookup of its
members.

I'm not yet sure how easily this can be hit in practice, because
commonly the multixact horizon should prevent a multi with all its
members living from being below the horizon. I found a situation where
that's not the case with the current bug, but I'm not sif that can
happen otherwise.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-09 22:24:31
Message-ID:	20171109222431.qwj6dtl5chumc6lt@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-04 06:15:00 -0700, Andres Freund wrote:
> The reason for that is that I hadn't yet quite figured out how the bug I
> described in the commit message (and the previously committed testcase)
> would cause that. I was planning to diagnose / experiment more with this
> and write an email if I couldn't come up with an explanation. The
> committed test does *not* actually trigger that.
>
> The reason I couldn't quite figure out how the problem triggers is that
> [ long explanation ]

Attached is a version of the already existing regression test that both
reproduces the broken hot chain (and thus failing index lookups) and
then also the tuple reviving. I don't see any need for letting this run
with arbitrary permutations.

Thanks to whoever allowed isolationtester permutations to go over
multiple lines and allow comments. I was wondering about adding that as
a feature just to discover it's already there ;)

What I'm currently wondering about is how much we need to harden
postgres against such existing corruption. If e.g. the hot chains are
broken somebody might have reindexed thinking the problem is fixed - but
if they then later vacuum everything goes to shit again, with dead rows
reappearing. There's no way we can fix hot chains after the fact, but
preventing dead rows from reapparing seems important. A minimal version
of that is fairly easy - we slap a bunch of if if
!TransactionIdDidCommit() elog(ERROR) at various code paths. But that'll
often trigger clog access errors when the problem occurred - if we want
to do better we need to pass down relfrozenxid/relminmxid to a few
functions. I'm inclined to do so, but it'll make the patch larger...

Comments?

Greetings,

Andres Freund

Attachment	Content-Type	Size
freeze-the-dead.spec	text/plain	2.2 KB

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 00:02:17
Message-ID:	CAH2-WzkqA3CwdVBn6me8+_sSzKgEcMRMaLGbkPp_DXScxgXcKw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 9, 2017 at 2:24 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Attached is a version of the already existing regression test that both
> reproduces the broken hot chain (and thus failing index lookups) and
> then also the tuple reviving. I don't see any need for letting this run
> with arbitrary permutations.

I thought that the use of every possible permutation was excessive,
myself. It left us with an isolation test that didn't precisely
describe the behavior that is tested. What you came up with seems far,
far better, especially because of the comments you included. The mail
message-id references seem to add a lot, too.

> What I'm currently wondering about is how much we need to harden
> postgres against such existing corruption. If e.g. the hot chains are
> broken somebody might have reindexed thinking the problem is fixed - but
> if they then later vacuum everything goes to shit again, with dead rows
> reappearing.

I don't follow you here. Why would REINDEXing make the rows that
should be dead disappear again, even for a short period of time? It
might do so for index scans, I suppose, but not for sequential scans.
Are you concerned about a risk of somebody not noticing that
sequential scans are still broken?

Actually, on second thought, I take that back -- I don't think that
REINDEXing will even finish once a HOT chain is broken by the bug.
IndexBuildHeapScan() actually does quite a good job of making sure
that HOT chains are sane, which is how the enhanced amcheck notices
the bug here in practice. (Before this bug was discovered, I would
have expected amcheck to catch problems like it slightly later, during
the Bloom filter probe for that HOT chain...but, in fact, it never
gets there with corruption from this bug in practice, AFAIK.)

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 00:17:18
Message-ID:	20171110001718.rpgx4lf6xxi4ge3l@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-09 16:02:17 -0800, Peter Geoghegan wrote:
> > What I'm currently wondering about is how much we need to harden
> > postgres against such existing corruption. If e.g. the hot chains are
> > broken somebody might have reindexed thinking the problem is fixed - but
> > if they then later vacuum everything goes to shit again, with dead rows
> > reappearing.
>
> I don't follow you here. Why would REINDEXing make the rows that
> should be dead disappear again, even for a short period of time?

It's not the REINDEX that makes them reappear. It's the second
vacuum. The reindex part was about $user trying to fix the problem...
As you need two vacuums with appropriate cutoffs to hit the "rows
revive" problem, that'll often in practice not happen immediately.

> Actually, on second thought, I take that back -- I don't think that
> REINDEXing will even finish once a HOT chain is broken by the bug.
> IndexBuildHeapScan() actually does quite a good job of making sure
> that HOT chains are sane, which is how the enhanced amcheck notices
> the bug here in practice.

I think that's too optimistic.

Greetings,

Andres Freund

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 00:19:30
Message-ID:	CAH2-WzkrdpO-szZjr8tEWrcsv4J3QUkr8ZYJSNRn4s9a_qUbMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I don't follow you here. Why would REINDEXing make the rows that
>> should be dead disappear again, even for a short period of time?
>
> It's not the REINDEX that makes them reappear.

Of course. I was just trying to make sense of what you said.

> It's the second
> vacuum. The reindex part was about $user trying to fix the problem...
> As you need two vacuums with appropriate cutoffs to hit the "rows
> revive" problem, that'll often in practice not happen immediately.

This explanation clears things up, though.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 00:45:07
Message-ID:	CAH2-WzkXq0iNHVr8uLntaHk080u=gu40jg8rihFsRKUTLST8vQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Actually, on second thought, I take that back -- I don't think that
>> REINDEXing will even finish once a HOT chain is broken by the bug.
>> IndexBuildHeapScan() actually does quite a good job of making sure
>> that HOT chains are sane, which is how the enhanced amcheck notices
>> the bug here in practice.
>
> I think that's too optimistic.

Why? Because the "find the TID of the root" logic in
IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
actual root (it might be some other HOT chain root following TID
recycling by VACUUM)?

Assuming that's what you meant: I would have thought that the
xmin/xmax matching within heap_get_root_tuples() makes the sanity
checking fairly reliable in practice.

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 00:53:09
Message-ID:	20171110005309.ectyi5z6fwqjgpfl@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-09 16:45:07 -0800, Peter Geoghegan wrote:
> On Thu, Nov 9, 2017 at 4:17 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> >> Actually, on second thought, I take that back -- I don't think that
> >> REINDEXing will even finish once a HOT chain is broken by the bug.
> >> IndexBuildHeapScan() actually does quite a good job of making sure
> >> that HOT chains are sane, which is how the enhanced amcheck notices
> >> the bug here in practice.
> >
> > I think that's too optimistic.
>
> Why? Because the "find the TID of the root" logic in
> IndexBuildHeapScan()/heap_get_root_tuples() won't reliably find the
> actual root (it might be some other HOT chain root following TID
> recycling by VACUUM)?

Primarily because it's not an anti-corruption tool. I'd be surprised if
there weren't ways to corrupt the page using these corruptions that
aren't detected by it. But even if it were, I don't think there's
enough information to do so in the general case. You very well can end
up with pages where subsequent hot pruning has removed a good bit of the
direct evidence of this bug.

But I'm not really sure why the error detection capabilities of matter
much for the principal point I raised, which is how much work we need to
do to not further worsen the corruption.

Greetings,

Andres Freund

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers <pgsql-committers(at)postgresql(dot)org>, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-10 01:13:40
Message-ID:	CAH2-WzknEHKn8rEQ_tL0USywARTU-b3fAenVBPprZqGzp1Z+4g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Nov 9, 2017 at 4:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Primarily because it's not an anti-corruption tool. I'd be surprised if
> there weren't ways to corrupt the page using these corruptions that
> aren't detected by it.

It's very hard to assess the risk of missing something that's actually
detectable with total confidence, but I think that the check is actually
very thorough.

> But even if it were, I don't think there's
> enough information to do so in the general case. You very well can end
> up with pages where subsequent hot pruning has removed a good bit of the
> direct evidence of this bug.

Sure, but maybe those are cases that can't get any worse anyway. So the
question of avoiding making it worse doesn't arise.

> But I'm not really sure why the error detection capabilities of matter
> much for the principal point I raised, which is how much work we need to
> do to not further worsen the corruption.

You're right. Just trying to put the risk in context, and to understand the
extent of the concern that you have.

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-14 03:03:41
Message-ID:	20171114030341.movhteyakqeqx5pm@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> Here's that patch. I've stared at this some, and Robert did too. Robert
> mentioned that the commit message might need some polish and I'm not
> 100% sure about the error message texts yet.
>
> I'm not yet convinced that the new elog in vacuumlazy can never trigger
> - but I also don't think we want to actually freeze the tuple in that
> case.

I'm fairly sure it could be triggered, therefore I've rewritten that.

I've played around quite some with the attached patch. So far, after
applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
the situation worse for already existing corruption. HOT pruning can
change the exact appearance of existing corruption a bit, but I don't
think it can make the corruption meaningfully worse. It's a bit
annoying and scary to add so many checks to backbranches but it kinda
seems required. The error message texts aren't perfect, but these are
"should never be hit" type elog()s so I'm not too worried about that.

Please review!

One thing I'm wondering about is whether we have any way for users to
diagnose whether they need to dump & restore - I can't really think of
anything actually meaningful. Reindexing will find some instances, but
far from all. VACUUM (disable_page_skipping, freeze) pg_class will
also find a bunch. Not a perfect story.

> Staring at the vacuumlazy hunk I think I might have found a related bug:
> heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> a multixact and still running. It does so without verifying liveliness
> of members. Isn't that buggy? Consider what happens if we have three
> blocks: 1 has free space, two is being vacuumed and is locked, three is
> full and has a tuple that's key share locked by a live tuple and is
> updated by a dead xmax from before the xmin horizon. In that case afaict
> the multi will be copied from the third page to the first one. Which is
> quite bad, because vacuum already processed it, and we'll set
> relfrozenxid accordingly. I hope I'm missing something here?

Trying to write a testcase for that now.

Greetings,

Andres Freund

Attachment	Content-Type	Size
0001-Fix-pruning-of-locked-and-updated-tuples.patch	text/x-diff	11.4 KB
0002-Perform-a-lot-more-sanity-checks-when-freezing-tuple.patch	text/x-diff	10.7 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-14 04:07:01
Message-ID:	20171114040701.xu4ew2slpkl6tmct@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> Hi,
>
> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > Here's that patch. I've stared at this some, and Robert did too. Robert
> > mentioned that the commit message might need some polish and I'm not
> > 100% sure about the error message texts yet.
> >
> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > - but I also don't think we want to actually freeze the tuple in that
> > case.
>
> I'm fairly sure it could be triggered, therefore I've rewritten that.
>
> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse. It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required. The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.
>
>
> Please review!

Please note:
I'd forgotten to git add the change to isolation_schedule re-activating
the freeze-the-dead regression test.

- Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-14 08:23:18
Message-ID:	20171114082318.35hp62dendwrsxhd@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Staring at the vacuumlazy hunk I think I might have found a related bug:
> > heap_update_tuple() just copies the old xmax to the new tuple's xmax if
> > a multixact and still running. It does so without verifying liveliness
> > of members. Isn't that buggy? Consider what happens if we have three
> > blocks: 1 has free space, two is being vacuumed and is locked, three is
> > full and has a tuple that's key share locked by a live tuple and is
> > updated by a dead xmax from before the xmin horizon. In that case afaict
> > the multi will be copied from the third page to the first one. Which is
> > quite bad, because vacuum already processed it, and we'll set
> > relfrozenxid accordingly. I hope I'm missing something here?
>
> Trying to write a testcase for that now.

This indeed happens, but I can't quite figure out a way to write an
isolationtester test for this. The problem is that to have something
reproducible one has to make vacuum block on a cleanup lock, but that
currently doesn't register as waiting for the purpose of
isolationtester's logic.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-20 19:18:45
Message-ID:	20171120191845.plwi3mregv5xzo7y@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

Ping? Alvaro, it'd be good to get some input here.

- Andres

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-11-22 01:11:17
Message-ID:	CAB7nPqRBJF=Nx_Whr7wZe+JN4QEQM-ZJbFiLBQus5zb9QErDzw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Tue, Nov 21, 2017 at 4:18 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
>> On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
>> > Here's that patch. I've stared at this some, and Robert did too. Robert
>> > mentioned that the commit message might need some polish and I'm not
>> > 100% sure about the error message texts yet.
>> >
>> > I'm not yet convinced that the new elog in vacuumlazy can never trigger
>> > - but I also don't think we want to actually freeze the tuple in that
>> > case.
>>
>> I'm fairly sure it could be triggered, therefore I've rewritten that.
>>
>> I've played around quite some with the attached patch. So far, after
>> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
>> the situation worse for already existing corruption. HOT pruning can
>> change the exact appearance of existing corruption a bit, but I don't
>> think it can make the corruption meaningfully worse. It's a bit
>> annoying and scary to add so many checks to backbranches but it kinda
>> seems required. The error message texts aren't perfect, but these are
>> "should never be hit" type elog()s so I'm not too worried about that.
>>
>> Please review!
>
> Ping? Alvaro, it'd be good to get some input here.

Note that I will be able to jump on the ship after being released from
commit fest duties. This is likely a multi-day task for testing and
looking at it, and I am not the most knowledgeable human being with
this code.
--
Michael

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-06 08:03:19
Message-ID:	20171206080319.v7of3p45yvapkgla@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-11-20 11:18:45 -0800, Andres Freund wrote:
> Hi,
>
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > Hi,
> >
> > On 2017-11-03 07:53:30 -0700, Andres Freund wrote:
> > > Here's that patch. I've stared at this some, and Robert did too. Robert
> > > mentioned that the commit message might need some polish and I'm not
> > > 100% sure about the error message texts yet.
> > >
> > > I'm not yet convinced that the new elog in vacuumlazy can never trigger
> > > - but I also don't think we want to actually freeze the tuple in that
> > > case.
> >
> > I'm fairly sure it could be triggered, therefore I've rewritten that.
> >
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse. It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required. The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
> >
> >
> > Please review!
>
> Ping? Alvaro, it'd be good to get some input here.

Ping. I'm a bit surprised that a bug fixing a significant data
corruption issue has gotten no reviews at all.

Greetings,

Andres Freund

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-06 13:59:30
Message-ID:	CAB7nPqShrg5vXb7r_artWVVHvgHsrFAz6pizcBB_ychxX_JuJw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Wed, Dec 6, 2017 at 5:03 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Ping. I'm a bit surprised that a bug fixing a significant data
> corruption issue has gotten no reviews at all.

Note that I was planning to look at this problem today and tomorrow my
time, getting stuck for CF handling last week and conference this
week. If you think that helps, I'll be happy to help at the extent of
what I can do.
--
Michael

From:	Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-06 16:21:15
Message-ID:	20171206162115.kdsd2fnipyswonir@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I think you've done a stellar job of identifying what the actual problem
was. I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.

freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.

I think the comparison to OldestXmin should be reversed:

if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;

return HEAPTUPLE_DEAD;

This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody). Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function. There is no reason for the
"else" or the extra braces.

Put together, I propose the attached delta for 0001.

Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.

I haven't looked at your 0002 yet.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
tweaks.patch	text/plain	2.0 KB

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-06 20:23:55
Message-ID:	20171206202355.byfh6g3bezy4rrcr@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Andres Freund wrote:

> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse. It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required. The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.

Looking at 0002: I agree with the stuff being done here. I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case. I didn't catch any place missing additional checks.

Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about. No translation required,
though. Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.

I started thinking it'd be good to report block number whenever anything
happened while scanning the relation. The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that. (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)

It appears that you got all the places that seem to reasonably need
additional checks.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
0001-tweaks-for-0002.patch	text/plain	6.5 KB
0002-blkno-rel-context-info-for-lazy_scan_heap.patch	text/plain	3.5 KB

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 03:08:38
Message-ID:	CAB7nPqSd6vRpo_0GutiQuEuUh==5V7nyMWn0iKDfpbCXMS1d_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Put together, I propose the attached delta for 0001.

I have been looking at Andres' 0001 and your tweaks here for some time
since yesterday...

I have also performed sanity checks using all the scripts that have
accumulated on my archives for this stuff. This looks solid to me. I
have not seen failures with broken hot chains, REINDEX, etc.

> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody). Also, this way it is consistent with the other comparison to
> OldestXmin at the bottom of the function. There is no reason for the
> "else" or the extra braces.

+1. It would be nice to add a comment in the patched portion
mentioning that the new code had better match what is at the bottom of
the function.

+ else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+ {
+ /*
+ * Not in Progress, Not Committed, so either Aborted or crashed.
+ * Mark the Xmax as invalid.
+ */
+ SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
}

- /*
- * Not in Progress, Not Committed, so either Aborted or crashed.
- * Remove the Xmax.
- */
- SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return HEAPTUPLE_LIVE;

I would find cleaner if the last "else if" is put into its own
separate if condition, and that for a multixact still running this
refers to an updating transaction aborted so hint bits are not set.

> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
--
Michael

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 09:32:51
Message-ID:	CAB7nPqTX1JvSANy7v149x+BZwPSj0D2KrW2z78mGHHGQgqFSdQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> Looking at 0002: I agree with the stuff being done here.

The level of details you are providing with a proper error code is an
improvement over the first version proposed in my opinion.

> I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

In FreezeMultiXactId() wouldn't it be better to issue an error as well
for this assertion?
Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

> Despite these being "shouldn't happen" conditions, I think we should
> turn these up all the way to ereports with an errcode and all, and also
> report the XIDs being complained about. No translation required,
> though. Other than those changes and minor copy editing a commit
> (attached), 0002 looks good to me.

+ if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
+ TransactionIdDidCommit(xid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg("can't freeze committed xmax %u", xid)));
The usual wording used in errmsg is not the "can't" but "cannot".

+ ereport(ERROR,
+ (errcode(ERRCODE_DATA_CORRUPTED),
+ errmsg_internal("uncommitted Xmin %u from
before xid cutoff %u needs to be frozen",
+ xid, cutoff_xid)));
"Xmin" I have never seen, but "xmin" I did.

> I started thinking it'd be good to report block number whenever anything
> happened while scanning the relation. The best way to go about this
> seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> a WIP untested patch to add that. (I'm not proposing this for
> back-patch for now, mostly because I don't have the time/energy to push
> for it right now.)

I would recommend to start a new thread and to add that patch to the
next commit fest as you would get more visibility and input from other
folks on -hackers. It looks like a good idea to me.
--
Michael

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)2ndQuadrant(dot)com>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 20:17:06
Message-ID:	20171207201706.fooygpgvlvfeas2t@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-12-06 13:21:15 -0300, Alvaro Herrera wrote:
> I think you've done a stellar job of identifying what the actual problem
> was. I like the new (simpler) coding of that portion of
> HeapTupleSatisfiesVacuum.

Thanks!

> freeze-the-dead is not listed in isolation_schedule; an easy fix.

Yea, I'd sent an update about that, stupidly forgot git amend the
commit...

> I confirm that the test crashes with an assertion failure without the
> code fix, and that it doesn't with it.
>
> I think the comparison to OldestXmin should be reversed:
>
> if (!TransactionIdPrecedes(xmax, OldestXmin))
> return HEAPTUPLE_RECENTLY_DEAD;
>
> return HEAPTUPLE_DEAD;
>
> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).

Yes, I think you're right. That's a bug.

> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

Yea, you're right. I was writing it with 14h of jetlag, apparently that
does something to your brain...

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 20:22:43
Message-ID:	20171207202243.pmxwjm5wvz5lp6j6@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > I've played around quite some with the attached patch. So far, after
> > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > the situation worse for already existing corruption. HOT pruning can
> > change the exact appearance of existing corruption a bit, but I don't
> > think it can make the corruption meaningfully worse. It's a bit
> > annoying and scary to add so many checks to backbranches but it kinda
> > seems required. The error message texts aren't perfect, but these are
> > "should never be hit" type elog()s so I'm not too worried about that.
>
> Looking at 0002: I agree with the stuff being done here. I think a
> couple of these checks could be moved one block outerwards in term of
> scope; I don't see any reason why the check should not apply in that
> case. I didn't catch any place missing additional checks.

I think I largely put them into the inner blocks because they were
guaranteed to be reached in those case (the horizon has to be before the
cutoff etc), and that way additional branches are avoided.

Hm, I don't really care one way or another. I do see that you used
errmsg() in some places, errmsg_internal() in others. Was that
intentional?

That seems like a good idea. There's some cases where that could
increase log spam noticeably (unitialized blocks), but that seems
acceptable.

> +static void
> +lazy_scan_heap_cb(void *arg)
> +{
> + LazyScanHeapInfo *info = (LazyScanHeapInfo *) arg;
> +
> + if (info->blkno != InvalidBlockNumber)
> + errcontext("while scanning page %u of relation %s",
> + info->blkno, RelationGetRelationName(info->relation));
> + else
> + errcontext("while vacuuming relation %s",
> + RelationGetRelationName(info->relation));
> +}

Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
replacing scanning with vacuuming?

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 20:41:28
Message-ID:	20171207204128.h5yiogvkqzuy675q@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-07 12:08:38 +0900, Michael Paquier wrote:
> > Your commit message does a poor job of acknowledging prior work on
> > diagnosing the problem starting from Dan's initial test case and patch.
>
> (Nit: I have extracted from the test case of Dan an isolation test,
> which Andres has reduced to a subset of permutations. Peter G. also
> complained about what is visibly the same bug we are discussing here
> but without a test case.)

The previous versions of the test case didn't actually hit the real
issues, so I don't think that matter much.

Greetings,

Andres Freund

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 20:41:56
Message-ID:	20171207204156.hzoladdacxjpz4rz@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hello,

Andres Freund wrote:

> On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote:
> > > I've played around quite some with the attached patch. So far, after
> > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> > > the situation worse for already existing corruption. HOT pruning can
> > > change the exact appearance of existing corruption a bit, but I don't
> > > think it can make the corruption meaningfully worse. It's a bit
> > > annoying and scary to add so many checks to backbranches but it kinda
> > > seems required. The error message texts aren't perfect, but these are
> > > "should never be hit" type elog()s so I'm not too worried about that.
> >
> > Looking at 0002: I agree with the stuff being done here. I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case. I didn't catch any place missing additional checks.
>
> I think I largely put them into the inner blocks because they were
> guaranteed to be reached in those case (the horizon has to be before the
> cutoff etc), and that way additional branches are avoided.

Hmm, it should be possible to call vacuum with a very low freeze_min_age
(which sets a very recent relfrozenxid), then shortly thereafter call it
with a large one, no? So it's not really guaranteed ...

> > Despite these being "shouldn't happen" conditions, I think we should
> > turn these up all the way to ereports with an errcode and all, and also
> > report the XIDs being complained about. No translation required,
> > though. Other than those changes and minor copy editing a commit
> > (attached), 0002 looks good to me.
>
> Hm, I don't really care one way or another. I do see that you used
> errmsg() in some places, errmsg_internal() in others. Was that
> intentional?

Eh, no, my intention was to make these all errmsg_internal() to avoid
translation (serves no purpose here). Feel free to update the remaining
ones.

> > I started thinking it'd be good to report block number whenever anything
> > happened while scanning the relation. The best way to go about this
> > seems to be to add an errcontext callback to lazy_scan_heap, so I attach
> > a WIP untested patch to add that. (I'm not proposing this for
> > back-patch for now, mostly because I don't have the time/energy to push
> > for it right now.)
>
> That seems like a good idea. There's some cases where that could
> increase log spam noticeably (unitialized blocks), but that seems
> acceptable.

Yeah, I noticed that and I agree it seems ok.

> > + if (info->blkno != InvalidBlockNumber)
> > + errcontext("while scanning page %u of relation %s",
> > + info->blkno, RelationGetRelationName(info->relation));
> > + else
> > + errcontext("while vacuuming relation %s",
> > + RelationGetRelationName(info->relation));
>
> Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by
> replacing scanning with vacuuming?

Makes sense.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-07 20:46:22
Message-ID:	20171207204622.2gxjhz7zepca5dbc@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-07 17:41:56 -0300, Alvaro Herrera wrote:
> > > Looking at 0002: I agree with the stuff being done here. I think a
> > > couple of these checks could be moved one block outerwards in term of
> > > scope; I don't see any reason why the check should not apply in that
> > > case. I didn't catch any place missing additional checks.
> >
> > I think I largely put them into the inner blocks because they were
> > guaranteed to be reached in those case (the horizon has to be before the
> > cutoff etc), and that way additional branches are avoided.
>
> Hmm, it should be possible to call vacuum with a very low freeze_min_age
> (which sets a very recent relfrozenxid), then shortly thereafter call it
> with a large one, no? So it's not really guaranteed ...

Fair point!

- Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-14 22:00:17
Message-ID:	20171214220017.6dax6ne7ru4ggadr@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-07 18:32:51 +0900, Michael Paquier wrote:
> On Thu, Dec 7, 2017 at 5:23 AM, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> > Looking at 0002: I agree with the stuff being done here.
>
> The level of details you are providing with a proper error code is an
> improvement over the first version proposed in my opinion.
>
> > I think a
> > couple of these checks could be moved one block outerwards in term of
> > scope; I don't see any reason why the check should not apply in that
> > case. I didn't catch any place missing additional checks.
>
> In FreezeMultiXactId() wouldn't it be better to issue an error as well
> for this assertion?
> Assert(!TransactionIdPrecedes(members[i].xid, cutoff_xid));

I'm not really concerned that much about pure lockers, they don't cause
permanent data corruption...

If you want to go around doing that in some more places we can do so in
master only...

> + if (!(tuple->t_infomask & HEAP_XMAX_LOCK_ONLY) &&
> + TransactionIdDidCommit(xid))
> + ereport(ERROR,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg("can't freeze committed xmax %u", xid)));
> The usual wording used in errmsg is not the "can't" but "cannot".
>
> + ereport(ERROR,
> + (errcode(ERRCODE_DATA_CORRUPTED),
> + errmsg_internal("uncommitted Xmin %u from
> before xid cutoff %u needs to be frozen",
> + xid, cutoff_xid)));
> "Xmin" I have never seen, but "xmin" I did.

Changed...

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 01:00:29
Message-ID:	20171215010029.3dxx56vjlymudvwo@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Hi,

On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> index f93c194e182..7d163c91379 100644
> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
> * While we have our hands on the tuple, we may as well freeze any
> * eligible xmin or xmax, so that future VACUUM effort can be saved.
> */
> - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> + heap_freeze_tuple(new_tuple->t_data,
> + state->rs_old_rel->rd_rel->relfrozenxid,
> + state->rs_old_rel->rd_rel->relminmxid,
> + state->rs_freeze_xid,
> state->rs_cutoff_multi);

Hm. So this requires backpatching the introduction of
RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
argument to begin_heap_rewrite(). It originally was added in the
logical decoding commit (i.e. 9.4).

I'm fine with that, but it could theoretically cause issues for somebody
with an extension that calls begin_heap_rewrite() - which seems unlikely
and I couldn't find any that does so.

Does anybody have a problem with that?

Regards,

Andres

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 02:30:59
Message-ID:	20171215023059.oeyindn57oeis5um@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-14 17:00:29 -0800, Andres Freund wrote:
> Hi,
>
> On 2017-11-13 19:03:41 -0800, Andres Freund wrote:
> > diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c
> > index f93c194e182..7d163c91379 100644
> > --- a/src/backend/access/heap/rewriteheap.c
> > +++ b/src/backend/access/heap/rewriteheap.c
> > @@ -407,7 +407,10 @@ rewrite_heap_tuple(RewriteState state,
> > * While we have our hands on the tuple, we may as well freeze any
> > * eligible xmin or xmax, so that future VACUUM effort can be saved.
> > */
> > - heap_freeze_tuple(new_tuple->t_data, state->rs_freeze_xid,
> > + heap_freeze_tuple(new_tuple->t_data,
> > + state->rs_old_rel->rd_rel->relfrozenxid,
> > + state->rs_old_rel->rd_rel->relminmxid,
> > + state->rs_freeze_xid,
> > state->rs_cutoff_multi);
>
> Hm. So this requires backpatching the introduction of
> RewriteStateData->rs_old_rel into 9.3, which in turn requires a new
> argument to begin_heap_rewrite(). It originally was added in the
> logical decoding commit (i.e. 9.4).
>
> I'm fine with that, but it could theoretically cause issues for somebody
> with an extension that calls begin_heap_rewrite() - which seems unlikely
> and I couldn't find any that does so.
>
> Does anybody have a problem with that?

Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
of the cutoff changes, polished some error messages.

Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
could have a look at the backported version, just about everything but
v10 had conflicts, some of them not insubstantial.

Greetings,

Andres Freund

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 11:25:22
Message-ID:	CAB7nPqRvpkveiDzstkQRdjemLq-MX6V71BTGaEYUKY31FojjMw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have gone through the backpatched versions for the fixes in tuple
pruning, running some tests on the way and those look good to me. I
have not taken the time to go through the ones changing the assertions
to ereport() though...
--
Michael

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 18:46:05
Message-ID:	CAH2-Wzn=XUWcL5pNUUkHDYXmsXpiB7ArP_U2=sHqv9GG3-5ufg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
> of the cutoff changes, polished some error messages.
>
>
> Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> could have a look at the backported version, just about everything but
> v10 had conflicts, some of them not insubstantial.

I have one minor piece of feedback on the upgrading of assertions to
ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
upgrade the raw elog() "can't happen" error within
IndexBuildHeapRangeScan() to be an ereport() with
ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
tuple for heap-only tuple" error, which I think merits being a real
user-visible error, just like the relfrozenxid/relminmxid tests are
now. As you know, the enhanced amcheck will sometimes detect
corruption due to this bug by hitting that error.

It would be nice if we could tighten up the number of errcodes that
can be involved in an error that amcheck detects. I know that elog()
implicitly has an errcode, that could be included in the errcodes to
check when verification occurs in an automated fashion across a large
number of databases. However, this is a pretty esoteric point, and I'd
prefer to just try to limit it to ERRCODE_DATA_CORRUPTION and
ERRCODE_INDEX_CORRUPTED, insofar as that's practical. When we ran
amcheck on the Heroku fleet, back when I worked there, there were all
kinds of non-interesting errors that could occur that needed to be
filtered out. I want to try to make that process somewhat less painful.

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 18:57:08
Message-ID:	20171215185708.su365ilv7ddnb6uk@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-15 20:25:22 +0900, Michael Paquier wrote:
> On Fri, Dec 15, 2017 at 11:30 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
>
> I have gone through the backpatched versions for the fixes in tuple
> pruning, running some tests on the way and those look good to me.

Thanks.

> I have not taken the time to go through the ones changing the
> assertions to ereport() though...

Those were the ones with a lot of conflicts tho - I'd temporarily broken
freezing for 9.3, but both review and testing caught it...

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 18:58:11
Message-ID:	20171215185811.zrc55yv4kvkp7hxw@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-15 10:46:05 -0800, Peter Geoghegan wrote:
> On Thu, Dec 14, 2017 at 6:30 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > Pushed this way. Moved some more relfrozenxid/relminmxid tests outside
> > of the cutoff changes, polished some error messages.
> >
> >
> > Alvaro, Michael, Peter, and everyone else I'd greatly appreciate if you
> > could have a look at the backported version, just about everything but
> > v10 had conflicts, some of them not insubstantial.
>
> I have one minor piece of feedback on the upgrading of assertions to
> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
> upgrade the raw elog() "can't happen" error within
> IndexBuildHeapRangeScan() to be an ereport() with
> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
> tuple for heap-only tuple" error, which I think merits being a real
> user-visible error, just like the relfrozenxid/relminmxid tests are
> now. As you know, the enhanced amcheck will sometimes detect
> corruption due to this bug by hitting that error.

I'm not opposed to that, it just seems independent from this thread. Not
sure I really want to go around and backpatch such a change, that code
has changed a bit between branches. Happy to do so on master.

Greetings,

Andres Freund

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 19:03:37
Message-ID:	CAH2-WznJ3Eg0NYyHuW+FP=WPkNwU3GantQE+FEWVD0imCfbVnw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Dec 15, 2017 at 10:58 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> I have one minor piece of feedback on the upgrading of assertions to
>> ereport()s with ERRCODE_DATA_CORRUPTION: It would be nice if you could
>> upgrade the raw elog() "can't happen" error within
>> IndexBuildHeapRangeScan() to be an ereport() with
>> ERRCODE_DATA_CORRUPTION. I'm referring to the "failed to find parent
>> tuple for heap-only tuple" error, which I think merits being a real
>> user-visible error, just like the relfrozenxid/relminmxid tests are
>> now. As you know, the enhanced amcheck will sometimes detect
>> corruption due to this bug by hitting that error.
>
> I'm not opposed to that, it just seems independent from this thread. Not
> sure I really want to go around and backpatch such a change, that code
> has changed a bit between branches. Happy to do so on master.

The elog(), which was itself upgraded from a simple Assert by commit
d70cf811, appears in exactly the same form in 9.3+. Things did change
there, but they were kept in sync.

--
Peter Geoghegan

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 19:15:47
Message-ID:	CAH2-WznSxXq1CdTkb7wSDUdh2QJSgeoK=vTS6c7JpJjD1cKqBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> The elog(), which was itself upgraded from a simple Assert by commit
> d70cf811, appears in exactly the same form in 9.3+. Things did change
> there, but they were kept in sync.

BTW, if you're going to do it, I would target the similar error within
validate_index_heapscan(), too. That was also added by d70cf811.

--
Peter Geoghegan

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Peter Geoghegan <pg(at)bowt(dot)ie>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, "Wood, Dan" <hexpert(at)amazon(dot)com>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-15 19:19:30
Message-ID:	20171215191930.ukafm3t6z7ahydc6@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On 2017-12-15 11:15:47 -0800, Peter Geoghegan wrote:
> On Fri, Dec 15, 2017 at 11:03 AM, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > The elog(), which was itself upgraded from a simple Assert by commit
> > d70cf811, appears in exactly the same form in 9.3+. Things did change
> > there, but they were kept in sync.
>
> BTW, if you're going to do it, I would target the similar error within
> validate_index_heapscan(), too. That was also added by d70cf811.

Please send a patch for master on a *new* thread.

Greetings,

Andres Freund

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-23 20:26:22
Message-ID:	20171223202622.27apdgfbxknnrxda@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

I noticed that I'm committer for this patch in the commitfest, though I
don't remember setting that. Are you expecting me to commit it? I
thought you'd do it, but if you want me to assume the responsibility I
can do that.

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>,Peter Geoghegan <pg(at)bowt(dot)ie>,pgsql-committers(at)postgresql(dot)org,"Wong, Yi Wen" <yiwong(at)amazon(dot)com>,Michael Paquier <michael(dot)paquier(at)gmail(dot)com>,Robert Haas <robertmhaas(at)gmail(dot)com>,Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-23 20:28:03
Message-ID:	9FA1F62B-E3F2-462A-BD7D-EEBC228330E4@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>I noticed that I'm committer for this patch in the commitfest, though I
>don't remember setting that. Are you expecting me to commit it? I
>thought you'd do it, but if you want me to assume the responsibility I
>can do that.

I thought I pushed it to all branches? Do you see anything missing? Didn't know there's a CF entry...

Andres

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

From:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-23 22:42:45
Message-ID:	20171223224245.ez6v37xo6lbvxctm@alvherre.pgsql
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

Andres Freund wrote:

> On December 23, 2017 9:26:22 PM GMT+01:00, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
> >I noticed that I'm committer for this patch in the commitfest, though I
> >don't remember setting that. Are you expecting me to commit it? I
> >thought you'd do it, but if you want me to assume the responsibility I
> >can do that.
>
> I thought I pushed it to all branches? Do you see anything missing?
> Didn't know there's a CF entry...

Ah, no, looks correct to me in all branches. Updated the CF now too.

Thanks!

--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

From:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "Wood, Dan" <hexpert(at)amazon(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-committers(at)postgresql(dot)org, "Wong, Yi Wen" <yiwong(at)amazon(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple
Date:	2017-12-23 23:48:40
Message-ID:	20171223234840.GA1873@paquier.xyz
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-committers pgsql-hackers

On Sat, Dec 23, 2017 at 05:26:22PM -0300, Alvaro Herrera wrote:
> I noticed that I'm committer for this patch in the commitfest, though I
> don't remember setting that. Are you expecting me to commit it? I
> thought you'd do it, but if you want me to assume the responsibility I
> can do that.

I tend to create CF entries for all patches that we need to track as bug
fixes. No things lost this way. Alvaro, you have been marked as a committer
of this patch as you were the one to work and push the first version that
has finished reverted.
--
Michael