Re: Vacuum/visibility is busted

Lists: pgsql-hackers
From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Vacuum/visibility is busted
Date: 2013-02-07 05:39:18
Message-ID: CAMkU=1ztsEd1XYgVYQoXz4dSzeh7_nVBOWjfb3nECEHSiOa9xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
that vacuum/visibility was busted. But it wasn't his patch that busted it.
As far as I can tell, the bad commit was in the
range 692079e5dcb331..168d3157032879

Since a run takes 12 to 24 hours, it will take a while to refine that
interval.

I was testing using the framework explained here:

http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com

Except that I increased JJ_torn_page to 8000, so that autovacuum has a
chance to run to completion before each crash; and I turned off
archive_mode as it was not relevant and caused annoying noise. As far as I
know, crashing is entirely irrelevant to the current problem, but I just
used and adapted the framework I had at hand.

A tarball of the data directory is available below, for those who would
like to do a forensic inspection. The table jjanes.public.foo is clearly
in violation of its unique index.

https://docs.google.com/file/d/0Bzqrh1SO9FcEbk1lUWgwSk9Od28/edit?usp=sharing

Thanks for any help,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 06:41:35
Message-ID: 28936.1360219295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
> that vacuum/visibility was busted. But it wasn't his patch that busted it.
> As far as I can tell, the bad commit was in the
> range 692079e5dcb331..168d3157032879

> Since a run takes 12 to 24 hours, it will take a while to refine that
> interval.

Just eyeballing the commit logs, I'd have to guess that
0ac5ad5134f2769c (the foreign-key-locks patch) is the only change in
that range that seems like it might account for a visibility-ish bug.
If you're lacking patience for a bisection run, try that one first.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 08:35:15
Message-ID: CA+U5nM+Aw_RiDA7mnqUCBHoPfLr04p9__fqbSRntUehwv9eybw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7 February 2013 05:39, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:

> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
> that vacuum/visibility was busted. But it wasn't his patch that busted it.
> As far as I can tell, the bad commit was in the range
> 692079e5dcb331..168d3157032879

Please define "busted" so we can all help track this down.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 08:55:31
Message-ID: CABOikdPr8-29NEta1grOi7=FyVVtc5gA5NLtOX6O6M=gLDsDqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 11:09 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
> that vacuum/visibility was busted. But it wasn't his patch that busted it.
> As far as I can tell, the bad commit was in the range
> 692079e5dcb331..168d3157032879
>
> Since a run takes 12 to 24 hours, it will take a while to refine that
> interval.
>
> I was testing using the framework explained here:
>
> http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
>
> Except that I increased JJ_torn_page to 8000, so that autovacuum has a
> chance to run to completion before each crash; and I turned off archive_mode
> as it was not relevant and caused annoying noise. As far as I know,
> crashing is entirely irrelevant to the current problem, but I just used and
> adapted the framework I had at hand.
>
> A tarball of the data directory is available below, for those who would
> like to do a forensic inspection. The table jjanes.public.foo is clearly in
> violation of its unique index.

The xmins of all the duplicate tuples look dangerously close to 2^31.
I wonder if XID wrap around has anything to do with it.

Index scans do not return any duplicates and you need to force a seq
scan to see them. Assuming that the page level VM bit might be
corrupted, I tried to REINDEX the table to see if it complains of
unique key violations, but that crashes the server with

TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
(8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)

Will look more into it, but thought this might be useful for others to
spot the problem.

Thanks,
Pavan

P.S BTW, you would need to connect as user "jjanes" to a database
"jjanes" to see the offending table.

--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 09:44:52
Message-ID: CABOikdPaSrdG9S9zJD-U+wRxHgJ1hgO-m=YAD4sJPwknLxjW9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 2:25 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

>
> Will look more into it, but thought this might be useful for others to
> spot the problem.
>

And here is some more forensic info about one of the pages having
duplicate tuples.

jjanes=# select *, xmin, xmax, ctid from foo where index IN (select
index from foo group by index having count(*) > 1 ORDER by index)
ORDER by index LIMIT 3;
index | count | xmin | xmax | ctid
-------+-------+------------+------+-----------
219 | 353 | 2100345903 | 0 | (150,98)
219 | 354 | 2100346051 | 0 | (150,101)
219 | 464 | 2101601086 | 0 | (150,126)
(3 rows)

jjanes=# select * from page_header(get_raw_page('foo',150));
lsn | tli | flags | lower | upper | special | pagesize |
version | prune_xid
-------------+-----+-------+-------+-------+---------+----------+---------+-----------
4C/52081968 | 1 | 5 | 1016 | 6304 | 8192 | 8192 |
4 | 0
(1 row)

jjanes=# select * from heap_page_items(get_raw_page('foo',150)) WHERE
lp IN (98, 101, 126);
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid
-----+--------+----------+--------+------------+--------+----------+-----------+-------------+------------+--------+--------+-------
98 | 7968 | 1 | 32 | 2100345903 | 0 | 0 |
(150,101) | 32770 | 10496 | 24 | |
101 | 7904 | 1 | 32 | 2100346051 | 0 | 0 |
(150,101) | 32770 | 10496 | 24 | |
126 | 7040 | 1 | 32 | 2101601086 | 0 | 0 |
(150,126) | 32770 | 10496 | 24 | |
(3 rows)

So every duplicate tuple has the same flags set:

HEAP_XMAX_INVALID
HEAP_XMIN_COMMITED
HEAP_UPDATED
HEAP_ONLY_TUPLE

The first two duplicates are chained by the ctid chain, but the last
one looks independent. More later.

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 14:58:05
Message-ID: 20130207145805.GB5172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee escribió:
> On Thu, Feb 7, 2013 at 2:25 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
> >
> > Will look more into it, but thought this might be useful for others to
> > spot the problem.
> >
>
> And here is some more forensic info about one of the pages having
> duplicate tuples.
>
> jjanes=# select *, xmin, xmax, ctid from foo where index IN (select
> index from foo group by index having count(*) > 1 ORDER by index)
> ORDER by index LIMIT 3;
> index | count | xmin | xmax | ctid
> -------+-------+------------+------+-----------
> 219 | 353 | 2100345903 | 0 | (150,98)
> 219 | 354 | 2100346051 | 0 | (150,101)
> 219 | 464 | 2101601086 | 0 | (150,126)
> (3 rows)

Hm, if the foreign key patch is to blame, this sounds like these tuples
had a different set of XMAX hint bits and a different Xmax, and they
were clobbered by something like vacuum or page pruning.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 15:05:25
Message-ID: 20130207150525.GC5172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera escribió:

> Hm, if the foreign key patch is to blame, this sounds like these tuples
> had a different set of XMAX hint bits and a different Xmax, and they
> were clobbered by something like vacuum or page pruning.

Hm, I think heap_freeze_tuple is busted, yes.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 17:18:35
Message-ID: CAMkU=1xYSgri0Mg83yis5vH90n=q5LJPeTnY25NLBDh10RL0oQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 1:44 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> On Thu, Feb 7, 2013 at 2:25 PM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>>
>> Will look more into it, but thought this might be useful for others to
>> spot the problem.
>>
>
> And here is some more forensic info about one of the pages having
> duplicate tuples.
>
> jjanes=# select *, xmin, xmax, ctid from foo where index IN (select
> index from foo group by index having count(*) > 1 ORDER by index)
> ORDER by index LIMIT 3;
> index | count | xmin | xmax | ctid
> -------+-------+------------+------+-----------
> 219 | 353 | 2100345903 | 0 | (150,98)
> 219 | 354 | 2100346051 | 0 | (150,101)
> 219 | 464 | 2101601086 | 0 | (150,126)
> (3 rows)

The one where count=464 should be the correct one to be visible, and
the other two are old tuples that were updated away. (The test driver
increases the count column monotonically for each any given value of
index column.

Cheers,

Jeff


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 17:32:23
Message-ID: CAMkU=1zqb0VTxbfRQqDNy4Zr5X8m0nuTa-CC6EDAO9yitpXUpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 12:55 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
> On Thu, Feb 7, 2013 at 11:09 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> While stress testing Pavan's 2nd pass vacuum visibility patch, I realized
>> that vacuum/visibility was busted. But it wasn't his patch that busted it.
>> As far as I can tell, the bad commit was in the range
>> 692079e5dcb331..168d3157032879
>>
>> Since a run takes 12 to 24 hours, it will take a while to refine that
>> interval.
>>
>> I was testing using the framework explained here:
>>
>> http://www.postgresql.org/message-id/CAMkU=1xoA6Fdyoj_4fMLqpicZR1V9GP7cLnXJdHU+iGgqb6WUw@mail.gmail.com
>>
>> Except that I increased JJ_torn_page to 8000, so that autovacuum has a
>> chance to run to completion before each crash; and I turned off archive_mode
>> as it was not relevant and caused annoying noise. As far as I know,
>> crashing is entirely irrelevant to the current problem, but I just used and
>> adapted the framework I had at hand.
>>
>> A tarball of the data directory is available below, for those who would
>> like to do a forensic inspection. The table jjanes.public.foo is clearly in
>> violation of its unique index.
>
> The xmins of all the duplicate tuples look dangerously close to 2^31.
> I wonder if XID wrap around has anything to do with it.
>
> Index scans do not return any duplicates and you need to force a seq
> scan to see them. Assuming that the page level VM bit might be
> corrupted, I tried to REINDEX the table to see if it complains of
> unique key violations, but that crashes the server with
>
> TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
> ((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
> (8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)

I don't see the assertion failure myself. If I do REINDEX INDEX it
gives a duplicate key violation, and if I do REINDEX TABLE or REINDEX
DATABASE I get claimed success.

This is using either current head (ab0f7b6) or 168d315 as binaries to
start up the cluster.

Cheers,

Jeff


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 17:49:15
Message-ID: CAMkU=1w5jQjQCNZdQ2wsogOw5T6_HAAsu6G3JXVFiquB31FUkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 9:32 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Feb 7, 2013 at 12:55 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>
>> Index scans do not return any duplicates and you need to force a seq
>> scan to see them. Assuming that the page level VM bit might be
>> corrupted, I tried to REINDEX the table to see if it complains of
>> unique key violations, but that crashes the server with
>>
>> TRAP: FailedAssertion("!(((bool) ((root_offsets[offnum - 1] !=
>> ((OffsetNumber) 0)) && (root_offsets[offnum - 1] <= ((OffsetNumber)
>> (8192 / sizeof(ItemIdData)))))))", File: "index.c", Line: 2482)
>
> I don't see the assertion failure myself. If I do REINDEX INDEX it
> gives a duplicate key violation, and if I do REINDEX TABLE or REINDEX
> DATABASE I get claimed success.

Ah, ignore that claim. Of course one does not get assertion failures
if one neglected to turn them on!

Cheers,

Jeff


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 17:52:11
Message-ID: 20130207175211.GD5172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes escribió:
> On Thu, Feb 7, 2013 at 12:55 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:

> I don't see the assertion failure myself. If I do REINDEX INDEX it
> gives a duplicate key violation, and if I do REINDEX TABLE or REINDEX
> DATABASE I get claimed success.
>
> This is using either current head (ab0f7b6) or 168d315 as binaries to
> start up the cluster.

Note that the visibility tests are correct: those tuples should all be
visible. The problem is not the binary that's running the cluster now;
the problem is the binary that created the cluster in the first place
(or rather the binary that was running when tuple freezing took place).
That is, assuming my theory about tuple freezing being involved is correct.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 18:09:16
Message-ID: CABOikdP6UDyEp0PDMhR=d0NXnhOHs_qPaSRRxkuQ-V_x3PtLAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 10:48 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Feb 7, 2013 at 1:44 AM, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:

>>
>> jjanes=# select *, xmin, xmax, ctid from foo where index IN (select
>> index from foo group by index having count(*) > 1 ORDER by index)
>> ORDER by index LIMIT 3;
>> index | count | xmin | xmax | ctid
>> -------+-------+------------+------+-----------
>> 219 | 353 | 2100345903 | 0 | (150,98)
>> 219 | 354 | 2100346051 | 0 | (150,101)
>> 219 | 464 | 2101601086 | 0 | (150,126)
>> (3 rows)
>
> The one where count=464 should be the correct one to be visible, and
> the other two are old tuples that were updated away. (The test driver
> increases the count column monotonically for each any given value of
> index column.
>

Right. I don't have the database handy at this moment, but earlier in
the day I ran some queries against it and found that most of the
duplicates which are not accessible via indexes have xmin very close
to 2100345903. In fact, many of them are from a consecutive range.

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 19:15:46
Message-ID: CAMkU=1wO_Pt=mCYyu9gv_eS2KKm86v8_xEbR31KM5nnY3Oheag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 10:09 AM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
> Right. I don't have the database handy at this moment, but earlier in
> the day I ran some queries against it and found that most of the
> duplicates which are not accessible via indexes have xmin very close
> to 2100345903. In fact, many of them are from a consecutive range.

Does anyone have suggestions on how to hack the system to make it
fast-forward the current transaction id? It would certainly make
testing this kind of thing faster if I could make transaction id
increment by 100 each time a new one is generated. Then wrap-around
could be approached in minutes rather than hours.

Thanks,

Jeff


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 20:01:26
Message-ID: 20130207200126.GA21122@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-02-07 11:15:46 -0800, Jeff Janes wrote:
> On Thu, Feb 7, 2013 at 10:09 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> > Right. I don't have the database handy at this moment, but earlier in
> > the day I ran some queries against it and found that most of the
> > duplicates which are not accessible via indexes have xmin very close
> > to 2100345903. In fact, many of them are from a consecutive range.
>
> Does anyone have suggestions on how to hack the system to make it
> fast-forward the current transaction id? It would certainly make
> testing this kind of thing faster if I could make transaction id
> increment by 100 each time a new one is generated. Then wrap-around
> could be approached in minutes rather than hours.

I had various plpgsql functions to do that, but those still took quite
some time. As I needed it before I just spent some minutes hacking up a
contrib module to do the job.

I doubt it really think it makes sense as a contrib module on its own
though?

postgres=# select * FROM burnxids(500000);select * FROM
burnxids(500000);
burnxids
----------
5380767
(1 row)

Time: 859.807 ms
burnxids
----------
5880767
(1 row)

Time: 717.700 ms

It doesn't really work in a nice way:
if (GetTopTransactionIdIfAny() != InvalidTransactionId)
elog(ERROR, "can't burn xids in a transaction with xid");

for (i = 0; i < nxids; i++)
{
last = GetNewTransactionId(false);
}

/* don't keep xid as assigned */
MyPgXact->xid = InvalidTransactionId;

but it seems to work ;)

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
xidfuncs.patch text/x-patch 2.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 20:11:40
Message-ID: 20956.1360267900@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> Does anyone have suggestions on how to hack the system to make it
> fast-forward the current transaction id?

What I've generally done is to stop the server then use pg_resetxlog
to put the XID counter where I want it. I believe you'll need to
manually create a pg_clog file corresponding to the new XID counter
value, and maybe pg_subtrans too. (Someday pg_resetxlog oughta be
improved to create those files itself, probably.)

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 20:23:42
Message-ID: CA+U5nMLBN4eO7uVQ7p0CeOafA8ekVViAGVGsB5bE=NVaNKxQGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7 February 2013 19:15, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> On Thu, Feb 7, 2013 at 10:09 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>
>> Right. I don't have the database handy at this moment, but earlier in
>> the day I ran some queries against it and found that most of the
>> duplicates which are not accessible via indexes have xmin very close
>> to 2100345903. In fact, many of them are from a consecutive range.
>
> Does anyone have suggestions on how to hack the system to make it
> fast-forward the current transaction id? It would certainly make
> testing this kind of thing faster if I could make transaction id
> increment by 100 each time a new one is generated. Then wrap-around
> could be approached in minutes rather than hours.

This is a variation of one of the regression tests...

CREATE OR REPLACE FUNCTION burn_xids (n integer)
RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
IF n <= 0 THEN RETURN; END IF;
PERFORM burn_xids(n - 1);
RETURN;
EXCEPTION WHEN raise_exception THEN NULL; END;
$$;

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-07 20:32:16
Message-ID: 20130207203216.GE5172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Janes escribió:
> On Thu, Feb 7, 2013 at 10:09 AM, Pavan Deolasee
> <pavan(dot)deolasee(at)gmail(dot)com> wrote:
> >
> > Right. I don't have the database handy at this moment, but earlier in
> > the day I ran some queries against it and found that most of the
> > duplicates which are not accessible via indexes have xmin very close
> > to 2100345903. In fact, many of them are from a consecutive range.
>
> Does anyone have suggestions on how to hack the system to make it
> fast-forward the current transaction id? It would certainly make
> testing this kind of thing faster if I could make transaction id
> increment by 100 each time a new one is generated. Then wrap-around
> could be approached in minutes rather than hours.

I can reproduce the problem in a few minutes with the attached.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
fastxid.patch text/x-diff 1.3 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 04:38:18
Message-ID: 20130208043818.GF5172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera escribió:
> Alvaro Herrera escribió:
>
> > Hm, if the foreign key patch is to blame, this sounds like these tuples
> > had a different set of XMAX hint bits and a different Xmax, and they
> > were clobbered by something like vacuum or page pruning.
>
> Hm, I think heap_freeze_tuple is busted, yes.

This patch seems to fix the problem for me. Please confirm.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
fix-freeze.patch text/x-diff 1020 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 04:47:29
Message-ID: 28450.1360298849@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> xid = HeapTupleHeaderGetRawXmax(tuple);
> ! if (((tuple->t_infomask & HEAP_XMAX_IS_MULTI) &&
> ! MultiXactIdIsValid(xid) &&
> ! MultiXactIdPrecedes(xid, cutoff_multi)) ||
> ! ((!(tuple->t_infomask & HEAP_XMAX_IS_MULTI)) &&
> ! TransactionIdIsNormal(xid) &&
> ! TransactionIdPrecedes(xid, cutoff_xid)))
> {

Would this be clearer as a ternary expression? That is,

if ((tuple->t_infomask & HEAP_XMAX_IS_MULTI) ?
(MultiXactIdIsValid(xid) &&
MultiXactIdPrecedes(xid, cutoff_multi)) :
(TransactionIdIsNormal(xid) &&
TransactionIdPrecedes(xid, cutoff_xid)))

regards, tom lane


From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 08:51:44
Message-ID: CABOikdPtpVdv08L4w_HO9+Og1igNxxsFr5scoArQg7bFde6Hzw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 8, 2013 at 10:08 AM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Alvaro Herrera escribió:
>> Alvaro Herrera escribió:
>>
>> > Hm, if the foreign key patch is to blame, this sounds like these tuples
>> > had a different set of XMAX hint bits and a different Xmax, and they
>> > were clobbered by something like vacuum or page pruning.
>>
>> Hm, I think heap_freeze_tuple is busted, yes.
>
> This patch seems to fix the problem for me. Please confirm.
>

I'm trying to reason how this bug explains what we saw. In the test,
we'd left with duplicate tuples. If I just take index 219 in the table
as an example, that tuple had three duplicates. The tuple with CTID
(150, 126) had the index pointer and the rest two were dangling tuples
in the heap. I wonder how the index pointers to those tuples got
removed:

1. May be HOT prune saw those tuples as DEAD and adjusted the HOT
chain by removing those tuples. But then HOT prune would have
reclaimed those tuples as well by setting the lp to UNUSED.

2. Index scan saw the HOT chain as DEAD and hence killed the index
tuple. That looks unlikely because that would require an intermediate
non-HOT update to the tuple. Given that the latest live tuple with the
same index value is in the same block, I seriously doubt there was a
non-HOT update to those tuples.

Also, there are couple of other things to notice.

1. For VACUUM to freeze those tuples as you are suspecting, they
should be seen as LIVE when HeapTupleSatisfiesVacuum is run by VACUUM.
But for them to be removed from the HOT chain, they must be seen as
DEAD to someone else and that must happen before VACUUM is run.

2. Tuple (150, 98) links to (150, 101) and both of them are unwanted
duplicates. Can't reason how we end up in this state.

Jeff mentioned that this thinks this issue could be reproducible
without any crash recovery. Alvaro, I did not try to reproduce the
problem using your patch, but can you please check if you see
duplicates in similar state that we saw in Jeff's case ? Or can
someone explain how we could end up in this state because of
heap_tuple_freeze() freezing a potentially DEAD tuple ?

Thanks,
Pavan
--
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 14:57:37
Message-ID: 20130208145736.GA3980@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavan Deolasee escribió:

> I'm trying to reason how this bug explains what we saw. In the test,
> we'd left with duplicate tuples. If I just take index 219 in the table
> as an example, that tuple had three duplicates. The tuple with CTID
> (150, 126) had the index pointer and the rest two were dangling tuples
> in the heap.

Hm, the examples I chased had t_infomask 0x2500, that is there were no
HOT bits set. It's quite possible that there's another bug in here, but
this one is a real one and it explains a very similar problem.

Now, how would your answers change if HeapTupleSatisfiesVacuum returned
RECENTLY_DEAD instead of DEAD? That's what I saw; and when it happened,
vacuum didn't set the "tupgone" flag, and thus passed the tuple to
heap_freeze_tuple; that routine examined the tuple and removed the Xmax
and set the HEAP_XMAX_INVALID bit because of the bogus logic. A tuple
which was supposed to be dead suddenly turned into visible.

I'm not really sure how these bogus conditions make HOT misbehave; I
don't fully understand the page pruning stuff. I think if they see a
chain and in the middle of it one Xmin doesn't match the next tuple's
Xmax, it considers the whole thing to not be a chain at all. So maybe
that explains the other effects?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 15:55:03
Message-ID: 20130208155503.GC3980@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane escribió:
> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> > xid = HeapTupleHeaderGetRawXmax(tuple);
> > ! if (((tuple->t_infomask & HEAP_XMAX_IS_MULTI) &&
> > ! MultiXactIdIsValid(xid) &&
> > ! MultiXactIdPrecedes(xid, cutoff_multi)) ||
> > ! ((!(tuple->t_infomask & HEAP_XMAX_IS_MULTI)) &&
> > ! TransactionIdIsNormal(xid) &&
> > ! TransactionIdPrecedes(xid, cutoff_xid)))
> > {
>
> Would this be clearer as a ternary expression? That is,
>
> if ((tuple->t_infomask & HEAP_XMAX_IS_MULTI) ?
> (MultiXactIdIsValid(xid) &&
> MultiXactIdPrecedes(xid, cutoff_multi)) :
> (TransactionIdIsNormal(xid) &&
> TransactionIdPrecedes(xid, cutoff_xid)))

Ah, yes, by far. Thanks, I pushed that way.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-02-08 17:08:52
Message-ID: CAMkU=1wJnGgkswCHMdykFYScEO_8NNb+if4LCgiW+Jtv+UpVEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 8:38 PM, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> Alvaro Herrera escribió:
>> Alvaro Herrera escribió:
>>
>> > Hm, if the foreign key patch is to blame, this sounds like these tuples
>> > had a different set of XMAX hint bits and a different Xmax, and they
>> > were clobbered by something like vacuum or page pruning.
>>
>> Hm, I think heap_freeze_tuple is busted, yes.
>
> This patch seems to fix the problem for me. Please confirm.

Applied to 0ac5ad5134f276, it has survived 4 wrap-arounds so far
without problem. I will let it go for a while longer, but I think we
can assume it is fixed.

Cheers,

Jeff


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-06-19 22:01:44
Message-ID: CAMkU=1xoCH4qP0F2Mgg0uvPev4ZQif7B+rRXouwKqNV2w2TZOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 7, 2013 at 12:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2013-02-07 11:15:46 -0800, Jeff Janes wrote:
> >
> > Does anyone have suggestions on how to hack the system to make it
> > fast-forward the current transaction id? It would certainly make
> > testing this kind of thing faster if I could make transaction id
> > increment by 100 each time a new one is generated. Then wrap-around
> > could be approached in minutes rather than hours.
>
> I had various plpgsql functions to do that, but those still took quite
> some time. As I needed it before I just spent some minutes hacking up a
> contrib module to do the job.
>

Hi Andres,

Your patch needs the file "xidfuncs--1.0.sql", but does not include it.

I could probably guess what needs to be in that file, but do you still have
a copy of it?

>
> I doubt it really think it makes sense as a contrib module on its own
> though?
>

Maybe PGXN?

Cheers,

Jeff


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Vacuum/visibility is busted
Date: 2013-06-20 14:42:52
Message-ID: 20130620144252.GC16659@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-06-19 15:01:44 -0700, Jeff Janes wrote:
> On Thu, Feb 7, 2013 at 12:01 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:
>
> > On 2013-02-07 11:15:46 -0800, Jeff Janes wrote:
> > >
> > > Does anyone have suggestions on how to hack the system to make it
> > > fast-forward the current transaction id? It would certainly make
> > > testing this kind of thing faster if I could make transaction id
> > > increment by 100 each time a new one is generated. Then wrap-around
> > > could be approached in minutes rather than hours.
> >
> > I had various plpgsql functions to do that, but those still took quite
> > some time. As I needed it before I just spent some minutes hacking up a
> > contrib module to do the job.
> >
>
> Hi Andres,
>
> Your patch needs the file "xidfuncs--1.0.sql", but does not include it.
>
> I could probably guess what needs to be in that file, but do you still have
> a copy of it?

Hm. Sorry. Not sure how that happened. The commit in my local branch for
that seems to have it ;). Attached.

> > I doubt it really think it makes sense as a contrib module on its own
> > though?

> Maybe PGXN?

Hm. As of now I am not yet that convinced of PGXN for C containing
extensions.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-add-xidfuncs.patch text/x-patch 3.5 KB