BUG #6291: Xid epoch is not updated properly

Lists: pgsql-bugs
From: "Daniel Farina" <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6291: Xid epoch is not updated properly
Date: 2011-11-13 22:15:42
Message-ID: 201111132215.pADMFgvH080387@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 6291
Logged by: Daniel Farina
Email address: daniel(at)heroku(dot)com
PostgreSQL version: 9.0.5
Operating system: Ubuntu 10.04
Description: Xid epoch is not updated properly
Details:

We have on hand a database that makes heavy use of the txid_snapshot family
of functions, and recently it just passed its 4^32 transaction mark.
Unfortunately, upon wraparound the xid epoch appears to not have been
incremented, remaining at 0. However, pg_controldata does properly report a
> 4^32 number, and so far it appears the database otherwise functions
normally. Here's a snippet:

Latest checkpoint's NextXID: 0/2131670
Latest checkpoint's NextOID: 1416740
Latest checkpoint's NextMultiXactId: 1119
Latest checkpoint's NextMultiOffset: 3115
Latest checkpoint's oldestXID: 4131117606
Latest checkpoint's oldestXID's DB: 16385
Latest checkpoint's oldestActiveXID: 0

The result is the following documentation at
"http://www.postgresql.org/docs/9.0/static/functions-info.html" is
dangerously misleading:

"The internal transaction ID type (xid) is 32 bits wide and wraps around
every 4 billion transactions. However, these functions export a 64-bit
format that is extended with an "epoch" counter so it will not wrap around
during the life of an installation. The data type used by these functions,
txid_snapshot, stores information about transaction ID visibility at a
particular moment in time. Its components are described in Table 9-53."


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Daniel Farina" <daniel(at)heroku(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6291: Xid epoch is not updated properly
Date: 2011-11-13 23:27:11
Message-ID: 28793.1321226831@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Daniel Farina" <daniel(at)heroku(dot)com> writes:
> We have on hand a database that makes heavy use of the txid_snapshot family
> of functions, and recently it just passed its 4^32 transaction mark.
> Unfortunately, upon wraparound the xid epoch appears to not have been
> incremented, remaining at 0.

I failed to reproduce this here, and a look at the code responsible for
xid epoch maintenance reveals no obvious way that it could have been
bypassed. So there's some fairly critical piece of context that you're
not telling us ...

regards, tom lane


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6291: Xid epoch is not updated properly
Date: 2011-11-14 06:30:09
Message-ID: CAAZKuFb3FaO=emaz6OGb0_32bdzzvb7SkKEJF4jxN_f-Ft7+5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Nov 13, 2011 at 3:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Daniel Farina" <daniel(at)heroku(dot)com> writes:
>> We have on hand a database that makes heavy use of the txid_snapshot family
>> of functions, and recently it just passed its 4^32 transaction mark.
>> Unfortunately, upon wraparound the xid epoch appears to not have been
>> incremented, remaining at 0.
>
> I failed to reproduce this here, and a look at the code responsible for
> xid epoch maintenance reveals no obvious way that it could have been
> bypassed.  So there's some fairly critical piece of context that you're
> not telling us ...

Hmm, the database has nothing particularly special about it; I also
reviewed the epoch code and don't see any simple oversight. On the
other hand, I should have the WAL that plays past the epoch wrap, so I
can instrument some telltale bit of code; if you have any special
suggestion about the diagnostics I'd like to hear them.

Also, do you see anything strange about the pg_controldata as-is? I'm
looking at in particular the greater-than 2**32 oldestXID seems in
line with expectation, yet calling txid_current et-al exposes a number
in the hundreds of thousands, as reflected in nextXID.

--
fdr