Re: Could not finish anti-wraparound VACUUM when stop limit is reached

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Could not finish anti-wraparound VACUUM when stop limit is reached
Date: 2014-06-03 21:45:32
Message-ID: CAMkU=1wNKH_NoSZHp5wuV7pzyjhZCvMF-yrfNZ0ivV00dyWt4A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, May 25, 2014 at 8:40 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> While debugging the B-tree bug that Jeff Janes reported (
> http://www.postgresql.org/message-id/CAMkU=1y=VwF07Ay+Cpqk_
> 7FpiHRctmssV9y99SBGhitkXPbf8g(at)mail(dot)gmail(dot)com), a new issue came up:
>
> If you reach the xidStopLimit, and try to run VACUUM, it fails with error:
>
> jjanes=# vacuum;
> ERROR: database is not accepting commands to avoid wraparound data loss
> in database "jjanes"
> HINT: Stop the postmaster and vacuum that database in single-user mode.
> You might also need to commit or roll back old prepared transactions.
>
> The backtrace looks like this:
>
> #0 errstart (elevel=20, filename=0x9590a0 "varsup.c", lineno=120,
> funcname=0x9593f0 <__func__.10455> "GetNewTransactionId", domain=0x0)
> at elog.c:249
> #1 0x00000000004f7c14 in GetNewTransactionId (isSubXact=0 '\000') at
> varsup.c:115
> #2 0x00000000004f86db in AssignTransactionId (s=0xd62900
> <TopTransactionStateData>)
> at xact.c:510
> #3 0x00000000004f84a4 in GetCurrentTransactionId () at xact.c:382
> #4 0x000000000062dc1c in vac_truncate_clog (frozenXID=1493663893,
> minMulti=1)
> at vacuum.c:909
> #5 0x000000000062dc06 in vac_update_datfrozenxid () at vacuum.c:888
> #6 0x000000000062cdf6 in vacuum (vacstmt=0x29e05e0, relid=0, do_toast=1
> '\001',
> bstrategy=0x2a5cc38, for_wraparound=0 '\000', isTopLevel=1 '\001') at
> vacuum.c:294
> #7 0x00000000007a3c55 in standard_ProcessUtility (parsetree=0x29e05e0,
> queryString=0x29dfbf8 "vacuum ;", context=PROCESS_UTILITY_TOPLEVEL,
> params=0x0,
> dest=0x29e0988, completionTag=0x7fff9411a490 "") at utility.c:645
>
> So, vac_truncate_clog() tries to get a new transaction ID, which fails
> because we've already reached the stop-limit. vac_truncate_clog() doesn't
> really need a new XID to be assigned, though, it only uses it to compare
> against datfrozenxid to see if wrap-around has already happened, so it
> could use ReadNewTransactionId() instead.
>

Like the attached patch, or does it need to be more complicated than that?

>
>
> Jeff's database seems to have wrapped around already, because after fixing
> the above, I get this:
>
> jjanes=# vacuum;
> WARNING: some databases have not been vacuumed in over 2 billion
> transactions
> DETAIL: You might have already suffered transaction-wraparound data loss.
> VACUUM
>

> We do not truncate clog when wraparound has already happened, so we never
> get past that point. Jeff advanced XID counter aggressively with some
> custom C code, so hitting the actual wrap-around is a case of "don't do
> that". Still, the case is quite peculiar: pg_controldata says that nextXid
> is 4/1593661139. The oldest datfrozenxid is equal to that, 1593661139. So
> ISTM he managed to not just wrap around, but execute 2 billion more
> transactions after the wraparound and reach datfrozenxid again. I'm not
> sure how that happened.

I applied the attached patch to current git head (0ad1a816320a2b539a5) and
use it to start up a copy of the indicated data directory in ordinary (not
single user) mode after deleting postmaster.opts. Autovacuum starts,
eventually completes, and then the database becomes usable again.

Perhap while investigating the matter you did something that pushed it over
the edge. Could you try it again from a fresh untarring of the data
directory and see if you can reproduce my results?

Thanks,

Jeff

Attachment Content-Type Size
ReadNew.patch application/octet-stream 670 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-03 21:53:07 Re: idle_in_transaction_timeout
Previous Message Josh Berkus 2014-06-03 21:22:58 Re: idle_in_transaction_timeout