Re: Completely broken replica after PANIC: WAL contains references to invalid pages

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Sergey Konoplev <gray(dot)ru(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>, Максим Панченко <Panchenko(at)gw(dot)tander(dot)ru>, Толстенко Илья <tolstenko_iv(at)gw(dot)tander(dot)ru>
Subject: Re: Completely broken replica after PANIC: WAL contains references to invalid pages
Date: 2013-04-02 10:10:12
Message-ID: 20130402101012.GB2415@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> On 30 March 2013 17:21, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > So if the xid is later than latestObservedXid we extend subtrans one by
> > one. So far so good. But we initialize it in
> > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> > latestObservedXid = running->nextXid;
> > TransactionIdRetreat(latestObservedXid);
> > Before that subtrans has initially been started up with:
> > if (wasShutdown)
> > oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> > else
> > oldestActiveXID = checkPoint.oldestActiveXid;
> > ...
> > StartupSUBTRANS(oldestActiveXID);
> >
> > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > can take some time till we reach consistency it seems rather plausible
> > that there now will be a gap in initilized pages. From
> > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > inbetween.
>
> That was an old bug.
>
> StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> does that incorrectly? How?

Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
between the call to Startup* and the first call to ExtendSUBTRANS. The
latter is only called *after* we reached STANDBY_INITIALIZED via
ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
running->nextXid - 1. There very well can be a gap inbetween.
The window isn't terribly big but if you use subtransactions as heavily
as Sergey seems to be it doesn't seem unlikely to hit it.

Let me come up with a testcase and patch.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message bricklen 2013-04-02 14:39:11 Re: BUG #8027: Get generated key value while inserting in partitioned table
Previous Message Sandeep Thakkar 2013-04-02 08:45:33 Re: BUG #7985: Postgres Windows Installer fails with "permission denied"