Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Christophe Pettus <xof(at)thebuild(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date: 2013-11-20 14:53:20
Message-ID: 20131120145320.GF25406@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-11-20 06:21:13 -0800, Kevin Grittner wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > On 2013-11-20 05:59:58 -0800, Kevin Grittner wrote:
>
> >> I don't understand where that would make sense; especially since
> >> I thought that a database FREEZE followed by a checkpoint
> >> releases old clog space anyway.
> >
> > It only releases them up to the (cluster wide) xmin horizon. So
> > if there are older snapshots or prepared xacts around...
>
> So as long as there are no open transactions or prepared
> transactions on the master which started before the release with
> the fix is applied, VACUUM FREEZE would be guaranteed to work?
> Since I don't see how a non-prepared transaction would be running
> from before a minor release upgrade, that just means we have to
> make sure there are no prepared transactions from before the
> upgrade?

That's not a bad point. So the way to fix it would be:

1) Restart the standby to the new minor release, wait for catchup
2) Restart the primary (fast or smart) to the new minor release
3) Acquire enough new xids to make sure we cross a clog page (?)
4) Jot down a new xid: SELECT txid_current()::bigint % (1::bigint<<33-1)
5) vacuumdb -z -a
6) Ensure that there are no prepared xacts older than 3) around
SELECT *
FROM pg_prepared_xacts
ORDER BY age(transaction) DESC LIMIT 1;
7) Ensure the xmin horizon is above the one from: 3:
SELECT datname, datfrozenxid
FROM pg_database
WHERE datname != 'template0'
ORDER BY age(datfrozenxid) DESC LIMIT 1;
8) Get the current lsn: SELECT pg_current_xlog_location();
9) verify on each standby that SELECT pg_last_xlog_receive_location() is
past 7)
10) be happy

I am not sure how we can easily compute that 6) and 7) are past 3) in
the presence of xid wraparounds.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-11-20 14:53:53 Re: Autoconf 2.69 update
Previous Message Tom Lane 2013-11-20 14:36:26 Re: Easily reading debug_print_plan