Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Christophe Pettus <xof(at)thebuild(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date: 2013-11-19 18:40:04
Message-ID: 20131119184004.GD7240@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-11-19 10:32:10 -0800, Christophe Pettus wrote:
>
> On Nov 19, 2013, at 10:29 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
>
> > It's pretty unlikely that any automated testing would have cought this,
> > the required conditions are too unlikely for that.
>
> I would expect that "promote secondary while primary is under heavy
> load" is clear-cut test case.

That's not sufficient though. It's e.g. very hard to reproduce the issue
using the standard pgbench workload (not enough xids generated, too many
hint bits).

Note that the bug isn't caused by promotion, the problem occurs during
the initial startup of a Hot-Standby standby. If the bug wasn't hit
there, it won't be a problem at promotion.

> What concerns me more is that we don't seem to have a framework to put
> in a regression test on the bug you just found (and thank you for
> finding it so quickly!).

Agreed. But regarding it as a bad situation isn't fixing it
unfortunately.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-19 18:42:00 Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Previous Message Bruce Momjian 2013-11-19 18:37:56 Re: Suggestion: Issue warning when calling SET TRANSACTION outside transaction block