Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Date: 2013-11-20 23:52:22
Message-ID: 528D4B36.1010804@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres,

> Everytime the server in HS mode allows connections ("consistent recovery state
> reached at ..." and "database system is ready to accept read only
> connections" in the log), the bug can be triggered. If there weren't too
> many transactions at that point, the problem won't occur until the
> standby is restarted.

Oh, so this doesn't just happen when the base backup is first taken;
*any* time the standby is restarted, it can happen. (!!!)

>> If someone is doing PITR based on a snapshot taken with pg_basebackup,
>> that will only trip this corruption bug if the user has hot_standby=on
>> in their config *while restoring*? Or is it critical if they have
>> hot_standby=on while backing up?
>
> hot_standby=on only has an effect while starting up with a recovery.conf
> present. So, if you have an old base backup around and all WAL files,
> you can start from that.
>
> Does that answer your questsions?

Yeah, thanks.

If you have any ideas for how we'd write code to scan for this kind of
corruption, please post them.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-20 23:57:28 Re: Data corruption issues using streaming replication on 9.0.14/9.2.5/9.3.1
Previous Message Craig Ringer 2013-11-20 23:45:41 Can we trust fsync?