Re: WAL replay bugs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WAL replay bugs
Date: 2014-06-17 16:40:35
Message-ID: CA+Tgmoa72AEAgNE=_J_edMkUojjA79pS-aj7-azpp27eoeLsRw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 2, 2014 at 8:55 AM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Wed, Apr 23, 2014 at 9:43 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> And here is the tool itself. It consists of two parts:
>>
>> 1. Modifications to the backend to write the page images
>> 2. A post-processing tool to compare the logged images between master and
>> standby.
> Having that into Postgres at the disposition of developers would be
> great, and I believe that it would greatly reduce the occurrence of
> bugs caused by WAL replay during recovery. So, with the permission of
> the author, I have been looking at this facility for a cleaner
> integration into Postgres.

I'm not sure if this is reasonably possible, but one thing that would
make this tool a whole lot easier to use would be if you could make
all the magic happen in a single server. For example, suppose you had
a background process that somehow got access to the pre and post
images for every buffer change, and the associated WAL record, and
tried applying the WAL record to the pre-image to see whether it got
the corresponding post-image. Then you could run 'make check' or so
and afterwards do something like psql -c 'SELECT * FROM
wal_replay_problems()' and hopefully get no rows back.

Don't get me wrong, having this tool at all sounds great. But I think
to really get the full benefit out of it we need to be able to run it
in the buildfarm, so that if people break stuff it gets noticed
quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-06-17 16:41:32 Re: Proposal for CSN based snapshots
Previous Message Noah Misch 2014-06-17 16:39:50 Re: [PATCH] Replacement for OSSP-UUID for Linux and BSD