Re: Patch for fail-back without fresh backup

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fail-back without fresh backup
Date: 2013-06-28 05:10:54
Message-ID: CAD21AoBh4xJvdkd7RkgqipTxX8XOG=t-f+mnZLseKcJYHs5+rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 26, 2013 at 1:40 PM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Tuesday, June 25, 2013 10:23 AM Amit Langote wrote:
>> Hi,
>>
>> >
>> >> So our proposal on this problem is that we must ensure that master
>> should
>> > not make any file system level changes without confirming that the
>> >> corresponding WAL record is replicated to the standby.
>> >
>> > How will you take care of extra WAL on old master during recovery.
>> If it
>> > plays the WAL which has not reached new-master, it can be a problem.
>> >
>>
>> I am trying to understand how there would be extra WAL on old master
>> that it would replay and cause inconsistency. Consider how I am
>> picturing it and correct me if I am wrong.
>>
>> 1) Master crashes. So a failback standby becomes new master forking the
>> WAL.
>> 2) Old master is restarted as a standby (now with this patch, without
>> a new base backup).
>> 3) It would try to replay all the WAL it has available and later
>> connect to the new master also following the timeline switch (the
>> switch might happen using archived WAL and timeline history file OR
>> the new switch-over-streaming-replication-connection as of 9.3,
>> right?)
>>
>> * in (3), when the new standby/old master is replaying WAL, from where
>> is it picking the WAL?
> Yes, this is the point which can lead to inconsistency, new standby/old master
> will replay WAL after the last successful checkpoint, for which he get info from
> control file. It is picking WAL from the location where it was logged when it was active (pg_xlog).
>
>> Does it first replay all the WAL in pg_xlog
>> before archive? Should we make it check for a timeline history file in
>> archive before it starts replaying any WAL?
>
> I have really not thought what is best solution for problem.
>
>> * And, would the new master, before forking the WAL, replay all the
>> WAL that is necessary to come to state (of data directory) that the
>> old master was just before it crashed?
>
> I don't think new master has any correlation with old master's data directory,
> Rather it will replay the WAL it has received/flushed before start acting as master.
when old master fail over, WAL which ahead of new master might be
broken data. so that when user want to dump from old master, there is
possible to fail dump.
it is just idea, we extend parameter which is used in recovery.conf
like 'follow_master_force'. this parameter accepts 'on' and 'off', is
effective only when standby_mode is set to on.

if both parameters 'follow_master_force' and 'standby_mode' is set to 'on',
1. when standby server starts and starts to recovery, standby server
skip to apply WAL which is in pg_xlog, and request WAL from latest
checkpoint LSN to master server.
2. master server receives LSN which is standby server latest
checkpoint, and compare between LSN of standby and LSN of master
latest checkpoint. if those LSN match, master will send WAL from
latest checkpoint LSN. if not, master will inform standby that failed.
3. standby will fork WAL, and apply WAL which is sent from master continuity.

in this approach, user who want to dump from old master will set 'off'
to follow_master_force and standby_mode, and gets the dump of old
master after master started. OTOH, user who want to starts replication
force will set 'on' to both parameter.

please give me feedback.

Regards,

-------
Sawada Masahiko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-06-28 05:49:46 Re: proposal: enable new error fields in plpgsql (9.4)
Previous Message Amit Kapila 2013-06-28 04:52:59 Re: Move unused buffers to freelist