Re: Clean switchover

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Clean switchover
Date: 2013-06-12 11:38:37
Message-ID: CABUevEyT6xXtumW9mw9Gr7bu43K1+LLZK-jhi=YsWMGYRSLkaA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jun 12, 2013 at 6:41 AM, Amit Kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
> On Wednesday, June 12, 2013 4:23 AM Fujii Masao wrote:
>> Hi,
>>
>> In streaming replication, when we shutdown the master, walsender tries
>> to send all the outstanding WAL records including the shutdown
>> checkpoint record to the standby, and then to exit. This basically
>> means that all the WAL records are fully synced between two servers
>> after the clean shutdown of the master. So, after promoting the standby
>> to new master, we can restart the stopped master as new standby without
>> the need for a fresh backup from new master.
>>
>> But there is one problem: though walsender tries to send all the
>> outstanding WAL records, it doesn't wait for them to be replicated to
>> the standby. IOW, walsender closes the replication connection as soon
>> as it sends WAL records.
>> Then, before receiving all the WAL records, walreceiver can detect the
>> closure of connection and exit. We cannot guarantee that there is no
>> missing WAL in the standby after clean shutdown of the master. In this
>> case, backup from new master is required when restarting the stopped
>> master as new standby. I have experienced this case several times,
>> especially when enabling WAL archiving.
>>
>> The attached patch fixes this problem. It just changes walsender so
>> that it waits for all the outstanding WAL records to be replicated to
>> the standby before closing the replication connection.
>>
>> You may be concerned the case where the standby gets stuck and the
>> walsender keeps waiting for the reply from that standby. In this case,
>> wal_sender_timeout detects such inactive standby and then walsender
>> ends. So even in that case, the shutdown can end.
>
> Do you think it can impact time to complete shutdown?
> After completing shutdown, user will promote standby to master, so if there
> is delay in shutdown, it can cause delay in switchover.

I'd expect a controlled switchover to happen without dataloss. Yes,
this could make it take a bit longer time, but it guarantees you don't
loose data. ISTM that if you don't care about the potential dataloss,
you can just use a faster shutdown method (e.g. immediate)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-06-12 11:47:46 Re: Adding IEEE 754:2008 decimal floating point and hardware support for it
Previous Message Andrew Dunstan 2013-06-12 11:31:15 Re: JSON and unicode surrogate pairs