Clean switchover

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Clean switchover
Date: 2013-06-11 22:53:29
Message-ID: CAHGQGwHLjEROTMtSWJd=xg_VFwRe3oJWnTYsyBDUbRYa6rr0DQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

In streaming replication, when we shutdown the master, walsender tries to
send all the outstanding WAL records including the shutdown checkpoint
record to the standby, and then to exit. This basically means that all the
WAL records are fully synced between two servers after the clean shutdown
of the master. So, after promoting the standby to new master, we can
restart the stopped master as new standby without the need for a fresh
backup from new master.

But there is one problem: though walsender tries to send all the outstanding
WAL records, it doesn't wait for them to be replicated to the standby. IOW,
walsender closes the replication connection as soon as it sends WAL records.
Then, before receiving all the WAL records, walreceiver can detect
the closure of connection and exit. We cannot guarantee that there is no
missing WAL in the standby after clean shutdown of the master. In this case,
backup from new master is required when restarting the stopped master as
new standby. I have experienced this case several times, especially when
enabling WAL archiving.

The attached patch fixes this problem. It just changes walsender so that it
waits for all the outstanding WAL records to be replicated to the standby
before closing the replication connection.

You may be concerned the case where the standby gets stuck and the
walsender keeps waiting for the reply from that standby. In this case,
wal_sender_timeout detects such inactive standby and then walsender
ends. So even in that case, the shutdown can end.

Thought?

Regards,

--
Fujii Masao

Attachment Content-Type Size
switchover_v1.patch application/octet-stream 2.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-06-11 22:58:05 Re: JSON and unicode surrogate pairs
Previous Message Noah Misch 2013-06-11 22:26:52 Re: JSON and unicode surrogate pairs