Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Timeout for asynchronous replication Re: Timeout and wait-forever in sync rep
Date: 2010-12-20 08:17:27
Message-ID: AANLkTin=8a9oS7D3dg6v3wB0Sgz21W-x-8zLfgy5Vt0f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 7, 2010 at 12:20 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Yeah.  If we rely on the TCP send buffer filling up, then the amount
> of time the master takes to notice a dead standby is going to be hard
> for the user to predict.  I think the standby ought to send some sort
> of heartbeat and the master should declare the standby dead if it
> doesn't see a heartbeat soon enough.  Maybe the heartbeat could even
> include the receive/fsync/replay LSNs, so that sync rep can use the
> same machinery but with more aggressive policies about when they must
> be sent.

OK. How about keepalive-like parameters and behaviors?

replication_keepalives_idle
replication_keepalives_interval
replication_keepalives_count

The master sends the keepalive packet if replication_keepalives_idle
elapsed after receiving the last ACK packet including the receive/
fsync/replay LSNs from the standby. OTOH, the standby sends the
ACK packet back to the master as soon as receiving the keepalive
packet.

If the master could not receive the ACK packet for
replication_keepalives_interval, it repeats sending the keepalive
packet and receiving the ACK replication_keepalives_count -1
times. If no ACK packet has finally arrived, the master thinks the
standby has been dead.

One obvious merit against my original proposal is that the master
can notice the death of the standby even when there are no WAL
records sendable. One demerit is that the standby needs to send
some packets even in asynchronous replication.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-12-20 08:56:33 Re: Extensions and custom_variable_classes
Previous Message Alex Hunsaker 2010-12-20 07:39:17 Re: plperlu problem with utf8