Timeout and wait-forever in sync rep

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Timeout and wait-forever in sync rep
Date: 2010-10-15 12:41:42
Message-ID: AANLkTikP0dGiOzr6zh0v-VthZ+Dwbt3kh3vEKQsZ0Xon@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

As the result of the discussion, I think that we need the following two
parameters for the case where the standby goes down.

* replication_timeout
This is the maximum time to wait for the ACK from the standby. If this
timeout expires, the master closes the replication connection and
disconnects the standby. This parameter is just used for the master
to detect the standby crash or the network outage.

We already have keepalive parameters for that purpose. But they cannot
detect the disconnection in some cases. So replication_timeout needs
to be introduced for sync rep.

* allow_standalone_master
This specifies whether we allow the master to process transactions
alone when there is no connected and sync'd standby.

If this is false, all the transactions on the master are blocked until
sync'd standby has appeared. Of course, this happen not only when
replication_timeout expires but also when we start the master alone
at the initial setup, when the master detects the disconnection by
using keepalive parameters, and when the standby is shut down normally.
People who want 'wait-forever' will disable this parameter to reduce
the risk of data loss.

OTOH, if this is true, the absence of sync'd standby doesn't prevent
the master from processing transactions alone. People who want high
availability even though the risk of data loss increases will enable
this parameter.

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2010-10-15 13:04:22 Re: security hook on table creation
Previous Message Oleg Bartunov 2010-10-15 11:37:29 Re: knngist plans