Synchronous replication

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Synchronous replication
Date: 2010-07-14 06:50:13
Message-ID: AANLkTilgyL3Y1jkDVHX02433COq7JLmqicsqmOsbuyA1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

The attached patch provides core of synchronous replication feature
based on streaming replication. I added this patch into CF 2010-07.

The code is also available in my git repository:
git://git.postgresql.org/git/users/fujii/postgres.git
branch: synchrep

Synchronization levels
----------------------
The patch provides replication_mode parameter in recovery.conf, which
specifies the replication mode which can control how long transaction
commit on the master server waits for replication before the command
returns a "success" indication to the client. Valid modes are:

1. async
doesn't make transaction commit wait for replication, i.e.,
asynchronous replication. This mode has been already supported in
9.0.

2. recv
makes transaction commit wait until the standby has received WAL
records.

3. fsync
makes transaction commit wait until the standby has received and
flushed WAL records to disk

4. replay
makes transaction commit wait until the standby has replayed WAL
records after receiving and flushing them to disk

You can choose the synchronization level per standby.

Quorum commit
-------------
In previous discussion about synchronous replication, some people
wanted the quorum commit feature. This feature is included in also
Zontan's synchronous replication patch, so I decided to create it.

The patch provides quorum parameter in postgresql.conf, which
specifies how many standby servers transaction commit will wait for
WAL records to be replicated to, before the command returns a
"success" indication to the client. The default value is zero, which
always doesn't make transaction commit wait for replication without
regard to replication_mode. Also transaction commit always doesn't
wait for replication to asynchronous standby (i.e., replication_mode
is set to async) without regard to this parameter. If quorum is more
than the number of synchronous standbys, transaction commit returns
a "success" when the ACK has arrived from all of synchronous standbys.

Currently quorum parameter is defined as PGC_USERSET. You can have
some transactions replicate synchronously and others asynchronously.

Protocol
--------
I extended the handshake message "START_REPLICATION" so that it
includes replication_mode read from recovery.conf. If 'async' is
passed, the master thinks that it doesn't need to wait for the ACK
from the standby.

I added XLogRecPtr message, which is used to send the ACK meaning
completion of replication from walreceiver to walsender. If
replication_mode = 'async', this message is never sent. XLogRecPtr
message always includes the current receive location if mode is 'recv',
the current flush location if mode is 'fsync' and the current replay
location if mode is 'replay'.

Then, if the location in the ACK is more than or equal to the
location of the COMMIT record, transaction breaks out of the wait-loop
and returns a "success" to the client.

TODO
----
The patch have no features for performance improvement of synchronous
replication. I admit that currently the performance overhead in the
master is terrible. We need to address the following TODO items in the
subsequent CF.

* Change the poll loop in the walsender
* Change the poll loop in the backend
* Change the poll loop in the startup process
* Change the poll loop in the walreceiver
* Perform the WAL write and replication concurrently
* Send WAL from not only disk but also WAL buffers

For the case where the network outage happens or the standby fails, we
should expose the maximum time to wait for replication, as a parameter.
Furthermore you might want to specify the reaction to the timeout. These
are also not in the patch, so we need to address them in the subsequent
CF, too.

In synchronous replication, it's important to check whether the standby
has been sync with the master. But such a monitoring feature is also not
in the patch. That's TODO.

It would be difficult to commit whole of synchronous replication feature
at one time. I'm planning to develop it by stages.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Attachment Content-Type Size
synch_rep_0714.patch application/octet-stream 50.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-07-14 07:41:25 Re: suppress automatic recovery after back crash
Previous Message Pavel Stehule 2010-07-14 06:15:32 Fwd: sql/med review - problems with patching