Review of Synchronous Replication patches

Lists: pgsql-hackers
From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Review of Synchronous Replication patches
Date: 2010-07-24 06:56:18
Message-ID: 4C4A8E92.7080607@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello list,

My apologies if this email arrives more than once, I've been
experiencing troubles posting to the -hackers list and am resending this
review as new thread instead of a reply to an existing thread of
http://archives.postgresql.org/pgsql-hackers/2010-07/msg01072.php which
also has relevant links to both patches and discussion.

Here follows a severly time-boxed review of both synchronous replication
patches. Expect this review to be incomplete, though I believe the
general remarks to be valid. Off the list I've received word from Zoltan
that work on a new patch is planned. It would be great if ideas from
both patches could be merged into one.

regards,
Yeb Havinga

Patch A: Zoltán Böszörményi
Patch B: Fujii Masao

* Is the patch in context diff format?
A: Yes
B: Yes

* Does it apply cleanly to the current CVS HEAD?
A: No
B: Yes

* Does it include reasonable tests, necessary doc patches, etc?
A: Doc patches, and a contrib module for debugging purposes.
B: Doc patches.

For neither patch the documentation is final, see e.g.
* 25.2 - The section starting with "It should be noted that the log
shipping is asynchronous, i.e.," should be updated.
* 25.2.5 - Dito for "Streaming replication is asynchronous, so there..."

Testing any replication setup is not possible with the current
regression tool suite, and adding that is beyond the scope of any
synchronous replication patch. Perhaps the work of Markus Wanner with
dtester (that adds a make dcheck) can be assimilated for this purpose.

Both patches add synchronous replication to the current single-master
streaming replication features in 9.0, which means that there now is a
mechanism where a commit on the master does not finish until 'some or
more of the replicas are in sync'

* Does the patch actually implement that?
A: Yes
B: No, if no standbys are connected commits to not wait. (If I
understand the code from WaitXLogSend correct)

* Do we want that?
For database users who do never want to loose a single committed record,
synchronous replication is a relatively easy setup to achieve a high
level of 'security' (where secure means: less chance of losing data).
The easiness is relative to current solutions (the whole range of
mirrored disks, controllers, backups, log shipping etc).

* Do we already have it?
No

* Does it follow SQL spec, or the community-agreed behavior?
A: Unknown, though the choices in guc parameters suggest the patch's
been made to support actual use cases.
B: Design of patch B has been discussed on the mailing list as well as
the wiki (for links I refer to my previous mail)

* Are there dangers?
Besides the usual errors causing transactions to fail, in my opinion the
single largest danger would be that the master thinks a standby is in
sync, where in fact it isn't.

* Have all the bases been covered?
Patch A seems to cover two-phase commit where patch B does not. (I'm
time constrained and currently do not have a replicated cluster with
patch B anymore, so I cannot test).

# Does the feature work as advertised?
Patch A: yes
Patch B: yes

I ran a few tests with failures and had some recoverable problems, of
which I'm unsure they are related to the sync rep patches or streaming
replication in general. Work in the direction of a replication
regression test would be useful.

# Are there any assertion failures or crashes?
No

A note: reading source code of both patches makes it clear that these
are patches from experienced PostgreSQL programmers. The source code
reads just like 'normal' high quality postgresql source.

Differences:
Patch A synchronizes by sending back the Xids that have been received
and written to disk. When the master receives the xids, the respective
backends with having those xids active are unlocked and signalled to
continue. This means some locking of the procarray. The libpq protocol
was adapted so a client (the walreceiver) can report back the xids.
Patch B synchronizes by waiting to send wal data, until the sync
replicas have sending back LSN pointers in the wal files they have
synced to, in one of three ways. The libpq protocol was adapted so the
walreceiver can report back WAL file positions.

Perhaps the weakness of both patches is that they are not one. In my
opinion, both patches have their strengths. It would be great if these
strenght could be unified in a single patch.

patch A strengths:
* design of the guc parameters - meaning of
min_sync_replication_clients. As I understand it, it is not possible
with patch B to define a minimum number of synchronous standby servers a
transaction must be replicated to, before the commit succeeds. Perhaps
the name could be changed (quorum_min_sync_standbys?), but in my opinion
the definitive sync replication patch needs a parameter with this number
and exact meaning. The 'synchronous_slave' guc in boolean is also a good
one in my opinion.

patch B strengths:
* having the walsender synced by waiting for acked LSN, instead of
signalling respective backends with a specific XID active, seems like a
simpler and therefore better solution.
* the three different ways when to sync (recv,flush and replay), and
also that this is configurable per standby.

In patch B there's probably some work to do in WaitXLogSend(), after a
long day I'm having troubles understanding it's meaning but somehow I
believe that a more intuitive set of guc parameters can be found,
accompanied by clear code in this function. The way patch A synchronizes
and count synchronous standbys for a commit is clearer.

end of review.


From: zb(at)cybertec(dot)at
To: "Yeb Havinga" <yebhavinga(at)gmail(dot)com>
Cc: "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, hs(at)cybertec(dot)at
Subject: Re: Review of Synchronous Replication patches
Date: 2010-07-24 07:40:17
Message-ID: 45f2f88591b7bc77ba03839929736061.squirrel@internal.cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

> ... Off the list I've received word from Zoltan
> that work on a new patch is planned. It would be great if ideas from
> both patches could be merged into one.

...

> * Does it follow SQL spec, or the community-agreed behavior?
> A: Unknown, though the choices in guc parameters suggest the patch's
> been made to support actual use cases.
> B: Design of patch B has been discussed on the mailing list as well as
> the wiki (for links I refer to my previous mail)

> * Have all the bases been covered?
> Patch A seems to cover two-phase commit where patch B does not. (I'm
> time constrained and currently do not have a replicated cluster with
> patch B anymore, so I cannot test).

...

> Differences:
> Patch A synchronizes by sending back the Xids that have been received
> and written to disk. When the master receives the xids, the respective
> backends with having those xids active are unlocked and signalled to
> continue. This means some locking of the procarray. The libpq protocol
> was adapted so a client (the walreceiver) can report back the xids.
> Patch B synchronizes by waiting to send wal data, until the sync
> replicas have sending back LSN pointers in the wal files they have
> synced to, in one of three ways. The libpq protocol was adapted so the
> walreceiver can report back WAL file positions.
>
> Perhaps the weakness of both patches is that they are not one. In my
> opinion, both patches have their strengths. It would be great if these
> strenght could be unified in a single patch.
>
> patch A strengths:
> * design of the guc parameters - meaning of
> min_sync_replication_clients. As I understand it, it is not possible
> with patch B to define a minimum number of synchronous standby servers a
> transaction must be replicated to, before the commit succeeds. Perhaps
> the name could be changed (quorum_min_sync_standbys?), but in my opinion
> the definitive sync replication patch needs a parameter with this number
> and exact meaning. The 'synchronous_slave' guc in boolean is also a good
> one in my opinion.
>
> patch B strengths:
> * having the walsender synced by waiting for acked LSN, instead of
> signalling respective backends with a specific XID active, seems like a
> simpler and therefore better solution.
> * the three different ways when to sync (recv,flush and replay), and
> also that this is configurable per standby.
>
> In patch B there's probably some work to do in WaitXLogSend(), after a
> long day I'm having troubles understanding it's meaning but somehow I
> believe that a more intuitive set of guc parameters can be found,
> accompanied by clear code in this function. The way patch A synchronizes
> and count synchronous standbys for a commit is clearer.

This week I am on vacation but we discussed your review a little with
Hans-Jürgen Schönig and given the opinion and the consensus that sending
back LSNs is a better solution, I will withdraw my patch from the
commitfest page.

Instead, I will post a patch that unifies my configuration choices with
Fujii's patch. Do you have suggestions for better worded GUCs?
"slave" seems to be phased out by "standby" for political correctness, so
"synchronous_standby" instead of "synchronous_slave". You mentioned
"min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?

You also noticed that my patch addressed 2PC, maybe I will have to add
this part to Fujii's patch, too. Note: I haven't yet read his patch,
maybe working with LSNs instead of XIDs make this work automatically,
I don't know. We should definitely test.

Best regards,
Zoltán Böszörményi


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: zb(at)cybertec(dot)at
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, hs(at)cybertec(dot)at
Subject: Re: Review of Synchronous Replication patches
Date: 2010-07-24 12:17:51
Message-ID: 4C4AD9EF.7070204@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello Zoltán,

Thanks for your reply!
> Instead, I will post a patch that unifies my configuration choices with
> Fujii's patch.
Please know that Fujii's patch is also a work in progress. I didn't
mention in my review the previously discussed items, most important the
changing the polling loops (see e.g.
http://archives.postgresql.org/pgsql-hackers/2010-07/msg00757.php).
> Do you have suggestions for better worded GUCs?
> "slave" seems to be phased out by "standby" for political correctness, so
> "synchronous_standby" instead of "synchronous_slave". You mentioned
> "min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?
>
The 'quorum_min_sync_standbys' came from a reply to the design of
Fujii's patch on
http://archives.postgresql.org/pgsql-hackers/2010-07/msg01167.php.
However after thinking about it a bit more, I still fail to see the use
case for a maximum quorum number. Having a 'max quorum' also seems to
contradict common meanings of 'quorum', which in itself is a minimum,
minimum number, least required number, lower limit.

Having also sync in the name is useful, imho, so people know it's not
about async servers. So then the name would become quorum_sync_standbys
or synchronous_standby_quorum.

Though I think I wouldn't use 'strict_sync_replication' (explained in
http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php) I can
imagine others want this feature, to not have the master halt by sync
standby failure. If that's what somebody never want, than the way this
is solved by this parameter is elegant: only wait if they are connected.

In recovery.conf, a boolean to discern between sync and async servers
(like in your patch) is IMHO better than mixing 'sync or async' with the
replication modes. Together with the replication modes, this could then
become
synchronous_standby (boolean)
synchronous_mode (recv,fsync,replay)

> You also noticed that my patch addressed 2PC, maybe I will have to add
> this part to Fujii's patch, too. Note: I haven't yet read his patch,
> maybe working with LSNs instead of XIDs make this work automatically,
> I don't know.
Yes, and I think we definately need automated replicated cluster tests.
A large part of reviewing went into virtual machine setup and cluster
setup. I'm not sure if a full test suite that includes node failures
could or should be included in the core regression test, but anything
that automates setup, a few transactions and failures would benefit
everyone working and testing on replication.

regards,
Yeb Havinga


From: zb(at)cybertec(dot)at
To: "Yeb Havinga" <yebhavinga(at)gmail(dot)com>
Cc: zb(at)cybertec(dot)at, "PostgreSQL-development Hackers" <pgsql-hackers(at)postgresql(dot)org>, hs(at)cybertec(dot)at
Subject: Re: Review of Synchronous Replication patches
Date: 2010-07-24 13:06:20
Message-ID: 62086e5fc0b8e2355f9f5d055e36d76f.squirrel@internal.cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hello Zoltán,
>
> Thanks for your reply!
>> Instead, I will post a patch that unifies my configuration choices with
>> Fujii's patch.
> Please know that Fujii's patch is also a work in progress.

Yes, I know that. But working from Fujii's last public patch
or from his GIT tree makes my patch easier to merge.

> I didn't
> mention in my review the previously discussed items, most important the
> changing the polling loops (see e.g.
> http://archives.postgresql.org/pgsql-hackers/2010-07/msg00757.php).

>> Do you have suggestions for better worded GUCs?
>> "slave" seems to be phased out by "standby" for political correctness,
>> so
>> "synchronous_standby" instead of "synchronous_slave". You mentioned
>> "min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?
>>
> The 'quorum_min_sync_standbys' came from a reply to the design of
> Fujii's patch on
> http://archives.postgresql.org/pgsql-hackers/2010-07/msg01167.php.
> However after thinking about it a bit more, I still fail to see the use
> case for a maximum quorum number. Having a 'max quorum' also seems to
> contradict common meanings of 'quorum', which in itself is a minimum,
> minimum number, least required number, lower limit.
>
> Having also sync in the name is useful, imho, so people know it's not
> about async servers. So then the name would become quorum_sync_standbys
> or synchronous_standby_quorum.

Ok.

> Though I think I wouldn't use 'strict_sync_replication' (explained in
> http://archives.postgresql.org/pgsql-hackers/2010-04/msg01516.php) I can
> imagine others want this feature, to not have the master halt by sync
> standby failure. If that's what somebody never want, than the way this
> is solved by this parameter is elegant: only wait if they are connected.

Thanks, I was thinking about possible use cases and different preferences
of people.

> In recovery.conf, a boolean to discern between sync and async servers
> (like in your patch) is IMHO better than mixing 'sync or async' with the
> replication modes. Together with the replication modes, this could then
> become
> synchronous_standby (boolean)
> synchronous_mode (recv,fsync,replay)

Ok.

>> You also noticed that my patch addressed 2PC, maybe I will have to add
>> this part to Fujii's patch, too. Note: I haven't yet read his patch,
>> maybe working with LSNs instead of XIDs make this work automatically,
>> I don't know.
> Yes, and I think we definately need automated replicated cluster tests.
> A large part of reviewing went into virtual machine setup and cluster
> setup. I'm not sure if a full test suite that includes node failures
> could or should be included in the core regression test, but anything
> that automates setup, a few transactions and failures would benefit
> everyone working and testing on replication.
>
> regards,
> Yeb Havinga


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: zb(at)cybertec(dot)at
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, hs(at)cybertec(dot)at
Subject: Re: Review of Synchronous Replication patches
Date: 2010-07-26 07:21:15
Message-ID: AANLkTim=tCQ8bwF5RE8QWknj5rXUc=81gayj8Zg_NgXn@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 24, 2010 at 4:40 PM, <zb(at)cybertec(dot)at> wrote:
> Instead, I will post a patch that unifies my configuration choices with
> Fujii's patch. Do you have suggestions for better worded GUCs?
> "slave" seems to be phased out by "standby" for political correctness, so
> "synchronous_standby" instead of "synchronous_slave". You mentioned
> "min_sync_replication_clients" -> "quorum_min_sync_standbys". What else?

I think that the meaning of my "quorum" parameter is the same as
that of your "min_sync_replication_clients". Right?

I'm planning to add new parameter specifying the behavior of quorum
commit when the number of connected synchronous standbys becomes
less than "quorum".

1. Ignore quorum. If the ACKs from all connected standbys have
arrived, transaction commit is successful even if the number
of standbys is less than quorum. If there is no connected
standby, transaction commit always is successful without
regard to quorum.

2. Observe quorum. Until the number of connected standbys has
become more than or equal to quorum, transaction commit waits.

http://archives.postgresql.org/pgsql-hackers/2010-07/msg01327.php

Is the meaning of this parameter the same as that of your
"strict_sync_replication"?

> You also noticed that my patch addressed 2PC, maybe I will have to add
> this part to Fujii's patch, too. Note: I haven't yet read his patch,
> maybe working with LSNs instead of XIDs make this work automatically,
> I don't know. We should definitely test.

Yeah, transaction commit seems to have to wait for replication in
not only RecordTransactionCommit() but also EndPrepare(),
RecordTransactionCommitPrepared() and RecordTransactionAbortPrepared().

I'll fix that.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center