Quick Links

Re: Issues with Quorum Commit

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Issues with Quorum Commit
Date:	2010-10-06 08:01:55
Message-ID:	AANLkTimkkrCrtj5LD6m9LyP8taiQUzx4+U9SU+OTYahj@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 6, 2010 at 10:52 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I'm not sure I entirely understand. I was concerned about the case of a
> standby server being allowed to lag behind the rest by a large number of
> WAL records. That can't happen in the "wait for all servers to apply"
> case, because the system would become unavailable rather than allow a
> significant difference in the amount of WAL applied.
>
> I'm not saying that an unavailable system is good, but I don't see how
> my particular complaint applies to the "wait for all servers to apply"
> case.
>
> The case I was worried about is:
> * 1 master and 2 standby
> * The rule is "wait for at least one standby to apply the WAL"
>
> In your notation, I believe that's M -> { S1, S2 }
>
> In that case, if one S1 is just a little faster than S2, then S2 might
> build up a significant queue of unapplied WAL. Then, when S1 goes down,
> there's no way for the slower one to acknowledge a new transaction
> without playing through all of the unapplied WAL.
>
> Intuitively, the administrator would think that he was getting both HA
> and redundancy, but in reality the availability is no better than if
> there were only two servers (M -> S1), except that it might be faster to
> replay the WAL then to set up a new standby (but that's not guaranteed).

Agreed. This is similar to my previous complaint.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00946.php

This problem would happen even if we fix the quorum to 1 as Josh propose.
To avoid this, the master must wait for ACK from all the connected
synchronous standbys.

I think that this is likely to happen especially when we choose 'apply'
replication level. Because that level can easily lag a synchronous
standby because of the conflict between recovery and read-only query.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Re: Issues with Quorum Commit at 2010-10-06 01:52:10 from Jeff Davis

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2010-10-06 08:05:21	Re: host name support in pg_hba.conf
Previous Message	KaiGai Kohei	2010-10-06 08:01:05	Re: leaky views, yet again