From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Jeff Davis <pgsql(at)j-davis(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Issues with Quorum Commit |
Date: | 2010-10-06 08:01:55 |
Message-ID: | AANLkTimkkrCrtj5LD6m9LyP8taiQUzx4+U9SU+OTYahj@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Oct 6, 2010 at 10:52 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> I'm not sure I entirely understand. I was concerned about the case of a
> standby server being allowed to lag behind the rest by a large number of
> WAL records. That can't happen in the "wait for all servers to apply"
> case, because the system would become unavailable rather than allow a
> significant difference in the amount of WAL applied.
>
> I'm not saying that an unavailable system is good, but I don't see how
> my particular complaint applies to the "wait for all servers to apply"
> case.
>
> The case I was worried about is:
> * 1 master and 2 standby
> * The rule is "wait for at least one standby to apply the WAL"
>
> In your notation, I believe that's M -> { S1, S2 }
>
> In that case, if one S1 is just a little faster than S2, then S2 might
> build up a significant queue of unapplied WAL. Then, when S1 goes down,
> there's no way for the slower one to acknowledge a new transaction
> without playing through all of the unapplied WAL.
>
> Intuitively, the administrator would think that he was getting both HA
> and redundancy, but in reality the availability is no better than if
> there were only two servers (M -> S1), except that it might be faster to
> replay the WAL then to set up a new standby (but that's not guaranteed).
Agreed. This is similar to my previous complaint.
http://archives.postgresql.org/pgsql-hackers/2010-09/msg00946.php
This problem would happen even if we fix the quorum to 1 as Josh propose.
To avoid this, the master must wait for ACK from all the connected
synchronous standbys.
I think that this is likely to happen especially when we choose 'apply'
replication level. Because that level can easily lag a synchronous
standby because of the conflict between recovery and read-only query.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2010-10-06 08:05:21 | Re: host name support in pg_hba.conf |
Previous Message | KaiGai Kohei | 2010-10-06 08:01:05 | Re: leaky views, yet again |