Re: Issues with Quorum Commit

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Markus Wanner <markus(at)bluegap(dot)ch>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Issues with Quorum Commit
Date: 2010-10-07 10:07:48
Message-ID: 4CAD9BF4.90706@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.10.2010 12:52, Dimitri Fontaine wrote:
> Markus Wanner<markus(at)bluegap(dot)ch> writes:
>>> I'm just saying that this should be an option, not the only choice.
>>
>> I'm sorry, I just don't see the use case for a mode that drops
>> guarantees when they are most needed. People who don't need those
>> guarantees should definitely go for async replication instead.
>
> We're still talking about freezing the master and all the applications
> when the first standby still has to do a base backup and catch-up to
> where the master currently is, right?

Either that, or you configure your system for asynchronous replication
first, and flip the switch to synchronous only after the standby has
caught up. Setting up the first standby happens only once when you
initially set up the system, or if you're recovering from a catastrophic
loss of the standby.

>> What does a synchronous replication mode that falls back to async upon
>> failure give you, except for a severe degradation in performance during
>> normal operation? Why not use async right away in such a case?
>
> It's all about the standard case you're building, sync rep, and how to
> manage errors. In most cases I want flexibility. Alert says standby is
> down, you lost your durability requirements, so now I'm building a new
> standby. Does it mean my applications are all off and the master
> refusing to work?

Yes. That's why you want to have at least two standbys if you care about
availability. Or if durability isn't that important to you after all,
use asynchronous replication.

Of course, if in the heat of the moment the admin is willing to forge
ahead without the standby, he can temporarily change the configuration
in the master. If you want the standby to be rebuilt automatically, you
can even incorporate that configuration change in the scripts too. The
important point is that you or your scripts are in control, and you know
at all times whether you can trust the standby or not. If the master
makes such decisions automatically, you don't know if the standby is
trustworthy (ie. guaranteed up-to-date) or not.

>>> so opening a
>>> superuser connection to act on the currently waiting transaction is
>>> still possible (pass/fail, but fail is what at this point? shutdown to
>>> wait some more offline?).
>>
>> Not sure I'm following here. The admin will be busy re-establishing
>> (connections to) standbies, killing transactions on the master doesn't
>> help anything - whether or not the master waits forever.
>
> The idea here would be to be able to manually ACK a transaction that's
> waiting forever, because you know it won't have an answer and you'd
> prefer the application to just continue. But I see that's not a valid
> use case for you.

I don't see anything wrong with having tools for admins to deal with the
unexpected. I'm not sure overriding individual transactions is very
useful though, more likely you'll want to take the whole server offline,
or you want to change the config to allow all transactions to continue
without the synchronous standby.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-10-07 10:32:41 Re: Issues with Quorum Commit
Previous Message Fujii Masao 2010-10-07 09:52:34 Re: Sync Rep at Oct 5