Re: Configuring synchronous replication

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <heikki(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Configuring synchronous replication
Date: 2010-09-17 09:49:29
Message-ID: 1284716969.1733.3699.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Fri, 2010-09-17 at 11:09 +0300, Heikki Linnakangas wrote:
> (changed subject again.)
>
> On 17/09/10 10:06, Simon Riggs wrote:
> > I don't think we can determine how far to implement without considering
> > both approaches in detail. With regard to your points below, I don't
> > think any of those points could be committed first.
>
> Yeah, I think we need to decide on the desired feature set first, before
> we dig deeper into the the patches. The design and implementation will
> fall out of that.

Well, we've discussed these things many times and talking hasn't got us
very far on its own. We need measurements and neutral assessments.

The patches are simple and we have time.

This isn't just about UI, there are significant and important
differences between the proposals in terms of the capability and control
they offer.

I propose we develop both patches further and performance test them.
Many of the features I have proposed are performance related and people
need to be able to see what is important, and what is not. But not
through mere discussion, we need numbers to show which things matter and
which things don't. And those need to be derived objectively.

> * Support multiple standbys with various synchronization levels.
>
> * What happens if a synchronous standby isn't connected at the moment?
> Return immediately vs. wait forever.
>
> * Per-transaction control. Some transactions are important, others are not.
>
> * Quorum commit. Wait until n standbys acknowledge. n=1 and n=all
> servers can be seen as important special cases of this.
>
> * async, recv, fsync and replay levels of synchronization.

That's a reasonable starting list of points, there may be others.

> So what should the user interface be like? Given the 1st and 2nd
> requirement, we need standby registration. If some standbys are
> important and others are not, the master needs to distinguish between
> them to be able to determine that a transaction is safely delivered to
> the important standbys.

My patch provides those two requirements without standby registration,
so we very clearly don't "need" standby registration.

The question is do we want standby registration on master and if so,
why?

> For per-transaction control, ISTM it would be enough to have a simple
> user-settable GUC like synchronous_commit. Let's call it
> "synchronous_replication_commit" for now.

If you wish to change the name of the GUC away from the one I have
proposed, fine. Please note that aspect isn't important to me and I will
happily concede all such points to the majority view.

> For non-critical transactions,
> you can turn it off. That's very simple for developers to understand and
> use. I don't think we need more fine-grained control than that at
> transaction level, in all the use cases I can think of you have a stream
> of important transactions, mixed with non-important ones like log
> messages that you want to finish fast in a best-effort fashion.

Sounds like we're getting somewhere. See below.

> I'm
> actually tempted to tie that to the existing synchronous_commit GUC, the
> use case seems exactly the same.

http://archives.postgresql.org/pgsql-hackers/2008-07/msg01001.php
Check the date!

I think that particular point is going to confuse us. It will draw much
bike shedding and won't help us decide between patches. It's a nicety
that can be left to a time after we have the core feature committed.

> OTOH, if we do want fine-grained per-transaction control, a simple
> boolean or even an enum GUC doesn't really cut it. For truly
> fine-grained control you want to be able to specify exceptions like
> "wait until this is replayed in slave named 'reporting'" or 'don't wait
> for acknowledgment from slave named 'uk-server'". With standby
> registration, we can invent a syntax for specifying overriding rules in
> the transaction. Something like SET replication_exceptions =
> 'reporting=replay, uk-server=async'.
>
> For the control between async/recv/fsync/replay, I like to think in
> terms of
> a) asynchronous vs synchronous
> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
>
> I think it makes most sense to set sync vs. async in the master, and the
> level of synchronicity in the slave. Although I have sympathy for the
> argument that it's simpler if you configure it all from the master side
> as well.

I have catered for such requests by suggesting a plugin that allows you
to implement that complexity without overburdening the core code.

This strikes me as an "ad absurdum" argument. Since the above
over-complexity would doubtless be seen as insane by Tom et al, it
attempts to persuade that we don't need recv, fsync and apply either.

Fujii has long talked about 4 levels of service also. Why change? I had
thought that part was pretty much agreed between all of us.

Without performance tests to demonstrate "why", these do sound hard to
understand. But we should note that DRBD offers recv ("B") and fsync
("C") as separate options. And Oracle implements all 3 of recv, fsync
and apply. Neither of them describe those options so simply and easily
as the way we are proposing with a 4 valued enum (with async as the
fourth option).

If we have only one option for sync_rep = 'on' which of recv | fsync |
apply would it implement? You don't mention that. Which do you choose?
For what reason do you make that restriction? The code doesn't get any
simpler, in my patch at least, from my perspective it would be a
restriction without benefit.

I no longer seek to persuade by words alone. The existence of my patch
means that I think that only measurements and tests will show why I have
been saying these things. We need performance tests. I'm not ready for
them today, but will be very soon. I suspect you aren't either since
from earlier discussions you didn't appear to have much about overall
throughput, only about response times for single transactions. I'm happy
to be proved wrong there.

> Putting all of that together. I think Fujii-san's standby.conf is pretty
> close.

> What it needs is the additional GUC for transaction-level control.

The difference between the patches is not a simple matter of a GUC.

My proposal allows a single standby to provide efficient replies to
multiple requested durability levels all at the same time. With
efficient use of network resources. ISTM that because the other patch
cannot provide that you'd like to persuade us that we don't need that,
ever. You won't sell me on that point, cos I can see lots of uses for
it.

Another use case for you:

* customer orders are important, but we want lots of them, so we use
recv mode for those.

* pricing data hardly ever changes, but when it does we need it to be
applied across the cluster so we don't get read mismatches, so those
rare transactions use apply mode.

If you don't want multiple modes at once, you don't need to use that
feature. But there is no reason to prevent people having the choice,
when a design exists that can provide it.

(A separate and later point, is that I would one day like to annotate
specific tables and functions with different modes, so a sysadmin can
point out which data is important at table level - which is what MySQL
provides by allowing choice of storage engine for particular tables.
Nobody cares about the specific engine, they care about the durability
implications of those choices. This isn't part of the current proposal,
just a later statement of direction.)

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-09-17 10:00:42 Re: Configuring synchronous replication
Previous Message Heikki Linnakangas 2010-09-17 09:30:19 Re: Configuring synchronous replication

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2010-09-17 09:52:24 trigger failover with signal
Previous Message Heikki Linnakangas 2010-09-17 09:30:19 Re: Configuring synchronous replication