Re: Configuring synchronous replication

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Fetter <david(at)fetter(dot)org>, Heikki Linnakangas <heikki(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Configuring synchronous replication
Date: 2010-09-17 10:41:19
Message-ID: 4C9345CF.8000708@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On 17/09/10 12:49, Simon Riggs wrote:
> This isn't just about UI, there are significant and important
> differences between the proposals in terms of the capability and control
> they offer.

Sure. The point of focusing on the UI is that the UI demonstrates what
capability and control a proposal offers.

>> So what should the user interface be like? Given the 1st and 2nd
>> requirement, we need standby registration. If some standbys are
>> important and others are not, the master needs to distinguish between
>> them to be able to determine that a transaction is safely delivered to
>> the important standbys.
>
> My patch provides those two requirements without standby registration,
> so we very clearly don't "need" standby registration.

It's still not clear to me how you would configure things like "wait for
ack from reporting slave, but not other slaves" or "wait until replayed
in the server on the west coast" in your proposal. Maybe it's possible,
but doesn't seem very intuitive, requiring careful configuration in both
the master and the slaves.

In your proposal, you also need to be careful not to connect e.g a test
slave with "synchronous_replication_service = apply" to the master, or
it will possible shadow a real production slave, acknowledging
transactions that are not yet received by the real slave. It's certainly
possible to screw up with standby registration too, but you have more
direct control of the master behavior in the master, instead of
distributing it across all slaves.

> The question is do we want standby registration on master and if so,
> why?

Well, aside from how to configure synchronous replication, standby
registration would help with retaining the right amount of WAL in the
master. wal_keep_segments doesn't guarantee that enough is retained, and
OTOH when all standbys are connected you retain much more than might be
required.

Giving names to slaves also allows you to view their status in the
master in a more intuitive format. Something like:

postgres=# SELECT * FROM pg_slave_status ;
name | connected | received | fsyncd | applied
------------+-----------+------------+------------+------------
reporting | t | 0/26000020 | 0/26000020 | 0/25550020
ha-standby | t | 0/26000020 | 0/26000020 | 0/26000020
testserver | f | | 0/15000020 |
(3 rows)

>> For the control between async/recv/fsync/replay, I like to think in
>> terms of
>> a) asynchronous vs synchronous
>> b) if it's synchronous, how synchronous is it? recv, fsync or replay?
>>
>> I think it makes most sense to set sync vs. async in the master, and the
>> level of synchronicity in the slave. Although I have sympathy for the
>> argument that it's simpler if you configure it all from the master side
>> as well.
>
> I have catered for such requests by suggesting a plugin that allows you
> to implement that complexity without overburdening the core code.

Well, plugins are certainly one possibility, but then we need to design
the plugin API. I've been thinking along the lines of a proxy, which can
implement whatever logic you want to decide when to send the
acknowledgment. With a proxy as well, if we push any features people
that want to a proxy or plugin, we need to make sure that the
proxy/plugin has all the necessary information available.

> This strikes me as an "ad absurdum" argument. Since the above
> over-complexity would doubtless be seen as insane by Tom et al, it
> attempts to persuade that we don't need recv, fsync and apply either.
>
> Fujii has long talked about 4 levels of service also. Why change? I had
> thought that part was pretty much agreed between all of us.

Now you lost me. I agree that we need 4 levels of service (at least
ultimately, not necessarily in the first phase).

> Without performance tests to demonstrate "why", these do sound hard to
> understand. But we should note that DRBD offers recv ("B") and fsync
> ("C") as separate options. And Oracle implements all 3 of recv, fsync
> and apply. Neither of them describe those options so simply and easily
> as the way we are proposing with a 4 valued enum (with async as the
> fourth option).
>
> If we have only one option for sync_rep = 'on' which of recv | fsync |
> apply would it implement? You don't mention that. Which do you choose?

You would choose between recv, fsync and apply in the slave, with a GUC.

> I no longer seek to persuade by words alone. The existence of my patch
> means that I think that only measurements and tests will show why I have
> been saying these things. We need performance tests.

I don't expect any meaningful differences in terms of performance
between any of the discussed options. The big question right now is what
features we provide and how they're configured. Performance will depend
primarily on the mode you use, and secondarily on the implementation of
the mode. It would be completely premature to do performance testing yet
IMHO.

>> Putting all of that together. I think Fujii-san's standby.conf is pretty
>> close.
>
>> What it needs is the additional GUC for transaction-level control.
>
> The difference between the patches is not a simple matter of a GUC.
>
> My proposal allows a single standby to provide efficient replies to
> multiple requested durability levels all at the same time. With
> efficient use of network resources. ISTM that because the other patch
> cannot provide that you'd like to persuade us that we don't need that,
> ever. You won't sell me on that point, cos I can see lots of uses for
> it.

Simon, how the replies are sent is an implementation detail I haven't
given much thought yet. The reason we delved into that discussion
earlier was that you seemed to contradict yourself with the claims that
you don't need to send more than one reply per transaction, and that the
standby doesn't need to know the synchronization level. Other than that
the curiosity about that contradiction, it doesn't seem like a very
interesting detail to me right now. It's not a question that drives the
rest of the design, but the other way round.

But FWIW, something like your proposal of sending 3 XLogRecPtrs in each
reply seems like a good approach. I'm not sure about using walwriter. I
can see that it helps with getting the 'recv' and 'replay'
acknowledgments out faster, but I still have the scars from starting
bgwriter during recovery.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-09-17 11:20:10 Re: Configuring synchronous replication
Previous Message Simon Riggs 2010-09-17 10:02:57 Re: Configuring synchronous replication

Browse pgsql-hackers by date

  From Date Subject
Next Message SAKAMOTO Masahiko 2010-09-17 10:47:29 Re: patch: SQL/MED(FDW) DDL
Previous Message Simon Riggs 2010-09-17 10:02:57 Re: Configuring synchronous replication