Re: Standalone synchronous master

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 18:06:24
Message-ID: 52CEE520.8010703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert,

> I think the problem here is that we tend to have a limited view of
> "the right way to use synch rep". If I have 5 nodes, and I set 1
> synchronous and the other 3 asynchronous, I've set up a "known
> successor" in the event that the leader fails. In this scenario
> though, if the "successor" fails, you actually probably want to keep
> accepting writes; since you weren't using synchronous for durability
> but for operational simplicity. I suspect there are probably other
> scenarios where users are willing to trade latency for improved and/or
> directed durability but not at the extent of availability, don't you?

That's a workaround for a completely different limitation though; the
inability to designate a specific async replica as "first". That is, if
there were some way to do so, you would be using that rather than sync
rep. Extending the capabilities of that workaround is not something I
would gladly do until I had exhausted other options.

The other problem is that *many* users think they can get improved
availability, consistency AND durability on two nodes somehow, and to
heck with the CAP theorem (certain companies are happy to foster this
illusion). Having a simple, easily-accessable auto-degrade without
treading degrade as a major monitoring event will feed this
self-deception. I know I already have to explain the difference between
"synchronous" and "simultaneous" to practically every one of my clients
for whom I set up replication.

Realistically, degrade shouldn't be something that happens inside a
single PostgreSQL node, either the master or the replica. It should be
controlled by some external controller which is capable of deciding on
degrade or not based on a more complex set of circumstances (e.g. "Is
the replica actually down or just slow?"). Certainly this is the case
with Cassandra, VoltDB, Riak, and the other "serious" multinode databases.

> This isn't to say there isn't a lot of confusion around the issue.
> Designing, implementing, and configuring different guarantees in the
> presence of node failures is a non-trivial problem. Still, I'd prefer
> to see Postgres head in the direction of providing more options in
> this area rather than drawing a firm line at being a CP-oriented
> system.

I'm not categorically opposed to having any form of auto-degrade at all;
what I'm opposed to is a patch which adds auto-degrade **without adding
any additional monitoring or management infrastructure at all**.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2014-01-09 18:16:02 Re: [PATCH] Relocation of tablespaces in pg_basebackup
Previous Message Alvaro Herrera 2014-01-09 18:05:06 Re: Turning off HOT/Cleanup sometimes