Re: Patch for fail-back without fresh backup

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fail-back without fresh backup
Date: 2013-06-27 18:22:43
Message-ID: CA+TgmoZauJ+VNsjwHZOVtCCOtHJT08zNzu-CL5+y-Ky2Z05PFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 17, 2013 at 7:48 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> I am told, one of the very popular setups for DR is to have one
>> local sync standby and one async (may be cascaded by the local sync). Since
>> this new feature is more useful for DR because taking a fresh backup on a
>> slower link is even more challenging, IMHO we should support such setups.
>
> ...which still doesn't make sense to me. Lets look at that in detail.
>
> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
> being safety standby at a distance.
>
> Either A or B is master, except in disaster. So if A is master, then B
> would be the failover target. If A fails, then you want to failover to
> B. Once B is the target, you want to failback to A as the master. C
> needs to follow the new master, whichever it is.
>
> If you set up sync rep between A and B and this new mode between A and
> C. When B becomes the master, you need to failback from B from A, but
> you can't because the new mode applied between A and C only, so you
> have to failback from C to A. So having the new mode not match with
> sync rep means you are forcing people to failback using the slow link
> in the common case.

It's true that in this scenario that doesn't really make sense, but I
still think they are separate properties. You could certainly want
synchronous replication without this new property, if you like the
data-loss guarantees that sync rep provides but don't care about
failback. You could also want this new property without synchronous
replication, if you don't need the data-loss guarantees that sync rep
provides but you do care about fast failback. I admit it seems
unlikely that you would use both features but not target them at the
same machines, although maybe: perhaps you have a sync standby and an
async standby and want this new property with respect to both of them.

In my admittedly limited experience, the use case for a lot of this
technology is in the cloud. The general strategy seems to be: at the
first sign of trouble, kill the offending instance and fail over.
This can result in failing over pretty frequently, and needing it to
be fast. There may be no real hardware problem; indeed, the failover
may be precipitated by network conditions or overload of the physical
host backing the virtual machine or any number of other nonphysical
problems. I can see this being useful in that environment, even for
async standbys. People can apparently tolerate a brief interruption
while their primary gets killed off and connections are re-established
with the new master, but they need the failover to be fast. The
problem with the status quo is that, even if the first failover is
fast, the second one isn't, because it has to wait behind rebuilding
the original master.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2013-06-27 18:23:28 Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Previous Message Tom Lane 2013-06-27 18:17:25 Re: Kudos for Reviewers -- straw poll