Re: Patch for fail-back without fresh backup

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fail-back without fresh backup
Date: 2013-06-24 13:47:00
Message-ID: CAD21AoCY2_bQVPzJeY7S77amncCBXfJ+1gpHgGDbULKLAv0t+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 17 June 2013 09:03, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>
>> I agree. We should probably find a better name for this. Any suggestions ?
>
> err, I already made one...
>
>>> But that's not the whole story. I can see some utility in a patch that
>>> makes all WAL transfer synchronous, rather than just commits. Some
>>> name like synchronous_transfer might be appropriate. e.g.
>>> synchronous_transfer = all | commit (default).
>
>> Since commits are more foreground in nature and this feature
>> does not require us to wait during common foreground activities, we want a
>> configuration where master can wait for synchronous transfers at other than
>> commits. May we can solve that by having more granular control to the said
>> parameter ?
>>
>>>
>>> The idea of another slew of parameters that are very similar to
>>> synchronous replication but yet somehow different seems weird. I can't
>>> see a reason why we'd want a second lot of parameters. Why not just
>>> use the existing ones for sync rep? (I'm surprised the Parameter
>>> Police haven't visited you in the night...) Sure, we might want to
>>> expand the design for how we specify multi-node sync rep, but that is
>>> a different patch.
>>
>>
>> How would we then distinguish between synchronous and the new kind of
>> standby ?
>
> That's not the point. The point is "Why would we have a new kind of
> standby?" and therefore why do we need new parameters?
>
>> I am told, one of the very popular setups for DR is to have one
>> local sync standby and one async (may be cascaded by the local sync). Since
>> this new feature is more useful for DR because taking a fresh backup on a
>> slower link is even more challenging, IMHO we should support such setups.
>
> ...which still doesn't make sense to me. Lets look at that in detail.
>
> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
> being safety standby at a distance.
>
> Either A or B is master, except in disaster. So if A is master, then B
> would be the failover target. If A fails, then you want to failover to
> B. Once B is the target, you want to failback to A as the master. C
> needs to follow the new master, whichever it is.
>
> If you set up sync rep between A and B and this new mode between A and
> C. When B becomes the master, you need to failback from B from A, but
> you can't because the new mode applied between A and C only, so you
> have to failback from C to A. So having the new mode not match with
> sync rep means you are forcing people to failback using the slow link
> in the common case.
>
> You might observe that having the two modes match causes problems if A
> and B fail, so you are forced to go to C as master and then eventually
> failback to A or B across a slow link. That case is less common and
> could be solved by extending sync transfer to more/multi nodes.
>
> It definitely doesn't make sense to have sync rep on anything other
> than a subset of sync transfer. So while it may be sensible in the
> future to make sync transfer a superset of sync rep nodes, it makes
> sense to make them the same config for now.
>
when 2 servers being synchronous replication, those servers are in
same location in many cases. ( e.g., same server room)
so taking a full backup and sending it to old master is not issue.
this proposal works for situation which those servers are put in
remote location and when main site is powered down due to such as
power failure or natural disaster occurs.
as you said, we can control file (e.g., CLOG, pg_control, etc)
replicating by adding synchronous_transfer option.
but if to add only this parameter, we can handle only following 2 cases.

1. synchronous standby and make same as failback safe standby
2. asynchronous standby and make same as failback safe standby

in above case, adding new parameter might be meaningless. but I think
that we should handle case not only case 1,2 but also following case
3, 4 for DR.

3. synchronous standby and make different asynchronous failback safe standby
4. asynchronous standby and make different asynchronous failback safe standby

To handles following case 3 and 4, we should set parameter to each
standby. so we need to adding new parameter.
if we can structure replication in such situation, replication would
be more useful for user in slow link.

parameter improvement idea is which we extend ini file for to set
parameter each standby. For example :

--------------------
[Server]
standby_name = 'slave1'
synchronous_transfer = commit
wal_sender_timeout = 30
[Server]
standby_name = 'slave2'
synchronous_transfer = all
wal_sender_timeout = 50
-------------------

there are discussions about such ini file in past. if so, we can set
each parameter to each standby.

please give me feedback.

Regards,
-------
Sawada Masahiko

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-24 13:55:22 Re: [Review] Re: minor patch submission: CREATE CAST ... AS EXPLICIT
Previous Message Kevin Grittner 2013-06-24 13:44:53 Re: changeset generation v5-01 - Patches & git tree