Re: Patch for fail-back without fresh backup

From: Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>, Samrat Revagade <revagade(dot)samrat(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch for fail-back without fresh backup
Date: 2013-07-07 07:27:37
Message-ID: CAD21AoD8qFcOjN6skX8J1C2hfF_a0-MW2Ja9n-HoLdWrupztfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 7, 2013 at 4:19 PM, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com> wrote:
> On Mon, Jun 17, 2013 at 8:48 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> On 17 June 2013 09:03, Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com> wrote:
>>
>>> I agree. We should probably find a better name for this. Any suggestions ?
>>
>> err, I already made one...
>>
>>>> But that's not the whole story. I can see some utility in a patch that
>>>> makes all WAL transfer synchronous, rather than just commits. Some
>>>> name like synchronous_transfer might be appropriate. e.g.
>>>> synchronous_transfer = all | commit (default).
>>
>>> Since commits are more foreground in nature and this feature
>>> does not require us to wait during common foreground activities, we want a
>>> configuration where master can wait for synchronous transfers at other than
>>> commits. May we can solve that by having more granular control to the said
>>> parameter ?
>>>
>>>>
>>>> The idea of another slew of parameters that are very similar to
>>>> synchronous replication but yet somehow different seems weird. I can't
>>>> see a reason why we'd want a second lot of parameters. Why not just
>>>> use the existing ones for sync rep? (I'm surprised the Parameter
>>>> Police haven't visited you in the night...) Sure, we might want to
>>>> expand the design for how we specify multi-node sync rep, but that is
>>>> a different patch.
>>>
>>>
>>> How would we then distinguish between synchronous and the new kind of
>>> standby ?
>>
>> That's not the point. The point is "Why would we have a new kind of
>> standby?" and therefore why do we need new parameters?
>>
>>> I am told, one of the very popular setups for DR is to have one
>>> local sync standby and one async (may be cascaded by the local sync). Since
>>> this new feature is more useful for DR because taking a fresh backup on a
>>> slower link is even more challenging, IMHO we should support such setups.
>>
>> ...which still doesn't make sense to me. Lets look at that in detail.
>>
>> Take 3 servers, A, B, C with A and B being linked by sync rep, and C
>> being safety standby at a distance.
>>
>> Either A or B is master, except in disaster. So if A is master, then B
>> would be the failover target. If A fails, then you want to failover to
>> B. Once B is the target, you want to failback to A as the master. C
>> needs to follow the new master, whichever it is.
>>
>> If you set up sync rep between A and B and this new mode between A and
>> C. When B becomes the master, you need to failback from B from A, but
>> you can't because the new mode applied between A and C only, so you
>> have to failback from C to A. So having the new mode not match with
>> sync rep means you are forcing people to failback using the slow link
>> in the common case.
>>
>> You might observe that having the two modes match causes problems if A
>> and B fail, so you are forced to go to C as master and then eventually
>> failback to A or B across a slow link. That case is less common and
>> could be solved by extending sync transfer to more/multi nodes.
>>
>> It definitely doesn't make sense to have sync rep on anything other
>> than a subset of sync transfer. So while it may be sensible in the
>> future to make sync transfer a superset of sync rep nodes, it makes
>> sense to make them the same config for now.
> I have updated the patch.
>
> we support following 2 cases.
> 1. SYNC server and also make same failback safe standby server
> 2. ASYNC server and also make same failback safe standby server
>
> 1. changed name of parameter
> give up 'failback_safe_standby_names' parameter from the first patch.
> and changed name of parameter from 'failback_safe_mode ' to
> 'synchronous_transfer'.
> this parameter accepts 'all', 'data_flush' and 'commit'.
>
> -'commit'
> 'commit' means that master waits for corresponding WAL to flushed
> to disk of standby server on commits.
> but master doesn't waits for replicated data pages.
>
> -'data_flush'
> 'data_flush' means that master waits for replicated data page
> (e.g, CLOG, pg_control) before flush to disk of master server.
> but if user set to 'data_flush' to this parameter,
> 'synchronous_commit' values is ignored even if user set
> 'synchronous_commit'.
>
> -'all'
> 'all' means that master waits for replicated WAL and data page.
>
> 2. put SyncRepWaitForLSN() function into XLogFlush() function
> we have put SyncRepWaitForLSN() function into XLogFlush() function,
> and change argument of XLogFlush().
>
> they are setup case and need to set parameters.
>
> - SYNC server and also make same failback safe standgy server (case 1)
> synchronous_transfer = all
> synchronous_commit = remote_write/on
> synchronous_standby_names = <ServerName>
>
> - ASYNC server and also make same failback safe standgy server (case 2)
> synchronous_transfer = data_flush
> (synchronous_commit values is ignored)
>
> - default SYNC replication
> synchronous_transfer = commit
> synchronous_commit = on
> synchronous_standby_names = <ServerName>
>
> - default ASYNC replication
> synchronous_transfer = commit
>
> ToDo
> 1. currently this patch supports synchronous transfer. so we can't set
> different synchronous transfer mode to each server.
> we need to improve the patch for support following cases.
> - SYNC standby and make separate ASYNC failback safe standby
> - ASYNC standby and make separate ASYNC failback safe standby
>
> 2. we have not measure performance yet. we need to measure perfomance.
>
> please give me your feedback.
>
> Regards,
>
> -------
> Sawada Masahiko

I'm sorry. I forgot attached the patch.
Please see the attached file.

Regards,

-------
Sawada Masahiko

Attachment Content-Type Size
failback_safe_standby_v2.patch application/octet-stream 26.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabien COELHO 2013-07-07 08:48:56 Re: Add regression tests for ROLE (USER)
Previous Message Sawada Masahiko 2013-07-07 07:19:01 Re: Patch for fail-back without fresh backup