Re: Synchronous Standalone Master Redoux

From: Jose Ildefonso Camargo Tolosa <ildefonso(dot)camargo(at)gmail(dot)com>
To: Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Hampus Wessman <hampus(at)hampuswessman(dot)se>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous Standalone Master Redoux
Date: 2012-07-14 14:12:09
Message-ID: CAETJ_S_GReZ05SCy=dzAGN5+KAQ5gGmS5q-v2D7fU0_PkGJmtg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jul 14, 2012 at 12:42 AM, Amit kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> From: Jose Ildefonso Camargo Tolosa [ildefonso(dot)camargo(at)gmail(dot)com]
>> Sent: Saturday, July 14, 2012 9:36 AM
>>On Fri, Jul 13, 2012 at 11:12 PM, Amit kapila <amit(dot)kapila(at)huawei(dot)com> wrote:
>> From: pgsql-hackers-owner(at)postgresql(dot)org [pgsql-hackers-owner(at)postgresql(dot)org] on behalf of Jose Ildefonso Camargo Tolosa [ildefonso(dot)camargo(at)gmail(dot)com]
>> Sent: Saturday, July 14, 2012 6:08 AM
>> On Fri, Jul 13, 2012 at 10:22 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> On Fri, Jul 13, 2012 at 09:12:56AM +0200, Hampus Wessman wrote:
>>>
>>>>> So how about this for a Postgres TODO:
>>>>>
>>>>> Add configuration variable to allow Postgres to disable synchronous
>>>>> replication after a specified timeout, and add variable to alert
>>>>> administrators of the change.
>>
>>>> I agree we need a TODO for this, but... I think timeout-only is not
>>>> the best choice, there should be a maximum timeout (as a last
>>>> resource: the maximum time we are willing to wait for standby, this
>>>> have to have the option of "forever"), but certainly PostgreSQL have
>>>> to detect the *complete* disconnection of the standby (or all standbys
>>>> on the synchronous_standby_names), if it detects that no standbys are
>>>> eligible for sync standby AND the option to do fallback to async is
>>>> enabled = it will go into standalone mode (as if
>>>> synchronous_standby_names were empty), otherwise (if option is
>>>> disabled) it will just continue to wait for ever (the "last resource"
>>>> timeout is ignored if the fallback option is disabled).... I would
>>>> call this "soft_synchronous_standby", and
>>>> "soft_synchronous_standby_timeout" (in seconds, 0=forever, a sane
>>>> value would be ~5 seconds) or something like that (I'm quite bad at
>>>> picking names :( ).
>>
>> >After it has gone to standalone mode, if the standby came back will it be able to return back to sync mode with it.
>
>> That's the idea, yes, after the standby comes back, the master would
>> act as if the sync standby connected for the first time: first going
>> through the "catchup" mode, and "once the lag between standby and
>> primary reaches zero "(...)" we move to real-time streaming state"
>> (from 9.1 docs), at that point: normal sync behavior is restored.
>
> Idea wise, it looks okay, but are you sure that in the current code/design, it can handle the way you are suggesting.
> I am not sure it can work because it might be the case that due to network instability, the master has gone in standalone mode
> and now after standy is able to communicate back, it might be expecting to get more data rather than go in cacthup mode.
> I believe some person who is expert of this code area can comment here to make it more concrete.

Well, I'd need to dive into the code, but as far as I know, is the
master who decides to be on "catchup" mode, and standby just takes
care of sending feedback to master. Also, it has to handle the
situation, because currently, if master goes away because it crashed,
or because of network issues, the standby doesn't really know why, and
will reconnect to master and do whatever it needs to do to get in sync
with master again (be it: try to reconnect several times while master
is restarting, or that it just reconnect to a waiting master, and
request pending WAL segments). There have to be code in place to
handle those issues, because it is already working. I'm trying to get
a solution that is as non-intrusive as possible, with lower amount of
code added, so that performance doesn't suffer by reusing current
logic and actions, with small alterations.

>
> With Regards,
> Amit Kapila.

--
Ildefonso Camargo
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2012-07-14 14:50:06 Re: Re: [COMMITTERS] pgsql: Fix mapping of PostgreSQL encodings to Python encodings.
Previous Message Heikki Linnakangas 2012-07-14 11:02:28 Re: [PATCH] Allow breaking out of hung connection attempts