Re: Standalone synchronous master

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 19:59:43
Message-ID: 52D2F42F.1070306@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

I'm leading this off with a review of the features offered by the actual
patch submitted. My general discussion of the issues of Sync Degrade,
which justifies my specific suggestions below, follows that. Rajeev,
please be aware that other hackers may have different opinions than me
on what needs to change about the patch, so you should collect all
opinions before changing code.

=======================

> Add a new parameter :

> synchronous_standalone_master = on | off

I think this is a TERRIBLE name for any such parameter. What does
"synchronous standalone" even mean? A better name for the parameter
would be "auto_degrade_sync_replication" or "synchronous_timeout_action
= error | degrade", or something similar. It would be even better for
this to be a mode of synchronous_commit, except that synchronous_commit
is heavily overloaded already.

Some issues raised by this log script:

LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver on the standby is killed (SIGKILL)
LOG: unexpected EOF on standby connection
LOG: not waiting for standby synchronization
<-- restart standby so that it connects again
LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver is first stopped (SIGSTOP) to make sure

The "not waiting for standby synchronization" message should be marked
something stronger than LOG. I'd like ERROR.

Second, you have the master resuming sync rep when the standby
reconnects. How do you determine when it's safe to do that? You're
making the assumption that you have a failing sync standby instead of
one which simply can't keep up with the master, or a flakey network
connection (see discussion below).

> a. Master_to_standalone_cmd: To be executed before master
switches to standalone mode.
>
> b. Master_to_sync_cmd: To be executed before master switches from
sync mode to standalone mode.

I'm not at all clear what the difference between these two commands is.
When would one be excuted, and when would the other be executed? Also,
renaming ...

Missing features:

a) we should at least send committing clients a WARNING if they have
commited a synchronous transaction and we are in degraded mode.

I know others have dismissed this idea as too "talky", but from my
perspective, the agreement with the client for each synchronous commit
is being violated, so each and every synchronous commit should report
failure to sync. Also, having a warning on every commit would make it
easier to troubleshoot degraded mode for users who have ignored the
other warnings we give them.

b) pg_stat_replication needs to show degraded mode in some way, or we
need pg_sync_rep_degraded(), or (ideally) both.

I'm also wondering if we need a more sophisticated approach to
wal_sender_timeout to go with all this.

=======================

On 01/11/2014 08:33 PM, Bruce Momjian wrote:
> On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>> place. Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
>
> Well, one goal I was considering is that if a commit is hung waiting for
> slave sync confirmation, and the timeout happens, then the mode is
> changed to degraded and the commit returns success. I am not sure how
> you would do that in an external tool, meaning there is going to be
> period where commits fail, unless you think there is a way that when the
> external tool changes the mode to degrade that all hung commits
> complete. That would be nice.

Realistically, though, that's pretty unavoidable. Any technique which
waits a reasonable interval to determine that the replica isn't going to
respond is liable to go beyond the application's timeout threshold
anyway. There are undoubtedly exceptions to that, but it will be the
case a lot of the time -- how many applications are willing to wait
*minutes* for a COMMIT?

I also don't see any way to allow the hung transactions to commit
without allowing the walsender to make a decision on degrading. As I've
outlined elsewhere (and below), the walsender just doesn't have enough
information to make a good decision.

On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
in a way such that in async mode it never
> waits for commits to be written to standby, but in this new mode it will
> do so unless it is not possible (all sync standby's goes down).
> Can't we use existing wal_sender_timeout, or even if user expects a
> different timeout because for this new mode, he expects master to wait
> more before it start operating like standalone sync master, we can provide
> a new parameter.

One of the reasons that there's so much disagreement about this feature
is that most of the folks strongly in favor of auto-degrade are thinking
*only* of the case that the standby is completely down. There are many
other reasons for a sync transaction to hang, and the walsender has
absolutely no way of knowing which is the case. For example:

* Transient network issues
* Standby can't keep up with master
* Postgres bug
* Storage/IO issues (think EBS)
* Standby is restarting

You don't want to handle all of those issues the same way as far as sync
rep is concerned. For example, if the standby is restaring, you
probably want to wait instead of degrading.

There's also the issue that this patch, and necessarily any
walsender-level auto-degrade, has IMHO no safe way to resume sync
replication. This means that any use who has a network or storage blip
once a day (again, think AWS) would be constantly in degraded mode, even
though both the master and the replica are up and running -- and it will
come as a complete surprise to them when the lose the master and
discover that they've lost data.

This is why, as I've said, any auto-degrade patch needs to treat
auto-degrade as a major event, and alert users in all ways reasonable.
See my concrete proposals at the beginning of this email for what I mean.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-01-12 20:30:26 Re: BUG #8782: Segmentation Fault during initialization
Previous Message Pavel Stehule 2014-01-12 19:58:52 Re: proposal, patch: allow multiple plpgsql plugins