Re: Standalone synchronous master

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 21:04:17
Message-ID: 52D30351.2040401@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/12/2014 12:35 PM, Stephen Frost wrote:
> * Josh Berkus (josh(at)agliodbs(dot)com) wrote:
>> You don't want to handle all of those issues the same way as far as sync
>> rep is concerned. For example, if the standby is restaring, you
>> probably want to wait instead of degrading.
>
> *What*?! Certainly not in any kind of OLTP-type system; a system
> restart can easily take minutes. Clearly, you want to resume once the
> standby is back up, which I feel like the people against an auto-degrade
> mode are missing, but holding up a commit until the standby finishes
> rebooting isn't practical.

Well, then that becomes a reason to want better/more configurability.
In the couple of sync rep sites I admin, I *would* want to wait.

>> There's also the issue that this patch, and necessarily any
>> walsender-level auto-degrade, has IMHO no safe way to resume sync
>> replication. This means that any use who has a network or storage blip
>> once a day (again, think AWS) would be constantly in degraded mode, even
>> though both the master and the replica are up and running -- and it will
>> come as a complete surprise to them when the lose the master and
>> discover that they've lost data.
>
> I don't follow this logic at all- why is there no safe way to resume?
> You wait til the slave is caught up fully and then go back to sync mode.
> If that turns out to be an extended problem then an alarm needs to be
> raised, of course.

So, if you have auto-resume, how do you handle the "flaky network" case?
And how would an alarm be raised?

On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I know others have dismissed this idea as too "talky", but from my
>> perspective, the agreement with the client for each synchronous
>> commit is being violated, so each and every synchronous commit
>> should report failure to sync. Also, having a warning on every
>> commit would make it easier to troubleshoot degraded mode for users
>> who have ignored the other warnings we give them.
>
> I agree that every synchronous commit on a master which is configured
> for synchronous replication which returns without persisting the work
> of the transaction on both the (local) primary and a synchronous
> replica should issue a WARNING. That said, the API for some
> connectors (like JDBC) puts the burden on the application or its
> framework to check for warnings each time and do something reasonable
> if found; I fear that a Venn diagram of those shops which would use
> this new feature and those shops that don't rigorously look for and
> reasonably deal with warnings would have significant overlap.

Oh, no question. However, having such a WARNING would help with
interactive troubleshooting once a problem has been identified, and
that's my main reason for wanting it.

Imagine the case where you have auto-degrade and a flaky network. The
user would experience problems as performance problems; that is, some
commits take minutes on-again, off-again. They wouldn't necessarily
even LOOK at the sync rep settings. So next step is to try walking
through a sample transaction on the command line, and then the
DBA/consultant gets WARNING messages, which gives an idea where the real
problem lies.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-01-12 21:18:29 Re: Standalone synchronous master
Previous Message Kevin Grittner 2014-01-12 20:51:38 Re: Standalone synchronous master