Standalone synchronous master

Lists: pgsql-hackers
From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Standalone synchronous master
Date: 2013-11-13 13:09:19
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DD9A622@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This patch implements the following TODO item:

Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period
This would require some type of command to be executed to alert administrators of this change.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

This patch implementation is in the same line as it was given in the earlier thread.
Some Of the additional important changes are:

1. Have added two GUC variable to take commands from user to be executed

a. Master_to_standalone_cmd: To be executed before master switches to standalone mode.

b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode.

2. Master mode switch will happen only if the corresponding command executed successfully.

3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiry of

wal_sender_timeout, the master will switch from sync mode to standalone mode.

Please provide your opinion or any other expectation out of this patch.

I will add the same to November commitFest.

Thanks and Regards,
Kumar Rajeev Rastogi

Attachment Content-Type Size
replication_new_modeV1.patch application/octet-stream 14.2 KB

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 04:55:12
Message-ID: CAA4eK1K8CwnsH59ofm+spErd7Tthi=uZ_RSgw+VTkO9yRtMqBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 13, 2013 at 6:39 PM, Rajeev rastogi
<rajeev(dot)rastogi(at)huawei(dot)com> wrote:

> Add a new "eager" synchronous mode that starts out synchronous but reverts
> to asynchronous after a failure timeout period
>
> This would require some type of command to be executed to alert
> administrators of this change.
>
> http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
> This patch implementation is in the same line as it was given in the earlier
> thread.
>
> Some Of the additional important changes are:
>
> 1. Have added two GUC variable to take commands from user to be
> executed
>
> a. Master_to_standalone_cmd: To be executed before master switches to
> standalone mode.
>
> b. Master_to_sync_cmd: To be executed before master switches from sync
> mode to standalone mode.

In description of both switches (a & b), you are telling that it
will switch to
standalone mode, I think by your point 1b. you mean to say other way
(switch from standalone to sync mode).

Instead of getting commands, why can't we just log such actions? I think
adding 3 new guc variables for this functionality seems to be bit high.

Also what will happen when it switches to standalone mode incase there
are some async standby's already connected to it before going to
standalone mode, if it continues to send data then I think naming it as
'enable_standalone_master' is not good.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 06:57:31
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DDB6749@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8th Jan, 2014, Amit Kapila Wrote
>
> > Add a new "eager" synchronous mode that starts out synchronous but
> > reverts to asynchronous after a failure timeout period
> >
> > This would require some type of command to be executed to alert
> > administrators of this change.
> >
> > http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
> > This patch implementation is in the same line as it was given in the
> > earlier thread.
> >
> > Some Of the additional important changes are:
> >
> > 1. Have added two GUC variable to take commands from user to be
> > executed
> >
> > a. Master_to_standalone_cmd: To be executed before master
> switches to
> > standalone mode.
> >
> > b. Master_to_sync_cmd: To be executed before master switches
> from sync
> > mode to standalone mode.
>
> In description of both switches (a & b), you are telling that it
> will switch to
> standalone mode, I think by your point 1b. you mean to say other way
> (switch from standalone to sync mode).

Yes you are right. Its typo mistake.

> Instead of getting commands, why can't we just log such actions? I
> think
> adding 3 new guc variables for this functionality seems to be bit
> high.

Actually in earlier discussion as well as in TODO added, it is mentioned
to execute some kind of command to be executed to alert administrator.
http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php

In my current patch, I have kept the LOG along with command.

> Also what will happen when it switches to standalone mode incase
> there
> are some async standby's already connected to it before going to
> standalone mode, if it continues to send data then I think naming it
> as
> 'enable_standalone_master' is not good.

Yes we can change name to something more appropriate, some of them are:
1. enable_async_master
2. sync_standalone_master
3. enable_nowait_master
4. enable_nowait_resp_master

Please provide your suggestion on above name or any other?.

Thanks and Regards,
Kumar Rajeev Rastogi


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 09:07:48
Message-ID: 52CD1564.6060504@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/13/2013 03:09 PM, Rajeev rastogi wrote:
> This patch implements the following TODO item:
>
> Add a new "eager" synchronous mode that starts out synchronous but reverts to asynchronous after a failure timeout period
> This would require some type of command to be executed to alert administrators of this change.
> http://archives.postgresql.org/pgsql-hackers/2011-12/msg01224.php
>
> This patch implementation is in the same line as it was given in the earlier thread.
> Some Of the additional important changes are:
>
> 1. Have added two GUC variable to take commands from user to be executed
>
> a. Master_to_standalone_cmd: To be executed before master switches to standalone mode.
>
> b. Master_to_sync_cmd: To be executed before master switches from sync mode to standalone mode.
>
> 2. Master mode switch will happen only if the corresponding command executed successfully.
>
> 3. Taken care of replication timeout to decide whether synchronous standby has gone down. i.e. only after expiry of
>
> wal_sender_timeout, the master will switch from sync mode to standalone mode.
>
> Please provide your opinion or any other expectation out of this patch.

I'm going to say right off the bat that I think the whole notion to
automatically disable synchronous replication when the standby goes down
is completely bonkers. If you don't need the strong guarantee that your
transaction is safe in at least two servers before it's acknowledged to
the client, there's no point enabling synchronous replication in the
first place. If you do need it, then you shouldn't fall back to a
degraded mode, at least not automatically. It's an idea that keeps
coming back, but I have not heard a convincing argument why it makes
sense. It's been discussed many times before, most recently in that
thread you linked to.

Now that I got that out of the way, I concur that some sort of hooks or
commands that fire when a standby goes down or comes back up makes
sense, for monitoring purposes. I don't much like this particular
design. If you just want to write log entry, when all the standbys are
disconnected, running a shell command seems like an awkward interface.
It's OK for raising an alarm, but there are many other situations where
you might want to raise alarms, so I'd rather have us implement some
sort of a generic trap system, instead of adding this one particular
extra config option. What do people usually use to monitor replication?

There are two things we're trying to solve here: raising an alarm when
something interesting happens, and changing the configuration to
temporarily disable synchronous replication. What would be a good API to
disable synchronous replication? Editing the config file and SIGHUPing
is not very nice. There's been talk of an ALTER command to change the
config, but I'm not sure that's a very good API either. Perhaps expose
the sync_master_in_standalone_mode variable you have in your patch to
new SQL-callable functions. Something like:

pg_disable_synchronous_replication()
pg_enable_synchronous_replication()

I'm not sure where that state would be stored. Should it persist
restarts? And you probably should get some sort of warnings in the log
when synchronous replication is disabled.

In summary, more work is required to design a good
user/admin/programming interface. Let's hear a solid proposal for that,
before writing patches.

BTW, calling an external command with system(), while holding
SyncRepLock in exclusive-mode, seems like a bad idea. For starters,
holding a lock will prevent a new WAL sender from starting up and
becoming a synchronous standby, and the external command might take a
long time to return.

- Heikki


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 09:48:57
Message-ID: 20140108094857.GM14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 11:07:48 +0200, Heikki Linnakangas wrote:
> I'm going to say right off the bat that I think the whole notion to
> automatically disable synchronous replication when the standby goes down is
> completely bonkers. If you don't need the strong guarantee that your
> transaction is safe in at least two servers before it's acknowledged to the
> client, there's no point enabling synchronous replication in the first
> place.

I think that's likely caused by the misconception that synchronous
replication is synchronous in apply, not just remote write/fsync. I have
now seen several sites that assumed that and just set up sync rep to
maintain that goal to then query standbys instead of the primary after
the commit finished.
If that assumption were true, supporting a timeout that way would
possibly be helpful, but it is not atm...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 17:39:23
Message-ID: CA+U5nMLpUcqsL0oyjnd68W0e6o00r8vQ=DNk4H8s4zeOFhbnwQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:

> I'm going to say right off the bat that I think the whole notion to
> automatically disable synchronous replication when the standby goes down is
> completely bonkers.

Agreed

We had this discussion across 3 months and we don't want it again.
This should not have been added as a TODO item.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 20:27:51
Message-ID: 20140108202751.GA6869@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote:
> On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>
> > I'm going to say right off the bat that I think the whole notion to
> > automatically disable synchronous replication when the standby goes down is
> > completely bonkers.
>
> Agreed
>
> We had this discussion across 3 months and we don't want it again.
> This should not have been added as a TODO item.

I am glad Heikki and Simon agree, but I don't. ;-)

The way that I understand it is that you might want durability, but
might not want to sacrifice availability. Phrased that way, it makes
sense, and notifying the administrator seems the appropriate action.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 20:39:37
Message-ID: 13F87FDF-236D-4A06-BD40-E583ADA4CD6E@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jan 8, 2014, at 9:27 PM, Bruce Momjian wrote:

> On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote:
>> On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>>
>>> I'm going to say right off the bat that I think the whole notion to
>>> automatically disable synchronous replication when the standby goes down is
>>> completely bonkers.
>>
>> Agreed
>>
>> We had this discussion across 3 months and we don't want it again.
>> This should not have been added as a TODO item.
>
> I am glad Heikki and Simon agree, but I don't. ;-)
>
> The way that I understand it is that you might want durability, but
> might not want to sacrifice availability. Phrased that way, it makes
> sense, and notifying the administrator seems the appropriate action.
>

technically and conceptually i agree with andres and simon but from daily experience i would say that we should make it configurable.
some people got some nasty experiences when their systems stopped working.

+1 for a GUC to control this one.

many thanks,

hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 20:46:51
Message-ID: 52CDB93B.5060506@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 10:27 PM, Bruce Momjian wrote:
> On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote:
>> On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
>>
>>> I'm going to say right off the bat that I think the whole notion to
>>> automatically disable synchronous replication when the standby goes down is
>>> completely bonkers.
>>
>> Agreed
>>
>> We had this discussion across 3 months and we don't want it again.
>> This should not have been added as a TODO item.
>
> I am glad Heikki and Simon agree, but I don't. ;-)
>
> The way that I understand it is that you might want durability, but
> might not want to sacrifice availability. Phrased that way, it makes
> sense, and notifying the administrator seems the appropriate action.

They want to have the cake and eat it too. But they're not actually
getting that. What they actually get is extra latency when things work,
with no gain in durability.

- Heikki


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:08:10
Message-ID: 20140108210810.GB6869@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 10:46:51PM +0200, Heikki Linnakangas wrote:
> On 01/08/2014 10:27 PM, Bruce Momjian wrote:
> >On Wed, Jan 8, 2014 at 05:39:23PM +0000, Simon Riggs wrote:
> >>On 8 January 2014 09:07, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> wrote:
> >>
> >>>I'm going to say right off the bat that I think the whole notion to
> >>>automatically disable synchronous replication when the standby goes down is
> >>>completely bonkers.
> >>
> >>Agreed
> >>
> >>We had this discussion across 3 months and we don't want it again.
> >>This should not have been added as a TODO item.
> >
> >I am glad Heikki and Simon agree, but I don't. ;-)
> >
> >The way that I understand it is that you might want durability, but
> >might not want to sacrifice availability. Phrased that way, it makes
> >sense, and notifying the administrator seems the appropriate action.
>
> They want to have the cake and eat it too. But they're not actually
> getting that. What they actually get is extra latency when things
> work, with no gain in durability.

They are getting guaranteed durability until they get a notification ---
that seems valuable. When they get the notification, they can
reevaluate if they want that tradeoff.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:34:08
Message-ID: 1389216848.30306.YahooMailNeo@web122303.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Heikki Linnakangas wrote:

>> They want to have the cake and eat it too. But they're not
>> actually getting that. What they actually get is extra latency
>> when things work, with no gain in durability.
>
> They are getting guaranteed durability until they get a
> notification --- that seems valuable.  When they get the
> notification, they can reevaluate if they want that tradeoff.

My first reaction to this has been that if you want synchronous
replication without having the system wait if the synchronous
target goes down, you should configure an alternate target.  With
the requested change we can no longer state that when a COMMIT
returns with an indication of success that the data has been
persisted to multiple clusters.  We would be moving to a situation
where the difference between synchronous is subtle -- either way
the data may or may not be on a second cluster by the time the
committer is notified of success.  We wait up to some threshold
time to try to make the success indication indicate that, but then
return success even if the guarantee has not been provided, without
any way for the committer to know the difference.

On the other hand, we keep getting people saying they want the
database to make the promise of synchronous replication, and tell
applications that it has been successful even when it hasn't been,
as long as there's a line in the server log to record the lie.  Or,
more likely, to record the boundaries of time blocks where it has
been a lie.  This appears to be requested because other products
behave that way.

I'm torn on whether we should cave to popular demand on this; but
if we do, we sure need to be very clear in the documentation about
what a successful return from a commit request means.  Sooner or
later, Murphy's Law being what it is, if we do this someone will
lose the primary and blame us because the synchronous replica is
missing gobs of transactions that were successfully committed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:37:59
Message-ID: 20140108213759.GP14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:
> On the other hand, we keep getting people saying they want the
> database to make the promise of synchronous replication, and tell
> applications that it has been successful even when it hasn't been,
> as long as there's a line in the server log to record the lie.

Most people having such a position I've talked to have held that
position because they thought synchronous replication would mean that
apply (and thus visibility) would also be synchronous. Is that
different from your experience?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:40:56
Message-ID: 17515.1389217256@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
> I'm torn on whether we should cave to popular demand on this; but
> if we do, we sure need to be very clear in the documentation about
> what a successful return from a commit request means. Sooner or
> later, Murphy's Law being what it is, if we do this someone will
> lose the primary and blame us because the synchronous replica is
> missing gobs of transactions that were successfully committed.

I'm for not caving. I think people who are asking for this don't
actually understand what they'd be getting.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:44:20
Message-ID: 52CDC6B4.1060805@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 12:27 PM, Bruce Momjian wrote:
> I am glad Heikki and Simon agree, but I don't. ;-)
>
> The way that I understand it is that you might want durability, but
> might not want to sacrifice availability. Phrased that way, it makes
> sense, and notifying the administrator seems the appropriate action.

I think there's a valid argument to want things the other way, but I
find the argument not persuasive. In general, people who want
auto-degrade for sync rep either:

a) don't understand what sync rep actually does (lots of folks confuse
synchronous with simultaneous), or

b) want more infrastructure than we actually have around managing sync
replicas

Now, the folks who want (b) have a legitimate need, and I'll point out
that we always planned to have more features around sync rep, it's just
that we never actually worked on any. For example, "quorum sync" was
extensively discussed and originally projected for 9.2, only certain
hackers changed jobs and interests.

If we just did the minimal change, that is, added an "auto-degrade" GUC
and an alert to the logs each time the master server went into degraded
mode, as Heikki says we'd be loading a big foot-gun for a bunch of
ill-informed DBAs. People who want that are really much better off with
async rep in the first place.

If we really want auto-degrading sync rep, then we'd (at a minimum) need
a way to determine *from the replica* whether or not it was in degraded
mode when the master died. What good do messages to the master log do
you if the master no longer exists?

Mind you, being able to determine on the replica whether it was
synchronous or not when it lost communication with the master would be a
great feature to have for sync rep groups as well, and would make them
practical (right now, they're pretty useless). However, I seriously
doubt that someone is going to code that up in the next 5 days.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:44:31
Message-ID: 52CDC6BF.80605@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2014 01:34 PM, Kevin Grittner wrote:

> I'm torn on whether we should cave to popular demand on this; but
> if we do, we sure need to be very clear in the documentation about
> what a successful return from a commit request means. Sooner or
> later, Murphy's Law being what it is, if we do this someone will
> lose the primary and blame us because the synchronous replica is
> missing gobs of transactions that were successfully committed.

I am trying to follow this thread and perhaps I am just being dense but
it seems to me that:

If you are running synchronous replication, as long as the target
(subscriber) is up, synchronous replication operates as it should. That
is that the origin will wait for a notification from the subscriber that
the write has been successful before continuing.

However, if the subscriber is down, the origin should NEVER wait. That
is just silly behavior and makes synchronous replication pretty much
useless. Machines go down, that is the nature of things. Yes, we should
log and log loudly if the subscriber is down:

ERROR: target xyz is non-communicative: switching to async replication.

We then should store the wal logs up to wal_keep_segments.

When the subscriber comes back up, it will then replicate in async mode
until the two are back in sync and then switch (perhaps by hand) to sync
mode. This of course assumes that we have a valid database on the
subscriber and we have not overrun wal_keep_segments.

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:49:10
Message-ID: 52CDC7D6.3020306@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 11:37 PM, Andres Freund wrote:
> On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:
>> On the other hand, we keep getting people saying they want the
>> database to make the promise of synchronous replication, and tell
>> applications that it has been successful even when it hasn't been,
>> as long as there's a line in the server log to record the lie.
>
> Most people having such a position I've talked to have held that
> position because they thought synchronous replication would mean that
> apply (and thus visibility) would also be synchronous.

And I totally agree that it would be a useful mode if apply was
synchronous. You could then build a master-standby pair where it's
guaranteed that when you commit a transaction in the master, it's
thereafter always seen as committed in the standby too. In that usage,
if the link between the two is broken, you could set up timeouts e.g so
that the standby stops accepting new queries after 20 seconds, and then
the master proceeds without the standby after 25 seconds. Then the
guarantee would hold.

I don't know if the people asking for the fallback mode are thinking
that synchronous replication means synchronous apply, or if they're
trying to have the cake and eat it too wrt. durability and availability.

Synchronous apply would be cool..

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:49:58
Message-ID: 17702.1389217798@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> If we really want auto-degrading sync rep, then we'd (at a minimum) need
> a way to determine *from the replica* whether or not it was in degraded
> mode when the master died. What good do messages to the master log do
> you if the master no longer exists?

How would it be possible for a replica to know whether the master had
committed more transactions while communication was lost, if the master
dies without ever restoring communication? It sounds like pie in the
sky from here ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:55:35
Message-ID: 17815.1389218135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> However, if the subscriber is down, the origin should NEVER wait. That
> is just silly behavior and makes synchronous replication pretty much
> useless. Machines go down, that is the nature of things. Yes, we should
> log and log loudly if the subscriber is down:

> ERROR: target xyz is non-communicative: switching to async replication.

> We then should store the wal logs up to wal_keep_segments.

> When the subscriber comes back up, it will then replicate in async mode
> until the two are back in sync and then switch (perhaps by hand) to sync
> mode. This of course assumes that we have a valid database on the
> subscriber and we have not overrun wal_keep_segments.

It sounds to me like you are describing the existing behavior of async
mode, with the possible exception of exactly what shows up in the
postmaster log.

Sync mode is about providing a guarantee that the data exists on more than
one server *before* we tell the client it's committed. If you don't need
that guarantee, you shouldn't be using sync mode. If you do need it,
it's not clear to me why you'd suddenly not need it the moment the going
actually gets tough.

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 21:57:20
Message-ID: 1389218240.48449.YahooMailNeo@web122303.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-01-08 13:34:08 -0800, Kevin Grittner wrote:
>
>> On the other hand, we keep getting people saying they want the
>> database to make the promise of synchronous replication, and
>> tell applications that it has been successful even when it
>> hasn't been, as long as there's a line in the server log to
>> record the lie.
>
> Most people having such a position I've talked to have held that
> position because they thought synchronous replication would mean
> that apply (and thus visibility) would also be synchronous. Is
> that different from your experience?

I haven't pursued it that far because we don't have
maybe-synchronous mode yet and seem unlikely to ever support it.
I'm not sure why that use-case is any better than any other.  You
still would never really know whether the data read is current.  If
we were to implement this, the supposedly synchronous replica could
be out-of-date by any arbitrary amount of time (from milliseconds
to months).  (Consider what could happen if the replication
connection authorizations got messed up while application
connections to the replica were fine.)

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:23:34
Message-ID: 52CDCFE6.40105@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2014 01:55 PM, Tom Lane wrote:

> Sync mode is about providing a guarantee that the data exists on more than
> one server *before* we tell the client it's committed. If you don't need
> that guarantee, you shouldn't be using sync mode. If you do need it,
> it's not clear to me why you'd suddenly not need it the moment the going
> actually gets tough.

As I understand it what is being suggested is that if a subscriber or
target goes down, then the master will just sit there and wait. When I
read that, I read that the master will no longer process write
transactions. If I am wrong in that understanding then cool. If I am not
then that is a serious problem with a production scenario. There is an
expectation that a master will continue to function if the target is
down, synchronous or not.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:34:56
Message-ID: 20140108223456.GQ14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 14:23:34 -0800, Joshua D. Drake wrote:
>
> On 01/08/2014 01:55 PM, Tom Lane wrote:
>
> >Sync mode is about providing a guarantee that the data exists on more than
> >one server *before* we tell the client it's committed. If you don't need
> >that guarantee, you shouldn't be using sync mode. If you do need it,
> >it's not clear to me why you'd suddenly not need it the moment the going
> >actually gets tough.
>
> As I understand it what is being suggested is that if a subscriber or target
> goes down, then the master will just sit there and wait. When I read that, I
> read that the master will no longer process write transactions. If I am
> wrong in that understanding then cool. If I am not then that is a serious
> problem with a production scenario. There is an expectation that a master
> will continue to function if the target is down, synchronous or not.

I don't think you've understood synchronous replication. There wouldn't
be *any* benefit to using it if it worked the way you wish since there
wouldn't be any additional guarantees. A single reconnect of the
streaming rep connection, without any permanent outage, would
potentially lead to data loss if the primary crashed in the wrong
moment.
So you'd buy no guarantees with a noticeable loss in performance.

Just use async mode if you want things work like that.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:42:37
Message-ID: 52CDD45D.2000100@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2014 02:34 PM, Andres Freund wrote:

> I don't think you've understood synchronous replication. There wouldn't
> be *any* benefit to using it if it worked the way you wish since there
> wouldn't be any additional guarantees. A single reconnect of the
> streaming rep connection, without any permanent outage, would
> potentially lead to data loss if the primary crashed in the wrong
> moment.
> So you'd buy no guarantees with a noticeable loss in performance.
>
> Just use async mode if you want things work like that.

Well no. That isn't what I am saying. Consider the following scenario:

db0->db1 in synchronous mode

The idea is that we know that data on db0 is not written until we know
for a fact that db1 also has that data. That is great and a guarantee of
data integrity between the two nodes.

If we have the following:

db0->db1:down

Using the model (as I understand it) that is being discussed we have
increased our failure rate because the moment db1:down we also lose db0.
The node db0 may be up but if it isn't going to process transactions it
is useless. I can tell you that I have exactly 0 customers that would
want that model because a single node failure would cause a double node
failure.

All the other stuff with wal_keep_segments is just idea throwing. I
don't care about that at this point. What I care about specifically is
that a single node failure regardless of replication mode should not be
able to (automatically) stop the operation of the master node.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:46:03
Message-ID: 20140108224603.GR14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
>
> On 01/08/2014 02:34 PM, Andres Freund wrote:
>
> >I don't think you've understood synchronous replication. There wouldn't
> >be *any* benefit to using it if it worked the way you wish since there
> >wouldn't be any additional guarantees. A single reconnect of the
> >streaming rep connection, without any permanent outage, would
> >potentially lead to data loss if the primary crashed in the wrong
> >moment.
> >So you'd buy no guarantees with a noticeable loss in performance.
> >
> >Just use async mode if you want things work like that.
>
> Well no. That isn't what I am saying. Consider the following scenario:
>
> db0->db1 in synchronous mode
>
> The idea is that we know that data on db0 is not written until we know for a
> fact that db1 also has that data. That is great and a guarantee of data
> integrity between the two nodes.

That guarantee is never there. The only thing guaranteed is that the
client isn't notified of the commit until db1 has received the data.

> If we have the following:
>
> db0->db1:down
>
> Using the model (as I understand it) that is being discussed we have
> increased our failure rate because the moment db1:down we also lose db0. The
> node db0 may be up but if it isn't going to process transactions it is
> useless. I can tell you that I have exactly 0 customers that would want that
> model because a single node failure would cause a double node failure.

That's why you should configure a second standby as another (candidate)
synchronous replica, also listed in synchronous_standby_names.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:49:59
Message-ID: 18869.1389221399@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
> On 01/08/2014 01:55 PM, Tom Lane wrote:
>> Sync mode is about providing a guarantee that the data exists on more than
>> one server *before* we tell the client it's committed. If you don't need
>> that guarantee, you shouldn't be using sync mode. If you do need it,
>> it's not clear to me why you'd suddenly not need it the moment the going
>> actually gets tough.

> As I understand it what is being suggested is that if a subscriber or
> target goes down, then the master will just sit there and wait. When I
> read that, I read that the master will no longer process write
> transactions. If I am wrong in that understanding then cool. If I am not
> then that is a serious problem with a production scenario. There is an
> expectation that a master will continue to function if the target is
> down, synchronous or not.

Then you don't understand the point of sync mode, and you shouldn't be
using it. The point is *exactly* to refuse to commit transactions unless
we can guarantee the data's been replicated.

There might be other interpretations of "synchronous replication" in which
it makes sense to continue accepting transactions whether or not there are
any up-to-date replicas; but in the meaning Postgres ascribes to the term,
it does not make sense. You should just use async mode if that behavior
is what you want.

Possibly we need to rename "synchronous replication", or document it
better. And I don't have any objection in principle to developing
additional replication modes that offer different sets of guarantees and
performance tradeoffs. But for the synchronous mode that we've got, the
proposed switch is insane, and asking for it merely proves that you don't
understand the difference between async and sync modes.

regards, tom lane


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:52:07
Message-ID: 52CDD697.7000800@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2014 02:46 PM, Andres Freund wrote:

>> db0->db1 in synchronous mode
>>
>> The idea is that we know that data on db0 is not written until we know for a
>> fact that db1 also has that data. That is great and a guarantee of data
>> integrity between the two nodes.
>
> That guarantee is never there. The only thing guaranteed is that the
> client isn't notified of the commit until db1 has received the data.

Well ugh on that.. but that is for another reply.

>
> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.

I don't have a response to this that does not involve a great deal of
sarcasm.

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:54:09
Message-ID: 18968.1389221649@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
>> Using the model (as I understand it) that is being discussed we have
>> increased our failure rate because the moment db1:down we also lose db0. The
>> node db0 may be up but if it isn't going to process transactions it is
>> useless. I can tell you that I have exactly 0 customers that would want that
>> model because a single node failure would cause a double node failure.

> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.

Right. If you want to tolerate one node failure, *and* have a guarantee
that committed data is on at least two nodes, you need at least three
nodes. Simple arithmetic. If you only have two nodes, you only get to
have one of those properties.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:56:37
Message-ID: 20140108225637.GH2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.

Perhaps we should stress in the docs that this is, in fact, the *only*
reasonable mode in which to run with sync rep on? Where there are
multiple replicas, because otherwise Drake is correct that you'll just
end up having both nodes go offline if the slave fails.

Thanks,

Stephen


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:58:16
Message-ID: 52CDD808.5090302@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2014 02:49 PM, Tom Lane wrote:

> Then you don't understand the point of sync mode, and you shouldn't be
> using it. The point is *exactly* to refuse to commit transactions unless
> we can guarantee the data's been replicated.

I understand exactly that and I don't disagree, except in the case where
it is going to bring down the master (see my further reply). I now
remember arguing about this a few years ago when we started down the
sync path.

Anyway, perhaps this is just something of a knob that can be turned. We
don't have to continue the argument. Thank you for considering what I
was saying.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 22:58:51
Message-ID: 20140108225851.GS14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
>
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on? Where there are
> multiple replicas, because otherwise Drake is correct that you'll just
> end up having both nodes go offline if the slave fails.

Which, as it happens, is actually documented.

http://www.postgresql.org/docs/devel/static/warm-standby.html#SYNCHRONOUS-REPLICATION
25.2.7.3. Planning for High Availability

"Commits made when synchronous_commit is set to on or remote_write will
wait until the synchronous standby responds. The response may never
occur if the last, or only, standby should crash.

The best solution for avoiding data loss is to ensure you don't lose
your last remaining synchronous standby. This can be achieved by naming
multiple potential synchronous standbys using
synchronous_standby_names. The first named standby will be used as the
synchronous standby. Standbys listed after this will take over the role
of synchronous standby if the first one should fail."

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:00:41
Message-ID: 52CDD899.5030209@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 01:49 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> If we really want auto-degrading sync rep, then we'd (at a minimum) need
>> a way to determine *from the replica* whether or not it was in degraded
>> mode when the master died. What good do messages to the master log do
>> you if the master no longer exists?
>
> How would it be possible for a replica to know whether the master had
> committed more transactions while communication was lost, if the master
> dies without ever restoring communication? It sounds like pie in the
> sky from here ...

Oh, right. Because the main reason for a sync replica degrading is that
it's down. In which case it isn't going to record anything. This would
still be useful for sync rep candidates, though, and I'll document why
below. But first, lemme demolish the case for auto-degrade.

So here's the case that we can't possibly solve for auto-degrade.
Anyone who wants auto-degrade needs to come up with a solution for this
case as a first requirement:

1. A data center network/power event starts.

2. The sync replica goes down.

3. A short time later, the master goes down.

4. Data center power is restored.

5. The master is fried and is a permanent loss. The replica is ok, though.

Question: how does the DBA know whether data has been lost or not?

With current sync rep, it's easy: no data was lost, because the master
stopped accepting writes once the replica went down. If we support
auto-degrade, though, there's no way to know; the replica doesn't have
that information, and anything which was on the master is permanently
lost. And the point several people have made is: if you can live with
indeterminancy, then you're better off with async rep in the first place.

Now, what we COULD definitely use is a single-command way of degrading
the master when the sync replica is down. Something like "ALTER SYSTEM
DEGRADE SYNC". Right now you have to push a change to the conf file and
reload, and there's no way to salvage the transaction which triggered
the sync failure. This would be a nice 9.5 feature.

HOWEVER, we've already kind of set up an indeterminate situation with
allowing sync rep groups and candidate sync rep servers. Consider this:

1. Master server A is configured with sync replica B and candidate sync
replica C

2. A rolling power/network failure event occurs, which causes B and C to
go down sometime before A, and all of them to go down before the
application does.

3. On restore, only C is restorable; both A and B are a total loss.

Again, we have no way to know whether or not C was in sync replication
when it went down. If C went down before B, then we've lost data; if B
went down before C, we haven't. But we can't find out. *This* is where
it would be useful to have C log whenever it went into (or out of)
synchronous mode.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:04:38
Message-ID: 20140108230438.GT14280@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-08 14:52:07 -0800, Joshua D. Drake wrote:
> On 01/08/2014 02:46 PM, Andres Freund wrote:
> >>The idea is that we know that data on db0 is not written until we know for a
> >>fact that db1 also has that data. That is great and a guarantee of data
> >>integrity between the two nodes.
> >
> >That guarantee is never there. The only thing guaranteed is that the
> >client isn't notified of the commit until db1 has received the data.
>
> Well ugh on that.. but that is for another reply.

You do realize that locally you have the same guarantees? If the client
didn't receive a reply to a COMMIT you won't know whether the tx
committed or not. If that's not sufficient you need to use 2pc and a
transaction manager.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:05:47
Message-ID: 20140108230547.GI2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
> > * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > > That's why you should configure a second standby as another (candidate)
> > > synchronous replica, also listed in synchronous_standby_names.
> >
> > Perhaps we should stress in the docs that this is, in fact, the *only*
> > reasonable mode in which to run with sync rep on? Where there are
> > multiple replicas, because otherwise Drake is correct that you'll just
> > end up having both nodes go offline if the slave fails.
>
> Which, as it happens, is actually documented.

I'm aware, my point was simply that we should state, up-front in
25.2.7.3 *and* where we document synchronous_standby_names, that it
requires at least three servers to be involved to be a workable
solution.

Perhaps we should even log a warning if only one value is found in
synchronous_standby_names...

Thanks,

Stephen


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:15:21
Message-ID: 52CDDC09.50301@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen,

> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.

It's a workable solution with 2 servers. That's a "low-availability,
high-integrity" solution; the user has chosen to double their risk of
not accepting writes against never losing a write. That's a perfectly
valid configuration, and I believe that NTT runs several applications
this way.

In fact, that can already be looked at as a kind of "auto-degrade" mode:
if there aren't two nodes, then the database goes read-only.

Might I also point out that transactions are synchronous or not
individually? The sensible configuration is for only the important
writes being synchronous -- in which case auto-degrade makes even less
sense.

I really think that demand for auto-degrade is coming from users who
don't know what sync rep is for in the first place. The fact that other
vendors are offering auto-degrade as a feature instead of the ginormous
foot-gun it is adds to the confusion, but we can't help that.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:17:40
Message-ID: 19484.1389223060@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.

It only requires that if your requirements include both redundant
data storage and tolerating single-node failure. Now admittedly,
most people who want replication want it so they can have failure
tolerance, but I don't think it's insane to say that you want to
stop accepting writes if either node of a 2-node server drops out.
If you can only afford two nodes, and you need guaranteed redundancy
for business reasons, then that's where you end up.

Or in short, I'm against throwing warnings for this kind of setup.
I do agree that we need some doc improvements, since this is
evidently not clear enough yet.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:18:26
Message-ID: 20140108231826.GJ2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh,

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> > I'm aware, my point was simply that we should state, up-front in
> > 25.2.7.3 *and* where we document synchronous_standby_names, that it
> > requires at least three servers to be involved to be a workable
> > solution.
>
> It's a workable solution with 2 servers. That's a "low-availability,
> high-integrity" solution; the user has chosen to double their risk of
> not accepting writes against never losing a write. That's a perfectly
> valid configuration, and I believe that NTT runs several applications
> this way.

I really don't agree with that when the standby going offline can take
out the master. Note that I didn't say we shouldn't allow it, but I
don't think we should accept that it's a real-world solution.

> I really think that demand for auto-degrade is coming from users who
> don't know what sync rep is for in the first place. The fact that other
> vendors are offering auto-degrade as a feature instead of the ginormous
> foot-gun it is adds to the confusion, but we can't help that.

Do you really feel that a WARNING and increasing the docs to point
out that three systems are necessary, particularly under the 'high
availability' documentation and options, is a bad idea? I fail to see
how that does anything but clarify the use-case for our users.

Thanks,

Stephen


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:24:34
Message-ID: 52CDDE32.1050104@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 03:18 PM, Stephen Frost wrote:
> Do you really feel that a WARNING and increasing the docs to point
> out that three systems are necessary, particularly under the 'high
> availability' documentation and options, is a bad idea? I fail to see
> how that does anything but clarify the use-case for our users.

I think the warning is dumb, and that the suggested documentation change
is insufficient. If we're going to clarify things, then we need to have
a full-on several-page doc showing several examples of different sync
rep configurations and explaining their tradeoffs (including the
different sync modes and per-transaction sync). Anything short of that
is just going to muddy the waters further.

Mind you, someone needs to take a machete to the HA section of the docs
anyway.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:27:21
Message-ID: 19681.1389223641@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> HOWEVER, we've already kind of set up an indeterminate situation with
> allowing sync rep groups and candidate sync rep servers. Consider this:

> 1. Master server A is configured with sync replica B and candidate sync
> replica C

> 2. A rolling power/network failure event occurs, which causes B and C to
> go down sometime before A, and all of them to go down before the
> application does.

> 3. On restore, only C is restorable; both A and B are a total loss.

> Again, we have no way to know whether or not C was in sync replication
> when it went down. If C went down before B, then we've lost data; if B
> went down before C, we haven't. But we can't find out. *This* is where
> it would be useful to have C log whenever it went into (or out of)
> synchronous mode.

Good point, but C can't solve this for you just by logging. If C was the
first to go down, it has no way to know whether A and B committed more
transactions before dying; and it's unlikely to have logged its own crash,
either.

More fundamentally, if you want to survive the failure of M out of N
nodes, you need a sync configuration that guarantees data is on at least
M+1 nodes before reporting commit. The above example doesn't meet that,
so it's not surprising that you're screwed.

What we lack, and should work on, is a way for sync mode to have M larger
than one. AFAICS, right now we'll report commit as soon as there's one
up-to-date replica, and some high-reliability cases are going to want
more.

regards, tom lane


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:51:00
Message-ID: CAMkU=1wDnisbrPCZjcSLuvt2=6_8UVfrFFc1rk5hHt27i6esPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 2:23 PM, Joshua D. Drake <jd(at)commandprompt(dot)com>wrote:

>
> On 01/08/2014 01:55 PM, Tom Lane wrote:
>
> Sync mode is about providing a guarantee that the data exists on more than
>> one server *before* we tell the client it's committed. If you don't need
>> that guarantee, you shouldn't be using sync mode. If you do need it,
>> it's not clear to me why you'd suddenly not need it the moment the going
>> actually gets tough.
>>
>
> As I understand it what is being suggested is that if a subscriber or
> target goes down, then the master will just sit there and wait. When I read
> that, I read that the master will no longer process write transactions. If
> I am wrong in that understanding then cool. If I am not then that is a
> serious problem with a production scenario. There is an expectation that a
> master will continue to function if the target is down, synchronous or not.
>

My expectation is that the master stops writing checks when it finds it can
no longer cash them.

Cheers,

Jeff


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-08 23:56:06
Message-ID: 52CDE596.3040409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 03:27 PM, Tom Lane wrote:
> Good point, but C can't solve this for you just by logging. If C was the
> first to go down, it has no way to know whether A and B committed more
> transactions before dying; and it's unlikely to have logged its own crash,
> either.

Sure. But if we *knew* that C was not in synchronous mode when it went
down, then we'd expect some data loss. As you point out, though, the
converse is not true; even if C was in sync mode, we don't know that
there's been no data loss, since B could come back up as a sync replica
before going down again.

> What we lack, and should work on, is a way for sync mode to have M larger
> than one. AFAICS, right now we'll report commit as soon as there's one
> up-to-date replica, and some high-reliability cases are going to want
> more.

Yeah, we talked about having this when sync rep originally went in. It
involves a LOT more bookeeping on the master though, which is why nobody
has been willing to attempt it -- and why we went with the
single-replica solution in the first place. Especially since most
people who want "quorum sync" really want MM replication anyway.

"Sync N times" is really just a guarantee against data loss as long as
you lose N-1 servers or fewer. And it becomes an even
lower-availability solution if you don't have at least N+1 replicas.
For that reason, I'd like to see some realistic actual user demand
before we take the idea seriously.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 00:03:55
Message-ID: CAMkU=1zd=jwUmu+wD9gPum3mJD6_27us7AfM+xPF7AN1t1Q6EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 2:56 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:

> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
>
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on?

I don't think it is the only reasonable way to run it. Most of the time
that the master can't communicate with rep1, it is because of a network
problem. So, the master probably can't talk to rep2 either, and adding the
second one doesn't really get you all that much in terms of availability.

Cheers,

Jeff


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 00:05:58
Message-ID: 28592.1389225958@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> On 01/08/2014 03:27 PM, Tom Lane wrote:
>> What we lack, and should work on, is a way for sync mode to have M larger
>> than one. AFAICS, right now we'll report commit as soon as there's one
>> up-to-date replica, and some high-reliability cases are going to want
>> more.

> "Sync N times" is really just a guarantee against data loss as long as
> you lose N-1 servers or fewer. And it becomes an even
> lower-availability solution if you don't have at least N+1 replicas.
> For that reason, I'd like to see some realistic actual user demand
> before we take the idea seriously.

Sure. I wasn't volunteering to implement it, just saying that what
we've got now is not designed to guarantee data survival across failure
of more than one server. Changing things around the margins isn't
going to improve such scenarios very much.

It struck me after re-reading your example scenario that the most
likely way to figure out what you had left would be to see if some
additional system (think Nagios monitor, or monitors) had records
of when the various database servers went down. This might be
what you were getting at when you said "logging", but the key point
is it has to be logging done on an external server that could survive
failure of the database server. postmaster.log ain't gonna do it.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 01:01:21
Message-ID: 52CDF4E1.8000604@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/8/14, 6:05 PM, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> >On 01/08/2014 03:27 PM, Tom Lane wrote:
>>> >>What we lack, and should work on, is a way for sync mode to have M larger
>>> >>than one. AFAICS, right now we'll report commit as soon as there's one
>>> >>up-to-date replica, and some high-reliability cases are going to want
>>> >>more.
>> >"Sync N times" is really just a guarantee against data loss as long as
>> >you lose N-1 servers or fewer. And it becomes an even
>> >lower-availability solution if you don't have at least N+1 replicas.
>> >For that reason, I'd like to see some realistic actual user demand
>> >before we take the idea seriously.
> Sure. I wasn't volunteering to implement it, just saying that what
> we've got now is not designed to guarantee data survival across failure
> of more than one server. Changing things around the margins isn't
> going to improve such scenarios very much.
>
> It struck me after re-reading your example scenario that the most
> likely way to figure out what you had left would be to see if some
> additional system (think Nagios monitor, or monitors) had records
> of when the various database servers went down. This might be
> what you were getting at when you said "logging", but the key point
> is it has to be logging done on an external server that could survive
> failure of the database server. postmaster.log ain't gonna do it.

Yeah, and I think that the logging command that was suggested allows for that *if configured correctly*.

Automatic degradation to async is useful for protecting you against all modes of a single failure: Master fails, you've got the replica. Replica fails, you've got the master.

But fit hits the shan as soon as you get a double failure, and that double failure can be very subtle. Josh's case is not subtle: You lost power AND the master died. You KNOW you have two failures.

But what happens if there's a network blip that's not large enough to notice (but large enough to degrade your replication) and the master dies? Now you have no clue if you've lost data.

Compare this to async: if the master goes down (one failure), you have zero clue if you lost data or not. At least with auto-degredation you know you have to have 2 failures to suffer data loss.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Treat <rob(at)xzilla(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 04:09:01
Message-ID: CABV9wwMcOQbb6YY-dF5qJ=B8di3xrco-bkrMvwj7A8o+jvqv3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Stephen,
>
>
>> I'm aware, my point was simply that we should state, up-front in
>> 25.2.7.3 *and* where we document synchronous_standby_names, that it
>> requires at least three servers to be involved to be a workable
>> solution.
>
> It's a workable solution with 2 servers. That's a "low-availability,
> high-integrity" solution; the user has chosen to double their risk of
> not accepting writes against never losing a write. That's a perfectly
> valid configuration, and I believe that NTT runs several applications
> this way.
>
> In fact, that can already be looked at as a kind of "auto-degrade" mode:
> if there aren't two nodes, then the database goes read-only.
>
> Might I also point out that transactions are synchronous or not
> individually? The sensible configuration is for only the important
> writes being synchronous -- in which case auto-degrade makes even less
> sense.
>
> I really think that demand for auto-degrade is coming from users who
> don't know what sync rep is for in the first place. The fact that other
> vendors are offering auto-degrade as a feature instead of the ginormous
> foot-gun it is adds to the confusion, but we can't help that.
>

I think the problem here is that we tend to have a limited view of
"the right way to use synch rep". If I have 5 nodes, and I set 1
synchronous and the other 3 asynchronous, I've set up a "known
successor" in the event that the leader fails. In this scenario
though, if the "successor" fails, you actually probably want to keep
accepting writes; since you weren't using synchronous for durability
but for operational simplicity. I suspect there are probably other
scenarios where users are willing to trade latency for improved and/or
directed durability but not at the extent of availability, don't you?

In fact there are entire systems that provide that type of thing. I
feel like it's worth mentioning that there's a nice primer on tunable
consistency in the Riak docs; strongly recommended.
http://docs.basho.com/riak/1.1.0/tutorials/fast-track/Tunable-CAP-Controls-in-Riak/.
I'm not entirely sure how well it maps into our problem space, but it
at least gives you a sane working model to think about. If you were
trying to explain the Postgres case, async is like the N value (I want
the data to end up on this many nodes eventually) and sync is like the
W value (it must be written to this many nodes, or it should fail). Of
course, we only offer an R = 1, W = 1 or 2, and N = all. And it's
worse than that, because we have golden nodes.

This isn't to say there isn't a lot of confusion around the issue.
Designing, implementing, and configuring different guarantees in the
presence of node failures is a non-trivial problem. Still, I'd prefer
to see Postgres head in the direction of providing more options in
this area rather than drawing a firm line at being a CP-oriented
system.

Robert Treat
play: xzilla.net
work: omniti.com


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Andres Freund" <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <kgrittn(at)ymail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 12:57:42
Message-ID: 7C5B6172D080441F8FD8171800713398@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
> On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
>> If we have the following:
>>
>> db0->db1:down
>>
>> Using the model (as I understand it) that is being discussed we have
>> increased our failure rate because the moment db1:down we also lose db0.
>> The
>> node db0 may be up but if it isn't going to process transactions it is
>> useless. I can tell you that I have exactly 0 customers that would want
>> that
>> model because a single node failure would cause a double node failure.
>
> That's why you should configure a second standby as another (candidate)
> synchronous replica, also listed in synchronous_standby_names.

Let me ask a (probably) stupid question. How is the sync rep different from
RAID-1?

When I first saw sync rep, I expected that it would provide the same
guarantees as RAID-1 in terms of durability (data is always mirrored on two
servers) and availability (if one server goes down, another server continues
full service).

The cost is reasonable with RAID-1. The sync rep requires high cost to get
both durability and availability --- three servers.

Am I expecting too much?

Regards
MauMau


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 14:26:07
Message-ID: 52CEB17F.4040807@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/09/2014 05:09 AM, Robert Treat wrote:
> On Wed, Jan 8, 2014 at 6:15 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Stephen,
>>
>>
>>> I'm aware, my point was simply that we should state, up-front in
>>> 25.2.7.3 *and* where we document synchronous_standby_names, that it
>>> requires at least three servers to be involved to be a workable
>>> solution.
>> It's a workable solution with 2 servers. That's a "low-availability,
>> high-integrity" solution; the user has chosen to double their risk of
>> not accepting writes against never losing a write. That's a perfectly
>> valid configuration, and I believe that NTT runs several applications
>> this way.
>>
>> In fact, that can already be looked at as a kind of "auto-degrade" mode:
>> if there aren't two nodes, then the database goes read-only.
>>
>> Might I also point out that transactions are synchronous or not
>> individually? The sensible configuration is for only the important
>> writes being synchronous -- in which case auto-degrade makes even less
>> sense.
>>
>> I really think that demand for auto-degrade is coming from users who
>> don't know what sync rep is for in the first place. The fact that other
>> vendors are offering auto-degrade as a feature instead of the ginormous
>> foot-gun it is adds to the confusion, but we can't help that.
>>
> I think the problem here is that we tend to have a limited view of
> "the right way to use synch rep". If I have 5 nodes, and I set 1
> synchronous and the other 3 asynchronous, I've set up a "known
> successor" in the event that the leader fails.
But there is no guarantee that the synchronous replica actually
is ahead of async ones.

> In this scenario
> though, if the "successor" fails, you actually probably want to keep
> accepting writes; since you weren't using synchronous for durability
> but for operational simplicity. I suspect there are probably other
> scenarios where users are willing to trade latency for improved and/or
> directed durability but not at the extent of availability, don't you?
>
Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 14:30:07
Message-ID: 52CEB26F.7030503@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/09/2014 12:05 AM, Stephen Frost wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
>> On 2014-01-08 17:56:37 -0500, Stephen Frost wrote:
>>> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
>>>> That's why you should configure a second standby as another (candidate)
>>>> synchronous replica, also listed in synchronous_standby_names.
>>> Perhaps we should stress in the docs that this is, in fact, the *only*
>>> reasonable mode in which to run with sync rep on? Where there are
>>> multiple replicas, because otherwise Drake is correct that you'll just
>>> end up having both nodes go offline if the slave fails.
>> Which, as it happens, is actually documented.
> I'm aware, my point was simply that we should state, up-front in
> 25.2.7.3 *and* where we document synchronous_standby_names, that it
> requires at least three servers to be involved to be a workable
> solution.
>
> Perhaps we should even log a warning if only one value is found in
> synchronous_standby_names...
You can have only one name in synchronous_standby_names and
have multiple slaves connecting with that name

Also, I can attest that I have had clients who want exactly that - a system
stop until admin intervention in case of a designated sync standby failing.

And they actually run more than one standby, they just want to make
sure that sync rep to 2nd data center always happens.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2014-01-09 14:33:06
Message-ID: 52CEB322.1050904@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/09/2014 01:57 PM, MauMau wrote:
> From: "Andres Freund" <andres(at)2ndquadrant(dot)com>
>> On 2014-01-08 14:42:37 -0800, Joshua D. Drake wrote:
>>> If we have the following:
>>>
>>> db0->db1:down
>>>
>>> Using the model (as I understand it) that is being discussed we have
>>> increased our failure rate because the moment db1:down we also lose
>>> db0. The
>>> node db0 may be up but if it isn't going to process transactions it is
>>> useless. I can tell you that I have exactly 0 customers that would
>>> want that
>>> model because a single node failure would cause a double node failure.
>>
>> That's why you should configure a second standby as another (candidate)
>> synchronous replica, also listed in synchronous_standby_names.
>
> Let me ask a (probably) stupid question. How is the sync rep
> different from RAID-1?
>
> When I first saw sync rep, I expected that it would provide the same
> guarantees as RAID-1 in terms of durability (data is always mirrored
> on two servers) and availability (if one server goes down, another
> server continues full service).
What you describe is most like A-sync rep.

Sync rep makes sure that data is always replicated before confirming to
writer.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 14:36:22
Message-ID: 52CEB3E6.9020905@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/08/2014 11:49 PM, Tom Lane wrote:
> "Joshua D. Drake" <jd(at)commandprompt(dot)com> writes:
>> On 01/08/2014 01:55 PM, Tom Lane wrote:
>>> Sync mode is about providing a guarantee that the data exists on more than
>>> one server *before* we tell the client it's committed. If you don't need
>>> that guarantee, you shouldn't be using sync mode. If you do need it,
>>> it's not clear to me why you'd suddenly not need it the moment the going
>>> actually gets tough.
>> As I understand it what is being suggested is that if a subscriber or
>> target goes down, then the master will just sit there and wait. When I
>> read that, I read that the master will no longer process write
>> transactions. If I am wrong in that understanding then cool. If I am not
>> then that is a serious problem with a production scenario. There is an
>> expectation that a master will continue to function if the target is
>> down, synchronous or not.
> Then you don't understand the point of sync mode, and you shouldn't be
> using it. The point is *exactly* to refuse to commit transactions unless
> we can guarantee the data's been replicated.
For single host scenario this would be similar to asking for
a mode which turns fsync=off in case of disk failure :)

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 15:01:10
Message-ID: 52CEB9B6.7030200@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/09/2014 02:01 AM, Jim Nasby wrote:
> On 1/8/14, 6:05 PM, Tom Lane wrote:
>> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>>> >On 01/08/2014 03:27 PM, Tom Lane wrote:
>>>> >>What we lack, and should work on, is a way for sync mode to have
>>>> M larger
>>>> >>than one. AFAICS, right now we'll report commit as soon as
>>>> there's one
>>>> >>up-to-date replica, and some high-reliability cases are going to
>>>> want
>>>> >>more.
>>> >"Sync N times" is really just a guarantee against data loss as long as
>>> >you lose N-1 servers or fewer. And it becomes an even
>>> >lower-availability solution if you don't have at least N+1 replicas.
>>> >For that reason, I'd like to see some realistic actual user demand
>>> >before we take the idea seriously.
>> Sure. I wasn't volunteering to implement it, just saying that what
>> we've got now is not designed to guarantee data survival across failure
>> of more than one server. Changing things around the margins isn't
>> going to improve such scenarios very much.
>>
>> It struck me after re-reading your example scenario that the most
>> likely way to figure out what you had left would be to see if some
>> additional system (think Nagios monitor, or monitors) had records
>> of when the various database servers went down. This might be
>> what you were getting at when you said "logging", but the key point
>> is it has to be logging done on an external server that could survive
>> failure of the database server. postmaster.log ain't gonna do it.
>
> Yeah, and I think that the logging command that was suggested allows
> for that *if configured correctly*.
*But* for relying on this, we would also need to make logging
*synchronous*,
which would probably not go down well with many people, as it makes things
even more fragile from availability viewpoint (and slower as well).

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Hannu Krosing" <hannu(at)2ndQuadrant(dot)com>, "Andres Freund" <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <kgrittn(at)ymail(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>, "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 15:15:34
Message-ID: CBA80AC741504640A9466018BB8EBC60@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Hannu Krosing" <hannu(at)2ndQuadrant(dot)com>
> On 01/09/2014 01:57 PM, MauMau wrote:
>> Let me ask a (probably) stupid question. How is the sync rep
>> different from RAID-1?
>>
>> When I first saw sync rep, I expected that it would provide the same
>> guarantees as RAID-1 in terms of durability (data is always mirrored
>> on two servers) and availability (if one server goes down, another
>> server continues full service).
> What you describe is most like A-sync rep.
>
> Sync rep makes sure that data is always replicated before confirming to
> writer.

Really? RAID-1 is a-sync?

Regards
MauMau


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2014-01-09 15:55:22
Message-ID: 52CEC66A.1060204@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/09/2014 04:15 PM, MauMau wrote:
> From: "Hannu Krosing" <hannu(at)2ndQuadrant(dot)com>
>> On 01/09/2014 01:57 PM, MauMau wrote:
>>> Let me ask a (probably) stupid question. How is the sync rep
>>> different from RAID-1?
>>>
>>> When I first saw sync rep, I expected that it would provide the same
>>> guarantees as RAID-1 in terms of durability (data is always mirrored
>>> on two servers) and availability (if one server goes down, another
>>> server continues full service).
>> What you describe is most like A-sync rep.
>>
>> Sync rep makes sure that data is always replicated before confirming to
>> writer.
>
> Really? RAID-1 is a-sync?
Not exactly, as there is no "master" just controller writing to two
equal disks.

But having a "degraded" mode makes it
more like async - it continues even with single disk and syncs later if
and when the 2nd disk comes back.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Standalone synchronous master
Date: 2014-01-09 17:15:37
Message-ID: 20140109171537.GA4873@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 9, 2014 at 04:55:22PM +0100, Hannu Krosing wrote:
> On 01/09/2014 04:15 PM, MauMau wrote:
> > From: "Hannu Krosing" <hannu(at)2ndQuadrant(dot)com>
> >> On 01/09/2014 01:57 PM, MauMau wrote:
> >>> Let me ask a (probably) stupid question. How is the sync rep
> >>> different from RAID-1?
> >>>
> >>> When I first saw sync rep, I expected that it would provide the same
> >>> guarantees as RAID-1 in terms of durability (data is always mirrored
> >>> on two servers) and availability (if one server goes down, another
> >>> server continues full service).
> >> What you describe is most like A-sync rep.
> >>
> >> Sync rep makes sure that data is always replicated before confirming to
> >> writer.
> >
> > Really? RAID-1 is a-sync?
> Not exactly, as there is no "master" just controller writing to two
> equal disks.
>
> But having a "degraded" mode makes it
> more like async - it continues even with single disk and syncs later if
> and when the 2nd disk comes back.

I think RAID-1 is a very good comparison because it is successful
technology and has similar issues.

RAID-1 is like Postgres synchronous_standby_names mode in the sense that
the RAID-1 controller will not return success until writes have happened
on both mirrors, but it is unlike synchronous_standby_names in that it
will degrade and continue writes even when it can't write to both
mirrors. What is being discussed is to allow the RAID-1 behavior in
Postgres.

One issue that came up in discussions is the insufficiency of writing a
degrade notice in a server log file because the log file isn't durable
from server failures, meaning you don't know if a fail-over to the slave
lost commits. The degrade message has to be stored durably against a
server failure, e.g. on a pager, probably using a command like we do for
archive_command, and has to return success before the server continues
in degrade mode. I assume degraded RAID-1 controllers inform
administrators in the same way.

I think RAID-1 controllers operate successfully with this behavior
because they are seen as durable and authoritative in reporting the
status of mirrors, while with Postgres, there is no central authority
that can report that degrade status of master/slaves.

Another concern with degrade mode is that once Postgres enters degrade
mode, how does it get back to synchronous_standby_names mode? We could
have each commit wait for the timeout before continuing, but that is
going to make degrade mode unusably slow. Would there be an admin
command? With a timeout to force degrade mode, a temporary network
outage could cause degrade mode, while our current behavior would
recover synchronous_standby_names mode once the network was repaired.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 17:36:47
Message-ID: CAMkU=1yqF6KY9B2xCO9uLtG8dZAxn2y=oud7UsyCvQ+ia1_bQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 8, 2014 at 3:00 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> On 01/08/2014 01:49 PM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> If we really want auto-degrading sync rep, then we'd (at a minimum) need
> >> a way to determine *from the replica* whether or not it was in degraded
> >> mode when the master died. What good do messages to the master log do
> >> you if the master no longer exists?
> >
> > How would it be possible for a replica to know whether the master had
> > committed more transactions while communication was lost, if the master
> > dies without ever restoring communication? It sounds like pie in the
> > sky from here ...
>
> Oh, right. Because the main reason for a sync replica degrading is that
> it's down. In which case it isn't going to record anything. This would
> still be useful for sync rep candidates, though, and I'll document why
> below. But first, lemme demolish the case for auto-degrade.
>
> So here's the case that we can't possibly solve for auto-degrade.
> Anyone who wants auto-degrade needs to come up with a solution for this
> case as a first requirement:
>

It seems like the only deterministically useful thing to do is to send a
NOTICE to the *client* that the commit has succeeded, but in degraded mode,
so keep your receipts and have your lawyer's number handy. Whether anyone
is willing to add code to the client to process that message is doubtful,
as well as whether the client will even ever receive it if we are in the
middle of a major disruption.

But I think there is a good probabilistic justification for an
auto-degrade mode. (And really, what else is there? There are never any
real guarantees of anything. Maybe none of your replicas ever come back
up. Maybe none of your customers do, either.)

>
> 1. A data center network/power event starts.
>
> 2. The sync replica goes down.
>
> 3. A short time later, the master goes down.
>
> 4. Data center power is restored.
>
> 5. The master is fried and is a permanent loss. The replica is ok, though.
>
> Question: how does the DBA know whether data has been lost or not?
>

What if he had a way of knowing that some data *has* been lost? What can
he do about it? What is the value in knowing it was lost after the fact,
but without the ability to do anything about it?

But let's say that instead of a permanent loss, the master can be brought
back up in a few days after replacing a few components, or in a few weeks
after sending the drives out to clean-room data recovery specialists.
Writing has already failed over to the replica, because you couldn't wait
that long to bring things back up.

Once you get your old master back, you can see if transaction have been
lost, and if they have been you can dump the tables out to a human readable
format, use PITR and restore a copy of the replica to the point just before
the failover (although I'm not really sure exactly how to identify that
point) and dump that out, then use 'diff' tools to figure out what changes
to the database were lost, consult with the application specialists to
figure out what the application was doing that lead to those changes (if
that is not obvious) and business operations people to figure out how to
apply the analogous changes to the top of the database, and customer
service VP or someone to figure how to retroactively fix transactions that
were done after the failover which would have been differently had the lost
transactions not been lost. Or instead of all that, you could look at the
recovered data and learn that in fact nothing had been lost, so nothing
further needs to be done.

If you were running in asyn replication mode on a busy server, there is a
virtual certainty that some transactions have been lost. If you were
running in sync mode with possibility of auto-degrade, it is far from
certain. That depends on how long the power event lasted, compared to how
long you had the timeout set to.

Or rather than a data-center-wide power spike, what if your master just
"done fell over" with no drama to the rest of the neighborhood? Inspection
after the fail-over to the replica shows the RAID controller card failed.
There is no reason to think that a RAID controller, in the process of
failing, would have caused the replication to kick into degraded mode. You
know from the surviving logs that the master spent 60 seconds total in
degraded mode over the last 3 months, so there is a 99.999% chance no
confirmed transactions were lost. To be conservative, let's drop it to
99.99% because maybe some unknown mechanism did allow a failing RAID
controller to blip the network card without leaving any evidence behind.
That's a lot better than the chances of lost transactions while in async
replication mode, which could be 99.9% in the other direction.

Cheers,

Jeff


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 17:48:11
Message-ID: 20140109174811.GB4873@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 9, 2014 at 09:36:47AM -0800, Jeff Janes wrote:
> Oh, right. Because the main reason for a sync replica degrading is that
> it's down. In which case it isn't going to record anything. This would
> still be useful for sync rep candidates, though, and I'll document why
> below. But first, lemme demolish the case for auto-degrade.
>
> So here's the case that we can't possibly solve for auto-degrade.
> Anyone who wants auto-degrade needs to come up with a solution for this
> case as a first requirement:
>
>
> It seems like the only deterministically useful thing to do is to send a NOTICE
> to the *client* that the commit has succeeded, but in degraded mode, so keep
> your receipts and have your lawyer's number handy. Whether anyone is willing
> to add code to the client to process that message is doubtful, as well as
> whether the client will even ever receive it if we are in the middle of a major
> disruption.

I don't think clients are the right place for notification. Clients
running on a single server could have fsync=off set by the admin or
lying drives and never know it. I can't imagine a client only wiling to
run if synchronous_standby_names is set.

The synchronous slave is something the administrator has set up and is
responsible for, so the administrator should be notified.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 18:06:24
Message-ID: 52CEE520.8010703@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert,

> I think the problem here is that we tend to have a limited view of
> "the right way to use synch rep". If I have 5 nodes, and I set 1
> synchronous and the other 3 asynchronous, I've set up a "known
> successor" in the event that the leader fails. In this scenario
> though, if the "successor" fails, you actually probably want to keep
> accepting writes; since you weren't using synchronous for durability
> but for operational simplicity. I suspect there are probably other
> scenarios where users are willing to trade latency for improved and/or
> directed durability but not at the extent of availability, don't you?

That's a workaround for a completely different limitation though; the
inability to designate a specific async replica as "first". That is, if
there were some way to do so, you would be using that rather than sync
rep. Extending the capabilities of that workaround is not something I
would gladly do until I had exhausted other options.

The other problem is that *many* users think they can get improved
availability, consistency AND durability on two nodes somehow, and to
heck with the CAP theorem (certain companies are happy to foster this
illusion). Having a simple, easily-accessable auto-degrade without
treading degrade as a major monitoring event will feed this
self-deception. I know I already have to explain the difference between
"synchronous" and "simultaneous" to practically every one of my clients
for whom I set up replication.

Realistically, degrade shouldn't be something that happens inside a
single PostgreSQL node, either the master or the replica. It should be
controlled by some external controller which is capable of deciding on
degrade or not based on a more complex set of circumstances (e.g. "Is
the replica actually down or just slow?"). Certainly this is the case
with Cassandra, VoltDB, Riak, and the other "serious" multinode databases.

> This isn't to say there isn't a lot of confusion around the issue.
> Designing, implementing, and configuring different guarantees in the
> presence of node failures is a non-trivial problem. Still, I'd prefer
> to see Postgres head in the direction of providing more options in
> this area rather than drawing a firm line at being a CP-oriented
> system.

I'm not categorically opposed to having any form of auto-degrade at all;
what I'm opposed to is a patch which adds auto-degrade **without adding
any additional monitoring or management infrastructure at all**.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 18:23:44
Message-ID: CA+U5nM+e30eHjUU4DjbaytP_Hp51P=X6A+ifPkLpQFEAW34z7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8 January 2014 21:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>> I'm torn on whether we should cave to popular demand on this; but
>> if we do, we sure need to be very clear in the documentation about
>> what a successful return from a commit request means. Sooner or
>> later, Murphy's Law being what it is, if we do this someone will
>> lose the primary and blame us because the synchronous replica is
>> missing gobs of transactions that were successfully committed.
>
> I'm for not caving. I think people who are asking for this don't
> actually understand what they'd be getting.

Agreed.

Just to be clear, I made this mistake initially. Now I realise Heikki
was right and if you think about it long enough, you will too. If you
still disagree, think hard, read the archives until you do.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-09 21:33:36
Message-ID: 52CF15B0.70201@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/9/14, 9:01 AM, Hannu Krosing wrote:
>> Yeah, and I think that the logging command that was suggested allows
>> >for that*if configured correctly*.
> *But* for relying on this, we would also need to make logging
> *synchronous*,
> which would probably not go down well with many people, as it makes things
> even more fragile from availability viewpoint (and slower as well).

Not really... you only care about monitoring performance when the standby has gone AWOL *and* you haven't sent a notification yet. Once you've notified once you're done.

So in this case the master won't go down unless you have a double fault: standby goes down AND you can't get to your monitoring.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 04:51:42
Message-ID: CAA4eK1Ke2_dcX1KY==uW7BaL2_tsWOPWypyeMej2kDHQsxRzOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> I think RAID-1 is a very good comparison because it is successful
> technology and has similar issues.
>
> RAID-1 is like Postgres synchronous_standby_names mode in the sense that
> the RAID-1 controller will not return success until writes have happened
> on both mirrors, but it is unlike synchronous_standby_names in that it
> will degrade and continue writes even when it can't write to both
> mirrors. What is being discussed is to allow the RAID-1 behavior in
> Postgres.
>
> One issue that came up in discussions is the insufficiency of writing a
> degrade notice in a server log file because the log file isn't durable
> from server failures, meaning you don't know if a fail-over to the slave
> lost commits. The degrade message has to be stored durably against a
> server failure, e.g. on a pager, probably using a command like we do for
> archive_command, and has to return success before the server continues
> in degrade mode. I assume degraded RAID-1 controllers inform
> administrators in the same way.

Here I think if user is aware from beginning that this is the behaviour,
then may be the importance of message is not very high.
What I want to say is that if we provide a UI in such a way that user
decides during setup of server the behavior that is required by him.

For example, if we provide a new parameter
available_synchronous_standby_names along with current parameter
and ask user to use this new parameter, if he wishes to synchronously
commit transactions on another server when it is available, else it will
operate as a standalone sync master.

> I think RAID-1 controllers operate successfully with this behavior
> because they are seen as durable and authoritative in reporting the
> status of mirrors, while with Postgres, there is no central authority
> that can report that degrade status of master/slaves.
>
> Another concern with degrade mode is that once Postgres enters degrade
> mode, how does it get back to synchronous_standby_names mode?

It will get back to mode where it will commit the transactions to another
server before commit completes when all the gap in WAL is resolved.
I think in new new mode it will operate as if there is no
synchronous_standby_names.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 06:47:58
Message-ID: CAB7nPqRwv7GOq+3mrvO7Gbb2gt9WWEJRdfKjtGvtynR8U1Vb3g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 3:23 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On 8 January 2014 21:40, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Kevin Grittner <kgrittn(at)ymail(dot)com> writes:
>>> I'm torn on whether we should cave to popular demand on this; but
>>> if we do, we sure need to be very clear in the documentation about
>>> what a successful return from a commit request means. Sooner or
>>> later, Murphy's Law being what it is, if we do this someone will
>>> lose the primary and blame us because the synchronous replica is
>>> missing gobs of transactions that were successfully committed.
>>
>> I'm for not caving. I think people who are asking for this don't
>> actually understand what they'd be getting.
>
> Agreed.
>
>
> Just to be clear, I made this mistake initially. Now I realise Heikki
> was right and if you think about it long enough, you will too. If you
> still disagree, think hard, read the archives until you do.
+1. I see far more potential in having a N-sync solution from the
usability viewpoint, and consistency with the existing mechanisms in
place. A synchronous apply mode would be nice as well.
--
Michael


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 15:47:25
Message-ID: 20140110154725.GD4873@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote:
> On Thu, Jan 9, 2014 at 10:45 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >
> > I think RAID-1 is a very good comparison because it is successful
> > technology and has similar issues.
> >
> > RAID-1 is like Postgres synchronous_standby_names mode in the sense that
> > the RAID-1 controller will not return success until writes have happened
> > on both mirrors, but it is unlike synchronous_standby_names in that it
> > will degrade and continue writes even when it can't write to both
> > mirrors. What is being discussed is to allow the RAID-1 behavior in
> > Postgres.
> >
> > One issue that came up in discussions is the insufficiency of writing a
> > degrade notice in a server log file because the log file isn't durable
> > from server failures, meaning you don't know if a fail-over to the slave
> > lost commits. The degrade message has to be stored durably against a
> > server failure, e.g. on a pager, probably using a command like we do for
> > archive_command, and has to return success before the server continues
> > in degrade mode. I assume degraded RAID-1 controllers inform
> > administrators in the same way.
>
> Here I think if user is aware from beginning that this is the behaviour,
> then may be the importance of message is not very high.
> What I want to say is that if we provide a UI in such a way that user
> decides during setup of server the behavior that is required by him.
>
> For example, if we provide a new parameter
> available_synchronous_standby_names along with current parameter
> and ask user to use this new parameter, if he wishes to synchronously
> commit transactions on another server when it is available, else it will
> operate as a standalone sync master.

I know there was a desire to remove this TODO item, but I think we have
brought up enough new issues that we can keep it to see if we can come
up with a solution. I have added a link to this discussion on the TODO
item.

I think we will need at least four new GUC variables:

* timeout control for degraded mode
* command to run during switch to degraded mode
* command to run during switch from degraded mode
* read-only variable to report degraded mode

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 16:09:17
Message-ID: CA+U5nMJCCGEczyD4RpFExyeemEZgGfEs1tfnXqwtkwUHGa5M_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 January 2014 15:47, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> I know there was a desire to remove this TODO item, but I think we have
> brought up enough new issues that we can keep it to see if we can come
> up with a solution.

Can you summarise what you think the new issues are? All I see is some
further rehashing of old discussions.

There is already a solution to the "problem" because the docs are
already very clear that you need multiple standbys to achieve commit
guarantees AND high availability. RTFM is usually used as some form of
put down, but that is what needs to happen here.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 17:15:10
Message-ID: 52D02A9E.2050704@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 05:09 PM, Simon Riggs wrote:
> On 10 January 2014 15:47, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
>> I know there was a desire to remove this TODO item, but I think we have
>> brought up enough new issues that we can keep it to see if we can come
>> up with a solution.
> Can you summarise what you think the new issues are? All I see is some
> further rehashing of old discussions.
>
> There is already a solution to the "problem" because the docs are
> already very clear that you need multiple standbys to achieve commit
> guarantees AND high availability. RTFM is usually used as some form of
> put down, but that is what needs to happen here.

If we want to get the guarantees that often come up in "sync rep"
discussions - namely that you can assume that your change is applied
on standby when commit returns - then we could implement this by
returning LSN from commit at protocol level and having an option in
queries on standby to wait for this LSN (again passed on wire below
the level of query) to be applied.

This can be mostly hidden in drivers and would need very little effort
from end user to use. basically you tell the driver that one connection
is bound as "the slave" of another and driver can manage using the
right LSNs. That is the last LSN received from master is always
attached to queries on slaves.

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 18:59:23
Message-ID: 52D0430B.7000000@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 07:47 AM, Bruce Momjian wrote:

> I know there was a desire to remove this TODO item, but I think we have
> brought up enough new issues that we can keep it to see if we can come
> up with a solution. I have added a link to this discussion on the TODO
> item.
>
> I think we will need at least four new GUC variables:
>
> * timeout control for degraded mode
> * command to run during switch to degraded mode
> * command to run during switch from degraded mode
> * read-only variable to report degraded mode
>

I know I am the one that instigated all of this so I want to be very
clear on what I and what I am confident that my customers would expect.

If a synchronous slave goes down, the master continues to operate. That
is all. I don't care if it is configurable (I would be fine with that).
I don't care if it is not automatic (e.g; slave goes down and we have to
tell the master to continue).

I have read through this thread more than once, and I have also went
back to the docs. I understand why we do it the way we do it. I also
understand that from a business requirement for 99% of CMD's customers,
it's wrong. At least in the sense of providing continuity of service.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Jim Nasby <jim(at)nasby(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 21:34:58
Message-ID: 52D06782.8010006@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/10/14, 12:59 PM, Joshua D. Drake wrote:
> I know I am the one that instigated all of this so I want to be very clear on what I and what I am confident that my customers would expect.
>
> If a synchronous slave goes down, the master continues to operate. That is all. I don't care if it is configurable (I would be fine with that). I don't care if it is not automatic (e.g; slave goes down and we have to tell the master to continue).
>
> I have read through this thread more than once, and I have also went back to the docs. I understand why we do it the way we do it. I also understand that from a business requirement for 99% of CMD's customers, it's wrong. At least in the sense of providing continuity of service.

+1

I understand that this is a degredation of full-on sync rep. But there is definite value added with sync-rep that can automatically (or at least easily) degrade over async; it protects you from single failures. I fully understand that it will not protect you from a double failure. That's OK in many cases.
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 21:49:34
Message-ID: 20140110214934.GC28544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
>
> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
>
> >I know there was a desire to remove this TODO item, but I think we have
> >brought up enough new issues that we can keep it to see if we can come
> >up with a solution. I have added a link to this discussion on the TODO
> >item.
> >
> >I think we will need at least four new GUC variables:
> >
> >* timeout control for degraded mode
> >* command to run during switch to degraded mode
> >* command to run during switch from degraded mode
> >* read-only variable to report degraded mode
> >
>
> I know I am the one that instigated all of this so I want to be very clear
> on what I and what I am confident that my customers would expect.
>
> If a synchronous slave goes down, the master continues to operate. That is
> all. I don't care if it is configurable (I would be fine with that). I don't
> care if it is not automatic (e.g; slave goes down and we have to tell the
> master to continue).

Would you please explain, as precise as possible, what the advantages of
using a synchronous standby would be in such a scenario?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:02:08
Message-ID: 20140110220208.GX2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > If a synchronous slave goes down, the master continues to operate. That is
> > all. I don't care if it is configurable (I would be fine with that). I don't
> > care if it is not automatic (e.g; slave goes down and we have to tell the
> > master to continue).
>
> Would you please explain, as precise as possible, what the advantages of
> using a synchronous standby would be in such a scenario?

In a degraded/failure state, things continue to *work*. In a
non-degraded/failure state, you're able to handle a system failure and
know that you didn't lose any transactions.

Tom's point is correct, that you will fail on the "have two copies of
everything" in this mode, but that could certainly be acceptable in the
case where there is a system failure. As pointed out by someone
previously, that's how RAID-1 works (which I imagine quite a few of us
use).

I've been thinking about this a fair bit and I've come to like the RAID1
analogy. Stinks that we can't keep things going (automatically) if
either side fails, but perhaps we will one day...

Thanks,

Stephen


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:15:59
Message-ID: 20140110221559.GD28544@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > If a synchronous slave goes down, the master continues to operate. That is
> > > all. I don't care if it is configurable (I would be fine with that). I don't
> > > care if it is not automatic (e.g; slave goes down and we have to tell the
> > > master to continue).
> >
> > Would you please explain, as precise as possible, what the advantages of
> > using a synchronous standby would be in such a scenario?
>
> In a degraded/failure state, things continue to *work*. In a
> non-degraded/failure state, you're able to handle a system failure and
> know that you didn't lose any transactions.

Why do you know that you didn't loose any transactions? Trivial network
hiccups, a restart of a standby, IO overload on the standby all can
cause a very short interruptions in the walsender connection - leading
to degradation.

> As pointed out by someone
> previously, that's how RAID-1 works (which I imagine quite a few of us
> use).

I don't think that argument makes much sense. Raid-1 isn't safe
as-is. It's only safe if you use some sort of journaling or similar
ontop. If you issued a write during a crash you normally will just get
either the version from before or the version after the last write back,
depending on the state on the individual disks and which disk is treated
as authoritative by the raid software.

And even if you disregard that, there's not much outside influence that
can lead to loosing connection to a disk drive inside a raid outside an
actually broken drive. Any network connection is normally kept *outside*
the leven at which you build raids.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:28:55
Message-ID: CAOuzzgr5762f1g_14R3R-+jx6rsVvT095hzXZxyMmLAic8H_Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres,

On Friday, January 10, 2014, Andres Freund wrote:

> On 2014-01-10 17:02:08 -0500, Stephen Frost wrote:
> > * Andres Freund (andres(at)2ndquadrant(dot)com <javascript:;>) wrote:
> > > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> > > > If a synchronous slave goes down, the master continues to operate.
> That is
> > > > all. I don't care if it is configurable (I would be fine with that).
> I don't
> > > > care if it is not automatic (e.g; slave goes down and we have to
> tell the
> > > > master to continue).
> > >
> > > Would you please explain, as precise as possible, what the advantages
> of
> > > using a synchronous standby would be in such a scenario?
> >
> > In a degraded/failure state, things continue to *work*. In a
> > non-degraded/failure state, you're able to handle a system failure and
> > know that you didn't lose any transactions.
>
> Why do you know that you didn't loose any transactions? Trivial network
> hiccups, a restart of a standby, IO overload on the standby all can
> cause a very short interruptions in the walsender connection - leading
> to degradation.

You know that you haven't *lost* any by virtue of the master still being
up. The case you describe is a double-failure scenario- the link between
the master and slave has to go away AND the master must accept a
transaction and then fail independently.

> > As pointed out by someone
> > previously, that's how RAID-1 works (which I imagine quite a few of us
> > use).
>
> I don't think that argument makes much sense. Raid-1 isn't safe
> as-is. It's only safe if you use some sort of journaling or similar
> ontop. If you issued a write during a crash you normally will just get
> either the version from before or the version after the last write back,
> depending on the state on the individual disks and which disk is treated
> as authoritative by the raid software.

Uh, you need a decent raid controller then and we're talking about after a
transaction commit/sync.

And even if you disregard that, there's not much outside influence that
> can lead to loosing connection to a disk drive inside a raid outside an
> actually broken drive. Any network connection is normally kept *outside*
> the leven at which you build raids.

This is a fair point and perhaps we should have the timeout or jitter GUC
which was proposed elsewhere, but the notion that this configuration is
completely unreasonable is not accurate and therefore having it would be a
benefit overall.

Thanks,

Stephen


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:29:58
Message-ID: 52D07466.6070005@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 01:49 PM, Andres Freund wrote:
>>
>> I know I am the one that instigated all of this so I want to be very clear
>> on what I and what I am confident that my customers would expect.
>>
>> If a synchronous slave goes down, the master continues to operate. That is
>> all. I don't care if it is configurable (I would be fine with that). I don't
>> care if it is not automatic (e.g; slave goes down and we have to tell the
>> master to continue).
>
> Would you please explain, as precise as possible, what the advantages of
> using a synchronous standby would be in such a scenario?

Current behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders
are being made.

db02 goes down. It doesn't matter why. It is down. Because it is down,
db01 for all intents and purposes is also down because we are using sync
replication. We have just lost continuity of service, we can no longer
accept orders, we can no longer allow people to log into the website, we
can no longer service accounts.

In short, we are out of business.

Proposed behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders
are being made.

db02 goes down. It doesn't matter why. It is down. db01 continues to
accept orders, allow people to log into the website and we can still
service accounts. The continuity of service continues.

Yes, there are all kinds of things that need to be considered when that
happens, that isn't the point. The point is, PostgreSQL continues its
uptime guarantee and allows the business to continue to function as (if)
nothing has happened.

For many and I dare say the majority of businesses, this is enough. They
know that if the slave goes down they can continue to operate. They know
if the master goes down they can fail over. They know that while both
are up they are using sync rep (with various caveats). They are happy.
They like that it is simple and just works. They continue to use PostgreSQL.

Sincerely,

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:33:55
Message-ID: 20140110223355.GA13568@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> db02 goes down. It doesn't matter why. It is down. db01 continues to accept
> orders, allow people to log into the website and we can still service
> accounts. The continuity of service continues.

Why is that configuration advantageous over a async configuration is the
question. Why, with those requirements, are you using a synchronous
standby at all?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:41:18
Message-ID: 52D0770E.2010405@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 02:33 PM, Andres Freund wrote:
> On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
>> db02 goes down. It doesn't matter why. It is down. db01 continues to accept
>> orders, allow people to log into the website and we can still service
>> accounts. The continuity of service continues.
>
> Why is that configuration advantageous over a async configuration is the
> question. Why, with those requirements, are you using a synchronous
> standby at all?

+1

>
> Greetings,
>
> Andres Freund
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:44:28
Message-ID: 52D077CC.6090308@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 02:33 PM, Andres Freund wrote:
>
> On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
>> db02 goes down. It doesn't matter why. It is down. db01 continues to accept
>> orders, allow people to log into the website and we can still service
>> accounts. The continuity of service continues.
>
> Why is that configuration advantageous over a async configuration is the
> question. Why, with those requirements, are you using a synchronous
> standby at all?

If the master goes down, I can fail over knowing that as many of my
transactions as possible have been replicated.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:45:22
Message-ID: 20140110224522.GB13568@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > Why do you know that you didn't loose any transactions? Trivial network
> > hiccups, a restart of a standby, IO overload on the standby all can
> > cause a very short interruptions in the walsender connection - leading
> > to degradation.

> You know that you haven't *lost* any by virtue of the master still being
> up. The case you describe is a double-failure scenario- the link between
> the master and slave has to go away AND the master must accept a
> transaction and then fail independently.

Unfortunately network outages do correlate with other system
faults. What you're wishing for really is the "I like the world to be
friendly to me" mode.
Even if you have only disk problems, quite often if your disks die, you
can continue to write (especially with a BBU), but uncached reads
fail. So the walsender connection errors out because a read failed, and
youre degrading into async mode. *Because* your primary is about to die.

> > > As pointed out by someone
> > > previously, that's how RAID-1 works (which I imagine quite a few of us
> > > use).
> >
> > I don't think that argument makes much sense. Raid-1 isn't safe
> > as-is. It's only safe if you use some sort of journaling or similar
> > ontop. If you issued a write during a crash you normally will just get
> > either the version from before or the version after the last write back,
> > depending on the state on the individual disks and which disk is treated
> > as authoritative by the raid software.

> Uh, you need a decent raid controller then and we're talking about after a
> transaction commit/sync.

Yes, if you have a BBU that memory is authoritative in most cases. But
in that case the argument of having two disks is pretty much pointless,
the SPOF suddenly became the battery + ram.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:47:13
Message-ID: CAMkU=1xAtZPYRjwr4qtw7bCVDmghXvOhUfgGgKtPrxHBFcVabQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 2:33 PM, Andres Freund <andres(at)2ndquadrant(dot)com>wrote:

> On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> > db02 goes down. It doesn't matter why. It is down. db01 continues to
> accept
> > orders, allow people to log into the website and we can still service
> > accounts. The continuity of service continues.
>
> Why is that configuration advantageous over a async configuration is the
> question.

Because it is orders of magnitude less likely to lose transactions that
were reported to have been committed. A permanent failure of the master is
almost guaranteed to lose transactions with async. With auto-degrade, a
permanent failure of the master only loses reported-committed transactions
if it co-occurs with a temporary failure of the replica or the network,
lasting longer than the time out period.

Why, with those requirements, are you using a synchronous
> standby at all?
>

They aren't using synchronous standby, they are using asynchronous standby
because we fail to provide the choice they prefer, which is a compromise
between the two.

Cheers,

Jeff


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:47:40
Message-ID: 20140110224740.GC13568@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-10 14:44:28 -0800, Joshua D. Drake wrote:
>
> On 01/10/2014 02:33 PM, Andres Freund wrote:
> >
> >On 2014-01-10 14:29:58 -0800, Joshua D. Drake wrote:
> >>db02 goes down. It doesn't matter why. It is down. db01 continues to accept
> >>orders, allow people to log into the website and we can still service
> >>accounts. The continuity of service continues.
> >
> >Why is that configuration advantageous over a async configuration is the
> >question. Why, with those requirements, are you using a synchronous
> >standby at all?
>
> If the master goes down, I can fail over knowing that as many of my
> transactions as possible have been replicated.

It's not like async replication mode delays sending data to the standby
in any way.

Really, the commits themselves are sent to the server at exactly the
same speed independent of sync/async. The only thing that's delayed is
the *notificiation* of the client that sent the commit. Not the commit
itself.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:57:02
Message-ID: CAOuzzgr8AxUQgshF-g9DdEkYOx3+yjn-gP_W8S28nJf_eKZZ4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

On Friday, January 10, 2014, Andres Freund wrote:

> Hi,
>
> On 2014-01-10 17:28:55 -0500, Stephen Frost wrote:
> > > Why do you know that you didn't loose any transactions? Trivial network
> > > hiccups, a restart of a standby, IO overload on the standby all can
> > > cause a very short interruptions in the walsender connection - leading
> > > to degradation.
>
> > You know that you haven't *lost* any by virtue of the master still being
> > up. The case you describe is a double-failure scenario- the link between
> > the master and slave has to go away AND the master must accept a
> > transaction and then fail independently.
>
> Unfortunately network outages do correlate with other system
> faults. What you're wishing for really is the "I like the world to be
> friendly to me" mode.
> Even if you have only disk problems, quite often if your disks die, you
> can continue to write (especially with a BBU), but uncached reads
> fail. So the walsender connection errors out because a read failed, and
> youre degrading into async mode. *Because* your primary is about to die.

That can happen, sure, but I don't agree that people using a single drive
with a BBU or having two drives in a raid1 die at the same time cases are
reasonable arguments against this option. Not to mention that, today, if
the master has an issue then we're SOL anyway. Also, if the network fails
then likely there aren't any new transactions happening.

> > > > As pointed out by someone
> > > > previously, that's how RAID-1 works (which I imagine quite a few of
> us
> > > > use).
> > >
> > > I don't think that argument makes much sense. Raid-1 isn't safe
> > > as-is. It's only safe if you use some sort of journaling or similar
> > > ontop. If you issued a write during a crash you normally will just get
> > > either the version from before or the version after the last write
> back,
> > > depending on the state on the individual disks and which disk is
> treated
> > > as authoritative by the raid software.
>
> > Uh, you need a decent raid controller then and we're talking about after
> a
> > transaction commit/sync.
>
> Yes, if you have a BBU that memory is authoritative in most cases. But
> in that case the argument of having two disks is pretty much pointless,
> the SPOF suddenly became the battery + ram.
>

If that is a concern then use multiple controllers. Certainly not unheard
of- look at SANs...

Thanks,

Stephen


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:59:06
Message-ID: 52D07B3A.5010504@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 02:47 PM, Andres Freund wrote:

> Really, the commits themselves are sent to the server at exactly the
> same speed independent of sync/async. The only thing that's delayed is
> the *notificiation* of the client that sent the commit. Not the commit
> itself.

Which is irrelevant to the point that if the standby goes down, we are
now out of business.

Any continuous replication should not be a SPOF. The current behavior
guarantees that a two node sync cluster is a SPOF. The proposed behavior
removes that.

Sincerely,

Joshua D. Drake

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 22:59:55
Message-ID: 52D07B6B.4040303@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 02:57 PM, Stephen Frost wrote:

> Yes, if you have a BBU that memory is authoritative in most cases. But
> in that case the argument of having two disks is pretty much pointless,
> the SPOF suddenly became the battery + ram.
>
>
> If that is a concern then use multiple controllers. Certainly not
> unheard of- look at SANs...
>

And in PostgreSQL we obviously have the option of having a third or
fourth standby but that isn't the problem we are trying to solve.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 23:09:44
Message-ID: 52D07DB8.5030508@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 11:59 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 02:57 PM, Stephen Frost wrote:
>
>> Yes, if you have a BBU that memory is authoritative in most
>> cases. But
>> in that case the argument of having two disks is pretty much
>> pointless,
>> the SPOF suddenly became the battery + ram.
>>
>>
>> If that is a concern then use multiple controllers. Certainly not
>> unheard of- look at SANs...
>>
>
> And in PostgreSQL we obviously have the option of having a third or
> fourth standby but that isn't the problem we are trying to solve.
The problem you are trying to solve is a controller with enough
Battery Backed Cache RAM to cache the entire database but with
write-though mode.

And you want it to degrade to write-back in case of disk failure so that
you can continue while the disk is broken.

People here are telling you that it would not be safe, use at least RAID-1
if you want availability

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 23:17:34
Message-ID: 52D07F8E.4020501@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 02:59 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 02:47 PM, Andres Freund wrote:
>
>> Really, the commits themselves are sent to the server at exactly the
>> same speed independent of sync/async. The only thing that's delayed is
>> the *notificiation* of the client that sent the commit. Not the commit
>> itself.
>
> Which is irrelevant to the point that if the standby goes down, we are
> now out of business.
>
> Any continuous replication should not be a SPOF. The current behavior
> guarantees that a two node sync cluster is a SPOF. The proposed behavior
> removes that.

Again, if that's your goal, then use async replication.

I really don't understand the use-case here.

The purpose of sync rep is to know determinatively whether or not you
have lost data when disaster strikes. If knowing for certain isn't
important to you, then use async.

BTW, people are using RAID1 as an analogy to 2-node sync replication.
That's a very bad analogy, because in RAID1 you have a *single*
controller which is capable of determining if the disks are in a failed
state or not, and this is all happening on a single node where things
like network outages aren't a consideration. It's really not the same
situation at all.

Also, frankly, I absolutely can't count the number of times I've had to
rescue a customer or family member who had RAID1 but wan't monitoring
syslog, and so one of their disks had been down for months without them
knowning it. Heck, I've done this myself.

So ... the Filesystem geeks have already been through this. Filesystem
clustering started out with systems like DRBD, which includes an
auto-degrade option. However, DBRD with auto-degrade is widely
considered untrustworthy and is a significant portion of why DBRD isn't
trusted today.

From here, clustered filesystems went in two directions: RHCS added
layers of monitoring and management to make auto-degrade a safer option
than it is with DRBD (and still not the default option). Scalable
clustered filesystems added N(M) quorum commit in order to support more
than 2 nodes. Either of these courses are reasonable for us to pursue.

What's a bad idea is adding an auto-degrade option without any tools to
manage and monitor it, which is what this patch does by my reading. If
I'm wrong, then someone can point it out to me.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 23:27:10
Message-ID: 52D081CE.4060406@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 01:49 PM, Andres Freund wrote:
> On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
>>
>> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
>>
>>> I know there was a desire to remove this TODO item, but I think we have
>>> brought up enough new issues that we can keep it to see if we can come
>>> up with a solution. I have added a link to this discussion on the TODO
>>> item.
>>>
>>> I think we will need at least four new GUC variables:
>>>
>>> * timeout control for degraded mode
>>> * command to run during switch to degraded mode
>>> * command to run during switch from degraded mode
>>> * read-only variable to report degraded mode

I would argue that we don't need the first. We just want a command to
switch synchronous/degraded, and a variable (or function) to report on
degraded mode. If we have those things, then it becomes completely
possible to have an external monitoring framework, which is capable of
answering questions like "is the replica down or just slow?", control
degrade.

Oh, wait! We DO have such a command. It's called ALTER SYSTEM SET!
Recently committed. So this is really a solvable issue if one is
willing to use an external utility.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-10 23:38:08
Message-ID: 52D08460.9050807@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 03:17 PM, Josh Berkus wrote:

>> Any continuous replication should not be a SPOF. The current behavior
>> guarantees that a two node sync cluster is a SPOF. The proposed behavior
>> removes that.
>
> Again, if that's your goal, then use async replication.

I think I have gone about this the wrong way. Async does not meet the
technical or business requirements that I have. Sync does except that it
increases the possibility of an outage. That is the requirement I am
trying to address.

>
> The purpose of sync rep is to know determinatively whether or not you
> have lost data when disaster strikes. If knowing for certain isn't
> important to you, then use async.

PostgreSQL Sync replication increases the possibility of an outage. That
is incorrect behavior.

I want sync because on the chance that the master goes down, I have as
much data as possible to fail over to. However, I can't use sync because
it increases the possibility that my business will not be able to
function on the chance that the standby goes down.

>
> What's a bad idea is adding an auto-degrade option without any tools to
> manage and monitor it, which is what this patch does by my reading. If

This we absolutely agree on.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:19:03
Message-ID: 52D08DF7.8000904@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 03:38 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 03:17 PM, Josh Berkus wrote:
>
>>> Any continuous replication should not be a SPOF. The current behavior
>>> guarantees that a two node sync cluster is a SPOF. The proposed behavior
>>> removes that.
>>
>> Again, if that's your goal, then use async replication.
>
> I think I have gone about this the wrong way. Async does not meet the
> technical or business requirements that I have. Sync does except that it
> increases the possibility of an outage. That is the requirement I am
> trying to address.
>
>>
>> The purpose of sync rep is to know determinatively whether or not you
>> have lost data when disaster strikes. If knowing for certain isn't
>> important to you, then use async.
>
> PostgreSQL Sync replication increases the possibility of an outage. That
> is incorrect behavior.
>
> I want sync because on the chance that the master goes down, I have as
> much data as possible to fail over to. However, I can't use sync because
> it increases the possibility that my business will not be able to
> function on the chance that the standby goes down.
>
>>
>> What's a bad idea is adding an auto-degrade option without any tools to
>> manage and monitor it, which is what this patch does by my reading. If
>
> This we absolutely agree on.

As I see it the state of replication in Postgres is as follows.

1) Async. Runs at the speed of the master as it does not have to wait on
the standby to signal a successful commit. There is some degree of
offset between master and standby(s) due to latency.

2) Sync. Runs at the speed of the standby + latency between master and
standby. This is counter balanced by knowledge that the master and
standby are in the same state. As Josh Berkus pointed out there is a
loop hole in this when multiple standbys are involved.

The topic under discussion is an intermediate mode between 1 and 2.
There seems to be a consensus that this is not unreasonable.

The issue seems to be how to achieve this with ideas falling into
roughly two camps.

A) Change the existing sync mode to allow the master and standby fall
out of sync should a standby fall over.

B) Create a new mode that does this without changing the existing sync mode.

My two cents would be to implement B. Sync to me is a contract that
master and standby are in sync at any point in time. Anything else
should be called something else. Then it is up to the documentation to
clearly point out the benefits/pitfalls. If you want to implement
something as important as replication without reading the docs then the
results are on you.

>
> JD
>
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:25:05
Message-ID: 20140111002505.GA2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Adrian,

* Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
> A) Change the existing sync mode to allow the master and standby
> fall out of sync should a standby fall over.

I'm not sure that anyone is argueing for this..

> B) Create a new mode that does this without changing the existing sync mode.
>
> My two cents would be to implement B. Sync to me is a contract that
> master and standby are in sync at any point in time. Anything else
> should be called something else. Then it is up to the documentation
> to clearly point out the benefits/pitfalls. If you want to implement
> something as important as replication without reading the docs then
> the results are on you.

The issue is that there are folks who are argueing, essentially, that
"B" is worthless, wrong, and no one should want it and therefore we
shouldn't have it.

Thanks,

Stephen


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:35:09
Message-ID: 52D091BD.70604@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 04:25 PM, Stephen Frost wrote:
> Adrian,
>
>
> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>> A) Change the existing sync mode to allow the master and standby
>> fall out of sync should a standby fall over.
>
> I'm not sure that anyone is argueing for this..

Looks like here, unless I am really missing the point:

http://www.postgresql.org/message-id/52D07466.6070005@commandprompt.com

"Proposed behavior:

db01->sync->db02

Transactions are happening. Everything is happy. Website is up. Orders
are being made.

db02 goes down. It doesn't matter why. It is down. db01 continues to
accept orders, allow people to log into the website and we can still
service accounts. The continuity of service continues.

Yes, there are all kinds of things that need to be considered when that
happens, that isn't the point. The point is, PostgreSQL continues its
uptime guarantee and allows the business to continue to function as (if)
nothing has happened.

For many and I dare say the majority of businesses, this is enough. They
know that if the slave goes down they can continue to operate. They know
if the master goes down they can fail over. They know that while both
are up they are using sync rep (with various caveats). They are happy.
They like that it is simple and just works. They continue to use
PostgreSQL. "

>
>> B) Create a new mode that does this without changing the existing sync mode.
>>
>> My two cents would be to implement B. Sync to me is a contract that
>> master and standby are in sync at any point in time. Anything else
>> should be called something else. Then it is up to the documentation
>> to clearly point out the benefits/pitfalls. If you want to implement
>> something as important as replication without reading the docs then
>> the results are on you.
>
> The issue is that there are folks who are argueing, essentially, that
> "B" is worthless, wrong, and no one should want it and therefore we
> shouldn't have it.

Well you will not please everyone, just displease the least.

>
> Thanks,
>
> Stephen
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:38:01
Message-ID: 20140111003801.GD2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Adrian,

* Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
> On 01/10/2014 04:25 PM, Stephen Frost wrote:
> >* Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
> >>A) Change the existing sync mode to allow the master and standby
> >>fall out of sync should a standby fall over.
> >
> >I'm not sure that anyone is argueing for this..
>
> Looks like here, unless I am really missing the point:

Elsewhere in the thread, JD agreed that having it as an independent
option was fine.

> Well you will not please everyone, just displease the least.

Well, sure, but we do generally try to reach concensus. :)

Thanks,

Stephen


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:48:31
Message-ID: 52D094DF.6060604@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/10/2014 04:38 PM, Stephen Frost wrote:
> Adrian,
>
> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>> On 01/10/2014 04:25 PM, Stephen Frost wrote:
>>> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>>>> A) Change the existing sync mode to allow the master and standby
>>>> fall out of sync should a standby fall over.
>>>
>>> I'm not sure that anyone is argueing for this..
>>
>> Looks like here, unless I am really missing the point:
>
> Elsewhere in the thread, JD agreed that having it as an independent
> option was fine.

Yes. I am fine with an independent option.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Jim Nasby <jim(at)nasby(dot)net>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 00:51:41
Message-ID: 52D0959D.5020003@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/10/14, 6:19 PM, Adrian Klaver wrote:
> 1) Async. Runs at the speed of the master as it does not have to wait on the standby to signal a successful commit. There is some degree of offset between master and standby(s) due to latency.
>
> 2) Sync. Runs at the speed of the standby + latency between master and standby. This is counter balanced by knowledge that the master and standby are in the same state. As Josh Berkus pointed out there is a loop hole in this when multiple standbys are involved.
>
> The topic under discussion is an intermediate mode between 1 and 2. There seems to be a consensus that this is not unreasonable.

That's not what's actually under debate; allow me to restate as option 3:

3) Sync. Everything you said, plus: "If for ANY reason the master can not talk to the slave it becomes read-only."

That's the current state.

What many people want is something along the lines of what you said in 2: The slave ALWAYS has everything the master does (at least on disk) unless the connection between master and slave fails.

The reason people want this is it protects you against a *single* fault. If just the master blows up, you have a 100% reliable slave. If the connection (or the slave itself) blows up, the master is still working.

I agree that there's a non-obvious gotcha here: in the case of a master failure you might also have experienced a connection failure, and without some kind of 3rd party involved you have no way to know that.

We should make best efforts to make that gotcha as clear to users as we can. But just because some users will blindly ignore that doesn't mean we flat-out shouldn't support those that will understand the gotcha and accept it's limitations.

BTW, if ALTER SYSTEM SET actually does make it possible to implement automated failover without directly adding it to Postgres then I think a good compromise would be to have an external project that does just that and have the docs reference that project and explain why we haven't built it in.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 01:06:58
Message-ID: 52D09932.50601@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 04:48 PM, Joshua D. Drake wrote:
>
> On 01/10/2014 04:38 PM, Stephen Frost wrote:
>> Adrian,
>>
>> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>>> On 01/10/2014 04:25 PM, Stephen Frost wrote:
>>>> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>>>>> A) Change the existing sync mode to allow the master and standby
>>>>> fall out of sync should a standby fall over.
>>>>
>>>> I'm not sure that anyone is argueing for this..
>>>
>>> Looks like here, unless I am really missing the point:
>>
>> Elsewhere in the thread, JD agreed that having it as an independent
>> option was fine.
>
> Yes. I am fine with an independent option.

I missed that. What confused me and seems to be generally confusing is
the overloading of the term sync:

"Proposed behavior:

db01->sync->db02 "

In my mind if that is an independent option it should have different
name. I propose Schrödinger:)

>
> JD
>
>
>

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 01:21:44
Message-ID: 1389403304.12505.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2014-01-08 at 17:56 -0500, Stephen Frost wrote:
> * Andres Freund (andres(at)2ndquadrant(dot)com) wrote:
> > That's why you should configure a second standby as another (candidate)
> > synchronous replica, also listed in synchronous_standby_names.
>
> Perhaps we should stress in the docs that this is, in fact, the *only*
> reasonable mode in which to run with sync rep on? Where there are
> multiple replicas, because otherwise Drake is correct that you'll just
> end up having both nodes go offline if the slave fails.

It's not unreasonable to run with only two if the writers are consuming
from a reliable message queue (or another system that maintains its own
reliable persistence). Then you can just continue processing messages
after you have repaired your replication pair.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 02:25:26
Message-ID: 20140111022526.GG15692@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 03:17:34PM -0800, Josh Berkus wrote:
> The purpose of sync rep is to know determinatively whether or not you
> have lost data when disaster strikes. If knowing for certain isn't
> important to you, then use async.
>
> BTW, people are using RAID1 as an analogy to 2-node sync replication.
> That's a very bad analogy, because in RAID1 you have a *single*
> controller which is capable of determining if the disks are in a failed
> state or not, and this is all happening on a single node where things
> like network outages aren't a consideration. It's really not the same
> situation at all.
>
> Also, frankly, I absolutely can't count the number of times I've had to
> rescue a customer or family member who had RAID1 but wan't monitoring
> syslog, and so one of their disks had been down for months without them
> knowning it. Heck, I've done this myself.
>
> So ... the Filesystem geeks have already been through this. Filesystem
> clustering started out with systems like DRBD, which includes an
> auto-degrade option. However, DBRD with auto-degrade is widely
> considered untrustworthy and is a significant portion of why DBRD isn't
> trusted today.
>
> >From here, clustered filesystems went in two directions: RHCS added
> layers of monitoring and management to make auto-degrade a safer option
> than it is with DRBD (and still not the default option). Scalable
> clustered filesystems added N(M) quorum commit in order to support more
> than 2 nodes. Either of these courses are reasonable for us to pursue.
>
> What's a bad idea is adding an auto-degrade option without any tools to
> manage and monitor it, which is what this patch does by my reading. If
> I'm wrong, then someone can point it out to me.

Yes, my big take-away from the discussion is that informing the admin in
a durable way is a requirement for this degraded mode. You are right
that many ignore RAID degradation warnings, but with the warnings
heeded, degraded functionality can be useful.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 02:27:37
Message-ID: 20140111022737.GH15692@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 03:27:10PM -0800, Josh Berkus wrote:
> On 01/10/2014 01:49 PM, Andres Freund wrote:
> > On 2014-01-10 10:59:23 -0800, Joshua D. Drake wrote:
> >>
> >> On 01/10/2014 07:47 AM, Bruce Momjian wrote:
> >>
> >>> I know there was a desire to remove this TODO item, but I think we have
> >>> brought up enough new issues that we can keep it to see if we can come
> >>> up with a solution. I have added a link to this discussion on the TODO
> >>> item.
> >>>
> >>> I think we will need at least four new GUC variables:
> >>>
> >>> * timeout control for degraded mode
> >>> * command to run during switch to degraded mode
> >>> * command to run during switch from degraded mode
> >>> * read-only variable to report degraded mode
>
> I would argue that we don't need the first. We just want a command to
> switch synchronous/degraded, and a variable (or function) to report on
> degraded mode. If we have those things, then it becomes completely
> possible to have an external monitoring framework, which is capable of
> answering questions like "is the replica down or just slow?", control
> degrade.
>
> Oh, wait! We DO have such a command. It's called ALTER SYSTEM SET!
> Recently committed. So this is really a solvable issue if one is
> willing to use an external utility.

How would that work? Would it be a tool in contrib? There already is a
timeout, so if a tool checked more frequently than the timeout, it
should work. The durable notification of the admin would happen in the
tool, right?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 07:59:23
Message-ID: CAA4eK1J9H6K=A4uk0Qz1M1gCSK8E9O2DSSdzy5mLYSz6yfxd6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 10, 2014 at 9:17 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Fri, Jan 10, 2014 at 10:21:42AM +0530, Amit Kapila wrote:
>> Here I think if user is aware from beginning that this is the behaviour,
>> then may be the importance of message is not very high.
>> What I want to say is that if we provide a UI in such a way that user
>> decides during setup of server the behavior that is required by him.
>>
>> For example, if we provide a new parameter
>> available_synchronous_standby_names along with current parameter
>> and ask user to use this new parameter, if he wishes to synchronously
>> commit transactions on another server when it is available, else it will
>> operate as a standalone sync master.
>
> I know there was a desire to remove this TODO item, but I think we have
> brought up enough new issues that we can keep it to see if we can come
> up with a solution.

I am not telling any such thing, rather I am suggesting some other way
for this new mode.

> I have added a link to this discussion on the TODO
> item.
>
> I think we will need at least four new GUC variables:
>
> * timeout control for degraded mode
> * command to run during switch to degraded mode
> * command to run during switch from degraded mode
> * read-only variable to report degraded mode

Okay, this is one way of providing this new mode, others could be:

a.
Have just one GUC sync_standalone_mode = true|false and make
this as PGC_POSTMASTER parameter, so that user is only
allowed to set this mode at startup. Even if we don't want it as
Postmaster parameter, we can mention to users that they can
change this parameter only before server reaches current situation.
I understand that without any alarm or some other way, it is difficult
for user to know and change it, but I think in that case he should
set it before server startup.

b.
On above lines, instead of boolean parameter, provide a parameter
similar to current one such as available_synchronous_standby_names,
setting of this should follow what I said in point a. The benefit in this
as compare to 'a' is that it appears to be more like what we currently have.

I think if we try to solve this problem by providing a way so that user
can change it at runtime or when the problem actually occurred, it can
make the UI more complex and difficult for us to provide a way so that
user can be alerted on such situation. We can keep our options open
so that if tomorrow, we can find any reasonable way, then we can
provide it to user a mechanism for changing this at runtime, but I don't
think it is stopping us from providing a way with which user can get the
benefit of this mode by providing start time parameter.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 16:11:15
Message-ID: 20140111161115.GJ15692@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote:
> Okay, this is one way of providing this new mode, others could be:
>
> a.
> Have just one GUC sync_standalone_mode = true|false and make
> this as PGC_POSTMASTER parameter, so that user is only
> allowed to set this mode at startup. Even if we don't want it as
> Postmaster parameter, we can mention to users that they can
> change this parameter only before server reaches current situation.
> I understand that without any alarm or some other way, it is difficult
> for user to know and change it, but I think in that case he should
> set it before server startup.
>
> b.
> On above lines, instead of boolean parameter, provide a parameter
> similar to current one such as available_synchronous_standby_names,
> setting of this should follow what I said in point a. The benefit in this
> as compare to 'a' is that it appears to be more like what we currently have.
>
> I think if we try to solve this problem by providing a way so that user
> can change it at runtime or when the problem actually occurred, it can
> make the UI more complex and difficult for us to provide a way so that
> user can be alerted on such situation. We can keep our options open
> so that if tomorrow, we can find any reasonable way, then we can
> provide it to user a mechanism for changing this at runtime, but I don't
> think it is stopping us from providing a way with which user can get the
> benefit of this mode by providing start time parameter.

I am not sure how this would work. Right now we wait for one of the
synchronous_standby_names servers to verify the writes. We need some
way of telling the system how long to wait before continuing in degraded
mode. Without a timeout and admin notification, it doesn't seem much
better than our async mode, which is what many people were complaining
about.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Joshua D(dot) Drake <jd(at)commandprompt(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 17:28:31
Message-ID: D8DE9B06-5D4C-4837-A79F-17A79BD0E472@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan11, 2014, at 01:48 , Joshua D. Drake <jd(at)commandprompt(dot)com> wrote:
> On 01/10/2014 04:38 PM, Stephen Frost wrote:
>> Adrian,
>>
>> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>>> On 01/10/2014 04:25 PM, Stephen Frost wrote:
>>>> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>>>>> A) Change the existing sync mode to allow the master and standby
>>>>> fall out of sync should a standby fall over.
>>>>
>>>> I'm not sure that anyone is argueing for this..
>>>
>>> Looks like here, unless I am really missing the point:
>>
>> Elsewhere in the thread, JD agreed that having it as an independent
>> option was fine.
>
> Yes. I am fine with an independent option.

Hm, I was about to suggest that you can set statement_timeout before
doing COMMIT to limit the amount of time you want to wait for the
standby to respond. Interestingly, however, that doesn't seem to work,
which is weird, since AFAICS statement_timeout simply generates a
query cancel requester after the timeout has elapsed, and cancelling
the COMMIT with Ctrl-C in psql *does* work.

I'm quite probably missing something, but what?

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 17:51:32
Message-ID: 14882.1389462692@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> Hm, I was about to suggest that you can set statement_timeout before
> doing COMMIT to limit the amount of time you want to wait for the
> standby to respond. Interestingly, however, that doesn't seem to work,
> which is weird, since AFAICS statement_timeout simply generates a
> query cancel requester after the timeout has elapsed, and cancelling
> the COMMIT with Ctrl-C in psql *does* work.

> I'm quite probably missing something, but what?

finish_xact_command() disables statement timeout before committing.

Not sure about the pros and cons of doing that later in the sequence.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 17:53:28
Message-ID: 20140111175328.GD13568@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-11 18:28:31 +0100, Florian Pflug wrote:
> Hm, I was about to suggest that you can set statement_timeout before
> doing COMMIT to limit the amount of time you want to wait for the
> standby to respond. Interestingly, however, that doesn't seem to work,
> which is weird, since AFAICS statement_timeout simply generates a
> query cancel requester after the timeout has elapsed, and cancelling
> the COMMIT with Ctrl-C in psql *does* work.

I think that'd be a pretty bad API since you won't know whether the
commit failed or succeeded but replication timed out. There very well
might have been longrunning constraint triggers or such taking a long
time.
So it really would need a separate GUC.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 21:29:02
Message-ID: 52D1B79E.8060509@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/01/14 13:25, Stephen Frost wrote:
> Adrian,
>
>
> * Adrian Klaver (adrian(dot)klaver(at)gmail(dot)com) wrote:
>> A) Change the existing sync mode to allow the master and standby
>> fall out of sync should a standby fall over.
>
> I'm not sure that anyone is argueing for this..
>
>> B) Create a new mode that does this without changing the existing sync mode.
>>
>> My two cents would be to implement B. Sync to me is a contract that
>> master and standby are in sync at any point in time. Anything else
>> should be called something else. Then it is up to the documentation
>> to clearly point out the benefits/pitfalls. If you want to implement
>> something as important as replication without reading the docs then
>> the results are on you.
>
> The issue is that there are folks who are argueing, essentially, that
> "B" is worthless, wrong, and no one should want it and therefore we
> shouldn't have it.
>

We have some people who clearly do want it (and seemed to have provided
sensible arguments about why it might be worthwhile), and the others who
say they should not.

My 2c is:

The current behavior in CAP theorem speak is 'Cap' - i.e focused on
consistency at the expense of availability. A reasonable thing to want.

The other behavior being asked for is 'cAp' - i.e focused on
availability. Also a reasonable configuration to want. Now the desire to
use sync rather than async is to achieve as much consistency as
possible, which is also reasonable.

I think an option to control whether we operate 'Cap' or 'cAp'
(defaulting to the current 'Cap' I guess) is probably the best solution.

Regards

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-11 21:59:51
Message-ID: 13657.1389477591@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes [slightly rearranged]
> My 2c is:

> The current behavior in CAP theorem speak is 'Cap' - i.e focused on
> consistency at the expense of availability. A reasonable thing to want.

> The other behavior being asked for is 'cAp' - i.e focused on
> availability. Also a reasonable configuration to want.

> I think an option to control whether we operate 'Cap' or 'cAp'
> (defaulting to the current 'Cap' I guess) is probably the best solution.

The above is all perfectly reasonable. The argument that's not been made
to my satisfaction is that the proposed patch is a good implementation of
'cAp'-optimized behavior. In particular,

> ... Now the desire to
> use sync rather than async is to achieve as much consistency as
> possible, which is also reasonable.

I don't think that the existing sync mode is designed to do that, and
simply lobotomizing it as proposed doesn't get you there. I think we
need a replication mode that's been designed *from the ground up*
with cAp priorities in mind. There may end up being only a few actual
differences in behavior --- but I fear that some of those differences
will be crucial.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 03:18:02
Message-ID: 52D2096A.2090001@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/10/2014 06:27 PM, Bruce Momjian wrote:
> How would that work? Would it be a tool in contrib? There already is a
> timeout, so if a tool checked more frequently than the timeout, it
> should work. The durable notification of the admin would happen in the
> tool, right?

Well, you know what tool *I'm* planning to use.

Thing is, when we talk about auto-degrade, we need to determine things
like "Is the replica down or is this just a network blip"? and take
action according to the user's desired configuration. This is not
something, realistically, that we can do on a single request. Whereas
it would be fairly simple for an external monitoring utility to do:

1. decide replica is offline for the duration (several poll attempts
have failed)

2. Send ALTER SYSTEM SET to the master and change/disable the
synch_replicas.

Such a tool would *also* be capable of detecting when the synchronous
replica was back up and operating, and switch back to sync mode,
something we simply can't do inside Postgres. And it would be a lot
easier to configure an external tool with monitoring system integration
so that it can alert the DBA to degradation in a way which the DBA was
liable to actually see (which is NOT the Postgres log).

In other words, if we're going to have auto-degrade, the most
intelligent place for it is in
RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
place. Anything we do *inside* Postgres is going to have a really,
really hard time determining when to degrade.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 04:03:34
Message-ID: CAA4eK1K4X=J2Xju6nRHHGXaT8_ND975OKH+eSVK5MDQHCj5dFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 12, 2014 at 8:48 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 01/10/2014 06:27 PM, Bruce Momjian wrote:
>> How would that work? Would it be a tool in contrib? There already is a
>> timeout, so if a tool checked more frequently than the timeout, it
>> should work. The durable notification of the admin would happen in the
>> tool, right?
>
> Well, you know what tool *I'm* planning to use.
>
> Thing is, when we talk about auto-degrade, we need to determine things
> like "Is the replica down or is this just a network blip"? and take
> action according to the user's desired configuration. This is not
> something, realistically, that we can do on a single request. Whereas
> it would be fairly simple for an external monitoring utility to do:
>
> 1. decide replica is offline for the duration (several poll attempts
> have failed)
>
> 2. Send ALTER SYSTEM SET to the master and change/disable the
> synch_replicas.

Will it possible in current mechanism, because presently master will
not accept any new command when the sync replica is not available?
Or is there something else also which needs to be done along with
above 2 points to make it possible.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 04:33:04
Message-ID: 20140112043304.GO28089@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
> In other words, if we're going to have auto-degrade, the most
> intelligent place for it is in
> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
> place. Anything we do *inside* Postgres is going to have a really,
> really hard time determining when to degrade.

Well, one goal I was considering is that if a commit is hung waiting for
slave sync confirmation, and the timeout happens, then the mode is
changed to degraded and the commit returns success. I am not sure how
you would do that in an external tool, meaning there is going to be
period where commits fail, unless you think there is a way that when the
external tool changes the mode to degrade that all hung commits
complete. That would be nice.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ Everyone has their own god. +


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 04:52:33
Message-ID: CAA4eK1+9DLR17L2BUHKC7eYCfz4s1oa5_+OfvpTK63q_6RKTzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 11, 2014 at 9:41 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Sat, Jan 11, 2014 at 01:29:23PM +0530, Amit Kapila wrote:
>> Okay, this is one way of providing this new mode, others could be:
>>
>> a.
>> Have just one GUC sync_standalone_mode = true|false and make
>> this as PGC_POSTMASTER parameter, so that user is only
>> allowed to set this mode at startup. Even if we don't want it as
>> Postmaster parameter, we can mention to users that they can
>> change this parameter only before server reaches current situation.
>> I understand that without any alarm or some other way, it is difficult
>> for user to know and change it, but I think in that case he should
>> set it before server startup.
>>
>> b.
>> On above lines, instead of boolean parameter, provide a parameter
>> similar to current one such as available_synchronous_standby_names,
>> setting of this should follow what I said in point a. The benefit in this
>> as compare to 'a' is that it appears to be more like what we currently have.
>>
>> I think if we try to solve this problem by providing a way so that user
>> can change it at runtime or when the problem actually occurred, it can
>> make the UI more complex and difficult for us to provide a way so that
>> user can be alerted on such situation. We can keep our options open
>> so that if tomorrow, we can find any reasonable way, then we can
>> provide it to user a mechanism for changing this at runtime, but I don't
>> think it is stopping us from providing a way with which user can get the
>> benefit of this mode by providing start time parameter.
>
> I am not sure how this would work. Right now we wait for one of the
> synchronous_standby_names servers to verify the writes. We need some
> way of telling the system how long to wait before continuing in degraded
> mode. Without a timeout and admin notification, it doesn't seem much
> better than our async mode, which is what many people were complaining
> about.

It is better than async mode in a way such that in async mode it never
waits for commits to be written to standby, but in this new mode it will
do so unless it is not possible (all sync standby's goes down).
Can't we use existing wal_sender_timeout, or even if user expects a
different timeout because for this new mode, he expects master to wait
more before it start operating like standalone sync master, we can provide
a new parameter.

With this the definition of new mode is to provide maximum
availability.

We can define the behavior in this new mode as:
a. It will operate like current synchronous master till one of the standby
mentioned in available_synchronous_standby_names is available.
b. If none is available, then it will start operating link current async
master, which means that if any async standby is configured, then
it will start sending WAL to that standby asynchronously, else if none
is configured, it will start operating in a standalone master.
c. We can even provide a new parameter replication_mode here
(non persistent), which will tell to user that master has switched
its mode, this can be made available by view. Update the value of
parameter when server switches to new mode.
d. When one of the standby mentioned in
available_synchronous_standby_names comes back and able to resolve
all WAL difference, then it will again switch back to sync mode, where it
will write to that standby before Commit finishes. After switch, it will
update the replication_mode parameter.

Now I think with above definition and behavior, it can switch to new mode
and will be able to provide information if user wants it by using view.

In above behaviour, the tricky part would be point 'd' where it has to switch
back to sync mode when one of the sync standby become available, but I
think we can workout design for that if you are positive about the above
definition and behaviour as defined by 4 points.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 15:59:21
Message-ID: DB8832A2-0EE2-4785-B60B-2147023D92EE@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan11, 2014, at 18:53 , Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-01-11 18:28:31 +0100, Florian Pflug wrote:
>> Hm, I was about to suggest that you can set statement_timeout before
>> doing COMMIT to limit the amount of time you want to wait for the
>> standby to respond. Interestingly, however, that doesn't seem to work,
>> which is weird, since AFAICS statement_timeout simply generates a
>> query cancel requester after the timeout has elapsed, and cancelling
>> the COMMIT with Ctrl-C in psql *does* work.
>
> I think that'd be a pretty bad API since you won't know whether the
> commit failed or succeeded but replication timed out. There very well
> might have been longrunning constraint triggers or such taking a long
> time.

You could still distinguish these cases because the COMMIT would succeed
with a WARNING if the timeout elapses while waiting for the standby, just
as it does for query cancellations already.

I'm not saying that this is a great API, though - I brought it up only
because I accepting cancellation requests but ignoring timeouts seems
a bit inconsistent to me.

best regards,
Florian Pflug


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 19:59:43
Message-ID: 52D2F42F.1070306@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

I'm leading this off with a review of the features offered by the actual
patch submitted. My general discussion of the issues of Sync Degrade,
which justifies my specific suggestions below, follows that. Rajeev,
please be aware that other hackers may have different opinions than me
on what needs to change about the patch, so you should collect all
opinions before changing code.

=======================

> Add a new parameter :

> synchronous_standalone_master = on | off

I think this is a TERRIBLE name for any such parameter. What does
"synchronous standalone" even mean? A better name for the parameter
would be "auto_degrade_sync_replication" or "synchronous_timeout_action
= error | degrade", or something similar. It would be even better for
this to be a mode of synchronous_commit, except that synchronous_commit
is heavily overloaded already.

Some issues raised by this log script:

LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver on the standby is killed (SIGKILL)
LOG: unexpected EOF on standby connection
LOG: not waiting for standby synchronization
<-- restart standby so that it connects again
LOG: standby "tx0113" is now the synchronous standby with priority 1
LOG: waiting for standby synchronization
<-- standby wal receiver is first stopped (SIGSTOP) to make sure

The "not waiting for standby synchronization" message should be marked
something stronger than LOG. I'd like ERROR.

Second, you have the master resuming sync rep when the standby
reconnects. How do you determine when it's safe to do that? You're
making the assumption that you have a failing sync standby instead of
one which simply can't keep up with the master, or a flakey network
connection (see discussion below).

> a. Master_to_standalone_cmd: To be executed before master
switches to standalone mode.
>
> b. Master_to_sync_cmd: To be executed before master switches from
sync mode to standalone mode.

I'm not at all clear what the difference between these two commands is.
When would one be excuted, and when would the other be executed? Also,
renaming ...

Missing features:

a) we should at least send committing clients a WARNING if they have
commited a synchronous transaction and we are in degraded mode.

I know others have dismissed this idea as too "talky", but from my
perspective, the agreement with the client for each synchronous commit
is being violated, so each and every synchronous commit should report
failure to sync. Also, having a warning on every commit would make it
easier to troubleshoot degraded mode for users who have ignored the
other warnings we give them.

b) pg_stat_replication needs to show degraded mode in some way, or we
need pg_sync_rep_degraded(), or (ideally) both.

I'm also wondering if we need a more sophisticated approach to
wal_sender_timeout to go with all this.

=======================

On 01/11/2014 08:33 PM, Bruce Momjian wrote:
> On Sat, Jan 11, 2014 at 07:18:02PM -0800, Josh Berkus wrote:
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>> place. Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
>
> Well, one goal I was considering is that if a commit is hung waiting for
> slave sync confirmation, and the timeout happens, then the mode is
> changed to degraded and the commit returns success. I am not sure how
> you would do that in an external tool, meaning there is going to be
> period where commits fail, unless you think there is a way that when the
> external tool changes the mode to degrade that all hung commits
> complete. That would be nice.

Realistically, though, that's pretty unavoidable. Any technique which
waits a reasonable interval to determine that the replica isn't going to
respond is liable to go beyond the application's timeout threshold
anyway. There are undoubtedly exceptions to that, but it will be the
case a lot of the time -- how many applications are willing to wait
*minutes* for a COMMIT?

I also don't see any way to allow the hung transactions to commit
without allowing the walsender to make a decision on degrading. As I've
outlined elsewhere (and below), the walsender just doesn't have enough
information to make a good decision.

On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
in a way such that in async mode it never
> waits for commits to be written to standby, but in this new mode it will
> do so unless it is not possible (all sync standby's goes down).
> Can't we use existing wal_sender_timeout, or even if user expects a
> different timeout because for this new mode, he expects master to wait
> more before it start operating like standalone sync master, we can provide
> a new parameter.

One of the reasons that there's so much disagreement about this feature
is that most of the folks strongly in favor of auto-degrade are thinking
*only* of the case that the standby is completely down. There are many
other reasons for a sync transaction to hang, and the walsender has
absolutely no way of knowing which is the case. For example:

* Transient network issues
* Standby can't keep up with master
* Postgres bug
* Storage/IO issues (think EBS)
* Standby is restarting

You don't want to handle all of those issues the same way as far as sync
rep is concerned. For example, if the standby is restaring, you
probably want to wait instead of degrading.

There's also the issue that this patch, and necessarily any
walsender-level auto-degrade, has IMHO no safe way to resume sync
replication. This means that any use who has a network or storage blip
once a day (again, think AWS) would be constantly in degraded mode, even
though both the master and the replica are up and running -- and it will
come as a complete surprise to them when the lose the master and
discover that they've lost data.

This is why, as I've said, any auto-degrade patch needs to treat
auto-degrade as a major event, and alert users in all ways reasonable.
See my concrete proposals at the beginning of this email for what I mean.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 20:35:12
Message-ID: 20140112203512.GL2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
> in a way such that in async mode it never
> > waits for commits to be written to standby, but in this new mode it will
> > do so unless it is not possible (all sync standby's goes down).
> > Can't we use existing wal_sender_timeout, or even if user expects a
> > different timeout because for this new mode, he expects master to wait
> > more before it start operating like standalone sync master, we can provide
> > a new parameter.
>
> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are thinking
> *only* of the case that the standby is completely down. There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case. For example:

Uhh, yea, no, I'm pretty sure those in favor of auto-degrade are very
specifically thinking of cases like "Standby is restarting", which is
not a reason for the master to fall over.

> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
>
> You don't want to handle all of those issues the same way as far as sync
> rep is concerned. For example, if the standby is restaring, you
> probably want to wait instead of degrading.

*What*?! Certainly not in any kind of OLTP-type system; a system
restart can easily take minutes. Clearly, you want to resume once the
standby is back up, which I feel like the people against an auto-degrade
mode are missing, but holding up a commit until the standby finishes
rebooting isn't practical.

> There's also the issue that this patch, and necessarily any
> walsender-level auto-degrade, has IMHO no safe way to resume sync
> replication. This means that any use who has a network or storage blip
> once a day (again, think AWS) would be constantly in degraded mode, even
> though both the master and the replica are up and running -- and it will
> come as a complete surprise to them when the lose the master and
> discover that they've lost data.

I don't follow this logic at all- why is there no safe way to resume?
You wait til the slave is caught up fully and then go back to sync mode.
If that turns out to be an extended problem then an alarm needs to be
raised, of course.

Thanks,

Stephen


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 20:51:38
Message-ID: 1389559898.38507.YahooMailNeo@web122303.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

>> Add a new parameter :

>
>> synchronous_standalone_master = on | off
>
> I think this is a TERRIBLE name for any such parameter.  What does
> "synchronous standalone" even mean?  A better name for the parameter
> would be "auto_degrade_sync_replication" or
> "synchronous_timeout_action
> = error | degrade", or something similar.  It would be even better for
> this to be a mode of synchronous_commit, except that synchronous_commit
> is heavily overloaded already.

+1

> a) we should at least send committing clients a WARNING if they have
> commited a synchronous transaction and we are in degraded mode.
>
> I know others have dismissed this idea as too "talky", but from my
> perspective, the agreement with the client for each synchronous commit
> is being violated, so each and every synchronous commit should report
> failure to sync.  Also, having a warning on every commit would make it
> easier to troubleshoot degraded mode for users who have ignored the
> other warnings we give them.

I agree that every synchronous commit on a master which is configured for synchronous replication which returns without persisting the work of the transaction on both the (local) primary and a synchronous replica should issue a WARNING.  That said, the API for some connectors (like JDBC) puts the burden on the application or its framework to check for warnings each time and do something reasonable if found; I fear that a Venn diagram of those shops which would use this new feature and those shops that don't rigorously look for and reasonably deal with warnings would have significant overlap.

> b) pg_stat_replication needs to show degraded mode in some way, or we
> need pg_sync_rep_degraded(), or (ideally) both.

+1

Since this new feature, where enabled, would cause synchronous replication to provide no guarantees beyond what asynchronous replication does[1], but would tend to cause people to have an *expectation* that they have some additional protection, I think proper documentation will be a big challenge.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1]  If I understand correctly, this is what the feature is intended to provide:
- A transaction successfully committed on the primary is guaranteed to be visible on the replica?  No, in all modes.
- A transaction successfully committed on the primary is guaranteed *not* to be visible on the replica?  No, in all modes.
- A the work of a transaction which has not returned from a commit request may be visible on the primary and/or the standby?  Yes in all modes.
- A failure of the primary is guaranteed not to lose successfully committed transactions when failing over to the replica?  Yes for sync rep without this feature, no for async or when this feature is used.  If things are going well up to the moment of primary failure, the feature improves the odds (versus async) that successfully committed transactions will not be lost, or may reduce the number of successfully committed transactions lost.
- A failure of the replica allows transactions on the primary to continue?  Read only for sync rep without this feature if the last sync standby has failed, read only for some interval and then read write with this feature or if there is still another working sync rep target, all transactions without interruption with async.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 21:04:17
Message-ID: 52D30351.2040401@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/12/2014 12:35 PM, Stephen Frost wrote:
> * Josh Berkus (josh(at)agliodbs(dot)com) wrote:
>> You don't want to handle all of those issues the same way as far as sync
>> rep is concerned. For example, if the standby is restaring, you
>> probably want to wait instead of degrading.
>
> *What*?! Certainly not in any kind of OLTP-type system; a system
> restart can easily take minutes. Clearly, you want to resume once the
> standby is back up, which I feel like the people against an auto-degrade
> mode are missing, but holding up a commit until the standby finishes
> rebooting isn't practical.

Well, then that becomes a reason to want better/more configurability.
In the couple of sync rep sites I admin, I *would* want to wait.

>> There's also the issue that this patch, and necessarily any
>> walsender-level auto-degrade, has IMHO no safe way to resume sync
>> replication. This means that any use who has a network or storage blip
>> once a day (again, think AWS) would be constantly in degraded mode, even
>> though both the master and the replica are up and running -- and it will
>> come as a complete surprise to them when the lose the master and
>> discover that they've lost data.
>
> I don't follow this logic at all- why is there no safe way to resume?
> You wait til the slave is caught up fully and then go back to sync mode.
> If that turns out to be an extended problem then an alarm needs to be
> raised, of course.

So, if you have auto-resume, how do you handle the "flaky network" case?
And how would an alarm be raised?

On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I know others have dismissed this idea as too "talky", but from my
>> perspective, the agreement with the client for each synchronous
>> commit is being violated, so each and every synchronous commit
>> should report failure to sync. Also, having a warning on every
>> commit would make it easier to troubleshoot degraded mode for users
>> who have ignored the other warnings we give them.
>
> I agree that every synchronous commit on a master which is configured
> for synchronous replication which returns without persisting the work
> of the transaction on both the (local) primary and a synchronous
> replica should issue a WARNING. That said, the API for some
> connectors (like JDBC) puts the burden on the application or its
> framework to check for warnings each time and do something reasonable
> if found; I fear that a Venn diagram of those shops which would use
> this new feature and those shops that don't rigorously look for and
> reasonably deal with warnings would have significant overlap.

Oh, no question. However, having such a WARNING would help with
interactive troubleshooting once a problem has been identified, and
that's my main reason for wanting it.

Imagine the case where you have auto-degrade and a flaky network. The
user would experience problems as performance problems; that is, some
commits take minutes on-again, off-again. They wouldn't necessarily
even LOOK at the sync rep settings. So next step is to try walking
through a sample transaction on the command line, and then the
DBA/consultant gets WARNING messages, which gives an idea where the real
problem lies.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-12 21:18:29
Message-ID: 20140112211829.GM2686@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> Well, then that becomes a reason to want better/more configurability.

I agree with this- the challenge is figuring out what those options
should be and how we should document them.

> In the couple of sync rep sites I admin, I *would* want to wait.

That's certainly an interesting data point. One of the specific
use-cases that I'm thinking of is to auto-degrade on a graceful shutdown
of the slave for upgrades and/or maintenance. Perhaps we don't need
*auto* degrade in that case, but then an actual failure of the slave
will also bring down the master.

> > I don't follow this logic at all- why is there no safe way to resume?
> > You wait til the slave is caught up fully and then go back to sync mode.
> > If that turns out to be an extended problem then an alarm needs to be
> > raised, of course.
>
> So, if you have auto-resume, how do you handle the "flaky network" case?
> And how would an alarm be raised?

Ideally, every time there is a auto-degrade, messages are logs to log
files which are monitored and notices are sent to admins about it
happening, who, upon getting repeated such emails, would realize there's
a problem and work to fix it.

> On 01/12/2014 12:51 PM, Kevin Grittner wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >> I know others have dismissed this idea as too "talky", but from my
> >> perspective, the agreement with the client for each synchronous
> >> commit is being violated, so each and every synchronous commit
> >> should report failure to sync. Also, having a warning on every
> >> commit would make it easier to troubleshoot degraded mode for users
> >> who have ignored the other warnings we give them.
> >
> > I agree that every synchronous commit on a master which is configured
> > for synchronous replication which returns without persisting the work
> > of the transaction on both the (local) primary and a synchronous
> > replica should issue a WARNING. That said, the API for some
> > connectors (like JDBC) puts the burden on the application or its
> > framework to check for warnings each time and do something reasonable
> > if found; I fear that a Venn diagram of those shops which would use
> > this new feature and those shops that don't rigorously look for and
> > reasonably deal with warnings would have significant overlap.
>
> Oh, no question. However, having such a WARNING would help with
> interactive troubleshooting once a problem has been identified, and
> that's my main reason for wanting it.

I'm in the camp of this being too 'talky'.

> Imagine the case where you have auto-degrade and a flaky network. The
> user would experience problems as performance problems; that is, some
> commits take minutes on-again, off-again. They wouldn't necessarily
> even LOOK at the sync rep settings. So next step is to try walking
> through a sample transaction on the command line, and then the
> DBA/consultant gets WARNING messages, which gives an idea where the real
> problem lies.

Or they look in the logs which hopefully say that their slave keeps
getting disconnected...

Thanks,

Stephen


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 06:28:20
Message-ID: CAA4eK1J9u1xHcf1=b2Ekj89i1r=hNf+2tcmNEBwTA+KvD5Y5=Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 01/11/2014 08:52 PM, Amit Kapila wrote:> It is better than async mode
> in a way such that in async mode it never
>> waits for commits to be written to standby, but in this new mode it will
>> do so unless it is not possible (all sync standby's goes down).
>> Can't we use existing wal_sender_timeout, or even if user expects a
>> different timeout because for this new mode, he expects master to wait
>> more before it start operating like standalone sync master, we can provide
>> a new parameter.
>
> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are thinking
> *only* of the case that the standby is completely down. There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case. For example:
>
> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
>
> You don't want to handle all of those issues the same way as far as sync
> rep is concerned. For example, if the standby is restaring, you
> probably want to wait instead of degrading.

I think it might be difficult to differentiate the cases except may be
by having a separate timeout for this mode, so that it can wait more
when server runs in this mode. OTOH why can't we define this new
mode such that it will behave same for all cases, basically we can tell
whenever sync standby is not available (n/w issue or m/c down), it will
behave as master in async mode.
Here I think the important point would be to gracefully allow resuming
sync standby when it tries to reconnect (we can allow to reconnect if it
can resolve all WAL differences.)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 07:04:50
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DDB8FB0@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 13th January 2013, Josh Berkus Wrote:

> I'm leading this off with a review of the features offered by the
> actual patch submitted. My general discussion of the issues of Sync
> Degrade, which justifies my specific suggestions below, follows that.
> Rajeev, please be aware that other hackers may have different opinions
> than me on what needs to change about the patch, so you should collect
> all opinions before changing code.

Thanks for reviewing and providing the first level of comments. Surely
We'll collect all feedback to improve this patch.

>
> > Add a new parameter :
>
> > synchronous_standalone_master = on | off
>
> I think this is a TERRIBLE name for any such parameter. What does
> "synchronous standalone" even mean? A better name for the parameter
> would be "auto_degrade_sync_replication" or "synchronous_timeout_action
> = error | degrade", or something similar. It would be even better for
> this to be a mode of synchronous_commit, except that synchronous_commit
> is heavily overloaded already.

Yes we can change this parameter name. Some of the suggestion in order to degrade the mode
1. Auto-degrade using some sort of configuration parameter as done in current patch.
2. Expose the configuration variable to a new SQL-callable functions as suggested by Heikki.
3. Or using ALTER SYSTEM SET as suggested by others.

> Some issues raised by this log script:
>
> LOG: standby "tx0113" is now the synchronous standby with priority 1
> LOG: waiting for standby synchronization
> <-- standby wal receiver on the standby is killed (SIGKILL)
> LOG: unexpected EOF on standby connection
> LOG: not waiting for standby synchronization
> <-- restart standby so that it connects again
> LOG: standby "tx0113" is now the synchronous standby with priority 1
> LOG: waiting for standby synchronization
> <-- standby wal receiver is first stopped (SIGSTOP) to make sure
>
> The "not waiting for standby synchronization" message should be marked
> something stronger than LOG. I'd like ERROR.

Yes we can change this to ERROR.

> Second, you have the master resuming sync rep when the standby
> reconnects. How do you determine when it's safe to do that? You're
> making the assumption that you have a failing sync standby instead of
> one which simply can't keep up with the master, or a flakey network
> connection (see discussion below).

Yes this can be further improved so that only if we make sure that synchronous
Standby has caught up with master node (may require a better design), then only
master can be upgraded to Synchronous mode by one of the method discussed above.

> > a. Master_to_standalone_cmd: To be executed before master
> switches to standalone mode.
> >
> > b. Master_to_sync_cmd: To be executed before master switches
> from
> sync mode to standalone mode.
>
> I'm not at all clear what the difference between these two commands is.
> When would one be excuted, and when would the other be executed? Also,
> renaming ...

There is typo mistake in above explain, meaning of two commands are:
a. Master_to_standalone_cmd: To be executed during degradation of sync mode.

b. Master_to_sync_cmd: To be executed before upgrade or restoration of mode.

These two commands are per the TODO item to inform DBA.

But as per Heikki suggestion, we should not use this mechanism to inform DBA rather
We should some have some sort of generic trap system, instead of adding this one
particular extra config option specifically for this feature.
This looks to be better idea so we can have further discussion to come with proper
design.

> Missing features:
>
> a) we should at least send committing clients a WARNING if they have
> commited a synchronous transaction and we are in degraded mode.

Yes it is great idea.

> One of the reasons that there's so much disagreement about this feature
> is that most of the folks strongly in favor of auto-degrade are
> thinking
> *only* of the case that the standby is completely down. There are many
> other reasons for a sync transaction to hang, and the walsender has
> absolutely no way of knowing which is the case. For example:
>
> * Transient network issues
> * Standby can't keep up with master
> * Postgres bug
> * Storage/IO issues (think EBS)
> * Standby is restarting
>
> You don't want to handle all of those issues the same way as far as
> sync rep is concerned. For example, if the standby is restaring, you
> probably want to wait instead of degrading.

I think if we support to have some external SQL-callable functions as Heikki
suggested to degrade instead of auto-degrade then user can handle at-least some
of the above scenarios if not all based on their experience and observation.

Thanks and Regards,
Kumar Rajeev Rastogi


From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 13:12:43
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DDB908E@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> On Sun, Jan 12, Amit Kapila wrote:
> >> How would that work? Would it be a tool in contrib? There already
> >> is a timeout, so if a tool checked more frequently than the timeout,
> >> it should work. The durable notification of the admin would happen
> >> in the tool, right?
> >
> > Well, you know what tool *I'm* planning to use.
> >
> > Thing is, when we talk about auto-degrade, we need to determine
> things
> > like "Is the replica down or is this just a network blip"? and take
> > action according to the user's desired configuration. This is not
> > something, realistically, that we can do on a single request.
> Whereas
> > it would be fairly simple for an external monitoring utility to do:
> >
> > 1. decide replica is offline for the duration (several poll attempts
> > have failed)
> >
> > 2. Send ALTER SYSTEM SET to the master and change/disable the
> > synch_replicas.
>
> Will it possible in current mechanism, because presently master will
> not accept any new command when the sync replica is not available?
> Or is there something else also which needs to be done along with
> above 2 points to make it possible.

Since there is not WAL written for ALTER SYSTEM SET command, then
it should be able to handle this command even though sync replica is
not available.

Thanks and Regards,
Kumar Rajeev Rastogi


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 15:12:27
Message-ID: 0F82AD6E-7C4B-4BA6-B762-D42704A8D9C6@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan12, 2014, at 04:18 , Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Thing is, when we talk about auto-degrade, we need to determine things
> like "Is the replica down or is this just a network blip"? and take
> action according to the user's desired configuration. This is not
> something, realistically, that we can do on a single request. Whereas
> it would be fairly simple for an external monitoring utility to do:
>
> 1. decide replica is offline for the duration (several poll attempts
> have failed)
>
> 2. Send ALTER SYSTEM SET to the master and change/disable the
> synch_replicas.
>
> In other words, if we're going to have auto-degrade, the most
> intelligent place for it is in
> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
> place. Anything we do *inside* Postgres is going to have a really,
> really hard time determining when to degrade.

+1

This is also how 2PC works, btw - the database provides the building
blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
to deal with issues that require a whole-cluster perspective.

best regards,
Florian Pflug


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 18:12:55
Message-ID: 52D42CA7.90302@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/13/2014 04:12 PM, Florian Pflug wrote:
> On Jan12, 2014, at 04:18 , Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Thing is, when we talk about auto-degrade, we need to determine things
>> like "Is the replica down or is this just a network blip"? and take
>> action according to the user's desired configuration. This is not
>> something, realistically, that we can do on a single request. Whereas
>> it would be fairly simple for an external monitoring utility to do:
>>
>> 1. decide replica is offline for the duration (several poll attempts
>> have failed)
>>
>> 2. Send ALTER SYSTEM SET to the master and change/disable the
>> synch_replicas.
>>
>> In other words, if we're going to have auto-degrade, the most
>> intelligent place for it is in
>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>> place. Anything we do *inside* Postgres is going to have a really,
>> really hard time determining when to degrade.
> +1
>
> This is also how 2PC works, btw - the database provides the building
> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
> to deal with issues that require a whole-cluster perspective.
>

++1

I like Simons idea to have a pg_xxx function for switching between
replication modes, which should be enough to support a monitor
daemon doing the switching.

Maybe we could have an 'syncrep_taking_too_long_command' GUC
which could be used to alert such a monitoring daemon, so it can
immediately check weather to

a) switch master to async rep or standalone mode (in case of sync slave
becoming unavailable)

or

b) to failover to slave (in almost equally likely case that it was the
master
which became disconnected from the world and slave is available)

or

c) do something else depending on circumstances/policy :)

NB! Note that in case of b) 'syncrep_taking_too_long_command' will
very likely also not reach the monitor daemon, so it can not relay on
this as main trigger!

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 18:21:47
Message-ID: 52D42EBB.1020103@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/13/2014 10:12 AM, Hannu Krosing wrote:
>>> In other words, if we're going to have auto-degrade, the most
>>> intelligent place for it is in
>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>>> place. Anything we do *inside* Postgres is going to have a really,
>>> really hard time determining when to degrade.
>> +1
>>
>> This is also how 2PC works, btw - the database provides the building
>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
>> to deal with issues that require a whole-cluster perspective.
>>
>
> ++1

+1

>
> I like Simons idea to have a pg_xxx function for switching between
> replication modes, which should be enough to support a monitor
> daemon doing the switching.
>
> Maybe we could have an 'syncrep_taking_too_long_command' GUC
> which could be used to alert such a monitoring daemon, so it can
> immediately check weather to
>

I would think that would be a column in pg_stat_replication. Basically
last_ack or something like that.

> a) switch master to async rep or standalone mode (in case of sync slave
> becoming unavailable)

Yep.

>
> or
>
> b) to failover to slave (in almost equally likely case that it was the
> master
> which became disconnected from the world and slave is available)
>
> or

I think this should be left to external tools.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Jim Nasby <jim(at)nasby(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 21:14:21
Message-ID: 52D4572D.2040802@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
>
> On 01/13/2014 10:12 AM, Hannu Krosing wrote:
>>>> In other words, if we're going to have auto-degrade, the most
>>>> intelligent place for it is in
>>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>>>> place. Anything we do *inside* Postgres is going to have a really,
>>>> really hard time determining when to degrade.
>>> +1
>>>
>>> This is also how 2PC works, btw - the database provides the building
>>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
>>> to deal with issues that require a whole-cluster perspective.
>>>
>>
>> ++1
>
> +1

Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally.
--
Jim C. Nasby, Data Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 21:18:21
Message-ID: 20140113211821.GB5838@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-01-13 15:14:21 -0600, Jim Nasby wrote:
> On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
> >
> >On 01/13/2014 10:12 AM, Hannu Krosing wrote:
> >>>>In other words, if we're going to have auto-degrade, the most
> >>>>intelligent place for it is in
> >>>>RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
> >>>>place. Anything we do *inside* Postgres is going to have a really,
> >>>>really hard time determining when to degrade.
> >>>+1
> >>>
> >>>This is also how 2PC works, btw - the database provides the building
> >>>blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
> >>>to deal with issues that require a whole-cluster perspective.
> >>>
> >>
> >>++1
> >
> >+1
>
> Josh, what do you think of the upthread idea of being able to recover in-progress transactions that are waiting when we turn off sync rep? I'm thinking that would be a very good feature to have... and it's not something you can easily do externally.

I think it'd be a fairly simple patch to re-check the state of syncrep
config in SyncRepWaitForLsn(). Alternatively you can just write code to
iterate over the procarray and sets Proc->syncRepState to
SYNC_REP_WAIT_CANCELLED or such.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 21:30:07
Message-ID: 52D45ADF.4030907@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/13/2014 01:14 PM, Jim Nasby wrote:
>
> On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
>>
>> On 01/13/2014 10:12 AM, Hannu Krosing wrote:
>>>>> In other words, if we're going to have auto-degrade, the most
>>>>> intelligent place for it is in
>>>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>>>>> place. Anything we do *inside* Postgres is going to have a really,
>>>>> really hard time determining when to degrade.
>>>> +1
>>>>
>>>> This is also how 2PC works, btw - the database provides the building
>>>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
>>>> to deal with issues that require a whole-cluster perspective.
>>>>
>>>
>>> ++1
>>
>> +1
>
> Josh, what do you think of the upthread idea of being able to recover
> in-progress transactions that are waiting when we turn off sync rep? I'm
> thinking that would be a very good feature to have... and it's not
> something you can easily do externally.

I think it is extremely valuable, else we have lost those transactions
which is exactly what we don't want.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
"In a time of universal deceit - telling the truth is a revolutionary
act.", George Orwell


From: Florian Pflug <fgp(at)phlo(dot)org>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-13 23:19:41
Message-ID: 854E115B-6049-4FE8-816D-6A485B8CAFAB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan13, 2014, at 22:30 , "Joshua D. Drake" <jd(at)commandprompt(dot)com> wrote:
> On 01/13/2014 01:14 PM, Jim Nasby wrote:
>>
>> On 1/13/14, 12:21 PM, Joshua D. Drake wrote:
>>>
>>> On 01/13/2014 10:12 AM, Hannu Krosing wrote:
>>>>>> In other words, if we're going to have auto-degrade, the most
>>>>>> intelligent place for it is in
>>>>>> RepMgr/HandyRep/OmniPITR/pgPoolII/whatever. It's also the *easiest*
>>>>>> place. Anything we do *inside* Postgres is going to have a really,
>>>>>> really hard time determining when to degrade.
>>>>> +1
>>>>>
>>>>> This is also how 2PC works, btw - the database provides the building
>>>>> blocks, i.e. PREPARE and COMMIT, and leaves it to a transaction manager
>>>>> to deal with issues that require a whole-cluster perspective.
>>>>>
>>>>
>>>> ++1
>>>
>>> +1
>>
>> Josh, what do you think of the upthread idea of being able to recover
>> in-progress transactions that are waiting when we turn off sync rep? I'm
>> thinking that would be a very good feature to have... and it's not
>> something you can easily do externally.
>
> I think it is extremely valuable, else we have lost those transactions which
> is exactly what we don't want.

We *have* to "recover" waiting transaction upon switching off sync rep.

A transaction that waits for a sync standby to respond has already committed
locally (i.e., updated the clog), it just hasn't updated the proc array yet,
and thus is still seen as in-progress by the rest of the system. But rolling
back the transaction is nevertheless *impossible* at that point (except by
PITR, and hence the quoted around reciver). So the only alternative to
"recovering" them, i.e. have them abort their waiting, is to let them linger
indefinitely, still holding their locks, preventing xmin from advancing, etc,
until either the client disconnects or the server is restarted.

best regards,
Florian Pflug


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-24 20:47:47
Message-ID: 52E2D173.8010700@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ISTM the consensus is that we need better monitoring/administration
interfaces so that people can script the behavior they want in external
tools. Also, a new synchronous apply replication mode would be handy,
but that'd be a whole different patch. We don't have a patch on the
table that we could consider committing any time soon, so I'm going to
mark this as rejected in the commitfest app.

- Heikki


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-24 21:29:45
Message-ID: 52E2DB49.50100@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
> ISTM the consensus is that we need better monitoring/administration
> interfaces so that people can script the behavior they want in external
> tools. Also, a new synchronous apply replication mode would be handy,
> but that'd be a whole different patch. We don't have a patch on the
> table that we could consider committing any time soon, so I'm going to
> mark this as rejected in the commitfest app.

I don't feel that "we'll never do auto-degrade" is determinative;
several hackers were for auto-degrade, and they have a good use-case
argument. However, we do have consensus that we need more scaffolding
than this patch supplies in order to make auto-degrade *safe*.

I encourage the submitter to resumbit and improved version of this patch
(one with more monitorability) for 9.5 CF1. That'll give us a whole
dev cycle to argue about it.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-24 22:25:19
Message-ID: 21E3EBC7-0A3D-471C-8309-6F33D59048B7@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan24, 2014, at 22:29 , Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
>> ISTM the consensus is that we need better monitoring/administration
>> interfaces so that people can script the behavior they want in external
>> tools. Also, a new synchronous apply replication mode would be handy,
>> but that'd be a whole different patch. We don't have a patch on the
>> table that we could consider committing any time soon, so I'm going to
>> mark this as rejected in the commitfest app.
>
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument. However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
>
> I encourage the submitter to resumbit and improved version of this patch
> (one with more monitorability) for 9.5 CF1. That'll give us a whole
> dev cycle to argue about it.

There seemed to be at least some support for having way to manually
degrade from sync rep to async rep via something like

ALTER SYSTEM SET synchronous_commit='local';

Doing that seems unlikely to meet much resistant on grounds of principle,
so it seems to me that working on that would be the best way forward for
the submitter. I don't know how hard it would be to pull this off,
though.

best regards,
Florian Pflug


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-26 12:42:28
Message-ID: 52E502B4.7090402@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/24/2014 10:29 PM, Josh Berkus wrote:
> On 01/24/2014 12:47 PM, Heikki Linnakangas wrote:
>> ISTM the consensus is that we need better monitoring/administration
>> interfaces so that people can script the behavior they want in external
>> tools. Also, a new synchronous apply replication mode would be handy,
>> but that'd be a whole different patch. We don't have a patch on the
>> table that we could consider committing any time soon, so I'm going to
>> mark this as rejected in the commitfest app.
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument.
Auto-degrade may make sense together with synchronous apply
mentioned by Heikki.

I do not see much use for synchronous-(noapply)-if-you-can mode,
though it may make some sense in some scenarios if sync failure
is accompanied by loud screaming ("hey DBA, we are writing checks
with no money in the bank, do something fast!")

Perhaps some kind of sync-with-timeout mode, where timing out
results with a "weak error" (something between current
warning and error) returned to client and/or where it causes and
external command to be run which could then be used to flood
admins mailbox :)
> However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
>
> I encourage the submitter to resumbit and improved version of this patch
> (one with more monitorability) for 9.5 CF1. That'll give us a whole
> dev cycle to argue about it.
>

Cheers

--
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ


From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-27 03:56:23
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DDBCB8B@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/25/2014, Josh Berkus wrote:
> > ISTM the consensus is that we need better monitoring/administration
> > interfaces so that people can script the behavior they want in
> > external tools. Also, a new synchronous apply replication mode would
> > be handy, but that'd be a whole different patch. We don't have a
> patch
> > on the table that we could consider committing any time soon, so I'm
> > going to mark this as rejected in the commitfest app.
>
> I don't feel that "we'll never do auto-degrade" is determinative;
> several hackers were for auto-degrade, and they have a good use-case
> argument. However, we do have consensus that we need more scaffolding
> than this patch supplies in order to make auto-degrade *safe*.
>
> I encourage the submitter to resumbit and improved version of this
> patch (one with more monitorability) for 9.5 CF1. That'll give us a
> whole dev cycle to argue about it.

I shall rework to improve this patch. Below are the summarization of all
discussions, which will be used as input for improving the patch:

1. Method of degrading the synchronous mode:
a. Expose the configuration variable to a new SQL-callable functions.
b. Using ALTER SYSTEM SET.
c. Auto-degrade using some sort of configuration parameter as done in current patch.
d. Or may be combination of above, which DBA can use depending on their use-cases.

We can discuss further to decide on one of the approach.

2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master.

3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system.

I shall propose a better approach for this.

4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode.

5. Please add more if I am missing something.

Thanks and Regards,
Kumar Rajeev Rastogi


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-27 14:47:11
Message-ID: CA+TgmoZc+dgKs=UZfRzmjvC=LjkKF1Dx9TwhWkEAuB4cHmxhew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 26, 2014 at 10:56 PM, Rajeev rastogi
<rajeev(dot)rastogi(at)huawei(dot)com> wrote:
> On 01/25/2014, Josh Berkus wrote:
>> > ISTM the consensus is that we need better monitoring/administration
>> > interfaces so that people can script the behavior they want in
>> > external tools. Also, a new synchronous apply replication mode would
>> > be handy, but that'd be a whole different patch. We don't have a
>> patch
>> > on the table that we could consider committing any time soon, so I'm
>> > going to mark this as rejected in the commitfest app.
>>
>> I don't feel that "we'll never do auto-degrade" is determinative;
>> several hackers were for auto-degrade, and they have a good use-case
>> argument. However, we do have consensus that we need more scaffolding
>> than this patch supplies in order to make auto-degrade *safe*.
>>
>> I encourage the submitter to resumbit and improved version of this
>> patch (one with more monitorability) for 9.5 CF1. That'll give us a
>> whole dev cycle to argue about it.
>
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
>
> 1. Method of degrading the synchronous mode:
> a. Expose the configuration variable to a new SQL-callable functions.
> b. Using ALTER SYSTEM SET.
> c. Auto-degrade using some sort of configuration parameter as done in current patch.
> d. Or may be combination of above, which DBA can use depending on their use-cases.
>
> We can discuss further to decide on one of the approach.
>
> 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master.
>
> 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system.
>
> I shall propose a better approach for this.
>
> 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode.
>
> 5. Please add more if I am missing something.

All of those things have been mentioned, but I'm not sure we have
consensus on which of them we actually want to do, or how. Figuring
that out seems like the next step.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, MauMau <maumau307(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Standalone synchronous master
Date: 2014-01-27 18:51:11
Message-ID: 52E6AA9F.1060006@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/26/2014 07:56 PM, Rajeev rastogi wrote:
> I shall rework to improve this patch. Below are the summarization of all
> discussions, which will be used as input for improving the patch:
>
> 1. Method of degrading the synchronous mode:
> a. Expose the configuration variable to a new SQL-callable functions.
> b. Using ALTER SYSTEM SET.
> c. Auto-degrade using some sort of configuration parameter as done in current patch.
> d. Or may be combination of above, which DBA can use depending on their use-cases.
>
> We can discuss further to decide on one of the approach.
>
> 2. Synchronous mode should upgraded/restored after at-least one synchronous standby comes up and has caught up with the master.
>
> 3. A better monitoring/administration interfaces, which can be even better if it is made as a generic trap system.
>
> I shall propose a better approach for this.
>
> 4. Send committing clients, a WARNING if they have committed a synchronous transaction and we are in degraded mode.
>
> 5. Please add more if I am missing something.

I think we actually need two degrade modes:

A. degrade once: if the sync standby connection is ever lost, degrade
and do not resync.

B. reconnect: if the sync standby catches up again, return it to sync
status.

The reason you'd want "degrade once" is to avoid the "flaky network"
issue where you're constantly degrading then reattaching the sync
standby, resulting in horrible performance.

If we did offer "degrade once" though, we'd need some easy way to
determine that the master was in a state of permanent degrade, and a
command to make it resync.

Discuss?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com