Quick Links

Timeout and wait-forever in sync rep

Lists:	pgsql-hackers

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Timeout and wait-forever in sync rep
Date:	2010-10-15 12:41:42
Message-ID:	AANLkTikP0dGiOzr6zh0v-VthZ+Dwbt3kh3vEKQsZ0Xon@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

As the result of the discussion, I think that we need the following two
parameters for the case where the standby goes down.

* replication_timeout
This is the maximum time to wait for the ACK from the standby. If this
timeout expires, the master closes the replication connection and
disconnects the standby. This parameter is just used for the master
to detect the standby crash or the network outage.

We already have keepalive parameters for that purpose. But they cannot
detect the disconnection in some cases. So replication_timeout needs
to be introduced for sync rep.

* allow_standalone_master
This specifies whether we allow the master to process transactions
alone when there is no connected and sync'd standby.

If this is false, all the transactions on the master are blocked until
sync'd standby has appeared. Of course, this happen not only when
replication_timeout expires but also when we start the master alone
at the initial setup, when the master detects the disconnection by
using keepalive parameters, and when the standby is shut down normally.
People who want 'wait-forever' will disable this parameter to reduce
the risk of data loss.

OTOH, if this is true, the absence of sync'd standby doesn't prevent
the master from processing transactions alone. People who want high
availability even though the risk of data loss increases will enable
this parameter.

The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
-forever' (i.e., you set allow_standalone_master to false), the master
should detect the standby crash as soon as possible by using the
timeout. For example, imagine that max_wal_senders is set to one and
the master cannot detect the standby crash because of absence of the
timeout. In this case, even if you start new standby, it will not be
able to connect to the master since there is no free walsender slot.
As the result, the master actually waits forever.

Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-15 15:43:11
Message-ID:	1287157391.1725.1582.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-10-15 at 21:41 +0900, Fujii Masao wrote:

> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.

> * replication_timeout
> This is the maximum time to wait for the ACK from the standby. If this
> timeout expires, the master closes the replication connection and
> disconnects the standby. This parameter is just used for the master
> to detect the standby crash or the network outage.
>
> We already have keepalive parameters for that purpose.

Yes, I had thought we would just use the keepalives...

> But they cannot
> detect the disconnection in some cases. So replication_timeout needs
> to be introduced for sync rep.

When exactly don't the keepalives work?

> * allow_standalone_master
> This specifies whether we allow the master to process transactions
> alone when there is no connected and sync'd standby.
>
> If this is false, all the transactions on the master are blocked until
> sync'd standby has appeared. Of course, this happen not only when
> replication_timeout expires but also when we start the master alone
> at the initial setup, when the master detects the disconnection by
> using keepalive parameters, and when the standby is shut down normally.
> People who want 'wait-forever' will disable this parameter to reduce
> the risk of data loss.
>
> OTOH, if this is true, the absence of sync'd standby doesn't prevent
> the master from processing transactions alone. People who want high
> availability even though the risk of data loss increases will enable
> this parameter.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-15 16:51:45
Message-ID:	4CB886A1.4060608@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10/15/2010 05:43 PM, Simon Riggs wrote:
> On Fri, 2010-10-15 at 21:41 +0900, Fujii Masao wrote:
>
>> As the result of the discussion, I think that we need the following two
>> parameters for the case where the standby goes down.
>
>> * replication_timeout
>> This is the maximum time to wait for the ACK from the standby. If this
>> timeout expires, the master closes the replication connection and
>> disconnects the standby. This parameter is just used for the master
>> to detect the standby crash or the network outage.
>>
>> We already have keepalive parameters for that purpose.
>
> Yes, I had thought we would just use the keepalives...
>
>> But they cannot
>> detect the disconnection in some cases. So replication_timeout needs
>> to be introduced for sync rep.
>
> When exactly don't the keepalives work?

well tcp level keepalives are not terribly portable(or can only be
partially controlledd from the app) and on some platforms have lower
limits that are in the minutes which is too long for a lot of usecases.
The keepalive usage we have in 9.0 is mostly for removing an annoyance
on some major platforms but depending on them for a major feature like
timeouts in sync rep is probably not a good idea.

Stefan

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-15 18:24:48
Message-ID:	1287167088.1725.1594.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-10-15 at 18:51 +0200, Stefan Kaltenbrunner wrote:
> >
> > When exactly don't the keepalives work?
>
> well tcp level keepalives are not terribly portable(or can only be
> partially controlledd from the app) and on some platforms have lower
> limits that are in the minutes which is too long for a lot of usecases.
> The keepalive usage we have in 9.0 is mostly for removing an annoyance
> on some major platforms but depending on them for a major feature like
> timeouts in sync rep is probably not a good idea.

If we need it, then I'm glad. It's easy to understand and easy to
program too.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-16 13:02:38
Message-ID:	AANLkTik1_eRB5RsozTuKd4=uS8uZ=fg+yh-mrv76epNM@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 15, 2010 at 8:41 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.
>
> * replication_timeout
> This is the maximum time to wait for the ACK from the standby. If this
> timeout expires, the master closes the replication connection and
> disconnects the standby. This parameter is just used for the master
> to detect the standby crash or the network outage.
>
> We already have keepalive parameters for that purpose. But they cannot
> detect the disconnection in some cases. So replication_timeout needs
> to be introduced for sync rep.

Good design, +1.

I'm not wild about the name, but otherwise this seems well-designed.

> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

Good point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-18 07:03:50
Message-ID:	AANLkTing6ncNwRZdNjvgH0exxHB5=C+E-h7JrBxdcrTv@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Oct 16, 2010 at 12:43 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> But they cannot
>> detect the disconnection in some cases. So replication_timeout needs
>> to be introduced for sync rep.
>
> When exactly don't the keepalives work?

The keepalives don't work at least on linux when the connection is terminated
after sending a packet and before receiving TCP-level ACK. You can confirm
this by unplugging the LAN cable from a client server while running pgbench
on a client. In this case, even if you specify tcp_keepalives_*, backends
would not be able to detect the disconnection. But note that this doesn't
always happen. Which depends on the timing.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-18 16:06:30
Message-ID:	AANLkTinbLxmwqdSFQVyamm_OzRYm-_-10Yna7-zRasF0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 18, 2010 at 12:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The keepalives don't work at least on linux when the connection is terminated
> after sending a packet and before receiving TCP-level ACK. You can confirm
> this by unplugging the LAN cable from a client server while running pgbench
> on a client.

What do you mean by "don't work"? In this case no additional packets
would be needed since the regular ack would serve the same purpose.
How long did you wait to test whether it would work? It takes quite a
while before the connection would time out.

--
greg

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Greg Stark <gsstark(at)mit(dot)edu>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-19 05:24:14
Message-ID:	AANLkTinnxTkpv3QBFOKTu-tY-pY3Q70nVvhsCbQ_FJ8a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 19, 2010 at 1:06 AM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Mon, Oct 18, 2010 at 12:03 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> The keepalives don't work at least on linux when the connection is terminated
>> after sending a packet and before receiving TCP-level ACK. You can confirm
>> this by unplugging the LAN cable from a client server while running pgbench
>> on a client.
>
> What do you mean by "don't work"?

I mean, for example, that the server cannot detect the disconnection for
more than 60 seconds even if the user configures the keepalive as follows.

tcp_keepalives_idle = 10
tcp_keepalives_interval = 5
tcp_keepalives_count = 2

> In this case no additional packets
> would be needed since the regular ack would serve the same purpose.
> How long did you wait to test whether it would work? It takes quite a
> while before the connection would time out.

Yep. In the case where the keepalive doesn't work, usually TCP retry
timeout makes the server detect the disconnection. The detection time
depends on the kernel parameter tcp_retries1 and tcp_retries2. AFAIR,
it's several minutes by default.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-19 05:56:32
Message-ID:	AANLkTim87K06KgPBcjDp6JyKXQNOvndwubCPqARS8Yhw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 18, 2010 at 10:24 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> I mean, for example, that the server cannot detect the disconnection for
> more than 60 seconds even if the user configures the keepalive as follows.
>
> tcp_keepalives_idle = 10
> tcp_keepalives_interval = 5
> tcp_keepalives_count = 2

Yeah, TCP is not going to detect a broken connection that quickly.

I think there's a fundamental impedence mismatch of between the
application needs here and the design goals of TCP.

TCP is designed to work if at all possible and only generate an error
if it's unavoidable. Keepalives were controversial when they were
proposed but for the original purpose -- ensuring that long-lived
servers didn't leak connections indefinitely -- they serve they work.
The point of them was to cover the remaining cases where there was no
data in flight and therefore no way to ever detect that the connection
was dead.

TCP is only going to detect a connection as dead if it has exceeded
all the engineering limits of the network. Until then it's still
possible it'll come back and having the network layer generate an
error when it's possible the connection is still functioning would be
bad.

--
greg

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-21 22:33:35
Message-ID:	201010212233.o9LMXZK07541@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Fujii Masao wrote:
> Hi,
>
> As the result of the discussion, I think that we need the following two
> parameters for the case where the standby goes down.

Can we have a parameter that calls a operating system command when a
standby is declared dead, to notify the administrator?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-10-22 04:46:39
Message-ID:	AANLkTik30veV=N1R=Gshq1-QQdDpRBa5D02g+keF4S7N@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 22, 2010 at 7:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> As the result of the discussion, I think that we need the following two
>> parameters for the case where the standby goes down.
>
> Can we have a parameter that calls a operating system command when a
> standby is declared dead, to notify the administrator?

For me, that command is useful to STONITH the standby when the master
detects the disconnection. I agree to add that parameter.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-12-06 06:42:52
Message-ID:	AANLkTimM1m+k62OU_etqjbJjdc2MG65wzP3RrU=0bC0m@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
> -forever' (i.e., you set allow_standalone_master to false), the master
> should detect the standby crash as soon as possible by using the
> timeout. For example, imagine that max_wal_senders is set to one and
> the master cannot detect the standby crash because of absence of the
> timeout. In this case, even if you start new standby, it will not be
> able to connect to the master since there is no free walsender slot.
> As the result, the master actually waits forever.

This occurred to me that the timeout would be required even for
asynchronous streaming replication. So, how about implementing the
replication timeout feature before synchronous replication itself?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Timeout and wait-forever in sync rep
Date:	2010-12-06 07:50:40
Message-ID:	4CFC95D0.2040506@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 06.12.2010 07:42, Fujii Masao wrote:
> On Fri, Oct 15, 2010 at 9:41 PM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>> The timeout doesn't oppose to 'wait-forever'. Even if you choose 'wait
>> -forever' (i.e., you set allow_standalone_master to false), the master
>> should detect the standby crash as soon as possible by using the
>> timeout. For example, imagine that max_wal_senders is set to one and
>> the master cannot detect the standby crash because of absence of the
>> timeout. In this case, even if you start new standby, it will not be
>> able to connect to the master since there is no free walsender slot.
>> As the result, the master actually waits forever.
>
> This occurred to me that the timeout would be required even for
> asynchronous streaming replication. So, how about implementing the
> replication timeout feature before synchronous replication itself?

Sounds good to me. The more pieces we can nibble off the main patch the
better.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com