Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running

Lists: pgsql-hackers
From: "MauMau" <maumau307(at)gmail(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-04 13:59:15
Message-ID: 369698E947874884A77849D8FE3680C2@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

My customer reported a strange connection hang problem. He and I couldn't
reproduce it. I haven't been able to understand the cause, but I can think
of one hypothesis. Could you give me your opinions on whether my hypothesis
is correct, and a direction on how to fix the problem? I'm willing to
submit a patch if necessary.

[Problem]
The customer is using synchronous streaming replication with PostgreSQL
9.2.8. The cluster consists of two nodes.

He performed archive recovery test like this:

1. Take a base backup. At that time, some notable settings in
postgresql.conf are:
synchronous_standby_names = 'node2'
autovacuum = on
# synchronous_commit is commented out, so it's on by default

2. Some update operations. I don't know what.

3. Shutdown the primary and promote the standby.

4. Shutdown the new primary.

5. Perform archive recovery. That is, restore the base backup, create
recovery.conf, and do pg_ctl start.

6. Immediately after the archive recovery is complete, connect to the
database server and perform some queries to check user data.

The steps 5 and 6 are done in some recovery script.

However, the connection attempt in step 6 got stuck for 12 hours, and the
test was canceled. The stack trace was:

#0 0x0000003f4badf258 in poll () from /lib64/libc.so.6
#1 0x0000000000619b94 in WaitLatchOrSocket ()
#2 0x0000000000640c4c in SyncRepWaitForLSN ()
#3 0x0000000000491c18 in RecordTransactionCommit ()
#4 0x0000000000491d98 in CommitTransaction ()
#5 0x0000000000493135 in CommitTransactionCommand ()
#6 0x000000000074938a in InitPostgres ()
#7 0x000000000066ddd7 in PostgresMain ()
#8 0x0000000000627d81 in PostmasterMain ()
#9 0x00000000005c4803 in main ()

The connection attempt is waiting for a reply from the standby. This is
strange, because we didn't anticipate that the connection establishment (and
subsequent SELECT queries) would update something and write some WAL. The
doc says:

http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION

"When requesting synchronous replication, each commit of a write transaction
will wait until confirmation is received that the commit has been written to
the transaction log on disk of both the primary and standby server.
...
Read only transactions and transaction rollbacks need not wait for replies
from standby servers. Subtransaction commits do not wait for responses from
standby servers, only top-level commits."

[Hypothesis]
Why does the connection processing emit WAL?

Probably, it did page-at-a-time vacuum during access to pg_database and
pg_authid for client authentication. src/backend/access/heap/README.HOT
describes:

"Effectively, space reclamation happens during tuple retrieval when the
page is nearly full (<10% free) and a buffer cleanup lock can be
acquired. This means that UPDATE, DELETE, and SELECT can trigger space
reclamation, but often not during INSERT ... VALUES because it does
not retrieve a row."

But the customer could not reproduce the problem when he performed the same
archive recovery from the same base backup again. Why? I guess the
autovacuum daemon vacuumed the system catalogs before he attempted to
connect to the database.

Is this correct?

[How to fix]
Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
synchronous_standby_names='" to pg_ctl start in the recovery script would
prevent the problem.

But isn't there anything to fix in PostgreSQL? I think the doc needs
improvement so that users won't misunderstand that only write transactions
would block at commit.

Do you think something else should be done? I guess pg_basebackup,
pg_isready, and PQping() called in pg_ctl -w start/restart would block
likewise, and I'm afraid users don't anticipate it. pg_upgrade appears to
set synchronous_commit to local when starting the database server.

Regards
MauMau


From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-06 06:36:53
Message-ID: CAA4eK1KggUFPuH38GZ4HLU5DQk4p9HLKkwukTGf-p2ixAY7zPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 4, 2014 at 7:29 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>
> Hello,
>
> "When requesting synchronous replication, each commit of a write
transaction will wait until confirmation is received that the commit has
been written to the transaction log on disk of both the primary and standby
server.
> ...
> Read only transactions and transaction rollbacks need not wait for
replies from standby servers. Subtransaction commits do not wait for
responses from standby servers, only top-level commits."
>
>
> [Hypothesis]
> Why does the connection processing emit WAL?
>
> Probably, it did page-at-a-time vacuum during access to pg_database and
pg_authid for client authentication. src/backend/access/heap/README.HOT
describes:

I agree with your analysis that it can happen during connection
attempt.

> But the customer could not reproduce the problem when he performed the
same archive recovery from the same base backup again. Why? I guess the
autovacuum daemon vacuumed the system catalogs before he attempted to
connect to the database.
>
> Is this correct?

One way to confirm could be to perform the archive recovery by
disabling autovacuum.

>
> [How to fix]
> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
synchronous_standby_names='" to pg_ctl start in the recovery script would
prevent the problem.
>
> But isn't there anything to fix in PostgreSQL? I think the doc needs
improvement so that users won't misunderstand that only write transactions
would block at commit.

I also think at the very least we should update docs even if we
don't have any solution for this case.

Another thing which I am wondering about is can't the same happen
even for Read Only transaction (incase someone does Select which
prunes the page).

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


From: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 04:20:12
Message-ID: BF2827DCCE55594C8D7A8F7FFD3AB7713DE14EA0@SZXEML508-MBX.china.huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 04 July 2014 19:29, MauMau Wrote:

> [How to fix]
> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
> synchronous_standby_names='" to pg_ctl start in the recovery script
> would prevent the problem.
>
> But isn't there anything to fix in PostgreSQL? I think the doc needs
> improvement so that users won't misunderstand that only write
> transactions would block at commit.

As of now there is no solution for this in PostgreSQL but I had submitted a patch "Standalone synchronous master" in
9.4 2014-01 CommitFest, which was rejected because of some issues. This patch was meant to degrade the synchronous
level of master, if all synchronous standbys are down.
I plan to resubmit this with better design sometime in 9.5.

Thanks and Regards,
Kumar Rajeev Rastogi


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 05:14:21
Message-ID: 20140707051421.GD17261@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-07-07 04:20:12 +0000, Rajeev rastogi wrote:
>
> On 04 July 2014 19:29, MauMau Wrote:
>
> > [How to fix]
> > Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
> > synchronous_standby_names='" to pg_ctl start in the recovery script
> > would prevent the problem.
> >
> > But isn't there anything to fix in PostgreSQL? I think the doc needs
> > improvement so that users won't misunderstand that only write
> > transactions would block at commit.
>
> As of now there is no solution for this in PostgreSQL but I had submitted a patch "Standalone synchronous master" in
> 9.4 2014-01 CommitFest, which was rejected because of some issues. This patch was meant to degrade the synchronous
> level of master, if all synchronous standbys are down.
> I plan to resubmit this with better design sometime in 9.5.

That seems to be more less orthogonal to the issue at hand. The problem
here is that a readonly command lead to a wait. And even worse it was a
command the user had no influence over.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 07:14:48
Message-ID: 20140707071448.GB29124@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-07-04 22:59:15 +0900, MauMau wrote:
> My customer reported a strange connection hang problem. He and I couldn't
> reproduce it. I haven't been able to understand the cause, but I can think
> of one hypothesis. Could you give me your opinions on whether my hypothesis
> is correct, and a direction on how to fix the problem? I'm willing to
> submit a patch if necessary.

> The connection attempt is waiting for a reply from the standby. This is
> strange, because we didn't anticipate that the connection establishment (and
> subsequent SELECT queries) would update something and write some WAL. The
> doc says:
>
> http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION
>
> "When requesting synchronous replication, each commit of a write transaction
> will wait until confirmation is received that the commit has been written to
> the transaction log on disk of both the primary and standby server.
> ...
> Read only transactions and transaction rollbacks need not wait for replies
> from standby servers. Subtransaction commits do not wait for responses from
> standby servers, only top-level commits."
>
>
> [Hypothesis]
> Why does the connection processing emit WAL?
>
> Probably, it did page-at-a-time vacuum during access to pg_database and
> pg_authid for client authentication. src/backend/access/heap/README.HOT
> describes:

> [How to fix]
> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
> synchronous_standby_names='" to pg_ctl start in the recovery script would
> prevent the problem.

> But isn't there anything to fix in PostgreSQL? I think the doc needs
> improvement so that users won't misunderstand that only write transactions
> would block at commit.

I think we should rework RecordTransactionCommit() to only wait for the
standby if `markXidCommitted' and not if `wrote_xlog'. There really
isn't a reason to make a readonly transaction's commit wait just because
it did some hot pruning.

Greetings,

Andres Freund


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 11:10:22
Message-ID: CAHGQGwHP6-5aM3YotAYfuFB8vSvQNT6OpXC1bvUw3RJjyKBfLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 7, 2014 at 4:14 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2014-07-04 22:59:15 +0900, MauMau wrote:
>> My customer reported a strange connection hang problem. He and I couldn't
>> reproduce it. I haven't been able to understand the cause, but I can think
>> of one hypothesis. Could you give me your opinions on whether my hypothesis
>> is correct, and a direction on how to fix the problem? I'm willing to
>> submit a patch if necessary.
>
>> The connection attempt is waiting for a reply from the standby. This is
>> strange, because we didn't anticipate that the connection establishment (and
>> subsequent SELECT queries) would update something and write some WAL. The
>> doc says:
>>
>> http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION
>>
>> "When requesting synchronous replication, each commit of a write transaction
>> will wait until confirmation is received that the commit has been written to
>> the transaction log on disk of both the primary and standby server.
>> ...
>> Read only transactions and transaction rollbacks need not wait for replies
>> from standby servers. Subtransaction commits do not wait for responses from
>> standby servers, only top-level commits."
>>
>>
>> [Hypothesis]
>> Why does the connection processing emit WAL?
>>
>> Probably, it did page-at-a-time vacuum during access to pg_database and
>> pg_authid for client authentication. src/backend/access/heap/README.HOT
>> describes:
>
>> [How to fix]
>> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
>> synchronous_standby_names='" to pg_ctl start in the recovery script would
>> prevent the problem.
>
>> But isn't there anything to fix in PostgreSQL? I think the doc needs
>> improvement so that users won't misunderstand that only write transactions
>> would block at commit.
>
> I think we should rework RecordTransactionCommit() to only wait for the
> standby if `markXidCommitted' and not if `wrote_xlog'. There really
> isn't a reason to make a readonly transaction's commit wait just because
> it did some hot pruning.

Sounds good direction. One question is: Can RecordTransactionCommit() avoid
waiting for not only replication but also local WAL flush safely in
such read-only
transaction case?

Regards,

--
Fujii Masao


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 13:57:20
Message-ID: 20972.1404741440@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> I think we should rework RecordTransactionCommit() to only wait for the
> standby if `markXidCommitted' and not if `wrote_xlog'. There really
> isn't a reason to make a readonly transaction's commit wait just because
> it did some hot pruning.

Well, see the comment that explains why the logic is like this now:

* If we didn't create XLOG entries, we're done here; otherwise we
* should flush those entries the same as a commit record. (An
* example of a possible record that wouldn't cause an XID to be
* assigned is a sequence advance record due to nextval() --- we want
* to flush that to disk before reporting commit.)

I agree that HOT pruning isn't a reason to make a commit wait, but
nextval() is.

We could perhaps add more flags that would keep track of which sorts of
xlog entries justify a wait at commit, but TBH I'm skeptical of the entire
proposition. Having synchronous replication on with no live slave *will*
result in arbitrary hangs, and the argument that this particular case
should be exempt seems a bit thin to me. The sooner the user realizes
he's got a problem, the better. If read-only transactions don't show a
problem, the user might not realize he's got one until he starts to wonder
why autovac/autoanalyze aren't working.

I think a more useful line of thought would be to see if we can't complain
more loudly when we have no synchronous standby. Perhaps a "WARNING:
waiting forever for lack of a synchronous standby" could be emitted when
a transaction starts to wait.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 15:51:13
Message-ID: 20140707155113.GB1136@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-07-07 09:57:20 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > I think we should rework RecordTransactionCommit() to only wait for the
> > standby if `markXidCommitted' and not if `wrote_xlog'. There really
> > isn't a reason to make a readonly transaction's commit wait just because
> > it did some hot pruning.
>
> Well, see the comment that explains why the logic is like this now:
>
> * If we didn't create XLOG entries, we're done here; otherwise we
> * should flush those entries the same as a commit record. (An
> * example of a possible record that wouldn't cause an XID to be
> * assigned is a sequence advance record due to nextval() --- we want
> * to flush that to disk before reporting commit.)

I think we should 'simply' make sequences assign a toplevel xid - then
we can get rid of that special case in RecordTransactionCommit(). And I
think the performance benefit of not having to wait on XLogFlush() for
readonly xacts due to hot prunes far outweighs the decrease due to the
xid assignment/commit record. I don't think that nextval()s are called
overly much without a later xid assigning statement.

> I agree that HOT pruning isn't a reason to make a commit wait, but
> nextval() is.

Agreed.

> We could perhaps add more flags that would keep track of which sorts of
> xlog entries justify a wait at commit, but TBH I'm skeptical of the entire
> proposition. Having synchronous replication on with no live slave *will*
> result in arbitrary hangs, and the argument that this particular case
> should be exempt seems a bit thin to me. The sooner the user realizes
> he's got a problem, the better. If read-only transactions don't show a
> problem, the user might not realize he's got one until he starts to wonder
> why autovac/autoanalyze aren't working.

Well, the user might just want to log in to diagnose the problem. If he
can't even login to see pg_stat_replication it's a pretty screwed up
situation.

> I think a more useful line of thought would be to see if we can't complain
> more loudly when we have no synchronous standby. Perhaps a "WARNING:
> waiting forever for lack of a synchronous standby" could be emitted when
> a transaction starts to wait.

In the OP's case the session wasn't even started - so proper feedback
isn't that easy...
We could special case that by forcing s_c=off until the session started properly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 16:06:14
Message-ID: 24441.1404749174@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2014-07-07 09:57:20 -0400, Tom Lane wrote:
>> Well, see the comment that explains why the logic is like this now:

> I think we should 'simply' make sequences assign a toplevel xid - then
> we can get rid of that special case in RecordTransactionCommit(). And I
> think the performance benefit of not having to wait on XLogFlush() for
> readonly xacts due to hot prunes far outweighs the decrease due to the
> xid assignment/commit record. I don't think that nextval()s are called
> overly much without a later xid assigning statement.

Yeah, that could well be true. I'm not sure if there are any other cases
where we have non-xid-assigning operations that are considered part of
what has to be flushed before reporting commit; if there are not, I'd
be okay with changing nextval() this way.

>> I think a more useful line of thought would be to see if we can't complain
>> more loudly when we have no synchronous standby. Perhaps a "WARNING:
>> waiting forever for lack of a synchronous standby" could be emitted when
>> a transaction starts to wait.

> In the OP's case the session wasn't even started - so proper feedback
> isn't that easy...

Perhaps I'm wrong, but I think a WARNING emitted here would be seen in
psql even though we're still in InitPostgres. If it isn't, we have a
problem there anyhow, IMO.

> We could special case that by forcing s_c=off until the session started properly.

Ugh.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, MauMau <maumau307(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 16:25:51
Message-ID: CAHGQGwF1nQK_Jzph-6rHJ3zTYj_eGazfYPs2pdZj0V6YV4Hx-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 8, 2014 at 1:06 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-07-07 09:57:20 -0400, Tom Lane wrote:
>>> Well, see the comment that explains why the logic is like this now:
>
>> I think we should 'simply' make sequences assign a toplevel xid - then
>> we can get rid of that special case in RecordTransactionCommit(). And I
>> think the performance benefit of not having to wait on XLogFlush() for
>> readonly xacts due to hot prunes far outweighs the decrease due to the
>> xid assignment/commit record. I don't think that nextval()s are called
>> overly much without a later xid assigning statement.
>
> Yeah, that could well be true. I'm not sure if there are any other cases
> where we have non-xid-assigning operations that are considered part of
> what has to be flushed before reporting commit;

Maybe pg_switch_xlog().

> if there are not, I'd
> be okay with changing nextval() this way.

+1

Regards,

--
Fujii Masao


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: MauMau <maumau307(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 17:23:00
Message-ID: 20140707172300.GC1136@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-07-07 12:06:14 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2014-07-07 09:57:20 -0400, Tom Lane wrote:
> >> Well, see the comment that explains why the logic is like this now:
>
> > I think we should 'simply' make sequences assign a toplevel xid - then
> > we can get rid of that special case in RecordTransactionCommit(). And I
> > think the performance benefit of not having to wait on XLogFlush() for
> > readonly xacts due to hot prunes far outweighs the decrease due to the
> > xid assignment/commit record. I don't think that nextval()s are called
> > overly much without a later xid assigning statement.
>
> Yeah, that could well be true. I'm not sure if there are any other cases
> where we have non-xid-assigning operations that are considered part of
> what has to be flushed before reporting commit; if there are not, I'd
> be okay with changing nextval() this way.

I'm not aware of any adhoc, but I think to actually change it someone
would have to iterate over all wal record types to make sure.

> >> I think a more useful line of thought would be to see if we can't complain
> >> more loudly when we have no synchronous standby. Perhaps a "WARNING:
> >> waiting forever for lack of a synchronous standby" could be emitted when
> >> a transaction starts to wait.
>
> > In the OP's case the session wasn't even started - so proper feedback
> > isn't that easy...
>
> Perhaps I'm wrong, but I think a WARNING emitted here would be seen in
> psql even though we're still in InitPostgres.

Yes, it is visible.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
Cc: "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-08 10:43:43
Message-ID: C5696CDEBEEF4B349ADFAD6EBD4659FF@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>
> On Fri, Jul 4, 2014 at 7:29 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>> [Hypothesis]
>> Why does the connection processing emit WAL?
>>
>> Probably, it did page-at-a-time vacuum during access to pg_database and
> pg_authid for client authentication. src/backend/access/heap/README.HOT
> describes:
>
> I agree with your analysis that it can happen during connection
> attempt.

Thank you. I'm relieved the cause seems correct.

>> But the customer could not reproduce the problem when he performed the
> same archive recovery from the same base backup again. Why? I guess the
> autovacuum daemon vacuumed the system catalogs before he attempted to
> connect to the database.
>>
> One way to confirm could be to perform the archive recovery by
> disabling autovacuum.

Yeah, I thought of that too. Unfortunately, the customer deleted the the
base backup for testing.

> Another thing which I am wondering about is can't the same happen
> even for Read Only transaction (incase someone does Select which
> prunes the page).

I'm afraid about that, too.

Regards
MauMau


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-08 10:50:28
Message-ID: 33150253DC2844C6A2EEEA4F4A430173@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Rajeev rastogi" <rajeev(dot)rastogi(at)huawei(dot)com>
As of now there is no solution for this in PostgreSQL but I had submitted a
patch "Standalone synchronous master" in
9.4 2014-01 CommitFest, which was rejected because of some issues. This
patch was meant to degrade the synchronous
level of master, if all synchronous standbys are down.
I plan to resubmit this with better design sometime in 9.5.

Although I only read some mails of that thread, I'm sure your proposal is
what many people would appreciate. Your new operation mode is equivalent to
the maximum availability mode of Oracle Data Guard, isn't it? I'm looking
forward to it. Good luck.

==================================================
Maximum availability
This protection mode provides the highest level of data protection that is
possible without compromising the availability of a primary database.
Transactions do not commit until all redo data needed to recover those
transactions has been written to the online redo log and to at least one
standby database. If the primary database cannot write its redo stream to at
least one standby database, it effectively switches to maximum performance
mode to preserve primary database availability and operates in that mode
until it is again able to write its redo stream to a standby database.

This protection mode ensures zero data loss except in the case of certain
double faults, such as failure of a primary database after failure of the
standby database.

Maximum performance
This is the default protection mode. It provides the highest level of data
protection that is possible without affecting the performance of a primary
database. This is accomplished by allowing transactions to commit as soon as
all redo data generated by those transactions has been written to the online
log. Redo data is also written to one or more standby databases, but this is
done asynchronously with respect to transaction commitment, so primary
database performance is unaffected by delays in writing redo data to the
standby database(s).

This protection mode offers slightly less data protection than maximum
availability mode and has minimal impact on primary database performance.
==================================================

Regards
MauMau


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andres Freund" <andres(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-08 11:03:22
Message-ID: 42B0E1C07CB24F4889ED8A345AF8EFC2@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> problem, the user might not realize he's got one until he starts to wonder
> why autovac/autoanalyze aren't working.

In autovacuum.c, autovacuum workers avoid waiting for the standby by doing:

/*
* Force synchronous replication off to allow regular maintenance even if
* we are waiting for standbys to connect. This is important to ensure we
* aren't blocked from performing anti-wraparound tasks.
*/
if (synchronous_commit > SYNCHRONOUS_COMMIT_LOCAL_FLUSH)
SetConfigOption("synchronous_commit", "local",
PGC_SUSET, PGC_S_OVERRIDE);

Regards
MauMau


From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andres Freund" <andres(at)2ndquadrant(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-08 11:18:14
Message-ID: 8BA1979216A04843A4F60B8CD4ACA8AB@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On 2014-07-07 09:57:20 -0400, Tom Lane wrote:
>>> Well, see the comment that explains why the logic is like this now:
>
>> I think we should 'simply' make sequences assign a toplevel xid - then
>> we can get rid of that special case in RecordTransactionCommit(). And I
>> think the performance benefit of not having to wait on XLogFlush() for
>> readonly xacts due to hot prunes far outweighs the decrease due to the
>> xid assignment/commit record. I don't think that nextval()s are called
>> overly much without a later xid assigning statement.
>
> Yeah, that could well be true. I'm not sure if there are any other cases
> where we have non-xid-assigning operations that are considered part of
> what has to be flushed before reporting commit; if there are not, I'd
> be okay with changing nextval() this way.

Thank you all for letting me know your thoughts. I understood and agree
that read-only transactions, including the connection establishment one,
should not wait for the standby nor the XLOG flush at commit, and the
current documentation/specification should not be changed.

I'll consider how to fix this problem, learning the code, then I'll ask for
review. I'd like to submit the patch for next CF if possible.

From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
> Sounds good direction. One question is: Can RecordTransactionCommit()
> avoid
> waiting for not only replication but also local WAL flush safely in
> such read-only
> transaction case?

I'd appreciate any opinion on this, too.

Regards
MauMau


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2015-02-26 14:53:54
Message-ID: 20150226145354.GA2384@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


FWIW a fix for this has been posted to all active branches:

Author: Andres Freund <andres(at)anarazel(dot)de>
Branch: master [fd6a3f3ad] 2015-02-26 12:50:07 +0100
Branch: REL9_4_STABLE [d72115112] 2015-02-26 12:50:07 +0100
Branch: REL9_3_STABLE [abce8dc7d] 2015-02-26 12:50:07 +0100
Branch: REL9_2_STABLE [d67076529] 2015-02-26 12:50:07 +0100
Branch: REL9_1_STABLE [5c8dabecd] 2015-02-26 12:50:08 +0100
Branch: REL9_0_STABLE [82e0d6eb5] 2015-02-26 12:50:08 +0100

Reconsider when to wait for WAL flushes/syncrep during commit.

Up to now RecordTransactionCommit() waited for WAL to be flushed (if
synchronous_commit != off) and to be synchronously replicated (if
enabled), even if a transaction did not have a xid assigned. The primary
reason for that is that sequence's nextval() did not assign a xid, but
are worthwhile to wait for on commit.

This can be problematic because sometimes read only transactions do
write WAL, e.g. HOT page prune records. That then could lead to read only
transactions having to wait during commit. Not something people expect
in a read only transaction.

This lead to such strange symptoms as backends being seemingly stuck
during connection establishment when all synchronous replicas are
down. Especially annoying when said stuck connection is the standby
trying to reconnect to allow syncrep again...

This behavior also is involved in a rather complicated <= 9.4 bug where
the transaction started by catchup interrupt processing waited for
syncrep using latches, but didn't get the wakeup because it was already
running inside the same overloaded signal handler. Fix the issue here
doesn't properly solve that issue, merely papers over the problems. In
9.5 catchup interrupts aren't processed out of signal handlers anymore.

To fix all this, make nextval() acquire a top level xid, and only wait for
transaction commit if a transaction both acquired a xid and emitted WAL
records. If only a xid has been assigned we don't uselessly want to
wait just because of writes to temporary/unlogged tables; if only WAL
has been written we don't want to wait just because of HOT prunes.

The xid assignment in nextval() is unlikely to cause overhead in
real-world workloads. For one it only happens SEQ_LOG_VALS/32 values
anyway, for another only usage of nextval() without using the result in
an insert or similar is affected.

Discussion: 20150223165359(dot)GF30784(at)awork2(dot)anarazel(dot)de,
369698E947874884A77849D8FE3680C2(at)maumau,
5CF4ABBA67674088B3941894E22A0D25(at)maumau

Per complaint from maumau and Thom Brown

Backpatch all the way back; 9.0 doesn't have syncrep, but it seems
better to be consistent behavior across all maintained branches.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services