Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC: bug fix?] Connection attempt block forever when the synchronous standby is not running
Date: 2014-07-07 11:10:22
Message-ID: CAHGQGwHP6-5aM3YotAYfuFB8vSvQNT6OpXC1bvUw3RJjyKBfLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 7, 2014 at 4:14 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 2014-07-04 22:59:15 +0900, MauMau wrote:
>> My customer reported a strange connection hang problem. He and I couldn't
>> reproduce it. I haven't been able to understand the cause, but I can think
>> of one hypothesis. Could you give me your opinions on whether my hypothesis
>> is correct, and a direction on how to fix the problem? I'm willing to
>> submit a patch if necessary.
>
>> The connection attempt is waiting for a reply from the standby. This is
>> strange, because we didn't anticipate that the connection establishment (and
>> subsequent SELECT queries) would update something and write some WAL. The
>> doc says:
>>
>> http://www.postgresql.org/docs/current/static/warm-standby.html#SYNCHRONOUS-REPLICATION
>>
>> "When requesting synchronous replication, each commit of a write transaction
>> will wait until confirmation is received that the commit has been written to
>> the transaction log on disk of both the primary and standby server.
>> ...
>> Read only transactions and transaction rollbacks need not wait for replies
>> from standby servers. Subtransaction commits do not wait for responses from
>> standby servers, only top-level commits."
>>
>>
>> [Hypothesis]
>> Why does the connection processing emit WAL?
>>
>> Probably, it did page-at-a-time vacuum during access to pg_database and
>> pg_authid for client authentication. src/backend/access/heap/README.HOT
>> describes:
>
>> [How to fix]
>> Of course, adding "-o '-c synchronous_commit=local'" or "-o '-c
>> synchronous_standby_names='" to pg_ctl start in the recovery script would
>> prevent the problem.
>
>> But isn't there anything to fix in PostgreSQL? I think the doc needs
>> improvement so that users won't misunderstand that only write transactions
>> would block at commit.
>
> I think we should rework RecordTransactionCommit() to only wait for the
> standby if `markXidCommitted' and not if `wrote_xlog'. There really
> isn't a reason to make a readonly transaction's commit wait just because
> it did some hot pruning.

Sounds good direction. One question is: Can RecordTransactionCommit() avoid
waiting for not only replication but also local WAL flush safely in
such read-only
transaction case?

Regards,

--
Fujii Masao

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Rahila Syed 2014-07-07 11:13:00 Re: [REVIEW] Re: Compression of full-page-writes
Previous Message Fujii Masao 2014-07-07 10:50:15 Re: pg_receivexlog add synchronous mode