Synchronous replication: sleeping

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Synchronous replication: sleeping
Date: 2008-12-08 11:12:39
Message-ID: 493D0127.9040604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In walsender, in the main loop that waits for backend requests to send
WAL, there's this comment:

> + /*
> + * Nap for the configured time or until a request arrives.
> + *
> + * On some platforms, signals won't interrupt the sleep. To ensure we
> + * respond reasonably promptly when someone signals us, break down the
> + * sleep into 1-second increments, and check for interrupts after each
> + * nap.
> + */

That's apparently copy-pasted from bgwriter. It's fine for bgwriter,
where a prompt response is not important, but it seems pretty awful for
synchronous replication. On such platforms, that would introduce a delay
of 500ms on average at every commit. I'm not sure if the comment is
actually accurate, though. bgwriter uses pq_usleep(), while this loop
uses pq_wait, which uses secure_poll().

There's also a small race condition in that loop:

> + while (remaining > 0)
> + {
> + int waitres;
> +
> + if (got_SIGHUP || shutdown_requested || replication_requested)
> + break;
> +
> + /*
> + * Check whether the data from standby can be read.
> + */
> + waitres = pq_wait(true, false,
> + remaining > 1000 ? 1000 : remaining);
> +
> ...

If a signal is received just before pq_wait call, after checking
replication_requested, pq_wait won't be interrupted and will wait up to
a second before responding to it.

BTW, on what platforms signal doesn't interrupt sleep?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous replication: sleeping
Date: 2008-12-08 11:42:07
Message-ID: 20081208114207.GE31566@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
> If a signal is received just before pq_wait call, after checking
> replication_requested, pq_wait won't be interrupted and will wait up to
> a second before responding to it.
>
> BTW, on what platforms signal doesn't interrupt sleep?

In theory, none. SIGALRM is not set as SA_RESTART so any system call
should be interrupted. This applies to POSIX systems though, not sure
about Windows.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous replication: sleeping
Date: 2008-12-08 13:36:27
Message-ID: 23532.1228743387@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
>> BTW, on what platforms signal doesn't interrupt sleep?

> In theory, none.

In practice, they exist. In particular I can demonstrate the issue
on HPUX 10.20. I also dispute your claim that the behavior is
forbidden by standards, For example, the Single Unix Spec
http://www.opengroup.org/onlinepubs/007908799/xsh/select.html
saith

If SA_RESTART has been set for the interrupting signal, it is
implementation-dependent whether select() restarts or returns with
[EINTR].

and since we set SA_RESTART for most everything, we are exposed to the
implementation dependency.

I complained about this previously, but nothing came of it:
http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php

regards, tom lane


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Synchronous replication: sleeping
Date: 2008-12-09 02:39:29
Message-ID: 3f0b79eb0812081839s285449dfibebd94cc5380fdfb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Mon, Dec 8, 2008 at 10:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
>> On Mon, Dec 08, 2008 at 01:12:39PM +0200, Heikki Linnakangas wrote:
>>> BTW, on what platforms signal doesn't interrupt sleep?
>
>> In theory, none.
>
> In practice, they exist. In particular I can demonstrate the issue
> on HPUX 10.20. I also dispute your claim that the behavior is
> forbidden by standards, For example, the Single Unix Spec
> http://www.opengroup.org/onlinepubs/007908799/xsh/select.html
> saith
>
> If SA_RESTART has been set for the interrupting signal, it is
> implementation-dependent whether select() restarts or returns with
> [EINTR].
>
> and since we set SA_RESTART for most everything, we are exposed to the
> implementation dependency.
>
> I complained about this previously, but nothing came of it:
> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00003.php

Umm... it's difficult problem. Is it OK if SA_RESTART is removed from only the
signals which walsender uses, and EINTR handling is added into every system
call which walsender uses? Some system calls which walsender uses already
have EINTR handling, for example pq_recvbuf handles EINTR by recv().

Does anyone have a better idea?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center