Re: Latch implementation that wakes on postmaster death on both win32 and Unix

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date: 2011-07-05 00:11:17
Message-ID: CAEYLb_UEOQr43P3VMv9nJ3ZEZNmQJ5NEcjg7PtaExcQLzr+jFg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 4 July 2011 22:42, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> If we do expect such event, we should close the hole instead of asserting.
> If we don't, then what's the point of the assert.

You can say the same thing about any assertion. I'm not going to
attempt to close the hole because I don't believe that there is one. I
would be happy to see your "read() from the pipe after select()" test
asserted though.

> BTW, do we currently retry the select() on EINTR (meaning a signal has
> arrived)? If we don't, that'd be an additional source of spurious returns
> from select.

Why might it be? WaitLatch() is currently documented to potentially
have its timeout invalidated by the process receiving a signal, which
is the exact opposite problem. We do account for this within the
archiver calling code though, and I remark upon it in a comment there.

> I'm not sure that there is currently a guarantee that PostmasterIsAlive
> will returns false immediately after select() indicates postmaster
> death. If e.g. the postmaster's parent is still running (which happens
> for example if you launch postgres via daemontools), the re-parenting of
> backends to init might not happen until the postmaster zombie has been
> vanquished by its parent's call of waitpid(). It's not entirely
> inconceivable for getppid() to then return the (dead) postmaster's pid
> until that waitpid() call has occurred.

Yes, this did occur to me - it's hard to reason about what exactly
happens here, and probably impossible to have the behaviour guaranteed
across platforms, however unlikely it seems. I'd like to hear what
Heikki has to say about asserting or otherwise verifying postmaster
death in the case of apparent postmaster death wake-up.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2011-07-05 02:14:48 Re: keepalives_* parameters usefullness
Previous Message Florian Pflug 2011-07-04 22:16:36 Re: Review of patch Bugfix for XPATH() if expression returns a scalar value