Re: Latch implementation that wakes on postmaster death on both win32 and Unix

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Latch implementation that wakes on postmaster death on both win32 and Unix
Date: 2011-07-08 10:58:53
Message-ID: 7867C59B-E22A-4C25-8B5E-65AE5ECAF4C9@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul8, 2011, at 11:57 , Peter Geoghegan wrote:
> On 7 July 2011 19:15, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I'm not concerned about the possibility of spurious extra cycles of
>>> auxiliary process event loops - should I be?
>>
>> A tight loop would be bad, but an occasional spurious wake-up seems harmless.
>
> We should also assert !PostmasterIsAlive() from within the latch code
> after waking due to apparent Postmaster death. The reason that I don't
> want to follow Florian's suggestion to check it in production is that
> I don't know what to do if the postmaster turns out to be alive. Why
> is it more reasonable to try again than to just return?

I'd say return, but don't indicate postmaster death in the return value
if PostmasterIsAlive() returns true. Or don't call PostmasterIsAlive() in
WaitLatch(), and return indicating postmaster death whenever select()
says so, and put the burden of re-checking on the callers.

I agree that retrying isn't all that reasonable.

> If the
> spurious wake-up thing was a problem that we could actually reproduce,
> then maybe I'd have an opinion on it. As it stands, our entire basis
> for thinking this may be a problem is the sentence "There may be other
> circumstances in which a file descriptor is spuriously reported as
> ready". That seems rather flimsy.

Flimsy or not, it pretty clearly warns us not to depend on there being
no spurious wake ups. Whether or not we know how to actually produce
there is IMHO largely irrelevant - what matters is whether the guarantees
given by select() match the expectations of our code. Which, according to
the cited passage, they currently don't.

> Anyone that still has any misgivings about this will probably feel
> better once the assertion is never reported to fail on any of the
> diverse systems that PostgreSQL will be tested on in advance of the
> 9.2 release.

I'm not so convinced that WaitLatch() will get exercised much on
assert-enabled builds. But I might very well be wrong there...

best regards,
Florian Pflug

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2011-07-08 12:22:31 Re: [COMMITTERS] pgsql: Adjust OLDSERXID_MAX_PAGE based on BLCKSZ.
Previous Message Florian Pflug 2011-07-08 10:02:34 Re: spinlock contention