Re: 9.4 HEAD: select() failed in postmaster

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 9.4 HEAD: select() failed in postmaster
Date: 2013-09-12 01:04:06
Message-ID: 20130912010406.GE12028@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Noah Misch escribió:
> On Tue, Sep 10, 2013 at 05:18:21PM -0700, Jeff Janes wrote:

> > I think the problem is here, where there should be a Max rather than a Min:
> >
> > commit 82233ce7ea42d6ba519aaec63008aff49da6c7af
> > Author: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
> > Date: Fri Jun 28 17:20:53 2013 -0400
> >
> > Send SIGKILL to children if they don't die quickly in immediate shutdown
> >
> > ...
> >
> > + /* remaining time, but at least 1 second */
> > + timeout->tv_sec = Min(SIGKILL_CHILDREN_AFTER_SECS -
> > + (time(NULL) - AbortStartTime), 1);
>
> Agreed; good catch.

Yeah, thanks. Should be a Max(). The current coding presumably makes
it use one second most of the time, instead of whatever the remaining
time is ... until the abort time is past, in which case it causes the
whole thing to break down as reported.

It might very well be that I used Max() there initially and changed to
Min() at the last minute before commit in a moment of brain fade.

> > But I don't understand the logic behind this anyway. Why sleep at least 1
> > second? If time is up, it is up, why not use zero as the minimum?
>
> Offhand, clamping to zero does make more sense to me. It looks like Alvaro
> added that bit in his pre-commit edits. Alvaro?

Sadly, I don't have the developing branch for this feature anymore, so I
have to go from memory. IIRC my thinking here is that if I make select
terminate immediately (timeout 0) then the time arithmetic in
ServerLoop() might lead us to decide not to send SIGKILL at that time,
causing one more iteration of that loop. Thinking about it again, that
argument doesn't seem to hold much water; but the time variables being
in integer seconds led me to add that.

I will fix it to Max( ..., 0).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-09-12 02:28:00 Re: Weaker shmem interlock w/o postmaster.pid
Previous Message Peter Geoghegan 2013-09-12 00:47:45 Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE