Re: bgworker crashed or not?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Antonin Houska <antonin(dot)houska(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Subject: Re: bgworker crashed or not?
Date: 2014-05-07 20:32:24
Message-ID: CA+TgmoZifBwcv22Gu+Zj-pmZmZG6e-orWuL5SP0_SgHgdHPJ6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 6, 2014 at 8:25 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> What I'm inclined to do is change the logic so that:
>>
>> (1) After a crash-and-restart sequence, zero rw->rw_crashed_at, so
>> that anything which is still registered gets restarted immediately.
>
> Yes, that's quite obvious change which I missed completely :).

I've committed the portion of your patch which does this, with pretty
extensive changes. I moved the function which resets the crash times
to bgworker.c, renamed it, and made it just reset all of the crash
times unconditionally; there didn't seem to be any point in skipping
the irrelevant ones, so it seemed best to keep things simple. I also
moved the call from reaper() where you had it to
PostmasterStateMachine(), because the old placement did not work for
me when I carried out the following steps:

1. Kill a background worker with a 60-second restart time using
SIGTERM, so that it does exit(1).
2. Before it restarts, kill a regular backend with SIGQUIT.

Let me know if you think I've got it wrong.

>> (2) If a shmem-connected backend fails to release the deadman switch
>> or exits with an exit code other than 0 or 1, we crash-and-restart. A
>> non-shmem-connected backend never causes a crash-and-restart.
>
> +1

I did this as a separate commit,
e2ce9aa27bf20eff2d991d0267a15ea5f7024cd7, just moving the check for
ReleasePostmasterChildSlot inside the if statement. Then I realized
that was bogus, so I just pushed
eee6cf1f337aa488a20e9111df446cdad770e645 to fix that. Hopefully it's
OK now.

>> (3) When a background worker exits without triggering a
>> crash-and-restart, an exit code of precisely 0 causes the worker to be
>> unregistered; any other exit code has no special effect, so
>> bgw_restart_time controls.
>
> +1

This isn't done yet.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2014-05-07 20:36:43 Re: proposal: Set effective_cache_size to greater of .conf value, shared_buffers
Previous Message Heikki Linnakangas 2014-05-07 20:18:30 pgsql: Clean up jsonb code.