Bug in walsender when calling out to do_pg_stop_backup (and others?)

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Bug in walsender when calling out to do_pg_stop_backup (and others?)
Date: 2011-10-05 13:30:28
Message-ID: CABUevExv-XVEytoo38sRHcjq+HgaWTFrO0UkFvSr+Qe_8yt9cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

When walsender calls out to do_pg_stop_backup() (during base backups),
it is not possible to terminate the process with a SIGTERM - it
requires a SIGKILL. This can leave unkillable backends for example if
archive_mode is on and archive_command is failing (or not set). A
similar thing would happen in other cases if walsender calls out to
something that would block (do_pg_start_backup() for example), but the
stop one is easy to provoke.

ISTM one way to fix it is the attached, which is to have walsender set
the "global" flags saying that we have received sigterm, which in turn
causes the CHECK_FOR_INTERRUPTS() calls in the routines to properly
exit the process. AFAICT it works fine. Any holes in this approach?

Second, I wonder if we should add a SIGINT handler as well, that would
make it possible to send a cancel signal. Given that the end result
would be the same (at least if we want to keep with the "walsender is
simple" path), I'm not sure it's necessary - but it would at least
help those doing pg_cancel_backend()... thoughts?

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Attachment Content-Type Size
walsender_sigterm.patch text/x-patch 511 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-10-05 13:35:24 Re: [PATCH] Unremovable tuple monitoring (was: Addition of some trivial auto vacuum logging)
Previous Message Lou Picciano 2011-10-05 13:24:29 Error building v9.1.1 (git) with python 3.2.2