Re: Immediate shutdown and system(3)

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Immediate shutdown and system(3)
Date: 2009-02-27 09:52:43
Message-ID: 49A7B7EB.3080103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We're using SIGQUIT to signal immediate shutdown request. Upon receiving
SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT
and exits.

This is a problem when child processes use system(3) to call other
programs. We use system(3) in two places: to execute archive_command and
restore_command. Fujii Masao identified this with pg_standby back in
November:

http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
and recently discussed here
http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com

I'm starting a new thread to bring this to attention of those who
haven't been following the hot standby stuff. pg_standby has a
particular problem because it traps SIGQUIT to mean "end recovery,
promote standby to master", which it shouldn't do IMHO. But ignoring
that for a moment, the problem is generic.

SIGQUIT by default dumps core. That's not what we want to happen on
immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
immediately instead, but external commands will dump core. system(3)
ignores SIGQUIT, so we can't trap it in the parent process; it is always
relayed to the child.

There's a few options on how to fix that:

1. Implement a custom version of system(3) using fork+exec that let's us
trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It
might be a bit tricky to get this right in a portable way; Windows would
certainly need a completely separate implementation.

2. Use a signal other than SIGQUIT for immediate shutdown of child
processes. We can't change the signal sent to postmaster for
backwards-compatibility reasons, but the signal sent by postmaster to
child processes we could change. We've already used all signals in
normal backends, but perhaps we could rearrange them.

3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
processes that use system(3): the archiver process and the startup
process. Neither of them use SIGINT currently. SIGINT is ignored by
system(3), like SIGQUIT, but the default action is to terminate the
process rather than core dump. Unfortunately pg_standby traps SIGINT too
to mean "promote to master", but we could change it to use SIGUSR1
instead for that purpose. If someone has a script that uses "killall
-INT pg_standby" to promote a standby server to master, it would need to
be changed. Looking at the manual page of pg_standby, however, it seems
that the kill-method of triggering a promotion isn't documented, so with
a notice in release notes we could do that.

I'm leaning towards option 3, but I wonder if anyone sees a better solution.

This is all for CVS HEAD. In back-branches, I think we should just
remove the signal handler for SIGQUIT from pg_standby and leave it at
that. If you perform an immediate shutdown, you can get a core dump from
archive_command or restore_command, but that's a minor inconvenience.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Greg Stark <stark(at)enterprisedb(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-02-27 10:49:11
Message-ID: 4136ffa0902270249h73ab7880v476811954e59f928@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 27, 2009 at 9:52 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.

This isn't the first time we've run into the problem that we've run
out of signals. I think we need to multiplex all our event signals
onto a single signal and use some other mechanism to indicate the type
of message.

Perhaps we do need two signals though, so subprocesses don't need to
connect to shared memory to distinguish "exit now" from other events.
SIGINT for "exit now" and USR1 for every postgres-internal signal
using shared memory to determine the meaning sounds like the most
logical arrangement to me.

Do we really need a "promote to master" message at all? Is pg_standby
responsible for this or could the master write out the configuration
changes necessary itself?

--
greg


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <stark(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-02-27 12:19:26
Message-ID: 49A7DA4E.7090004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> This isn't the first time we've run into the problem that we've run
> out of signals. I think we need to multiplex all our event signals
> onto a single signal and use some other mechanism to indicate the type
> of message.

Yeah. A patch to do that was discussed a while ago, as Fujii's
synchronous replication patch bumped into that as well. I don't feel
like changing the signaling so dramatically right now, however.

> Do we really need a "promote to master" message at all? Is pg_standby
> responsible for this or could the master write out the configuration
> changes necessary itself?

The way pg_standby works is that it keeps waiting for new WAL files to
arrive, until it's told to stop and return a non-zero exit code.
Non-zero exit code from restore_command basically means "file not
found", making the startup process to end recovery and start up the
database. There's two ways to tell pg_standby to stop: create a trigger
file with a particular name, or signal it with SIGINT or SIGQUIT.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <stark(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-02-27 15:44:01
Message-ID: 19192.1235749441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Greg Stark wrote:
>> This isn't the first time we've run into the problem that we've run
>> out of signals. I think we need to multiplex all our event signals
>> onto a single signal and use some other mechanism to indicate the type
>> of message.

> Yeah. A patch to do that was discussed a while ago, as Fujii's
> synchronous replication patch bumped into that as well. I don't feel
> like changing the signaling so dramatically right now, however.

It's not really a feasible answer anyway for auxiliary processes that
have no need to be connected to shared memory.

regards, tom lane


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-02 06:08:57
Message-ID: 3f0b79eb0903012208g6d427cear275df57524d14e17@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> We're using SIGQUIT to signal immediate shutdown request. Upon receiving
> SIGQUIT, postmaster in turn kills all the child processes with SIGQUIT and
> exits.
>
> This is a problem when child processes use system(3) to call other programs.
> We use system(3) in two places: to execute archive_command and
> restore_command. Fujii Masao identified this with pg_standby back in
> November:
>
> http://archives.postgresql.org/message-id/3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
> and recently discussed here
> http://archives.postgresql.org/message-id/3f0b79eb0902260919l2675aaafq10e5b2d49ebfa3a1@mail.gmail.com
>
> I'm starting a new thread to bring this to attention of those who haven't
> been following the hot standby stuff. pg_standby has a particular problem
> because it traps SIGQUIT to mean "end recovery, promote standby to master",
> which it shouldn't do IMHO. But ignoring that for a moment, the problem is
> generic.
>
> SIGQUIT by default dumps core. That's not what we want to happen on
> immediate shutdown. All PostgreSQL processes trap SIGQUIT to exit
> immediately instead, but external commands will dump core. system(3) ignores
> SIGQUIT, so we can't trap it in the parent process; it is always relayed to
> the child.
>
> There's a few options on how to fix that:
>
> 1. Implement a custom version of system(3) using fork+exec that let's us
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It might
> be a bit tricky to get this right in a portable way; Windows would certainly
> need a completely separate implementation.
>
> 2. Use a signal other than SIGQUIT for immediate shutdown of child
> processes. We can't change the signal sent to postmaster for
> backwards-compatibility reasons, but the signal sent by postmaster to child
> processes we could change. We've already used all signals in normal
> backends, but perhaps we could rearrange them.
>
> 3. Use SIGINT instead of SIGQUIT for immediate shutdown of the two child
> processes that use system(3): the archiver process and the startup process.
> Neither of them use SIGINT currently. SIGINT is ignored by system(3), like
> SIGQUIT, but the default action is to terminate the process rather than core
> dump. Unfortunately pg_standby traps SIGINT too to mean "promote to master",
> but we could change it to use SIGUSR1 instead for that purpose. If someone
> has a script that uses "killall -INT pg_standby" to promote a standby server
> to master, it would need to be changed. Looking at the manual page of
> pg_standby, however, it seems that the kill-method of triggering a promotion
> isn't documented, so with a notice in release notes we could do that.
>
> I'm leaning towards option 3, but I wonder if anyone sees a better solution.

4. Use the shared memory to tell the startup process about the shutdown state.
When a shutdown signal arrives, postmaster sets the corresponding shutdown
state to the shared memory before signaling to the child processes. The startup
process check the shutdown state whenever executing system(), and determine
how to exit according to that state. This solution doesn't change any existing
behavior of pg_standby. What is your opinion?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-02 07:59:44
Message-ID: 49AB91F0.2010304@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> I'm leaning towards option 3, but I wonder if anyone sees a better solution.
>
> 4. Use the shared memory to tell the startup process about the shutdown state.
> When a shutdown signal arrives, postmaster sets the corresponding shutdown
> state to the shared memory before signaling to the child processes. The startup
> process check the shutdown state whenever executing system(), and determine
> how to exit according to that state. This solution doesn't change any existing
> behavior of pg_standby. What is your opinion?

That would only solve the problem for pg_standby. Other programs you
might use as a restore_command or archive_command like "cp" or "rsync"
would still core dump on the SIGQUIT.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-03 03:43:37
Message-ID: 20090303123550.8F21.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> 1. Implement a custom version of system(3) using fork+exec that let's us
> trap SIGQUIT and send e.g SIGTERM or SIGINT to the child instead. It
> might be a bit tricky to get this right in a portable way; Windows would
> certainly need a completely separate implementation.

I think the custom system() approach is the most ideal plan for us because
it could open the door for faster recovery; If there were an asynchronous
version of system(), startup process could parallelly execute both
restoring archived wal files and redoing operations in them.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-03 16:16:05
Message-ID: 49AD57C5.8090904@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 2.03.09 08:59, Heikki Linnakangas napsal(a):
> Fujii Masao wrote:
>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>> solution.
>>
>> 4. Use the shared memory to tell the startup process about the
>> shutdown state.
>> When a shutdown signal arrives, postmaster sets the corresponding
>> shutdown
>> state to the shared memory before signaling to the child processes.
>> The startup
>> process check the shutdown state whenever executing system(), and
>> determine
>> how to exit according to that state. This solution doesn't change any
>> existing
>> behavior of pg_standby. What is your opinion?
>
> That would only solve the problem for pg_standby. Other programs you
> might use as a restore_command or archive_command like "cp" or "rsync"
> would still core dump on the SIGQUIT.
>

I think that we could have two methods. Extended method will use share
memory to say what child should do and standard which send appropriate
signal to child. For example pg_ctl could use extended communication to
better postmaster controlling.

Zdenek


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-03 18:59:16
Message-ID: 49AD7E04.90304@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Dne 2.03.09 08:59, Heikki Linnakangas napsal(a):
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>> solution.
>>>
>>> 4. Use the shared memory to tell the startup process about the
>>> shutdown state.
>>> When a shutdown signal arrives, postmaster sets the corresponding
>>> shutdown
>>> state to the shared memory before signaling to the child processes.
>>> The startup
>>> process check the shutdown state whenever executing system(), and
>>> determine
>>> how to exit according to that state. This solution doesn't change any
>>> existing
>>> behavior of pg_standby. What is your opinion?
>>
>> That would only solve the problem for pg_standby. Other programs you
>> might use as a restore_command or archive_command like "cp" or "rsync"
>> would still core dump on the SIGQUIT.
>>
>
> I think that we could have two methods. Extended method will use share
> memory to say what child should do and standard which send appropriate
> signal to child. For example pg_ctl could use extended communication to
> better postmaster controlling.

The problem isn't in the signaling between external tools like pg_ctl
and postmaster, but the signaling between postmaster and the child
processes.

Signal multiplexing would help by releasing some signals, but to kill a
child process that can be executing an external command with system(3),
we'd still want to use a signal that does the right thing for external
commands, per usual Unix semantics. Also, the archiver process currently
detaches itself from shared memory at start, so using shared memory
doesn't seem like an improvement.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-04 02:06:43
Message-ID: 3f0b79eb0903031806y2804e597g6d03bfb4b8ca68c2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Fujii Masao wrote:
>>
>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>
>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>> solution.
>>
>> 4. Use the shared memory to tell the startup process about the shutdown
>> state.
>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>> state to the shared memory before signaling to the child processes. The
>> startup
>> process check the shutdown state whenever executing system(), and
>> determine
>> how to exit according to that state. This solution doesn't change any
>> existing
>> behavior of pg_standby. What is your opinion?
>
> That would only solve the problem for pg_standby. Other programs you might
> use as a restore_command or archive_command like "cp" or "rsync" would still
> core dump on the SIGQUIT.

Right. I've just understood your intention. I also agree with option 3 if nobody
complains about lack of backward compatibility of pg_standby. If no, how about
using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
and the startup process. SIGUSR2 by default terminates the process.
The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
reassign that function to another signal (SIGINT is suitable, I think).
This solution doesn't need signal multiplexing. Thought?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-04 09:14:49
Message-ID: 49AE4689.3040105@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Per discussion, here's a patch for pg_standby in REL8_3_STABLE. The
signal handling is changed so that SIGQUIT no longer triggers failover,
but immediately kills pg_standby, triggering FATAL death of the startup
process too. That's what you want with immediate shutdown.

SIGUSR1 is now accepted as a signal to trigger failover. SIGINT is still
accepted too, but that should be considered deprecated since we're
likely to use SIGINT for immediate shutdown (for startup process) in 8.4.

We should document the use of signals to trigger failover in the
manual... Any volunteers?

This should be noted in the release notes:

If you are using pg_standby, and if you are using signals (e.g "killall
-SIGINT pg_standby") to trigger failover, change your scripts to use
SIGUSR1 instead of SIGQUIT or SIGINT. SIGQUIT no longer triggers
failover, but aborts the recovery and shuts down the standby database.
SIGINT is still accepted as failover trigger, but should be considered
as deprecated and will also be changed to trigger immediate shutdown in
a future release.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
pg_standby-immediate-shutdown-83stable-1.patch text/x-diff 1.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-04 11:01:34
Message-ID: 49AE5F8E.9010403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fujii Masao wrote:
> Hi,
>
> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Fujii Masao wrote:
>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>> solution.
>>> 4. Use the shared memory to tell the startup process about the shutdown
>>> state.
>>> When a shutdown signal arrives, postmaster sets the corresponding shutdown
>>> state to the shared memory before signaling to the child processes. The
>>> startup
>>> process check the shutdown state whenever executing system(), and
>>> determine
>>> how to exit according to that state. This solution doesn't change any
>>> existing
>>> behavior of pg_standby. What is your opinion?
>> That would only solve the problem for pg_standby. Other programs you might
>> use as a restore_command or archive_command like "cp" or "rsync" would still
>> core dump on the SIGQUIT.
>
> Right. I've just understood your intention. I also agree with option 3 if nobody
> complains about lack of backward compatibility of pg_standby. If no, how about
> using SIGUSR2 instead of SIGINT for immediate shutdown of only the archiver
> and the startup process. SIGUSR2 by default terminates the process.
> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
> reassign that function to another signal (SIGINT is suitable, I think).
> This solution doesn't need signal multiplexing. Thought?

Hmm, the startup/archiver process would then in turn need to kill the
external command with SIGINT. I guess that would work.

There's a problem with my idea of just using SIGINT instead of SIGQUIT.
Some (arguably bad-behaving) programs trap SIGINT and exit() with a
return code. The startup process won't recognize that as "killed by
signal", and we're back to same problem we have with pg_standby that the
startup process doesn't die but continues with the startup. Notably
rsync seems to behave like that.

BTW, searching the archive, I found this long thread about this same issue:

http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php

The idea of SIGUSR2 was mentioned there as well, as well as the idea of
reimplementing system(3). The conclusion of that thread was the usage of
setsid() and process groups, to ensure that the SIGQUIT is delivered to
the archive/recovery_command.

I'm starting to feel that this is getting too complicated. Maybe we
should just fix pg_standby to not trap SIGQUIT, and live with the core
dumps...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 19:33:19
Message-ID: 49C14C7F.1060306@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ok, I've committed a minimal patch to pg_standby in CVS HEAD and
REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I added
a signal handler for SIGUSR1 to trigger failover; that should be
considered the preferred signal for that, even though SIGINT still works
too.

SIGQUIT is trapped to just die immediately, but without core dumping. As
we still use SIGQUIT for immediate shutdown, any other archive_command
or restore_command will still receive SIGQUIT on immediate shutdown, and
by default dump core. Let's just live with that for now..

This should be mentioned in release notes, as any script that might be
using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
instead. Where should I make a note of that so that we don't forget?

Heikki Linnakangas wrote:
> Fujii Masao wrote:
>> Hi,
>>
>> On Mon, Mar 2, 2009 at 4:59 PM, Heikki Linnakangas
>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> Fujii Masao wrote:
>>>> On Fri, Feb 27, 2009 at 6:52 PM, Heikki Linnakangas
>>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>> I'm leaning towards option 3, but I wonder if anyone sees a better
>>>>> solution.
>>>> 4. Use the shared memory to tell the startup process about the shutdown
>>>> state.
>>>> When a shutdown signal arrives, postmaster sets the corresponding
>>>> shutdown
>>>> state to the shared memory before signaling to the child processes. The
>>>> startup
>>>> process check the shutdown state whenever executing system(), and
>>>> determine
>>>> how to exit according to that state. This solution doesn't change any
>>>> existing
>>>> behavior of pg_standby. What is your opinion?
>>> That would only solve the problem for pg_standby. Other programs you
>>> might
>>> use as a restore_command or archive_command like "cp" or "rsync"
>>> would still
>>> core dump on the SIGQUIT.
>>
>> Right. I've just understood your intention. I also agree with option 3
>> if nobody
>> complains about lack of backward compatibility of pg_standby. If no,
>> how about
>> using SIGUSR2 instead of SIGINT for immediate shutdown of only the
>> archiver
>> and the startup process. SIGUSR2 by default terminates the process.
>> The archiver already uses SIGUSR2 for pgarch_waken_stop, so we need to
>> reassign that function to another signal (SIGINT is suitable, I think).
>> This solution doesn't need signal multiplexing. Thought?
>
> Hmm, the startup/archiver process would then in turn need to kill the
> external command with SIGINT. I guess that would work.
>
> There's a problem with my idea of just using SIGINT instead of SIGQUIT.
> Some (arguably bad-behaving) programs trap SIGINT and exit() with a
> return code. The startup process won't recognize that as "killed by
> signal", and we're back to same problem we have with pg_standby that the
> startup process doesn't die but continues with the startup. Notably
> rsync seems to behave like that.
>
> BTW, searching the archive, I found this long thread about this same issue:
>
> http://archives.postgresql.org/pgsql-hackers/2006-11/msg00406.php
>
> The idea of SIGUSR2 was mentioned there as well, as well as the idea of
> reimplementing system(3). The conclusion of that thread was the usage of
> setsid() and process groups, to ensure that the SIGQUIT is delivered to
> the archive/recovery_command.
>
> I'm starting to feel that this is getting too complicated. Maybe we
> should just fix pg_standby to not trap SIGQUIT, and live with the core
> dumps...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 20:04:11
Message-ID: 49C153BB.3000508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Ok, I've committed a minimal patch to pg_standby in CVS HEAD and
> REL8_3_STABLE to not interpret SIGQUIT as a signal for failover. I
> added a signal handler for SIGUSR1 to trigger failover; that should be
> considered the preferred signal for that, even though SIGINT still
> works too.
>
> SIGQUIT is trapped to just die immediately, but without core dumping.
> As we still use SIGQUIT for immediate shutdown, any other
> archive_command or restore_command will still receive SIGQUIT on
> immediate shutdown, and by default dump core. Let's just live with
> that for now..
>
> This should be mentioned in release notes, as any script that might be
> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or
> SIGINT instead. Where should I make a note of that so that we don't
> forget?
>
>

Unless I'm missing it the use of signals to trigger failover is not
documented AT ALL. So why anyone would expect such behaviour is
something of a mystery.

Perhaps doing that would be even more important than release notes.

cheers

andrew


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 20:16:51
Message-ID: 49C156B3.2000504@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Heikki Linnakangas wrote:
>> This should be mentioned in release notes, as any script that might be
>> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or
>> SIGINT instead. Where should I make a note of that so that we don't
>> forget?
>
> Unless I'm missing it the use of signals to trigger failover is not
> documented AT ALL. So why anyone would expect such behaviour is
> something of a mystery.

Well, some people do read source code. If it was more widely known, I
would hesitate more to change it, though.

> Perhaps doing that would be even more important than release notes.

Agreed it should be documented.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 20:40:49
Message-ID: 200903182040.n2IKen516881@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> This should be mentioned in release notes, as any script that might be
> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
> instead. Where should I make a note of that so that we don't forget?

The CVS commit message.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 21:00:58
Message-ID: 603c8f070903181400o45298c51n280822749a44b494@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Heikki Linnakangas wrote:
>> This should be mentioned in release notes, as any script that might be
>> using SIGQUIT at the moment needs to be changed to use SIGUSR1 or SIGINT
>> instead. Where should I make a note of that so that we don't forget?
>
> The CVS commit message.

Is there some reason we don't just put it in the release notes as
*part* of the commit? Someone can always go back and edit it later.
It seems like that would be easier and less error-prone than grepping
the CVS commit logs for "release notes"...

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 21:12:58
Message-ID: 15785.1237410778@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> The CVS commit message.

> Is there some reason we don't just put it in the release notes as
> *part* of the commit? Someone can always go back and edit it later.

That was suggested before, and I think we actually tried it for a few
months. It didn't work.

Putting an item in the release notes *properly* is a whole lot more
work than putting a short bit of text in the CVS log (especially for
committers whose first language isn't English). It would also
create a lot more merge-collision issues for unrelated patches.

It's less trouble overall to do the editing, organizing, and SGML-ifying
of all the release notes at once. Also you end up with a better
product, assuming that whoever is doing the notes puts in reasonable
editorial effort.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: Immediate shutdown and system(3)
Date: 2009-03-18 21:31:26
Message-ID: 603c8f070903181431w4ee1ed14ubec078682fc30dff@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Mar 18, 2009 at 5:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Mar 18, 2009 at 4:40 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>> The CVS commit message.
>
>> Is there some reason we don't just put it in the release notes as
>> *part* of the commit?  Someone can always go back and edit it later.
>
> That was suggested before, and I think we actually tried it for a few
> months.  It didn't work.
>
> Putting an item in the release notes *properly* is a whole lot more
> work than putting a short bit of text in the CVS log (especially for
> committers whose first language isn't English).  It would also
> create a lot more merge-collision issues for unrelated patches.

Yeah, I wouldn't ask people to include it in the patches they post.
That would be a pain, and people would probably tend (with the best of
intentions) to inflate the relative importance of their own work. I
was thinking that the committer could make a quick entry at the time
they actually committed the patch, so that the step you describe below
could start with something other than an email box.

> It's less trouble overall to do the editing, organizing, and SGML-ifying
> of all the release notes at once.  Also you end up with a better
> product, assuming that whoever is doing the notes puts in reasonable
> editorial effort.

If it works for the people who are doing it, good enough.

...Robert