Immediate shutdown during recovery

Lists: pgsql-hackers
From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Immediate shutdown during recovery
Date: 2008-11-28 09:56:18
Message-ID: 3f0b79eb0811280156s78a3730en73aca49b6e95d3cb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

The immediate shutdown (pg_ctl -m i stop) might not be able to
kill the startup process during archive recovery. It's because
the startup process calls system() which ignores SIGQUIT for
executing the restore_command. So, only the startup process
might survive the immediate shutdown and continue redoing up
to the end. Is this desirable behavior? This sounds odd for me.

In order to prevent the surviving, I think that the startup process
should check whether postmaster is still alive periodically. This
idea is already adopted in the archiver process which also calls
system() for executing archive_command.

What is your opinion?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate shutdown during recovery
Date: 2008-11-28 10:53:58
Message-ID: 3f0b79eb0811280253o274d6d5bsa1b1654c3e8ca1ae@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> Hi,
>
> The immediate shutdown (pg_ctl -m i stop) might not be able to
> kill the startup process during archive recovery. It's because
> the startup process calls system() which ignores SIGQUIT for
> executing the restore_command. So, only the startup process
> might survive the immediate shutdown and continue redoing up
> to the end. Is this desirable behavior? This sounds odd for me.

In RestoreArchivedFile(), there is the following code as the safeguard
against the termination of restore_command by signal. But the
safeguard might not work if restore_command defines its own signal
handler for SIGQUIT like pg_standby.

> signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
>
> ereport(signaled ? FATAL : DEBUG2,
> (errmsg("could not restore file \"%s\" from archive: return code %d",
> xlogfname, rc)));

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate shutdown during recovery
Date: 2008-11-28 15:40:13
Message-ID: 1227886813.20796.162.camel@hp_dx2400_1
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:
> Hi,
>
> On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> > Hi,
> >
> > The immediate shutdown (pg_ctl -m i stop) might not be able to
> > kill the startup process during archive recovery. It's because
> > the startup process calls system() which ignores SIGQUIT for
> > executing the restore_command. So, only the startup process
> > might survive the immediate shutdown and continue redoing up
> > to the end. Is this desirable behavior? This sounds odd for me.
>
> In RestoreArchivedFile(), there is the following code as the safeguard
> against the termination of restore_command by signal. But the
> safeguard might not work if restore_command defines its own signal
> handler for SIGQUIT like pg_standby.
>
> > signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
> >
> > ereport(signaled ? FATAL : DEBUG2,
> > (errmsg("could not restore file \"%s\" from archive: return code %d",
> > xlogfname, rc)));

Agree there is an existing problem.

Suggest we fix it after the main patches are committed.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Immediate shutdown during recovery
Date: 2008-11-29 01:33:37
Message-ID: 3f0b79eb0811281733k600f0ed6xaa08e3badd23773f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

On Sat, Nov 29, 2008 at 12:40 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> On Fri, 2008-11-28 at 19:53 +0900, Fujii Masao wrote:
>> Hi,
>>
>> On Fri, Nov 28, 2008 at 6:56 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> > Hi,
>> >
>> > The immediate shutdown (pg_ctl -m i stop) might not be able to
>> > kill the startup process during archive recovery. It's because
>> > the startup process calls system() which ignores SIGQUIT for
>> > executing the restore_command. So, only the startup process
>> > might survive the immediate shutdown and continue redoing up
>> > to the end. Is this desirable behavior? This sounds odd for me.
>>
>> In RestoreArchivedFile(), there is the following code as the safeguard
>> against the termination of restore_command by signal. But the
>> safeguard might not work if restore_command defines its own signal
>> handler for SIGQUIT like pg_standby.
>>
>> > signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) > 125;
>> >
>> > ereport(signaled ? FATAL : DEBUG2,
>> > (errmsg("could not restore file \"%s\" from archive: return code %d",
>> > xlogfname, rc)));
>
> Agree there is an existing problem.
>
> Suggest we fix it after the main patches are committed.

OK, thanks.

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center