Re: Stefan's bug (was: max_standby_delay considered harmful)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Florian Pflug <fgp(at)phlo(dot)org>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: Stefan's bug (was: max_standby_delay considered harmful)
Date: 2010-05-17 13:20:11
Message-ID: AANLkTilS3LVTYsaLlp8sX7MNf1wCa_QM_nC93vR_cyZP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, May 17, 2010 at 7:44 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Mon, May 17, 2010 at 8:02 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> (1)
>>> Smart or fast shutdown requested in PM_STARTUP state always removes
>>> the backup_label file if it exists. But it might be still required
>>> for subsequent recovery. I changed your patch so that additionally
>>> the postmaster skips deleting the backup_label in that case.
>>
>> Can you explain in a little more detail how this can cause a problem?
>> I'm not very familiar with how the backup label is used.
>>
>> Also, why is this different in PM_STARTUP than in PM_RECOVERY?
>> PM_RECOVERY doesn't guarantee that we've reached consistency.
>
> Before the startup process sends the PMSIGNAL_RECOVERY_STARTED signal
> (i.e., when the postmaster is in PM_STARTUP state), it reads the
> backup_label file to know the recovery starting WAL location, saves
> that information in pg_control file, and rename the file "backup_label"
> to "backup_label.old".
>
> If the backup_label file is removed before pg_control is updated,
> subsequent recovery cannot get the right recovery starting location.
> This is the problem that I'm concerned.
>
> The smart shutdown during recovery and the fast shutdown might call
> CancelBackup() and remove the backup_label file. So if shutdown is
> requested in PM_STARTUP state, the problem would happen.

OK, I think I understand now. But, the SIGTERM sent by the postmaster
doesn't kill the recovery process unconditionally. It will invoke
StartupProcShutdownHandler(), which will set set shutdown_requested =
true. That gets checked by RestoreArchivedFile() and
HandleStartupProcInterrupts(), and I think that neither of those can
get invoked until after the control file has been updated. Do you see
a way it can happen?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-05-17 15:26:03 release notes
Previous Message Simon Riggs 2010-05-17 12:01:46 Re: Stefan's bug (was: max_standby_delay considered harmful)