Re: Skip checkpoint on promoting from streaming replication

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Skip checkpoint on promoting from streaming replication
Date: 2013-01-24 16:52:09
Message-ID: 510166B9.6080705@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 24.01.2013 18:24, Simon Riggs wrote:
> On 6 January 2013 21:58, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>> I've been torn between the need to remove the checkpoint for speed and
>> being worried about the implications of doing so.
>>
>> We promote in multiple use cases. When we end a PITR, or are
>> performing a switchover, it doesn't really matter how long the
>> shutdown checkpoint takes, so I'm inclined to leave it there in those
>> cases. For failover, we need fast promotion.
>>
>> So my thinking is to make pg_ctl promote -m fast
>> be the way to initiate a fast failover that skips the shutdown checkpoint.
>>
>> That way all existing applications work the same as before, while new
>> users that explicitly choose to do so will gain from the new option.
>
> Here's a patch to skip checkpoint when we do
>
> pg_ctl promote -m fast
>
> We keep the end of recovery checkpoint in all other cases.

Hmm, there seems to be no way to do a "fast" promotion with a trigger file.

I'm a bit confused why there needs to be special mode for this. Can't we
just always do the "fast" promotion? I agree that there's no urgency
when you're doing PITR, but shouldn't do any harm either. Or perhaps
always do "fast" promotion when starting up from standby mode, and
"slow" otherwise.

Are we comfortable enough with this to skip the checkpoint after crash
recovery?

I may be missing something, but it looks like after a "fast" promotion,
you don't request a new checkpoint. So it can take quite a while for the
next checkpoint to be triggered by checkpoint_timeout/segments. That
shouldn't be a problem, but I feel that it'd be prudent to request a new
checkpoint immediately (not necessarily an "immediate" checkpoint, though).

> The only thing left from Kyotaro's patch is a single line of code -
> the call to ReadCheckpointRecord() that checks to see if the WAL
> records for the last two restartpoints is on disk, which was an
> important line of code.

Why's that important, just for paranoia? If the last two restartpoints
have disappeared, something's seriously wrong, and you will be in
trouble e.g if you crash at that point. Do we need to be extra paranoid
when doing a "fast" promotion?

> Patch implements a new record type XLOG_END_OF_RECOVERY that behaves
> on replay like a shutdown checkpoint record. I put this back in from
> my patch because I believe its important that we have a clear place
> where the WAL history changes timelineId. WAL format change bump
> required.

Agreed, such a WAL record is essential.

At replay, an end-of-recovery record should be a signal to the hot
standby mechanism that there are no transactions running in the master
at that point, same as a shutdown checkpoint.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2013-01-24 16:55:15 Re: Back-branch update releases coming in a couple weeks
Previous Message Phil Sorber 2013-01-24 16:45:37 Re: [PATCH] pg_isready (was: [WIP] pg_ping utility)