Re: Fast promotion failure

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>
Cc: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <masao(dot)fujii(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast promotion failure
Date: 2013-05-13 09:41:34
Message-ID: 007201ce4fbe$0fa8bbe0$2efa33a0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Monday, May 13, 2013 1:13 PM Heikki Linnakangas wrote:
> On 13.05.2013 06:07, Amit Kapila wrote:
> > On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
> >> Heikki said in the fist message in this thread that he suspected
> >> the cause of the failure he had seen to be wrong TLI on whitch
> >> checkpointer runs. Nevertheless, the patch you suggested for me
> >> looks fixing it. Moreover (one of?) the failure from the same
> >> cause looks fixed with the patch.
> >
> > There were 2 problems:
> > 1. There was some issue in walsender logic due to which after
> promotion in
> > some cases it hits assertion or error
> > 2. During fast promotion, checkpoint gets created with wrong TLI
> >
> > He has provided 2 different patches
> > fix-standby-promotion-assert-fail-2.patch and
> > fast-promotion-quick-fix.patch.
> > Among 2, he has already committed fix-standby-promotion-assert-fail-
> 2.patch
> >
> (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa
> 66f49
> > 75c99e52984f7ee81b47d137b5b4751)
>
> That's correct.
>
> >> Is the point of this discussion that the patch may leave out some
> >> glich about timing of timeline-related changing and Heikki saw an
> >> egress of that?
> >
> > AFAIU, the committed patch has some gap in overall scenario which is
> the
> > fast promotion issue.
>
> Right, the fast promotion issue is still there.
>
> Just to get us all on the same page again: Does anyone see a problem
> with a fresh git checkout, with the fast-promotion-quick-fix.patch
> applied?
> (http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If
> you do, please speak up. As far as I know, the already-committed patch,
> together with fast-promotion-quick-fix.patch, should fix all known
> issues (*).
>
> I haven't committed a fix for the issue I reported in this thread,
> because I'm not 100% on what the right fix for it would be.
> fast-promotion-quick-fix.patch seems to do the trick, but at least the
> comments need to be updated, and I'm not sure if there some related
> corner cases that it doesn't handle. Simon?

The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
of recovery checkpoint, it doesn't matter w.r.t performance but actually the
purpose will
be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
or am I missing something.

One more thing, I think after fast promotion, either it should set timeline
or give error in CreateCheckPoint() function before it reaches the check
mentioned by you in your initial mail.
if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
elog(ERROR, "can't create a checkpoint during recovery");
Shouldn't it set timeline in above check (RecoveryInProgress()) or when
RecoveryInProgress() is called before CreateCheckPoint()?

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2013-05-13 11:28:26 Re: Logging of PAM Authentication Failure
Previous Message Heikki Linnakangas 2013-05-13 08:49:07 Re: erroneous restore into pg_catalog schema