Re: Fast promotion failure

From: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To: amit(dot)kapila(at)huawei(dot)com
Cc: hlinnakangas(at)vmware(dot)com, simon(at)2ndquadrant(dot)com, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fast promotion failure
Date: 2013-05-16 06:02:42
Message-ID: 20130516.150242.153333292.horiguchi.kyotaro@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

> > >> Is the point of this discussion that the patch may leave out some
> > >> glich about timing of timeline-related changing and Heikki saw an
> > >> egress of that?
> > >
> > > AFAIU, the committed patch has some gap in overall scenario which is
> > the
> > > fast promotion issue.
> >
> > Right, the fast promotion issue is still there.
> >
> > Just to get us all on the same page again: Does anyone see a problem
> > with a fresh git checkout, with the fast-promotion-quick-fix.patch
> > applied?
> > (http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If
> > you do, please speak up. As far as I know, the already-committed patch,
> > together with fast-promotion-quick-fix.patch, should fix all known
> > issues (*).

Shared XLogCtl->ThisTimeLineID is written and read without
fencing by spinlock unlike some other XLogCtl members. Can this
break coherency of its memory between different processors? It
is quite reasonable that I cannot find the trouble if it is the
cause. I didn't see the issue even without
fast-promotion-quick-fix.patch.

> The patch provided will un-necessarily call InitXLOGAccess() 2 times for End
> of recovery checkpoint, it doesn't matter w.r.t performance but actually the
> purpose will
> be almost same for calling LocalSetXLogInsertAllowed() and InitXLOGAccess(),
> or am I missing something.
>
> One more thing, I think after fast promotion, either it should set timeline
> or give error in CreateCheckPoint() function before it reaches the check
> mentioned by you in your initial mail.
> if (RecoveryInProgress() && (flags & CHECKPOINT_END_OF_RECOVERY) == 0)
> elog(ERROR, "can't create a checkpoint during recovery");
> Shouldn't it set timeline in above check (RecoveryInProgress()) or when
> RecoveryInProgress() is called before CreateCheckPoint()?

Thinking of checkpointer, it does RecoveryInProgress() far
earlier to there, in waiting loop in CheckPointerMain where to
decide which to do between checkpoint and restartpoint. So
InitXLogAccess() has been already done when checkpoint is choosed
there for the first time. And before that, ThisTimeLineID in the
startup process gets incremented and is copied onto
XLogCtl->ThisTimeLineID before xlogctl->SharedRecoveryInProgress
becomes false in StartupXLog(). I don't think it is possible for
checkpointer to run on older timeline id on codition that all
processes looks exactly the same memory image.

regards,

--
Kyotaro Horiguchi
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2013-05-16 06:53:40 Re: Logging of PAM Authentication Failure
Previous Message Daniel Farina 2013-05-16 05:53:17 Re: Better LWLocks with compare-and-swap (9.4)