Re: Fast promotion failure

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: 'Kyotaro HORIGUCHI' <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, masao(dot)fujii(at)gmail(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fast promotion failure
Date: 2013-05-13 07:43:20
Message-ID: 51909998.3010202@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13.05.2013 06:07, Amit Kapila wrote:
> On Monday, May 13, 2013 5:54 AM Kyotaro HORIGUCHI wrote:
>> Heikki said in the fist message in this thread that he suspected
>> the cause of the failure he had seen to be wrong TLI on whitch
>> checkpointer runs. Nevertheless, the patch you suggested for me
>> looks fixing it. Moreover (one of?) the failure from the same
>> cause looks fixed with the patch.
>
> There were 2 problems:
> 1. There was some issue in walsender logic due to which after promotion in
> some cases it hits assertion or error
> 2. During fast promotion, checkpoint gets created with wrong TLI
>
> He has provided 2 different patches
> fix-standby-promotion-assert-fail-2.patch and
> fast-promotion-quick-fix.patch.
> Among 2, he has already committed fix-standby-promotion-assert-fail-2.patch
> (http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2ffa66f49
> 75c99e52984f7ee81b47d137b5b4751)

That's correct.

>> Is the point of this discussion that the patch may leave out some
>> glich about timing of timeline-related changing and Heikki saw an
>> egress of that?
>
> AFAIU, the committed patch has some gap in overall scenario which is the
> fast promotion issue.

Right, the fast promotion issue is still there.

Just to get us all on the same page again: Does anyone see a problem
with a fresh git checkout, with the fast-promotion-quick-fix.patch
applied?
(http://www.postgresql.org/message-id/51894942.4080500@vmware.com). If
you do, please speak up. As far as I know, the already-committed patch,
together with fast-promotion-quick-fix.patch, should fix all known
issues (*).

I haven't committed a fix for the issue I reported in this thread,
because I'm not 100% on what the right fix for it would be.
fast-promotion-quick-fix.patch seems to do the trick, but at least the
comments need to be updated, and I'm not sure if there some related
corner cases that it doesn't handle. Simon?

(*) Well, almost. This one is still pending:
http://www.postgresql.org/message-id/CAB7nPqRhuCuuD012GCB_tAAFrixx2WioN_zfXQcvLuRab8DN2g@mail.gmail.com

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-05-13 08:49:07 Re: erroneous restore into pg_catalog schema
Previous Message Kyotaro HORIGUCHI 2013-05-13 07:25:45 Re: Logging of PAM Authentication Failure