Re: [BUG?] lag of minRecoveryPont in archive recovery

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Kyotaro HORIGUCHI'" <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG?] lag of minRecoveryPont in archive recovery
Date: 2012-12-06 11:39:16
Message-ID: 00da01cdd3a6$52ba37e0$f82ea7a0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, December 06, 2012 9:35 AM Kyotaro HORIGUCHI wrote:
> Hello, I have a problem with PostgreSQL 9.2 with Pacemaker.
>
> HA standby sometime failes to start under normal operation.
>
> Testing with a bare replication pair showed that the standby failes
> startup recovery under the operation sequence shown below. 9.3dev too,
> but 9.1 does not have this problem. This problem became apparent by the
> invalid-page check of xlog, but
> 9.1 also has same glitch potentially.
>
> After the investigation, the lag of minRecoveryPoint behind EndRecPtr in
> redo loop seems to be the cause. The lag brings about repetitive redoing
> of unrepeatable xlog sequences such as XLOG_HEAP2_VISIBLE ->
> SMGR_TRUNCATE on the same page. So I did the same aid work as
> xact_redo_commit_internal for smgr_redo. While doing this, I noticed
> that
> CheckRecoveryConsistency() in redo apply loop should be after redoing
> the record, so moved it.

I think moving CheckRecoveryConsistency() after redo apply loop might cause
a problem.
As currently it is done before recoveryStopsHere() function, which can allow
connections
on HOTSTANDY. But now if due to some reason recovery pauses or stops due to
above function,
connections might not be allowed as CheckRecoveryConsistency() is not
called.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2012-12-06 12:41:03 Re: Commits 8de72b and 5457a1 (COPY FREEZE)
Previous Message Pavan Deolasee 2012-12-06 10:49:25 Re: pg_dump transaction's read-only mode