Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Re: Slave enters in recovery and promotes when WAL stream with master is cut + delay master/slave
Date: 2013-01-18 09:20:57
Message-ID: 50F913F9.10400@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 18.01.2013 02:35, Andres Freund wrote:
> On 2013-01-18 08:24:31 +0900, Michael Paquier wrote:
>> On Fri, Jan 18, 2013 at 3:05 AM, Fujii Masao<masao(dot)fujii(at)gmail(dot)com> wrote:
>>
>>> I encountered the problem that the timeline switch is not performed
>>> expectedly.
>>> I set up one master, one standby and one cascade standby. All the servers
>>> share the archive directory. restore_command is specified in the
>>> recovery.conf
>>> in those two standbys.
>>>
>>> I shut down the master, and then promoted the standby. In this case, the
>>> cascade standby should switch to new timeline and replication should be
>>> successfully restarted. But the timeline was never changed, and the
>>> following
>>> log messages were kept outputting.
>>>
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>> sby2 LOG: restarted WAL streaming at 0/3000000 on timeline 1
>>> sby2 LOG: replication terminated by primary server
>>> sby2 DETAIL: End of WAL reached on timeline 1
>>>
>> I am seeing similar issues with master at 88228e6.
>> This is easily reproducible by setting up 2 slaves under a master, then
>> kill the master. Promote slave 1 and reconnect slave 2 to slave 1, then
>> you will notice that the timeline jump is not done.
>>
>> I don't know if Masao tried to put in sync the slave that reconnects to the
>> promoted slave, but in this case slave2 stucks in "potential" state". That
>> is due to timeline that has not changed on slave2 but better to let you
>> know...
>
> Ok, I know whats causing this now. Rather ugly.
>
> Whenever accessing a page in a segment we haven't accessed before we
> read the first page to do an extra bit of validation as the first page
> in a segment contains more information.
>
> Suppose timeline 1 ends at 0/6087088, xlog.c notices that WAL ends
> there, wants to read the new timeline, requests record
> 0/06087088. xlogreader wants to do its validation and goes back to the
> first page in the segment which triggers xlog.c to rerequest timeline1
> to be transferred..

Hmm, so it's the same issue I thought I fixed yesterday. My patch only
fixed it for the case that the timeline switch is in the first page of
the segment. When it's not, you still get two calls for a WAL record,
first one for the first page in the segment, to verify that, and then
the page that actually contains the record. The first call leads
XLogPageRead to think it needs to read from the old timeline.

We didn't have this problem before the xlogreader refactoring because
XLogPageRead() was always called with the RecPtr of the record, even
when we actually read the segment header from the file first. We'll have
to somehow get that same information, the RecPtr of the record we're
actually interested in, to XLogPageRead(). We could add a new argument
to the callback for that, or we could keep xlogreader.c as it is and
pass it through from ReadRecord to XLogPageRead() in the private struct.

An explicit argument to the callback is probably best. That's
straightforward, and it might be useful for the callback to know the
actual WAL position that xlogreader.c is interested in anyway. See attached.

- Heikki

Attachment Content-Type Size
choose-correct-timeline-in-xlogpageread.patch text/x-diff 6.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-01-18 09:26:50 Re: How to hack the storage component?
Previous Message Dimitri Fontaine 2013-01-18 09:18:56 Re: Event Triggers: adding information