Quick Links

Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From:	Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To:	Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date:	2014-10-22 11:53:53
Message-ID:	54479AD1.2000902@vmware.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/20/2014 09:26 AM, Michael Paquier wrote:
> On Fri, Oct 17, 2014 at 10:37 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>>
>> On Fri, Oct 17, 2014 at 9:23 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>
>>> In this case, the patch seems to make the restartpoint recycle even WAL files
>>> which have .ready files and will have to be archived later. Thought?
>>
>> The real problem currently is that it is possible to have a segment file not marked as .done during recovery when stream connection is abruptly cut when this segment is switched, marking it as .ready in archive_status and simply letting this segment in pg_xlog because it will neither be recycled nor removed. I have not been able to look much at this code these days, so I am not sure how invasive it would be in back-branches, but perhaps we should try to improve code such as when a segment file is switched and connection to the is cut, we guarantee that this file is completed and marked as .done.
>
> I have spent more time on that, with a bit more of underground work...
> First, the problem can be reproduced most of the time by running this
> simple command:
> psql -c 'select pg_switch_xlog()'; pg_ctl restart -m immediate
>
> This will enforce a segment file switch and restart the master in
> crash recovery. This has as effect to immediately cut the WAL stream
> on slave, symbolized by a FATAL in libpqrcv_receive where rawlen == 0.
> For example, let's imagine that stream fails when switching from
> 000000010000000000000003 to the next segment, then the
> the last XLogRecPtr in XLogWalRcvProcessMsg for dataStart is for
> example 0/3100000, and walrcv->latestWalEnd is 0/4000000. When stream
> restarts it will begin once again from 0/4000000, ignoring that
> 000000010000000000000003 should be marked as .done, ultimately marking
> it in .ready state when old segment files are recycled or removed.
> There is nothing that can really be done to enforce the creation of a
> .done file before the FATAL of libpqrcv_receive because we cannot
> predict the stream failure..
>
> Now, we can do better than what we have now by looking at WAL start
> position used when starting streaming in WAL receiver and enforce
> .done if the start position is the last one of previous segment.
> Hence, in the case of start position 0/4000000 that was found
> previously, the file that will be enforced to .done is
> 000000010000000000000003. I have written the patch attached that
> implements this idea and fixes the problem. Now let's see if you guys
> see any flaws in this simple logic which uses a sniper gun instead of
> a bazooka as in the previous patches sent.

Hmm. This will still miss the .done file if you don't re-establish the
streaming replication connection after the restart. For example, if you
shut down the master, and promote the standby server.

I think we should take a more wholesale approach to this. We should
enforce the rule that the server only ever archives WAL files belonging
to the same timeline that the server generates. IOW, the server only
archives the WAL that it has generated.

- Heikki

In response to

Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves) at 2014-10-20 06:26:27 from Michael Paquier

Responses

Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves) at 2014-10-22 19:28:15 from Heikki Linnakangas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Teodor Sigaev	2014-10-22 12:01:15	Re: btree_gin and ranges
Previous Message	Teodor Sigaev	2014-10-22 11:52:37	speedup tidbitmap patch: cache page