Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date: 2014-10-08 13:59:25
Message-ID: CAHGQGwF16xuGr=vBG0RGZOzR-PZDerkWf2OcxHAFrdTqiP41XQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 8, 2014 at 6:54 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> On 10/08/2014 10:44 AM, Michael Paquier wrote:
>>
>> On Fri, Sep 19, 2014 at 1:07 AM, Jehan-Guillaume de Rorthais <
>> jgdr(at)dalibo(dot)com> wrote:
>>
>>> We kept the WAL files and log files for further analysis. How can we help
>>> regarding this issue?
>>>
>>
>> Commit c2f79ba has added as assumption that the WAL receiver should always
>> enforce the create of .done files when WAL files are done being streamed
>> (XLogWalRcvWrite and WalReceiverMain) or archived
>> (KeepFileRestoredFromArchive). Then using this assumption 1bd42cd has
>> changed a bit RemoveOldXlogFiles, removing a check looking if the node is
>> in recovery. Now, based on the information given here yes it happens that
>> there are still cases where .done file creation is not correctly done,
>> leading to those extra files. Even by looking at the code, I am not
>> directly seeing any code paths where an extra call to XLogArchiveForceDone
>> would be needed on the WAL receiver side but... Something like the patch
>> attached (which is clearly a band-aid) may help though as it would make
>> files to be removed even if they are not marked as .done for a node in
>> recovery. And this is consistent with the pre-1bd42cd.
>
>
>
> There are two mysteries here:
>
> 1. Where do the FF files come from? In 9.2, FF-segments are not supposed to
> created, ever.
>
> Since this only happens with streaming replication, the FF segments are
> probably being created by walreceiver. XLogWalRcvWrite is the function that
> opens the file. I don't see anything obviously wrong there. XLogWalRcvWrite
> opens the file corresponding the start position in the message received from
> the master. There is no check that the start position is valid, though; if
> the master sends a start position in the FF segment, walreceiver will
> merrily write it. So the problem could be in the walsender side. However, I
> don't see anything wrong there either.
>
> I think we should add a check in walreceiver, to throw an error if the
> master sends an invalid WAL pointer, pointing to an FF segment.
>
>
> 2. Why are the .done files sometimes not being created?
>
> I may have an explanation for that. Walreceiver creates a .done file when it
> closes an old segment and opens a new one. However, it does this only when
> it's about to start writing to the new segment, and still has the old
> segment open. If you stream the FE segment fully, but drop replication
> connection at exactly that point, the .done file is not created. That might
> sound unlikely, but it's actually pretty easy to trigger. Just do "select
> pg_switch_xlog()" in the master, followed by "pg_ctl stop -m i" and a
> restart.
>
> The creation of the .done files seems quite unreliable anyway. If only a
> portion of a segment is streamed, we don't write a .done file for it, so we
> still have the original problem that we will try to archive the segment
> after failover, even though the master might already have archived it.
>
> I looked again at the thread where this was discussed:
> http://www.postgresql.org/message-id/flat/CAHGQGwHVYqbX=A+zo+AvFbVHLGoypO9G_QDKbabeXgXBVGd05g(at)mail(dot)gmail(dot)com(dot)
> I believe the idea was that the server that generates a WAL segment is
> always responsible for archiving it. A standby should never attempt to
> archive a WAL segment that was restored from the archive, or streamed from
> the master.
>
> In that thread, it was not discussed what should happen to WAL files that an
> admin manually copies into pg_xlog of the standby. Should the standby
> archive them? I don't think so - the admin should copy them manually to the
> archive too, if he wants them archived. It's a good and simple rule that the
> server that generates the WAL, archives the WAL.
>
> Instead of creating any .done files during recovery, we could scan pg_xlog
> at promotion, and create a .done file for every WAL segment that's present
> at that point. That would be more robust. And then apply your patch, to
> recycle old segments during archive recovery, ignoring .done files.

What happens if a user shutdowns the standby, removes recovery.conf and
starts the server as the master? In this case, no WAL files have .done status
files, so the server will create .ready and archive all of them. Probably this
is problematic. So even if we adopt your idea, ISTM that it's better to create
.done file whenever WAL file is fullly streamed and restored.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-08 14:05:46 Re: pg_background (and more parallelism infrastructure patches)
Previous Message Fujii Masao 2014-10-08 13:51:01 Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index