Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG: *FF WALs under 9.2 (WAS: .ready files appearing on slaves)
Date: 2014-10-28 13:48:10
Message-ID: 544F9E9A.9020808@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/27/2014 06:12 PM, Heikki Linnakangas wrote:
> On 10/27/2014 02:12 PM, Fujii Masao wrote:
>> >On Fri, Oct 24, 2014 at 10:05 PM, Heikki Linnakangas
>> ><hlinnakangas(at)vmware(dot)com> wrote:
>>> >>On 10/23/2014 11:09 AM, Heikki Linnakangas wrote:
>>>> >>>
>>>> >>>At least for master, we should consider changing the way the archiving
>>>> >>>works so that we only archive WAL that was generated in the same server.
>>>> >>>I.e. we should never try to archive WAL files belonging to another
>>>> >>>timeline.
>>>> >>>
>>>> >>>I just remembered that we discussed a different problem related to this
>>>> >>>some time ago, at
>>>> >>>
>>>> >>>http://www.postgresql.org/message-id/20131212.110002.204892575.horiguchi.kyotaro@lab.ntt.co.jp.
>>>> >>>The conclusion of that was that at promotion, we should not archive the
>>>> >>>last, partial, segment from the old timeline.
>>> >>
>>> >>
>>> >>So, this is what I came up with for master. Does anyone see a problem with
>>> >>it?
>> >
>> >What about the problem that I raised upthread? This is, the patch
>> >prevents the last, partial, WAL file of the old timeline from being archived.
>> >So we can never PITR the database to the point that the last, partial WAL
>> >file has.
> A partial WAL file is never archived in the master server to begin with,
> so if it's ever used in archive recovery, the administrator must have
> performed some manual action to copy the partial WAL file from the
> original server. When he does that, he can also copy it manually to the
> archive, or whatever he wants to do with it.
>
> Note that the same applies to any complete, but not-yet archived WAL
> files. But we've never had any mechanism in place to archive those in
> the new instance, after PITR.

Actually, I'll take back what I said above. I had misunderstood the
current behavior. Currently, a server *does* archive any files that you
copy manually to pg_xlog, after PITR has finished. Eventually. We don't
create a .ready file for them until they're old enough to be recycled.
We do create a .ready file for the last, partial, segment, but it's
pretty weird to do it just for that, and not any other, complete,
segments that might've been copied to pg_xlog. So what happens is that
the last partial segment gets archived immediately after promotion, but
any older segments will linger unarchived until much later.

The special treatment of the last partial segment still makes no sense.
If we want the segments from the old timeline to be archived after PITR,
we should archive them all immediately after end of recovery, not just
the partial one. The exception for just the last partial segment is silly.

Now, the bigger question is whether we want the server after PITR to be
responsible for archiving the segments from the old timeline at all. If
we do, then we should remove the special treatment of the last, partial
segment, and create the .ready files for all the complete segments too.
And actually, I think we should *not* archive the partial segment. We
don't normally archive partial segments, and all the WAL required to
restore the server to new timeline is copied to the file with the new
TLI. If the old timeline is still live, i.e. there's a server somewhere
still writing new WAL on the old timeline, the partial segment will
clash with a complete segment that the other server will archive later.

Yet another consideration is that we currently don't archive files
streamed from the master. If we think that the standby server is
responsible for archiving old segments after recovery, why is it not
responsible for archiving the streamed segments? It's because in most
cases, the master will archive the file, and we don't want two servers
to archive the same file, but there is actually no guarantee on that. It
might well be that the archiver runs a little bit behind in the master,
and after crash the archive will miss some of the segments required.
That's not good either.

I'm not sure what to do here. The current behavior is inconsistent, and
there are a some nasty gotchas that would be nice to fix. I think
someone needs to sit down and write a high-level design of how this all
should work.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-28 13:48:43 Re: superuser() shortcuts
Previous Message Stephen Frost 2014-10-28 13:43:35 Re: superuser() shortcuts