Re: pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_basebackup -x/X doesn't play well with archive_mode & wal_keep_segments
Date: 2014-12-05 07:18:02
Message-ID: CAHGQGwFNBywzAf1CxQmWyAL2ap-9WxK76XqtX+qHhpBPNJON_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 5, 2014 at 9:28 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> We've recently observed a case where, after a promotion, a postgres
> server suddenly started to archive a large amount of old WAL.
>
> After some digging the problem is this:
>
> pg_basebackup -X creates files in pg_xlog/ without creating the
> corresponding .done file. Note that walreceiver *does* create them. The
> standby in this case, just like the master, had a significant
> wal_keep_segments. RemoveOldXlogFiles() then, during recovery restart
> points, calls XLogArchiveCheckDone() which in turn does:
> /* Retry creation of the .ready file */
> XLogArchiveNotify(xlog);
> return false;
> if there's neither a .done nor a .ready file present and archive_mode is
> enabled. These segments then aren't removed because there's a .ready
> present and they're never archived as long as the node is a standby
> because we don't do archiving on standbys.
> Once the node is promoted archiver will be started and suddenly archive
> all these files - which might be months old.
>
> And additional, at first strange, nice detail is that a lot of the
> .ready files had nearly the same timestamps. Turns out that's due to
> wal_keep_segments. Initially RemoveOldXlogFiles() doesn't process the
> files because they're newer than allowed due to wal_keep_segments. Then
> every checkpoint a couple segments would be old enough to reach
> XLogArchiveCheckDone() which then'd create the .ready marker... But not
> all at once :)
>
>
> So I think we just need to make pg_basebackup create to .ready
> files.

s/.ready/.done? If yes, +1.

> Given that the walreceiver and restore_command already
> unconditionally do XLogArchiveForceDone() I think we'd follow the
> established precedent. Arguably it could make sense to archive files
> again on the standby after a promotion as they aren't guaranteed to have
> been on the then primary. But we don't have any infrastructure anyway
> for that and walsender doesn't do so, so it doesn't seem to make any
> sense to do that for pg_basebackup.
>
> Independent from this bug, there's also some debatable behaviour about
> what happens if a node with a high wal_keep_segments turns on
> archive_mode. Suddenly all those old files are archived... I think it
> might be a good idea to simply always create .done files when
> archive_mode is disabled while a wal segment is finished.

+1

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-12-05 07:18:22 Re: On partitioning
Previous Message Amit Langote 2014-12-05 06:57:16 Re: On partitioning