Re: [9.3 bug] disk space in pg_xlog increases during archive recovery

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: MauMau <maumau307(at)gmail(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Date: 2014-01-21 21:37:43
Message-ID: 52DEE8A7.9020404@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/21/2014 07:31 PM, Fujii Masao wrote:
> On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307(at)gmail(dot)com> wrote:
>> From: "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>
>>
>>> ! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
>>>
>>> Even when standby_mode is not enabled, we can use cascade replication and
>>> it needs the accumulated WAL files. So I think that
>>> AllowCascadeReplication()
>>> should be added into this condition.
>>>
>>> ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
>>> ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
>>> !
>>> ! if (restoredFromArchive)
>>>
>>> Don't we need to check !StandbyModeRequested and
>>> !AllowCascadeReplication()
>>> here?
>>
>> Oh, you are correct. Okay, done.
>
> Thanks! The patch looks good to me. Attached is the updated version of
> the patch. I added the comments.

Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for
recovery are copied into pg_xlog, even when restoring from archive, even
when not doing cascading replication. It guarantees that you can restart
the standby, even if the connection to the archive is lost for some
reason. I intentionally changed the behavior for archive recovery too,
when it was introduced for cascading replication. Also, I think it's
good that the behavior does not depend on whether cascading replication
is enabled - it's a quite subtle difference.

So, IMHO this is not a bug, it's a feature.

To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively.
We intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-01-21 21:51:10 Re: Hard limit on WAL space used (because PANIC sucks)
Previous Message Tom Lane 2014-01-21 21:37:28 Re: Funny representation in pg_stat_statements.query.