Re: Redesigning checkpoint_segments

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2013-06-05 18:35:32
Message-ID: 51AF84F4.4000504@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05.06.2013 21:16, Fujii Masao wrote:
> On Wed, Jun 5, 2013 at 9:16 PM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> I propose that we do something similar, but not exactly the same. Let's have
>> a setting, max_wal_size, to control the max. disk space reserved for WAL.
>> Once that's reached (or you get close enough, so that there are still some
>> segments left to consume while the checkpoint runs), a checkpoint is
>> triggered.
>
> What if max_wal_size is reached while the checkpoint is running? We should
> change the checkpoint from spread mode to fast mode?

The checkpoint spreading code already tracks if the checkpoint is "on
schedule", and it takes into account both checkpoint_timeout and
checkpoint_segments. Ie. if you consume segments faster than expected,
the checkpoint will speed up as well. Once checkpoint_segments is
reached, the checkpoint will complete ASAP, with no delays to spread it out.

This would still work the same with max_wal_size. A new checkpoint would
be started well before reaching max_wal_size, so that it has enough time
to complete. If the checkpoint "falls behind", it will hurry up until
it's back on schedule. If max_wal_size is reached anyway, it will
complete ASAP.

> Or, if max_wal_size
> is hard limit, we should keep the allocation of new WAL file waiting until
> the checkpoint has finished and removed some old WAL files?

I was not thinking of making it a hard limit. It would be just like
checkpoint_segments from that point of view - if a checkpoint takes a
long time, max_wal_size might still be exceeded.

>> In this proposal, the number of segments preallocated is controlled
>> separately from max_wal_size, so that you can set max_wal_size high, without
>> actually consuming that much space in normal operation. It's just a
>> backstop, to avoid completely filling the disk, if there's a sudden burst of
>> activity. The number of segments preallocated is auto-tuned, based on the
>> number of segments used in previous checkpoint cycles.
>
> How is wal_keep_segments handled in your approach?

Hmm, haven't thought about that. I think a better unit to set
wal_keep_segments in would also be MB, not segments. Perhaps
max_wal_size should include WAL retained for wal_keep_segments, leaving
less room for checkpoints. Ie. when you you set wal_keep_segments
higher, a xlog-based checkpoint would be triggered earlier, because the
old segments kept for replication would leave less room for new
segments. And setting wal_keep_segments higher than max_wal_size would
be an error.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-06-05 19:07:56 Re: Configurable location for extension .control files
Previous Message Fujii Masao 2013-06-05 18:16:09 Re: Redesigning checkpoint_segments