Re: Redesigning checkpoint_segments

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Kevin Grittner <kgrittn(at)ymail(dot)com>
Cc: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2013-06-06 13:21:33
Message-ID: 51B08CDD.8030900@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 06.06.2013 15:31, Kevin Grittner wrote:
> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> wrote:
>> On 05.06.2013 22:18, Kevin Grittner wrote:
>>> Heikki Linnakangas<hlinnakangas(at)vmware(dot)com> wrote:
>>>
>>>> I was not thinking of making it a hard limit. It would be just
>>>> like checkpoint_segments from that point of view - if a
>>>> checkpoint takes a long time, max_wal_size might still be
>>>> exceeded.
>>>
>>> Then I suggest we not use exactly that name. I feel quite sure we
>>> would get complaints from people if something labeled as "max" was
>>> exceeded -- especially if they set that to the actual size of a
>>> filesystem dedicated to WAL files.
>>
>> You're probably right. Any suggestions for a better name?
>> wal_size_soft_limit?
>
> After reading later posts on the thread, I would be inclined to
> support making it a hard limit and adapting the behavior to match.

Well, that's a lot more difficult to implement. And even if we have a
hard limit, I think many people would still want to have a soft limit
that would trigger a checkpoint, but would not stop WAL writes from
happening. So what would we call that?

I'd love to see a hard limit too, but I see that as an orthogonal feature.

How about calling the (soft) limit "checkpoint_wal_size"? That goes well
together with checkpoint_timeout, meaning that a checkpoint will be
triggered if you're about to exceed the given size.

> I'm also concerned about the "spin up" from idle to high activity.
> Perhaps a "min" should also be present, to mitigate repeated short
> checkpoint cycles for "bursty" environments?

With my proposal, you wouldn't get repeated short checkpoint cycles with
bursts. The checkpoint interval would be controlled by
checkpoint_timeout, and checkpoint_wal_size. If there is a lot of
activity, then checkpoints will happen more frequently, as
checkpoint_wal_size is reached sooner. But it would not depend on the
activity in previous checkpoint cycles, only the current one, so it
would not make a difference if you have a continuously high load, or a
bursty one.

The history would matter for the calculation of how many segments to
preallocate/recycle, however. Under the proposal, that would be
calculated separately from checkpoint_wal_size, and for that we'd use
some kind of a moving average of how many segments were used in previous
cycles. A min setting might be useful for that. We could also try to
make WAL file creation cheaper, ie. by using posix_fallocate(), as was
proposed in another thread, and doing it in bgwriter or walwriter. That
would make it less important to get the estimate right, from a
performance point of view, although you'd still want to get it right to
avoid running out of disk space (having the segments preallocated
ensures that they are available when needed).

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Karl O. Pinc 2013-06-06 13:30:17 Re: Make targets of doc links used by phpPgAdmin static
Previous Message Heikki Linnakangas 2013-06-06 12:39:27 Re: Freezing without write I/O