Re: Redesigning checkpoint_segments

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Venkata Balaji N <nag1010(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2015-02-03 21:18:34
Message-ID: 54D13B2A.6040808@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/03/2015 07:50 AM, Robert Haas wrote:
> On Tue, Feb 3, 2015 at 10:44 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>> That's the whole point of this patch. "max_checkpoint_segments = 10", or
>> "max_checkpoint_segments = 160 MB", means that the system will begin a
>> checkpoint so that when the checkpoint completes, and it truncates away or
>> recycles old WAL, the total size of pg_xlog is 160 MB.
>>
>> That's different from our current checkpoint_segments setting. With
>> checkpoint_segments, the documented formula for calculating the disk usage
>> is (2 + checkpoint_completion_target) * checkpoint_segments * 16 MB. That's
>> a lot less intuitive to set.
>
> Hmm, that's different from what I was thinking. We probably shouldn't
> call that max_checkpoint_segments, then. I got confused and thought
> you were just trying to decouple the number of segments that it takes
> to trigger a checkpoint from the number we keep preallocated.

Wait, what? Because the new setting is an actual soft maximum, we
*shouldn't* call it a maximum? Or are you saying something else?

On 02/03/2015 04:25 AM, Heikki Linnakangas wrote:
> On 02/02/2015 04:21 PM, Andres Freund wrote:
>> I think we need to increase checkpoint_timeout too - that's actually
>> just as important for the default experience from my pov. 5 minutes
>> often just unnecessarily generates FPWs en masse.

I have yet to see any serious benchmarking on checkpoint_timeout. It
does seem that for some workloads on some machines a longer timeout is
better, but I've also seen workloads where a longer timeout decreases
throughput or raises IO. So absent some hard numbers, I'd be opposed to
changing the default.

>>
>>> I'll open the bidding at 1600MB (aka 100).
>>
>> Fine with me.
>
> I wouldn't object to raising it a little bit, but that's way too high.
> It's entirely possible to have a small database that generates a lot of
> WAL. A table that has only a few rows, but is updated very very
> frequently, for example. And checkpointing such a database is quick too,
> so frequent checkpoints are not a problem. You don't want to end up with
> 1.5 GB of WAL on a 100 MB database.

I suggest 192MB instead (12 segments). That almost doubles our current
real default, without requiring huge disk space which might surprise
some users.

In practice, checkpoint_segments is impossible to automatically tune
correctly. So let's be conservative.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-02-03 21:51:42 pgsql: Process 'die' interrupts while reading/writing from the client s
Previous Message Andres Freund 2015-02-03 20:28:11 Re: Re: Overhauling our interrupt handling (was Escaping from blocked send() reprised.)