Re: Redesigning checkpoint_segments

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Venkata Balaji N <nag1010(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Redesigning checkpoint_segments
Date: 2015-05-26 21:26:03
Message-ID: CAMkU=1xxnNJBh_w6SGU-nYszuLKkq3hPyMKk9fadZdtbU2=o9A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 21, 2015 at 8:40 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:

> On Thu, May 21, 2015 at 3:53 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> > On Mon, Mar 16, 2015 at 11:05 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
> wrote:
> >>
> >> On Mon, Feb 23, 2015 at 8:56 AM, Heikki Linnakangas
> >> <hlinnakangas(at)vmware(dot)com> wrote:
> >>>
> >>>
> >>> Everyone seems to be happy with the names and behaviour of the GUCs, so
> >>> committed.
> >>
> >>
> >>
> >> The docs suggest that max_wal_size will be respected during archive
> >> recovery (causing restartpoints and recycling), but I'm not seeing that
> >> happening. Is this a doc bug or an implementation bug?
> >
> >
> > I think the old behavior, where restartpoints were driven only by time
> and
> > not by volume, was a misfeature. But not a bug, because it was
> documented.
> >
> > One of the points of max_wal_size and its predecessor is to limit how big
> > pg_xlog can grow. But running out of disk space on pg_xlog is no more
> fun
> > during archive recovery than it is during normal operations. So why
> > shouldn't max_wal_size be active during recovery?
>
> The following message of commit 7181530 explains why.
>
> In standby mode, respect checkpoint_segments in addition to
> checkpoint_timeout to trigger restartpoints. We used to deliberately
> only
> do time-based restartpoints, because if checkpoint_segments is small we
> would spend time doing restartpoints more often than really necessary.
> But now that restartpoints are done in bgwriter, they're not as
> disruptive as they used to be. Secondly, because streaming replication
> stores the streamed WAL files in pg_xlog, we want to clean it up more
> often to avoid running out of disk space when checkpoint_timeout is
> large
> and checkpoint_segments small.
>
> Previously users were more likely to fall into this trouble (i.e., too
> frequent
> occurrence of restartpoints) because the default value of
> checkpoint_segments
> was very small, I guess. But we increased the default of max_wal_size, so
> now
> the risk of that trouble seems to be smaller than before, and maybe we can
> allow max_wal_size to trigger restartpoints.
>

I see. The old behavior was present for the same reason we decided to split
checkpoint_segments into max_wal_size and min_wal_size.

That is, the default checkpoint_segments was small, and it had to be small
because increasing it would cause more space to be used even when that
extra space was not helpful.

So perhaps we can consider this change a completion of the max_wal_size
work, rather than a new feature?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2015-05-26 21:41:09 Re: ERROR: MultiXactId xxxx has not been created yet -- apparent wraparound
Previous Message Oskari Saarenmaa 2015-05-26 21:19:51 Re: hstore_plpython regression test does not work on Python 3