Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-29 18:36:07
Message-ID: 54513397.8060509@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29.10.2014 18:31, Robert Haas wrote:
> On Mon, Oct 27, 2014 at 8:01 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>> (3) write-heavy workloads / large template database
>>
>> Current approach wins, for two reasons: (a) for large databases the
>> WAL-logging overhead may generate much more I/O than a checkpoint,
>> and (b) it may generate so many WAL segments it eventually triggers
>> a checkpoint anyway (even repeatedly).
>
> I would tend not to worry too much about this case. I'm skeptical
> that there are a lot of people using large template databases. But
> if there are, or if some particular one of those people hits this
> problem, then they can raise checkpoint_segments to avoid it. The
> reverse problem, which you are encountering, cannot be fixed by
> adjusting settings.

That however solves "only" the checkpoint, not the double amount of I/O
due to writing both the files and WAL, no? But maybe that's OK.

Also, all this is concern only with 'wal_level != minimal', but ISTM 'on
wal_level=minimal it's fast' is a rather poor argument.

>
> (This reminds me, yet again, that it would be really nice to something
> smarter than checkpoint_segments. If there is little WAL activity
> between one checkpoint and the next, we should reduce the number of
> segments we're keeping around to free up disk space and ensure that
> we're recycling a file new enough that it's likely to still be in
> cache. Recycling files long-since evicted from cache is poor. But
> then we should also let the number of WAL files ratchet back up if the
> system again becomes busy. Isn't this more or less what Heikki's
> soft-WAL-limit patch did? Why did we reject that, again?)

What about simply reusing the files in a different way? Instead of
looping through the files in a round robin manner, couldn't we just use
the last recently used file, instead of going all the way back to the
first one? This won't free the disk space, but IMHO that's not a problem
because noone is going to use that space anyway (as it would be a risk
once all the segments will be used again).

Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-10-29 19:00:36 Re: pg_background (and more parallelism infrastructure patches)
Previous Message Stephen Frost 2014-10-29 18:30:14 Re: Directory/File Access Permissions for COPY and Generic File Access Functions