Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: "Tomas Vondra" <tv(at)fuzzy(dot)cz>
To: "Heikki Linnakangas" <hlinnakangas(at)vmware(dot)com>
Cc: "Tomas Vondra" <tv(at)fuzzy(dot)cz>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-27 13:21:58
Message-ID: 429fe6fc1aa1d3a91804af032bbf1b5d.squirrel@2.emaily.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dne 27 Říjen 2014, 10:47, Heikki Linnakangas napsal(a):
> On 10/26/2014 11:47 PM, Tomas Vondra wrote:
>> After eyeballing the code for an hour or two, I think CREATE DATABASE
>> should be fine with performing only a 'partial checkpoint' on the
>> template database - calling FlushDatabaseBuffers and processing unlink
>> requests, as suggested by the comment in createdb().
>
> Hmm. You could replace the first checkpoint with that, but I don't think
> that's enough for the second. To get any significant performance
> benefit, you need to get rid of both checkpoints, because doing two
> checkpoints one after another is almost as fast as doing a single
> checkpoint; the second checkpoint has very little work to do because the
> first checkpoint already flushed out everything.
>
> The second checkpoint, after copying but before commit, is done because
> (from the comments in createdb function):
>
>> * #1: When PITR is off, we don't XLOG the contents of newly created
>> * indexes; therefore the drop-and-recreate-whole-directory behavior
>> * of DBASE_CREATE replay would lose such indexes.
>>
>> * #2: Since we have to recopy the source database during DBASE_CREATE
>> * replay, we run the risk of copying changes in it that were
>> * committed after the original CREATE DATABASE command but before the
>> * system crash that led to the replay. This is at least unexpected
>> * and at worst could lead to inconsistencies, eg duplicate table
>> * names.
>
> Doing only FlushDatabaseBuffers would not prevent these issues - you
> need a full checkpoint. These issues are better explained here:
> http://www.postgresql.org/message-id/28884.1119727671@sss.pgh.pa.us

Thinking about this a bit more, do we really need a full checkpoint? That
is a checkpoint of all the databases in the cluster? Why checkpointing the
source database is not enough?

I mean, when we use database A as a template, why do we need to checkpoint
B, C, D and F too? (Apologies if this is somehow obvious, I'm way out of
my comfort zone in this part of the code.)

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-10-27 13:28:03 Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Previous Message Fujii Masao 2014-10-27 13:20:00 Re: [REVIEW] Re: Compression of full-page-writes