Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-27 22:57:27
Message-ID: 544ECDD7.1040500@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10/27/2014 05:58 PM, Tomas Vondra wrote:
> On 27.10.2014 17:24, Heikki Linnakangas wrote:
>> On 10/27/2014 03:46 PM, Tom Lane wrote:
>>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>>>> On 10/27/2014 03:21 PM, Tomas Vondra wrote:
>>>>> Thinking about this a bit more, do we really need a full checkpoint?
>>>>> That
>>>>> is a checkpoint of all the databases in the cluster? Why
>>>>> checkpointing the
>>>>> source database is not enough?
>>>> A full checkpoint ensures that you always begin recovery *after* the
>>>> DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
>>>> crash recovery (except when you crash before the transaction commits, in
>>>> which case it doesn't matter if the new database's directory is borked).
>>> Yeah. After re-reading the 2005 thread, I wonder if we shouldn't just
>>> bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log
>>> all the copied files instead of doing a "cp -r"-equivalent directory
>>> copy.
>>> That would fix a number of existing replay hazards as well as making it
>>> safe to do what Tomas wants. In the small scale this would cause more
>>> I/O
>>> (2 copies of the template database's data) but in production situations
>>> we might well come out ahead by avoiding a forced checkpoint of the rest
>>> of the cluster. Also I guess we could skip WAL-logging if WAL archiving
>>> is off, similarly to the existing optimization for CREATE INDEX etc.
>> That would be a nasty surprise for anyone who's using CREATE DATABASE
>> as a fast way to clone a large database. But I would be OK with that,
>> at least if we can skip the WAL-logging with wal_level=minimal.
> That's true. Sadly, I can't think of a solution that would address both
> use cases at the same time :-(
>
> The only thing I can think of is having two CREATE DATABASE "flavors".
> One keeping the current approach (suitable for fast cloning) and one
> with the WAL logging (minimizing the CREATE DATABASE duration the impact
> on other backends).
>
> It will probably make the code significantly more complex, which is not
> exactly desirable, I guess. Also, if we keep the current code (even if
> only as a special case) it won't eliminate the existing replay hazards
> (which was one of the Tom's arguments for biting the bullet).
>
> I'm also thinking that for wal_level=archive and large databases, this
> won't really eliminate the checkpoint as it will likely generate enough
> WAL to hit checkpoint_segments and trigger a checkpoint anyway. No?
>
> That being said, our CREATE DATABASE docs currently say this
>
> Although it is possible to copy a database other than template1 by
> specifying its name as the template, this is not (yet) intended as
> a general-purpose "COPY DATABASE" facility. The principal
> limitation is that no other sessions can be connected to the
> template database while it is being copied. CREATE DATABASE will
> fail if any other connection exists when it starts; otherwise, new
> connections to the template database are locked out until CREATE
> DATABASE completes. See Section 21.3 for more information.
>
> I think that this limitation pretty much means no one should use CREATE
> DATABASE for cloning live databases in production environment (because
> of the locking).
>
> It also seems to me the "general-purpose COPY DATABASE" described in the
> docs is what we're describing in this thread.
>

Notwithstanding what the docs say, I have seen CREATE DATABASE used
plenty of times, and quite effectively, to clone databases. I don't
think making it do twice the IO in the general case is going to go down
well.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-27 23:01:47 Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Previous Message Jeff Janes 2014-10-27 22:57:11 Re: pgcrypto: PGP signatures