Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: CREATE DATABASE vs. (partial) CHECKPOINT
Date: 2014-10-27 21:58:31
Message-ID: 544EC007.40702@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 27.10.2014 17:24, Heikki Linnakangas wrote:
> On 10/27/2014 03:46 PM, Tom Lane wrote:
>> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>>> On 10/27/2014 03:21 PM, Tomas Vondra wrote:
>>>> Thinking about this a bit more, do we really need a full checkpoint?
>>>> That
>>>> is a checkpoint of all the databases in the cluster? Why
>>>> checkpointing the
>>>> source database is not enough?
>>
>>> A full checkpoint ensures that you always begin recovery *after* the
>>> DBASE_CREATE record. I.e. you never replay a DBASE_CREATE record during
>>> crash recovery (except when you crash before the transaction commits, in
>>> which case it doesn't matter if the new database's directory is borked).
>>
>> Yeah. After re-reading the 2005 thread, I wonder if we shouldn't just
>> bite the bullet and redesign CREATE DATABASE as you suggest, ie, WAL-log
>> all the copied files instead of doing a "cp -r"-equivalent directory
>> copy.
>> That would fix a number of existing replay hazards as well as making it
>> safe to do what Tomas wants. In the small scale this would cause more
>> I/O
>> (2 copies of the template database's data) but in production situations
>> we might well come out ahead by avoiding a forced checkpoint of the rest
>> of the cluster. Also I guess we could skip WAL-logging if WAL archiving
>> is off, similarly to the existing optimization for CREATE INDEX etc.
>
> That would be a nasty surprise for anyone who's using CREATE DATABASE
> as a fast way to clone a large database. But I would be OK with that,
> at least if we can skip the WAL-logging with wal_level=minimal.

That's true. Sadly, I can't think of a solution that would address both
use cases at the same time :-(

The only thing I can think of is having two CREATE DATABASE "flavors".
One keeping the current approach (suitable for fast cloning) and one
with the WAL logging (minimizing the CREATE DATABASE duration the impact
on other backends).

It will probably make the code significantly more complex, which is not
exactly desirable, I guess. Also, if we keep the current code (even if
only as a special case) it won't eliminate the existing replay hazards
(which was one of the Tom's arguments for biting the bullet).

I'm also thinking that for wal_level=archive and large databases, this
won't really eliminate the checkpoint as it will likely generate enough
WAL to hit checkpoint_segments and trigger a checkpoint anyway. No?

That being said, our CREATE DATABASE docs currently say this

Although it is possible to copy a database other than template1 by
specifying its name as the template, this is not (yet) intended as
a general-purpose "COPY DATABASE" facility. The principal
limitation is that no other sessions can be connected to the
template database while it is being copied. CREATE DATABASE will
fail if any other connection exists when it starts; otherwise, new
connections to the template database are locked out until CREATE
DATABASE completes. See Section 21.3 for more information.

I think that this limitation pretty much means no one should use CREATE
DATABASE for cloning live databases in production environment (because
of the locking).

It also seems to me the "general-purpose COPY DATABASE" described in the
docs is what we're describing in this thread.

regards
Tomas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam Brightwell 2014-10-27 21:59:54 Re: Directory/File Access Permissions for COPY and Generic File Access Functions
Previous Message Alvaro Herrera 2014-10-27 21:57:45 Re: jsonb generator functions