Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones

From: Oskari Saarenmaa <os(at)ohmu(dot)fi>
To: Larry Rosenman <ler(at)lerctr(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] pg_upgrade: support for btrfs copy-on-write clones
Date: 2013-10-04 19:42:46
Message-ID: 524F1A36.4070007@ohmu.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

03.10.2013 01:35, Larry Rosenman kirjoitti:
> On 2013-10-02 17:32, Josh Berkus wrote:
>>> No fundamental reason; I'm hoping ZFS will be supported in addition to
>>> btrfs, but I don't have any systems with ZFS filesystems at the moment
>>> so I haven't been able to test it or find out the mechanisms ZFS uses
>>> for cloning. On btrfs cloning is implemented with a custom
>>> btrfs-specific ioctl, ZFS probably has something similar which would be
>>> pretty easy to add on top of this patch.
>>
>> Would you like a VM with ZFS on it? I'm pretty sure I can supply one.
>>
> I can also supply SSH access to a FreeBSD 10 system that is totally ZFS.

Thanks for the offers, but it looks like ZFS doesn't actually implement
a similar file level clone operation. See
https://github.com/zfsonlinux/zfs/issues/405 for discussion on a feature
request for it.

ZFS does support cloning entire datasets which seem to be similar to
btrfs subvolume snapshots and could be used to set up a new data
directory for a new $PGDATA. This would require the original $PGDATA
to be a dataset/subvolume of its own and quite a bit different logic
(than just another file copy method in pg_upgrade) to initialize the new
version's $PGDATA as a snapshot/clone of the original. The way this
would work is that the original $PGDATA dataset/subvolume gets cloned to
a new location after which we move the files out of the way of the new
PG installation and run pg_upgrade in link mode. I'm not sure if
there's a good way to integrate this into pg_upgrade or if it's just
something that could be documented as a fast way to run pg_upgrade
without touching original files.

With btrfs tooling the sequence would be something like this:

btrfs subvolume snapshot /srv/pg92 /srv/pg93
mv /srv/pg93/data /srv/pg93/data92
initdb /data/pg93/data
pg_upgrade --link \
--old-datadir=/data/pg93/data92 \
--new-datadir=/data/pg93/data

/ Oskari

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2013-10-04 21:18:44 Re: ToDo: fast update of arrays with fixed length fields for PL/pgSQL
Previous Message Andres Freund 2013-10-04 19:20:37 Re: mvcc catalo gsnapshots and TopTransactionContext