Re: Better support of exported snapshots with pg_dump

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-22 00:24:07
Message-ID: CAB7nPqQkhgUai3PSUUqHxyS8oSgV=1VHppsxFukPAZmyKsCVVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 11:33 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Thoughts?

I have been poking at that during the long flight back from Chicago
and created the attached patch that makes pg_dump able to create a
replication slot (hence have pg_dump put its hands on a synchronized
snapshot describing data at the state of slot creation), then take a
dump using the exported snapshot while maintaining the replication
connection for slot creation alive for the duration of the dump.

Taking a dump consistent with a replication slot is useful for online
upgrade cases first, because you can simply run pg_dump, have a slot
created, and get as well a state of the database consistent with the
slot creation before replaying changes in a way or another. Using
that, a decoder that generates raw queries, and a receiver able to
apply changes on a remote Postgres server, it is possible to get a
kind of live migration solution from a Postgres instance to another
for a single database, as long as the origin server uses 9.4. Making
the receiver report write and flush positions makes also possible the
origin server to use synchronous replication protocol to be sure that
changes got applied on remote before performing a switch from the
origin to the remote (that may actually explain why multi-syncrep
would be useful here for multiple databases). Also, I imagine that
users could even use this tool in pg_dump for example to do some post
processing on the data dumped in accordance to the decoder plugin
before applying changes to a remote source.

Now, this is done with the addition of two options in pg_dump to
control the logical slot creation:
- --slot to define the name of the slot being created
- --plugin-name, to define the name of the decoder plugin
And then you can of course do things like that:
# Raw data dump on a slot
$ pg_dump --slot bar --plugin-name test_decoding
# Existing parallel dump not changed:
$ pg_dump -j 4 -f data -F d
# Parallel dump on a slot
$ pg_dump -j 4 --slot bar --plugin-name test_decoding -f data -F d

This patch does not solve the existing problems related to relation
locking between LOCK taken on tables and the moment a snapshot is
exported (actually that's a different problem), but similarly to
parallel pg_dump it reduces the exposition window to schema changes to
a minimum. This has needed the addition of some logic to make pg_dump
aware of replication connection. Parallel dumps are supported as well,
the trick being to be sure that the existing parallel dump facility is
still using the snapshots from the main db connection, and not the
replication connection, while parallel dumps are possible using the
snapshot from the slot created.

The first patch attached is the feature itself. The second patch, that
can be applied on top the first one, outputs some useful logs to track
the snapshot creation depending on the code paths taken. I used that
for debugging purposes only, just posting it here for reference. I'll
add that to the next commit fest (patch contains docs as well).

Regards,
--
Michael

Attachment Content-Type Size
0001-pg_dump_repslot_core.patch text/x-patch 16.5 KB
0002-pg_dump_repslot_debug.patch text/x-patch 1.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-22 02:43:08 Re: Turning off HOT/Cleanup sometimes
Previous Message Alvaro Herrera 2014-09-21 22:23:23 Re: [REVIEW] Re: Compression of full-page-writes