Re: Better support of exported snapshots with pg_dump

Lists: pgsql-hackers
From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Better support of exported snapshots with pg_dump
Date: 2014-09-01 08:00:32
Message-ID: CAB7nPqTyCtpg_rkMVMMwUKn0f31dy2LCALwBn2=E-jLvoOumJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

Currently pg_dump does not allow a user to specify an exported snapshot
name that he would like to use for a dump using SET TRANSACTION SNAPSHOT
(now pg_export_snapshot is only used for parallel pg_dump within it). I
imagine that this would be handy to take a consistent dump of a given
database after creating a logical replication slot on it. Thoughts?
Regards,
--
Michael


From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-01 08:25:58
Message-ID: 1BE0F00AB388000B75A8E7D7@eje.credativ.lan
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--On 1. September 2014 17:00:32 +0900 Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:

> Currently pg_dump does not allow a user to specify an exported snapshot
> name that he would like to use for a dump using SET TRANSACTION SNAPSHOT
> (now pg_export_snapshot is only used for parallel pg_dump within it). I
> imagine that this would be handy to take a consistent dump of a given
> database after creating a logical replication slot on it. Thoughts?

There was a discussion of this kind of feature some time ago here:

<http://www.postgresql.org/message-id/CA+U5nMK9+TTCff_-4MfdxWHnASTAuHuq7u7uedD57vaY28AsQA@mail.gmail.com>

Not sure if all the arguments holds still true with the appearance of MVCC
catalog scans.

--
Thanks

Bernd


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Bernd Helmle <mailings(at)oopsware(dot)de>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-01 09:30:52
Message-ID: 20140901093052.GB7400@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-09-01 10:25:58 +0200, Bernd Helmle wrote:
> --On 1. September 2014 17:00:32 +0900 Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>
> >Currently pg_dump does not allow a user to specify an exported snapshot
> >name that he would like to use for a dump using SET TRANSACTION SNAPSHOT
> >(now pg_export_snapshot is only used for parallel pg_dump within it). I
> >imagine that this would be handy to take a consistent dump of a given
> >database after creating a logical replication slot on it. Thoughts?

Yes, I always wanted that option.

> There was a discussion of this kind of feature some time ago here:
>
> <http://www.postgresql.org/message-id/CA+U5nMK9+TTCff_-4MfdxWHnASTAuHuq7u7uedD57vaY28AsQA@mail.gmail.com>

I was never convinced of the reasoning in that thread. Possibly things
have changed enough now that logical decoding is in core...

> Not sure if all the arguments holds still true with the appearance of MVCC
> catalog scans.

I don't think they change anything here. The problem is the, pretty
fundamental, problem that you need to know a relation exists before
executing a LOCK ...; on it. During that time somebody can change the
schema.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-01 12:54:24
Message-ID: CAB7nPqQ=yJu3UkgeEaLnZmd9diBBXCFQCDNHMESRDD=UMNEi8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 6:30 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On 2014-09-01 10:25:58 +0200, Bernd Helmle wrote:
>> There was a discussion of this kind of feature some time ago here:
>>
>> <http://www.postgresql.org/message-id/CA+U5nMK9+TTCff_-4MfdxWHnASTAuHuq7u7uedD57vaY28AsQA@mail.gmail.com>
Thanks. It is not surprising to see similar threads.

> I was never convinced of the reasoning in that thread. Possibly things
> have changed enough now that logical decoding is in core...

Well, the test case I got in mind is only for taking a dump using the
latest state of a replication slot and not the snapshot export itself.
So what about the following: we let the user specify a slot name with
pg_dump, and take a dump using the latest snapshot that this
replication slot has reported to a user. We could track the name of
the latest snapshot reported to user by adding a new field in
MyReplicationSlot, field updated in walsender.c when calling
SnapBuildExportSnapshot. Then we could expose that in
pg_replication_slots or with a separate SQL function that pg_dump
could use. That's just a rough idea, but something like that would
greatly help users writing online upgrade scripts.

>> Not sure if all the arguments holds still true with the appearance of MVCC
>> catalog scans.
>
> I don't think they change anything here. The problem is the, pretty
> fundamental, problem that you need to know a relation exists before
> executing a LOCK ...; on it. During that time somebody can change the
> schema.

Doesn't this window exist as well with parallel pg_dump? Looking at
the code snapshot export is taken before any locks on tables are
taken. This window is smaller, but still...
--
Michael


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-01 12:58:06
Message-ID: 20140901125806.GD20990@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-01 21:54:24 +0900, Michael Paquier wrote:
> On Mon, Sep 1, 2014 at 6:30 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > I was never convinced of the reasoning in that thread. Possibly things
> > have changed enough now that logical decoding is in core...
>
> Well, the test case I got in mind is only for taking a dump using the
> latest state of a replication slot and not the snapshot export itself.

I don't think what you're proposing is really possible. Could you
describe it in a bit more detail?

> So what about the following: we let the user specify a slot name with
> pg_dump, and take a dump using the latest snapshot that this
> replication slot has reported to a user.

There exists no snapshot sufficient for user data after slot creation.

> > I don't think they change anything here. The problem is the, pretty
> > fundamental, problem that you need to know a relation exists before
> > executing a LOCK ...; on it. During that time somebody can change the
> > schema.
>
> Doesn't this window exist as well with parallel pg_dump?

Yes. I didn't say those reasons were convincing. The window is quite a
bit smaller though. With the exported snapshot from CREATE REPLICATION
SLOT it could convinceably be hours.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-03 14:57:05
Message-ID: CA+TgmoYeYgb5DWwXKFUSR2P9+jqWA1SV496ZTTKs4DaXoKomYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 1, 2014 at 5:30 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> Hi,
>
>> >Currently pg_dump does not allow a user to specify an exported snapshot
>> >name that he would like to use for a dump using SET TRANSACTION SNAPSHOT
>> >(now pg_export_snapshot is only used for parallel pg_dump within it). I
>> >imagine that this would be handy to take a consistent dump of a given
>> >database after creating a logical replication slot on it. Thoughts?
>
> Yes, I always wanted that option.

I didn't find that option to be terribly important then, but I don't
see how we can possibly get by without it now, unless our goal is to
make logical decoding as hard to use as we possibly can.

Tom's got a good point about the order of locking vs. snapshot taking,
but I think the way to address that is by adding some capability to
temporarily lock out all DDL on non-temporary objects across the
entire system, rather than by trying to make pg_dump (or the walsender
creating the replication slot) lock every table. Even if we could get
that to work, it still leaves the very-much-related problem that dumps
of databases containing many tables can easily exhaust the lock table.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-04 14:33:39
Message-ID: CAB7nPqTg_o+VRYywPEahQFKgUWxogC+vitcOpoEM8UQ1KXY5OQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 3, 2014 at 11:57 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I didn't find that option to be terribly important then, but I don't
> see how we can possibly get by without it now, unless our goal is to
> make logical decoding as hard to use as we possibly can.

Yes. With 9.4 it is possible to take a consistent database snapshot
when creating a slot but it is tricky because of how ephemeral
exported snapshots are:
- When using CREATE_REPLICATION_SLOT, an exported snapshot lives only
for the time replication connection is done.
- pg_export_snapshot result only lives for the duration of the
transaction where function is called
- pg_create_logical_replication_slot cannot export a snapshot
So now (if I am correct), the only way to get a consistent dump from
database is to maintain open a replication connection after opening a
replication slot on it. Still it is really application-dependent,
assuming as well that schema is not modified as mentioned in this
thread. Any ways to facilitate the user experience on this side would
be a great step for things like online upgrades. Perhaps we could get
pg_dump or a wrapper on top of pg_dump creating a logical replication
slot, then taking a consistent image of the database it is based on
while replication connection is open.

> Tom's got a good point about the order of locking vs. snapshot taking,
> but I think the way to address that is by adding some capability to
> temporarily lock out all DDL on non-temporary objects across the
> entire system, rather than by trying to make pg_dump (or the walsender
> creating the replication slot) lock every table. Even if we could get
> that to work, it still leaves the very-much-related problem that dumps
> of databases containing many tables can easily exhaust the lock table.

Yes this is an idea to dig. Having system-wide DDL locking is
something that has been discussed at some point in XC development for
the addition of new nodes (needed to ensure that schema was consistent
during migration of data) if I recall correctly. Now looking quickly
at the XC code git-grepping is showing a method based on
pg_try_advisory_lock_shared and a global boolean variable set in
PostgresMain, coupled with a check in ProcessUtility preventing a
certain category of DDL from running if a lock is taken. The good
point is that there is already some work done to detect what are the
utility statements that could be allowed even if lock is hold
(EXECUTE, VACUUM, CLUSTER, etc.).
Now, wouldn't a variable in shared memory controlled by some system
function a better option? There are as well some utility code paths
that we wouldn't want to block so we would end up with a switch on all
the DDL Stmt nodes or a large portion of them. Thoughts?
Regards,
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-09-22 00:24:07
Message-ID: CAB7nPqQkhgUai3PSUUqHxyS8oSgV=1VHppsxFukPAZmyKsCVVg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 11:33 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> Thoughts?

I have been poking at that during the long flight back from Chicago
and created the attached patch that makes pg_dump able to create a
replication slot (hence have pg_dump put its hands on a synchronized
snapshot describing data at the state of slot creation), then take a
dump using the exported snapshot while maintaining the replication
connection for slot creation alive for the duration of the dump.

Taking a dump consistent with a replication slot is useful for online
upgrade cases first, because you can simply run pg_dump, have a slot
created, and get as well a state of the database consistent with the
slot creation before replaying changes in a way or another. Using
that, a decoder that generates raw queries, and a receiver able to
apply changes on a remote Postgres server, it is possible to get a
kind of live migration solution from a Postgres instance to another
for a single database, as long as the origin server uses 9.4. Making
the receiver report write and flush positions makes also possible the
origin server to use synchronous replication protocol to be sure that
changes got applied on remote before performing a switch from the
origin to the remote (that may actually explain why multi-syncrep
would be useful here for multiple databases). Also, I imagine that
users could even use this tool in pg_dump for example to do some post
processing on the data dumped in accordance to the decoder plugin
before applying changes to a remote source.

Now, this is done with the addition of two options in pg_dump to
control the logical slot creation:
- --slot to define the name of the slot being created
- --plugin-name, to define the name of the decoder plugin
And then you can of course do things like that:
# Raw data dump on a slot
$ pg_dump --slot bar --plugin-name test_decoding
# Existing parallel dump not changed:
$ pg_dump -j 4 -f data -F d
# Parallel dump on a slot
$ pg_dump -j 4 --slot bar --plugin-name test_decoding -f data -F d

This patch does not solve the existing problems related to relation
locking between LOCK taken on tables and the moment a snapshot is
exported (actually that's a different problem), but similarly to
parallel pg_dump it reduces the exposition window to schema changes to
a minimum. This has needed the addition of some logic to make pg_dump
aware of replication connection. Parallel dumps are supported as well,
the trick being to be sure that the existing parallel dump facility is
still using the snapshots from the main db connection, and not the
replication connection, while parallel dumps are possible using the
snapshot from the slot created.

The first patch attached is the feature itself. The second patch, that
can be applied on top the first one, outputs some useful logs to track
the snapshot creation depending on the code paths taken. I used that
for debugging purposes only, just posting it here for reference. I'll
add that to the next commit fest (patch contains docs as well).

Regards,
--
Michael

Attachment Content-Type Size
0001-pg_dump_repslot_core.patch text/x-patch 16.5 KB
0002-pg_dump_repslot_debug.patch text/x-patch 1.2 KB

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-14 14:55:14
Message-ID: 543D3952.5040905@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22/09/14 02:24, Michael Paquier wrote:
> On Thu, Sep 4, 2014 at 11:33 PM, Michael Paquier
>
> Taking a dump consistent with a replication slot is useful for online
> upgrade cases first, because you can simply run pg_dump, have a slot
> created, and get as well a state of the database consistent with the
> slot creation before replaying changes in a way or another. Using
> that, a decoder that generates raw queries, and a receiver able to
> apply changes on a remote Postgres server, it is possible to get a
> kind of live migration solution from a Postgres instance to another
> for a single database, as long as the origin server uses 9.4. Making
> the receiver report write and flush positions makes also possible the
> origin server to use synchronous replication protocol to be sure that
> changes got applied on remote before performing a switch from the
> origin to the remote (that may actually explain why multi-syncrep
> would be useful here for multiple databases). Also, I imagine that
> users could even use this tool in pg_dump for example to do some post
> processing on the data dumped in accordance to the decoder plugin
> before applying changes to a remote source.
>
> Now, this is done with the addition of two options in pg_dump to
> control the logical slot creation:
> - --slot to define the name of the slot being created
> - --plugin-name, to define the name of the decoder plugin
> And then you can of course do things like that:
> # Raw data dump on a slot
> $ pg_dump --slot bar --plugin-name test_decoding
> # Existing parallel dump not changed:
> $ pg_dump -j 4 -f data -F d
> # Parallel dump on a slot
> $ pg_dump -j 4 --slot bar --plugin-name test_decoding -f data -F d
>

Wouldn't it be better to have the slot handling done outside of pg_dump
by whatever replication solution you use and just have pg_dump accept
the snapshot as input parameter? I am not sure how much I like pg_dump
creating the slot. I am aware that you need to have the replication
connection open but that's IMHO just matter of scripting it together.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-14 22:09:10
Message-ID: CAB7nPqRJfThS902YsjJnL07u9a0GhbYJ5X+KkXuUsTrU_No7xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 14, 2014 at 11:55 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> On 22/09/14 02:24, Michael Paquier wrote:
>>
>> On Thu, Sep 4, 2014 at 11:33 PM, Michael Paquier
>>
>> Taking a dump consistent with a replication slot is useful for online
>> upgrade cases first, because you can simply run pg_dump, have a slot
>> created, and get as well a state of the database consistent with the
>> slot creation before replaying changes in a way or another. Using
>> that, a decoder that generates raw queries, and a receiver able to
>> apply changes on a remote Postgres server, it is possible to get a
>> kind of live migration solution from a Postgres instance to another
>> for a single database, as long as the origin server uses 9.4. Making
>> the receiver report write and flush positions makes also possible the
>> origin server to use synchronous replication protocol to be sure that
>> changes got applied on remote before performing a switch from the
>> origin to the remote (that may actually explain why multi-syncrep
>> would be useful here for multiple databases). Also, I imagine that
>> users could even use this tool in pg_dump for example to do some post
>> processing on the data dumped in accordance to the decoder plugin
>> before applying changes to a remote source.
>>
>> Now, this is done with the addition of two options in pg_dump to
>> control the logical slot creation:
>> - --slot to define the name of the slot being created
>> - --plugin-name, to define the name of the decoder plugin
>> And then you can of course do things like that:
>> # Raw data dump on a slot
>> $ pg_dump --slot bar --plugin-name test_decoding
>> # Existing parallel dump not changed:
>> $ pg_dump -j 4 -f data -F d
>> # Parallel dump on a slot
>> $ pg_dump -j 4 --slot bar --plugin-name test_decoding -f data -F d
>>
>
> Wouldn't it be better to have the slot handling done outside of pg_dump by
> whatever replication solution you use and just have pg_dump accept the
> snapshot as input parameter? I am not sure how much I like pg_dump creating
> the slot. I am aware that you need to have the replication connection open
> but that's IMHO just matter of scripting it together.
The whole point of the operation is to reduce the amount of time the
external snapshot is taken to reduce the risk of race conditions that
may be induced by schema changes. See for example discussions related
to why we do not authorize specifying a snapshot name as an option of
pg_dump.
--
Michael


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-15 05:06:33
Message-ID: 20141015050633.GB7242@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-15 07:09:10 +0900, Michael Paquier wrote:
> > whatever replication solution you use and just have pg_dump accept the
> > snapshot as input parameter? I am not sure how much I like pg_dump creating
> > the slot. I am aware that you need to have the replication connection open
> > but that's IMHO just matter of scripting it together.
> The whole point of the operation is to reduce the amount of time the
> external snapshot is taken to reduce the risk of race conditions that
> may be induced by schema changes. See for example discussions related
> to why we do not authorize specifying a snapshot name as an option of
> pg_dump.

I think that's completely the wrong way to go at this. The time it takes
to create a replication slot under write load is far larger than the
time it takes to start pg_dump and load. This really doesn't add any
actual safety. Also, the inability to use the snapshot outside of
pg_dump restricts the feature far too much imo.

I personally think we should just disregard the race here and add a
snapshot parameter. The race is already there and not exactly
small. Let's not kid ourselves that hiding it solves anything.

But if that's not the way to go, we need to think about a way of how to
prevent "problematic" DDL that's not racy.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-15 05:28:16
Message-ID: CAB7nPqQj_kk_UoyxGgN-3mkZ8FsiRPrYhaBv9ESeyZYa-T4Utg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 15, 2014 at 2:06 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> I think that's completely the wrong way to go at this. The time it takes
> to create a replication slot under write load is far larger than the
> time it takes to start pg_dump and load. This really doesn't add any
> actual safety. Also, the inability to use the snapshot outside of
> pg_dump restricts the feature far too much imo.
>
> I personally think we should just disregard the race here and add a
> snapshot parameter. The race is already there and not exactly
> small. Let's not kid ourselves that hiding it solves anything.
>
> But if that's not the way to go, we need to think about a way of how to
> prevent "problematic" DDL that's not racy.
>

Well, I would be perfectly happy to be able to specify a snapshot for
pg_dump, now the reason why this approach is used is to be able to isolate
the DDL conflicts into pg_dump itself without relying on any external
mechanism, be it an extra client controlling lock on the objects being
dumped, or a system-wide lock preventing any DDL command (LOCK SYSTEM kind
of thing). This seems more user-friendly. But well I agree that we could do
a larger set of things that could be used for even other purposes:
- Ability to define snapshot name with pg_dump
- Take system or database-wide lock
- Extra client application running the whole
Now is this set of features worth doing knowing that export snapshot has
been designed for multi-threaded closed applications? Not much sure.
Regards,
--
Michael


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-15 05:46:34
Message-ID: 20141015054634.GD7242@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-15 14:28:16 +0900, Michael Paquier wrote:
> On Wed, Oct 15, 2014 at 2:06 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
>
> > I think that's completely the wrong way to go at this. The time it takes
> > to create a replication slot under write load is far larger than the
> > time it takes to start pg_dump and load. This really doesn't add any
> > actual safety. Also, the inability to use the snapshot outside of
> > pg_dump restricts the feature far too much imo.
> >
> > I personally think we should just disregard the race here and add a
> > snapshot parameter. The race is already there and not exactly
> > small. Let's not kid ourselves that hiding it solves anything.
> >
> > But if that's not the way to go, we need to think about a way of how to
> > prevent "problematic" DDL that's not racy.
> >
>
> Well, I would be perfectly happy to be able to specify a snapshot for
> pg_dump, now the reason why this approach is used is to be able to isolate
> the DDL conflicts into pg_dump itself without relying on any external
> mechanism, be it an extra client controlling lock on the objects being
> dumped, or a system-wide lock preventing any DDL command (LOCK SYSTEM kind
> of thing).

There's no 'isolation' here imo. pg_dump *does not* detect these
cases. I've posted a couple of examples of that in some earlier thread
about this.

> This seems more user-friendly. But well I agree that we could do
> a larger set of things that could be used for even other purposes:
> - Ability to define snapshot name with pg_dump
> - Take system or database-wide lock
> - Extra client application running the whole
> Now is this set of features worth doing knowing that export snapshot has
> been designed for multi-threaded closed applications? Not much sure.
> Regards,

What do you mean with "designed for multi-threaded closed applications"?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-15 06:06:46
Message-ID: CAB7nPqSUbT59OEX-9C0VsaKCS+t8dAx6ZJ5qgo4EF0Y-Lk8v-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 15, 2014 at 2:46 PM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:

> > This seems more user-friendly. But well I agree that we could do
> > a larger set of things that could be used for even other purposes:
> > - Ability to define snapshot name with pg_dump
> > - Take system or database-wide lock
> > - Extra client application running the whole
> > Now is this set of features worth doing knowing that export snapshot has
> > been designed for multi-threaded closed applications? Not much sure.
> > Regards,
>
> What do you mean with "designed for multi-threaded closed applications"?
>
External snapshots creation and control should be localized within
dedicated client applications only. At least that's what I understand from
it as that's how it is used now.
--
Michael


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-16 15:19:14
Message-ID: CA+TgmoYaKxL=UNU4rk2fSU8qSqMSj3ELtqqFj=Y4bL_gAAYhNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 15, 2014 at 1:06 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I personally think we should just disregard the race here and add a
> snapshot parameter. The race is already there and not exactly
> small. Let's not kid ourselves that hiding it solves anything.

I, too, favor that plan.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-17 04:25:44
Message-ID: CAB7nPqQ6Qo1vbeYQhyfFcCyDybc62qdDsxvXB7oZzHKCCG98=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 17, 2014 at 12:19 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Oct 15, 2014 at 1:06 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
> wrote:
> > I personally think we should just disregard the race here and add a
> > snapshot parameter. The race is already there and not exactly
> > small. Let's not kid ourselves that hiding it solves anything.
> I, too, favor that plan.
>
Two votes in favor of that from two committers sounds like a deal. Here is
an refreshed version of the patch introducing --snapshot from here, after
fixing a couple of things and adding documentation:
http://www.postgresql.org/message-id/CA+U5nMK9+TTCff_-4MfdxWHnASTAuHuq7u7uedD57vaY28AsQA@mail.gmail.com

When the snapshot specified by user is not a valid one, here is the error
returned by pg_dump:
$ pg_dump --snapshot 'ppo'
pg_dump: [archiver (db)] query failed: ERROR: invalid snapshot identifier:
"ppo"
pg_dump: [archiver (db)] query was: SET TRANSACTION SNAPSHOT 'ppo'
I thinks that's fine, and it makes the code lighter to rely on the existing
error machinery.

Regards,
--
Michael

Attachment Content-Type Size
20141017_pg_dump_snapshot.patch text/x-patch 6.9 KB

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Petr Jelinek <petr(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-27 10:37:34
Message-ID: 544E206E.3010302@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/10/14 06:25, Michael Paquier wrote:
> Two votes in favor of that from two committers sounds like a deal. Here
> is an refreshed version of the patch introducing --snapshot from here,
> after fixing a couple of things and adding documentation:
> http://www.postgresql.org/message-id/CA+U5nMK9+TTCff_-4MfdxWHnASTAuHuq7u7uedD57vaY28AsQA@mail.gmail.com
>

Hi,

I have two minor things:

+ printf(_(" --snapshot use given synchronous
snapshot for the dump\n"));

I think this should say --snapshot=NAME or something like that to make
it visible that you are supposed to provide the parameter.

The other thing is, you changed logic of the parallel dump behavior a
little - before your patch it works in a way that one connection exports
snapshot and others do "SET TRANSACTION SNAPSHOT", after your patch,
even the connection that exported the snapshot does the "SET TRANSACTION
SNAPSHOT" which is unnecessary. I don't see it as too big deal but I
don't see the need to change that behavior either.

You could do something like this instead maybe?:
if (AH->sync_snapshot_id)
SET TRANSACTION SNAPSHOT
else if (AH->numWorkers > 1 && AH->remoteVersion >= 90200 &&
!dopt->no_synchronized_snapshots)
AH->sync_snapshot_id = get_synchronized_snapshot(AH);

And remove the else if for the if (dumpsnapshot) part.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-28 02:15:43
Message-ID: CAB7nPqSjOEieh4hBTJ_QeUtdGDqnnEGXVEiVJ_xi6k95qV2xig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 27, 2014 at 7:37 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> I have two minor things:
> + printf(_(" --snapshot use given synchronous
> snapshot for the dump\n"));
Thanks for your input!

> I think this should say --snapshot=NAME or something like that to make it
> visible that you are supposed to provide the parameter.
Right. Let's do --snapshot=SNAPSHOT then.

> The other thing is, you changed logic of the parallel dump behavior a little
> - before your patch it works in a way that one connection exports snapshot
> and others do "SET TRANSACTION SNAPSHOT", after your patch, even the
> connection that exported the snapshot does the "SET TRANSACTION SNAPSHOT"
> which is unnecessary. I don't see it as too big deal but I don't see the
> need to change that behavior either.
Indeed, let's save some resources and not have the connection
exporting the snapshot use SET TRANSACTION SNAPSHOT if that's not
necessary.

> You could do something like this instead maybe?:
> if (AH->sync_snapshot_id)
> SET TRANSACTION SNAPSHOT
> else if (AH->numWorkers > 1 && AH->remoteVersion >= 90200 &&
> !dopt->no_synchronized_snapshots)
> AH->sync_snapshot_id = get_synchronized_snapshot(AH);
> And remove the else if for the if (dumpsnapshot) part.
Right as well. We still need to check for dumpsnapshot to set up
sync_snapshot_id for the first connection such as it can pass the
value to the other workers though.

Updated patch with those comments addressed is attached.
Regards,
--
Michael

Attachment Content-Type Size
20141028_pg_dump_snapshot_v2.patch text/x-patch 6.9 KB

From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-28 11:08:20
Message-ID: 544F7924.7040502@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 28/10/14 03:15, Michael Paquier wrote:
>
> Updated patch with those comments addressed is attached.

Great, I have no further comments so I consider this patch ready for
committer (and will mark it so momentarily).

Just as a note - there is the issue you raised yourself about the less
than perfect error message, but I really don't think it's worth the
trouble to have special handling for it as the message coming from the
server is clear enough IMHO.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Petr Jelinek <petr(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-10-28 11:34:36
Message-ID: CAB7nPqQbMY_3v7Ytv=crq6AkNsHn2pG56p844Fmk981GJfroYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 28, 2014 at 8:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
> Hi,
>
> On 28/10/14 03:15, Michael Paquier wrote:
>>
>>
>> Updated patch with those comments addressed is attached.
>
>
> Great, I have no further comments so I consider this patch ready for
> committer (and will mark it so momentarily).
OK thanks!

> Just as a note - there is the issue you raised yourself about the less than
> perfect error message, but I really don't think it's worth the trouble to
> have special handling for it as the message coming from the server is clear
> enough IMHO.
Yeah thinking the same. Let's see what others think (btw, if this code
gets committed, be sure to mark Simon as a co-author).
Regards,
--
Michael


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-11-17 22:13:55
Message-ID: CA+U5nMLTmEO+CRHaxvgYTgKK22dh54GmMV94ASObVcq05m5sHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 28 October 2014 11:34, Michael Paquier <michael(dot)paquier(at)gmail(dot)com> wrote:
> On Tue, Oct 28, 2014 at 8:08 PM, Petr Jelinek <petr(at)2ndquadrant(dot)com> wrote:
>> Hi,
>>
>> On 28/10/14 03:15, Michael Paquier wrote:
>>>
>>>
>>> Updated patch with those comments addressed is attached.
>>
>>
>> Great, I have no further comments so I consider this patch ready for
>> committer (and will mark it so momentarily).
> OK thanks!
>
>> Just as a note - there is the issue you raised yourself about the less than
>> perfect error message, but I really don't think it's worth the trouble to
>> have special handling for it as the message coming from the server is clear
>> enough IMHO.
> Yeah thinking the same. Let's see what others think (btw, if this code
> gets committed, be sure to mark Simon as a co-author).

Committed.

Thanks very much for pushing forwards with this.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Petr Jelinek <petr(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better support of exported snapshots with pg_dump
Date: 2014-11-17 23:54:10
Message-ID: CAB7nPqT5QcZ=pDqjQu=C1+fo38Zvbf-US6LBzP_Znqi+jiZCig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 18, 2014 at 7:13 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Committed.
>
> Thanks very much for pushing forwards with this.
Thanks.
--
Michael