Re: pg_dump and pg_dumpall in real life (proposal)

Lists: pgsql-hackers
From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: rafael(at)postgresql(dot)org(dot)es
Subject: pg_dump and pg_dumpall in real life
Date: 2013-11-11 13:59:10
Message-ID: 5280E2AE.8070106@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello

After some discussions in #pgconfeu, this is an attempt to relaunch
the discussion about how pg_dump and pg_dumpall work and the
challenges they give us in real life.

We have got bitten sometimes because of their behavior and we can see
it is a difficult subject for new postgres users even if they have
long experience with others databases.

Logical backups are used for restores or cloning purposes. If as a
database administrator you want to be able to do one of these
procedures for a database, you need to do this in advance today:

1) Use pg_dump to dump schema, data and privileges of the database.
2) Use pg_dumpall -g to dump global objects
3) Use pg_dumpall -g to dump ALTER ROLE ... SET ... data
4) Use pg_dumpall to dump possible ALTER DATABASE ... SET ... data

For a cloning procedure we need the samme steps but usually and in
addition we have to change the name of the owner/database when
importing the dumps.

If you have just a few and not very complicated databases in your
cluster, these steps will not be very complicated although very
irritating.

Imagine you have several hundred databases in your cluster, with
several hundred users owning some objects and with grants in others.

Imagine you are cloning or restoring only one or a few of these
databases to another server. For 2), 3) and 4) you will have to parse
the output from pg_dumpall to get the few global objects, ALTER ROLE
and ALTER DATABASE data for the few databases you are restoring.

In addition, if you have used "GRANT .. ON .. TO ..." in your database
objects you will have to take care of this manually to find out who
has extra privileges in your objects, so you can also get the right
information from pg_dumpall -g.

You don't need a lot of imagination to understand what a mess this can
be when moving data around. Not to talk about the possibility of doing
something wrong in the process and not be a very robust solution.

After many years of using pg_dump/pg_dumpall, this is our experience,
our wishes and thoughts:

* We need a pg_dump solution that can generate in one step all the
necessary pieces of information needed when restoring or cloning a
database. (schema, data, privileges, users and alter database/role data)

* It would be great to be able to tell pg_restore that user1 in the
dump will became user2 in the restored/cloned database. The same for
the name of the database.

* For serious backup management of large and complicated databases,
pg_dump with the custom output + pg_restore is the only feasible solution.

What do you think about the subject? Does it sound like a reasonable
proposition? What do we need to implement some of these changes?

Thanks in advance for your time.

Some background information:

Ref:
http://wiki.postgresql.org/wiki/Todo
http://www.postgresql.org/message-id/4864F001.50909@archonet.com
http://www.postgresql.org/message-id/11646.1272814212@sss.pgh.pa.us

regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlKA4q4ACgkQBhuKQurGihSJJACglhZnjSTGFvzz6Rl0Vhrl3BrY
gssAni2l7kOQFxzr6IlDHAd0oMryDkT5
=Ti6V
-----END PGP SIGNATURE-----


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 14:24:34
Message-ID: 20131111142434.GA2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Rafael Martinez (r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no) wrote:
> * We need a pg_dump solution that can generate in one step all the
> necessary pieces of information needed when restoring or cloning a
> database. (schema, data, privileges, users and alter database/role data)

This sounds pretty reasonable and should be possible to implement- but
realize that, on the restore side, you might end up with multiple
attempts to create the same objects. Consider a role that's depended
upon by objects in two databases- it would be included in the dump of
both of those databases and if you restored both of those into the same
cluster, one of the CREATE ROLE statements would fail.

I'd think this would also be a new pg_dump option along the lines of
'include global dependencies' or similar. Reading the older threads, I
also agree that a '--create' version of pg_dump should include the
various SET commands for the database to be configured the same as the
one being dump'd. The next part seems simple- let's get someone to do
it.. :)

> * It would be great to be able to tell pg_restore that user1 in the
> dump will became user2 in the restored/cloned database. The same for
> the name of the database.

This is a lot uglier, unfortunately. We've seen this multiple times
before- there's not a good way to provide such a mapping as a command
line option. There may also be issues with the dependency resolution..

> * For serious backup management of large and complicated databases,
> pg_dump with the custom output + pg_restore is the only feasible solution.

Sure; is there a question here? I don't think that means we're going to
change the default, though there is a whole other thread on that
subject.

> What do you think about the subject? Does it sound like a reasonable
> proposition? What do we need to implement some of these changes?

As for 'what we need', I'd think someone with the time and energy to
write the patch and work with the community to implement it..

Thanks,

Stephen


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 14:28:51
Message-ID: 5280E9A3.4010304@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/11/2013 08:59 AM, Rafael Martinez wrote:
>
> * It would be great to be able to tell pg_restore that user1 in the
> dump will became user2 in the restored/cloned database. The same for
> the name of the database.

A general ability to rename things would be good. In particular,
restoring schema x into schema y or table x into table y would be very
useful, especially if you need to be able to compare old with new.

Unfortunately, this would involve a fairly significant change in the
design of pg_dump / pg_restore. The stored SQL is currently fairly
opaque, and a renaming scheme would probably need to implement instead
some sort of placeholder mechanism. That would mean a LOT of work.

cheers

andrew


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 22:20:21
Message-ID: 52815825.70109@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/11/2013 06:24 AM, Stephen Frost wrote:
> * Rafael Martinez (r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no) wrote:
>> * We need a pg_dump solution that can generate in one step all the
>> necessary pieces of information needed when restoring or cloning a
>> database. (schema, data, privileges, users and alter database/role data)
>
> This sounds pretty reasonable and should be possible to implement- but
> realize that, on the restore side, you might end up with multiple
> attempts to create the same objects. Consider a role that's depended
> upon by objects in two databases- it would be included in the dump of
> both of those databases and if you restored both of those into the same
> cluster, one of the CREATE ROLE statements would fail.

Well, then we just need pg_restore to handle the "role already exists"
error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS"
statement, and use that for roles.

>> * It would be great to be able to tell pg_restore that user1 in the
>> dump will became user2 in the restored/cloned database. The same for
>> the name of the database.
>
> This is a lot uglier, unfortunately. We've seen this multiple times
> before- there's not a good way to provide such a mapping as a command
> line option. There may also be issues with the dependency resolution..

This sounds like one of the reasons we still *have* text-mode dumps.
For stuff like this.

> As for 'what we need', I'd think someone with the time and energy to
> write the patch and work with the community to implement it..

+1

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 22:50:39
Message-ID: 1384210239631-5777816.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote
> A general ability to rename things would be good. In particular,
> restoring schema x into schema y or table x into table y would be very
> useful, especially if you need to be able to compare old with new.

compare old and new what? I would imagine that schema comparisons would be
much easier if the only thing that is different is the database name and you
compare database "old" to database "new".

Are there any existing threads or posts, that you recollect, that detail
solid use-cases for "clone-and-rename" mechanics? I don't seem to recall
anything in the past year or so but my coverage is probably only about 70%
in that timeframe.

SQL seems particularly unfriendly to renaming and runtime name resolution in
general (largely due to caching effects). Some kind of alias mechanism
makes sense conceptually but the performance hit for such isn't likely to be
worth incurring.

I could see having table name aliases so that raw data in a dump from one
database could be restored into another but I'd likely require that the user
be able to generate the target schema from source themselves. That would
facilitate the use-case where the DBA/programmer is able to fully recreate
their schema from source and only require that actual data be restored into
the newly created database. I can see where grants may fall into a grey
middle-area but functions/view/triggers and the like would need to be
synchronized with any schema naming changes and that should, IMO, be driven
from source and not facilitated by a dump/restore process.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp5777718p5777816.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 23:06:22
Message-ID: 1384211182348-5777823.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote
> Well, then we just need pg_restore to handle the "role already exists"
> error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS"
> statement, and use that for roles.

My only qualm here is if the exists check is based off of role name only.
If database "A" and database "B" came from different clusters but both have
a role "david" the actual identity of "david" is (could be) different
because the source cluster.

The risk of such occurring is a high-security situation is likely to be
small but some kind of "--ignore-different-cluster-same-role" flag may be
worthwhile such that pg_restore will error unless that flag is set (i.e.,
high security by default). The error itself should be rare enough most
people wouldn't even notice it is there but seeing such an error (with a
hint provided as well) would be easily able to disable and continue on with
the restore.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/pg-dump-and-pg-dumpall-in-real-life-tp5777718p5777823.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-11 23:16:31
Message-ID: 5281654F.10007@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/11/2013 03:06 PM, David Johnston wrote:
> Josh Berkus wrote
>> Well, then we just need pg_restore to handle the "role already exists"
>> error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS"
>> statement, and use that for roles.
>
> My only qualm here is if the exists check is based off of role name only.
> If database "A" and database "B" came from different clusters but both have
> a role "david" the actual identity of "david" is (could be) different
> because the source cluster.
>
> The risk of such occurring is a high-security situation is likely to be
> small but some kind of "--ignore-different-cluster-same-role" flag may be
> worthwhile such that pg_restore will error unless that flag is set (i.e.,
> high security by default). The error itself should be rare enough most
> people wouldn't even notice it is there but seeing such an error (with a
> hint provided as well) would be easily able to disable and continue on with
> the restore.

I'd do the opposite: let's optimize for the most common case, not the
least common one. So we'd do --role-errors, which would throw a fatal
error on duplicate roles, instead of just posting a WARNING.

Again, this is all rather academic, unless you know someone who's eager
to dig into pg_dump/pg_restore.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-12 00:15:36
Message-ID: 52817328.7040001@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/11/2013 05:50 PM, David Johnston wrote:
> Andrew Dunstan wrote
>> A general ability to rename things would be good. In particular,
>> restoring schema x into schema y or table x into table y would be very
>> useful, especially if you need to be able to compare old with new.
> compare old and new what?

Data is what I had in mind.

There have been plenty of times when I've been asked ex post to find out
what's changed in some table in the last 24 hours or something like
that, and all I've had to work with is yesterday's dump file. The
handsprings you have to turn in order to get the old version of the
table and the new version side by side make it painful - it would be
nice to be able to say "restore this table but with that name," or
"restore this table but into that schema".

cheers

andrew


From: Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-12 02:43:43
Message-ID: CAFcNs+qGXuraex5kvWja6sCiTK_PXOFrcAaNwT=YYxbtgQynFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 11, 2013 at 8:20 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>
> [...]
>
> Well, then we just need pg_restore to handle the "role already exists"
> error message gracefully. That's all. Or a "CREATE ROLE IF NOT EXISTS"
> statement, and use that for roles.
>

I'm working in a patch to add IF NOT EXISTS for all CREATE statements,
including of course the CREATE ROLE statement.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello


From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life
Date: 2013-11-12 06:16:42
Message-ID: 5281C7CA.7000407@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/11/2013 09:59 PM, Rafael Martinez wrote:
> * We need a pg_dump solution that can generate in one step all the
> necessary pieces of information needed when restoring or cloning a
> database. (schema, data, privileges, users and alter database/role
> data)

... and if some users/roles already exist, but have different
meanings? Or some roles exist and some don't?

I'm very strongly against adding and using CREATE ROLE IF NOT EXISTS.
pg_restore should handle this case-by-case, forcing the user to
specify explicitly role-by-role that they want a given role in the
existing DB re-used if it exists, or want a new one created with a new
name in case of a clash.

A --rename-all-conflicting-roles and --reuse-all-conflicting-roles
option could be added for the all-or-none options. IMO if neither is
specified, the existence of any role name conflict should be a restore
error.

> * It would be great to be able to tell pg_restore that user1 in
> the dump will became user2 in the restored/cloned database. The
> same for the name of the database.

Agreed that this would be useful. Needs to deal with the case where
the users should be separated but they should remain a member of some
common role, though - eg "olduser" becomes "newuser" but the dumped
"olduser" was member of role "users" and "newuser" should also be
member of "users", not some renamed role.

- --
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.15 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJSgcfKAAoJELBXNkqjr+S2XWIH/2c1Mcd4ldVTAPw/jAln4gNM
YH8SRPlsGU0fqfbYoKg/1y0K/Wdjdlac9bjjGzYpODryaXGopf1i+pWaphF2kJTM
LeMRVgEFEW7u2Dr6FXajQTQCiXLnA8C16NmmgIdqZZgYCsOwCorG+gFNfI8fZyft
okCQpYcljGXzlc218DI6/o4OZBBSdLh8diTzF8+xywoXJZopdAwfHDPPpAvizPye
rcUUkq1svArq78HakSuI8HoCy3ZHuiCf8mQEUPcLhFrwgh+bkrs29W7YAdD75gr4
yp32XeyOY5npXHaG9mHghs7anbUnwywJVEzpwKAf0SyPe7zunw8fdtx2NSF70no=
=dusS
-----END PGP SIGNATURE-----


From: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Cc: rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life (proposal)
Date: 2013-11-12 09:21:17
Message-ID: 5281F30D.2000805@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/11/2013 11:20 PM, Josh Berkus wrote:
> On 11/11/2013 06:24 AM, Stephen Frost wrote:
>> * Rafael Martinez (r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no) wrote:
>>> * We need a pg_dump solution that can generate in one step all
>>> the necessary pieces of information needed when restoring or
>>> cloning a database. (schema, data, privileges, users and alter
>>> database/role data)
>>
[.........]
>
>> As for 'what we need', I'd think someone with the time and energy
>> to write the patch and work with the community to implement it..
>
> +1
>

Well, I am willing to take a chance on the first suggestion if nobody
else has the time or energy.

I have never sent a patch or have worked with the postgres code, but I
think it can be done without a lot of work with some reuse of the code
used in pg_dumpall.

This is a proposal based on the feedback we have received:

* pg_dump will also deliver information about "ALTER DATABASE ... SET"
data for a given database when the option '--create' is used.

* pg_dump will deliver information about ROLES used and "ALTER ROLE
... SET" data for a given database when a new option i.e.
"--roles-global" is used.

* pg_restore will restore ROLE information when used with a new option
i.e. "--roles-global" and "ALTER DATABASE ... SET" information when
used with the '--create' option.

* We need to do something with how pg_restore will handle ROLES
information because some security concerns when restoring roles that
already exists on the target server.

Some of the suggestions are:

a) Implement and use CREATE ROLE IF NOT EXISTS and just throw a
warning or handle the "role already exists" error message gracefully.

b) Use a new option i.e. "--reuse-roles-in-conflict" to behave like
suggestion a). If this option is not used, pg_restore will stop with a
fatal error when a role already exist.

c) Use a new option i.e. "--on-role-error-stop" to stop with a fatal
error when a role already exist. If this option is not used pg_restore
will behave like suggestion a).

d) Use a new option i.e. "--rename-roles-in-conflict" to rename the
roles that already exists. If this option is not used, pg_restore will
stop with a fatal error when a role already exist.

I think I prefer b) to continue with the postgres spirit of security
by default. d) is too complicated for me due to lack of knowledge of
the postgres code.

Comments?
regards,
- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (GNU/Linux)

iEYEARECAAYFAlKB8w0ACgkQBhuKQurGihSq7QCfZdzreRGNRx0vUzXPjYqzNOIP
LqgAoJnfeCYjsfEUmsYvvp3DSL959IRL
=8Ynv
-----END PGP SIGNATURE-----


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Rafael Martinez <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org, rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life (proposal)
Date: 2013-11-12 14:28:57
Message-ID: 20131112142857.GD17272@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Rafael Martinez (r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no) wrote:
> Comments?

Create a wiki page for it. :)

Thanks,

Stephen


From: Rafael Martinez Guerrero <r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, rafael(at)postgresql(dot)org(dot)es
Subject: Re: pg_dump and pg_dumpall in real life (proposal)
Date: 2013-11-12 15:49:12
Message-ID: 52824DF8.8000902@usit.uio.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 11/12/2013 03:28 PM, Stephen Frost wrote:
> * Rafael Martinez (r(dot)m(dot)guerrero(at)usit(dot)uio(dot)no) wrote:
>> Comments?
>
> Create a wiki page for it. :)
>

What about this to start with?:
https://wiki.postgresql.org/wiki/Pg_dump_improvements

- --
Rafael Martinez Guerrero
Center for Information Technology
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlKCTfAACgkQBhuKQurGihSrYQCeKyYVthpbk47hGjayBjidqaFL
nysAn3JJjGT/8SuDUi2Nt6hEZ4eu1smz
=7wjV
-----END PGP SIGNATURE-----