Re: pg_dump schema breakup

Lists: pgsql-hackers
From: Naz Gassiep <naz(at)mira(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_dump schema breakup
Date: 2006-08-18 16:28:51
Message-ID: 44E5EAC3.9010900@mira.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is my first post to a PostgreSQL mailing list, so please forgive me
if I have posted to the wrong place

Currently pg_dump has flags for dumping only table definitions and/or
data. These flags are respectively:
--schema-only
--data-only

I propose that two more be added:
--tables-only
--constraints-only

These would essentially break up the output of --schema-only into two
sections. I.e., the output of --tables-only plus the output of
--constraints-only would be identical to the output of --schema-only .

There are a number of scenarios where this may be useful, I will
describe the one that I would use it for.

When making changes to my database schema I would like to take the
schema from the newly modified database, and dump the data from the old
database into it to ensure that the new schema is able to take the data
that exists in the live database. If it isn't then I modify the new
schema or the live dataset as appropriate, and try again.

This requires the following steps:

1. Create temporary database and apply modified schema to it
2. Dump new database using --schema-only
3. Split new schema into table definitions and constraints
4. Apply new schema table definitions from step 34 to the testing database
5. Dump the existing database using --data-only
6. Apply the dataset from step 5 to the testing database
7. Apply new schema constraints from step 3 to the testing database

All of these steps are easily scriptable except step 3, which means that
making quick changes to the new schema and re-applying includes the
tedious task of opening the 5,000 line schema file in a text editor and
manually pasting the relevant sections into new files. Step 3 really
does hold up the development process with regards to testing changes to
the schema.

Generalizing the nature of this task, the pg_dump features I propose
would allow the easy scripting of dumping databases, making changes to
the dumps and then re-applying them in a non paradox-inducing order.

FWIW I thought this would be a very simple patch, and had a look at the
code for pg_dump myself, despite the fact that I've not even written a
Hello World in C as yet. That attempt failed miserably, so now I am
reduced to selling the merits of this idea to real developers.
Incidentally, --schema-only appears to break tables and constraints into
two sections already, as evidenced by the fact that all tables are
created first, and constraints afterwards.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Naz Gassiep <naz(at)mira(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 16:52:11
Message-ID: 21511.1155919931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Naz Gassiep <naz(at)mira(dot)net> writes:
> I propose that two more be added:
> --tables-only
> --constraints-only

This doesn't seem well-defined at all. There are many objects in a
database that are definitely neither tables nor constraints, and it's
not very clear what things should be considered constraints either.

I think what you may really be after is "the stuff that should be loaded
before inserting data" and "the stuff that should be loaded after", but
the above are poor names for these concepts.

regards, tom lane


From: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 16:57:25
Message-ID: 200608181857.25252.andreak@officenet.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 18 August 2006 18:52, Tom Lane wrote:
> Naz Gassiep <naz(at)mira(dot)net> writes:
> > I propose that two more be added:
> > --tables-only
> > --constraints-only
>
> This doesn't seem well-defined at all. There are many objects in a
> database that are definitely neither tables nor constraints, and it's
> not very clear what things should be considered constraints either.
>
> I think what you may really be after is "the stuff that should be loaded
> before inserting data" and "the stuff that should be loaded after", but
> the above are poor names for these concepts.

But it certainly would be nice to be able to dump all that "stuff":-)

--
Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Karenslyst Allé 11 | know how to do a thing and to watch |
PO. Box 529 Skøyen | somebody else doing it wrong, without |
0214 Oslo | comment. |
NORWAY | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+


From: Naz Gassiep <naz(at)mira(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 17:02:26
Message-ID: 44E5F2A2.2020709@mira.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Tom Lane wrote:
<blockquote cite="mid21511(dot)1155919931(at)sss(dot)pgh(dot)pa(dot)us" type="cite">
<pre wrap="">Naz Gassiep <a class="moz-txt-link-rfc2396E" href="mailto:naz(at)mira(dot)net">&lt;naz(at)mira(dot)net&gt;</a> writes:
</pre>
<blockquote type="cite">
<pre wrap="">I propose that two more be added:
--tables-only
--constraints-only
</pre>
</blockquote>
<pre wrap=""><!---->
This doesn't seem well-defined at all. There are many objects in a
database that are definitely neither tables nor constraints, and it's
not very clear what things should be considered constraints either.

I think what you may really be after is "the stuff that should be loaded
before inserting data" and "the stuff that should be loaded after", but
the above are poor names for these concepts.

regards, tom lane</pre>
</blockquote>
And here I was thinking that I'd been clear :)<br>
<br>
But yes, you are right, what I want is "the stuff that gets loaded
before data insertion" and "the stuff that goes after data insertion"<br>
<br>
Essentially, the CREATE TABLE statements are the first part as that is
what is needed for data to be inserted. Everything else is the second
part.
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 1.3 KB

From: Naz Gassiep <naz(at)mira(dot)net>
To: Andreas Joseph Krogh <andreak(at)officenet(dot)no>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 17:33:03
Message-ID: 44E5F9CF.1080407@mira.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-15"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Andreas Joseph Krogh wrote:
<blockquote cite="mid200608181857(dot)25252(dot)andreak(at)officenet(dot)no"
type="cite">
<pre wrap="">On Friday 18 August 2006 18:52, Tom Lane wrote:
</pre>
<blockquote type="cite">
<pre wrap="">Naz Gassiep <a class="moz-txt-link-rfc2396E" href="mailto:naz(at)mira(dot)net">&lt;naz(at)mira(dot)net&gt;</a> writes:
</pre>
<blockquote type="cite">
<pre wrap="">I propose that two more be added:
--tables-only
--constraints-only
</pre>
</blockquote>
<pre wrap="">This doesn't seem well-defined at all. There are many objects in a
database that are definitely neither tables nor constraints, and it's
not very clear what things should be considered constraints either.

I think what you may really be after is "the stuff that should be loaded
before inserting data" and "the stuff that should be loaded after", but
the above are poor names for these concepts.
</pre>
</blockquote>
<pre wrap=""><!---->
But it certainly would be nice to be able to dump all that "stuff":-)</pre>
</blockquote>
Yea, I've been told that this would not be a high demand feature. So do
I have a second vote? ;-)<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 1.3 KB

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Naz Gassiep <naz(at)mira(dot)net>
Cc: Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 17:39:38
Message-ID: 44E5FB5A.6000901@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>>> I think what you may really be after is "the stuff that should be loaded
>>> before inserting data" and "the stuff that should be loaded after", but
>>> the above are poor names for these concepts.
>>>
>>
>> But it certainly would be nice to be able to dump all that "stuff":-)
> Yea, I've been told that this would not be a high demand feature. So
> do I have a second vote? ;-)

The ability to have a dump that automatically separated the before-data
and after-data objects is definitely
useful. The amount of times I have had to dump the schema and data
separately just so I can modify the
schema before restore, or disable some function that is causing problems
can not even be counted.

Sincerely,

Joshua D. Drake


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Naz Gassiep <naz(at)mira(dot)net>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 17:48:38
Message-ID: 44E5FD76.2070905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
>
>>>> I think what you may really be after is "the stuff that should be
>>>> loaded
>>>> before inserting data" and "the stuff that should be loaded after",
>>>> but
>>>> the above are poor names for these concepts.
>>>>
>>>
>>> But it certainly would be nice to be able to dump all that "stuff":-)
>> Yea, I've been told that this would not be a high demand feature. So
>> do I have a second vote? ;-)
>
> The ability to have a dump that automatically separated the
> before-data and after-data objects is definitely
> useful. The amount of times I have had to dump the schema and data
> separately just so I can modify the
> schema before restore, or disable some function that is causing
> problems can not even be counted.
>

We already have a highly selective and configurable restore mechanism,
using the -L feature of pg_restore. Maybe there's a good special case
for this particular split, but it is hardly undoable now.

As for Naz' needs - I gave him a perl script I whipped up in few minutes
to do the split he wanted - and I'm sure you could do the same in python ;-)

cheers

andrew


From: Naz Gassiep <naz(at)mira(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 18:29:50
Message-ID: 44E6071E.5050403@mira.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> We already have a highly selective and configurable restore mechanism,
> using the -L feature of pg_restore. Maybe there's a good special case
> for this particular split, but it is hardly undoable now.
>
> As for Naz' needs - I gave him a perl script I whipped up in few
> minutes to do the split he wanted - and I'm sure you could do the same
> in python ;-)
>
> cheers
>
> andrew
And very appreciative I am too. While I concede it is doable now, and I
could do this in Python as well, I do feel that a feature like this
would have wider use than may be apparent on a first glance.
Furthermore, I think that the simplicity of implementing it makes it
really a question of "why not" rather than "why".
- Naz


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Naz Gassiep <naz(at)mira(dot)net>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 19:33:06
Message-ID: 20060818193306.GV32181@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Joshua D. Drake wrote:
> >
> >>>>I think what you may really be after is "the stuff that should be
> >>>>loaded
> >>>>before inserting data" and "the stuff that should be loaded after",
> >>>>but
> >>>>the above are poor names for these concepts.
> >>>>
> >>>
> >>>But it certainly would be nice to be able to dump all that "stuff":-)
> >>Yea, I've been told that this would not be a high demand feature. So
> >>do I have a second vote? ;-)
> >
> >The ability to have a dump that automatically separated the
> >before-data and after-data objects is definitely
> >useful. The amount of times I have had to dump the schema and data
> >separately just so I can modify the
> >schema before restore, or disable some function that is causing
> >problems can not even be counted.
> >
>
> We already have a highly selective and configurable restore mechanism,
> using the -L feature of pg_restore. Maybe there's a good special case
> for this particular split, but it is hardly undoable now.
>
> As for Naz' needs - I gave him a perl script I whipped up in few minutes
> to do the split he wanted - and I'm sure you could do the same in python ;-)

Maybe what we need is a program to process the object list from
pg_restore -L. Or a mode in pg_restore, "from this dump give me all
the sutff to be loaded before inserting data", and the reverse.

I mean, should the problem be attacked while dumping, or while
restoring?

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Naz Gassiep <naz(at)mira(dot)net>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 19:55:13
Message-ID: 44E61B21.9060704@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>> Joshua D. Drake wrote:
>>
>>>>>> I think what you may really be after is "the stuff that should be
>>>>>> loaded
>>>>>> before inserting data" and "the stuff that should be loaded after",
>>>>>> but
>>>>>> the above are poor names for these concepts.
>>>>>>
>>>>>>
>>>>> But it certainly would be nice to be able to dump all that "stuff":-)
>>>>>
>>>> Yea, I've been told that this would not be a high demand feature. So
>>>> do I have a second vote? ;-)
>>>>
>>> The ability to have a dump that automatically separated the
>>> before-data and after-data objects is definitely
>>> useful. The amount of times I have had to dump the schema and data
>>> separately just so I can modify the
>>> schema before restore, or disable some function that is causing
>>> problems can not even be counted.
>>>
>>>
>> We already have a highly selective and configurable restore mechanism,
>> using the -L feature of pg_restore. Maybe there's a good special case
>> for this particular split, but it is hardly undoable now.
>>
>> As for Naz' needs - I gave him a perl script I whipped up in few minutes
>> to do the split he wanted - and I'm sure you could do the same in python ;-)
>>
>
> Maybe what we need is a program to process the object list from
> pg_restore -L. Or a mode in pg_restore, "from this dump give me all
> the sutff to be loaded before inserting data", and the reverse.
>
> I mean, should the problem be attacked while dumping, or while
> restoring?
>
>

Well, the other issue is how many canned breakup schemes we are going to
support. If this particular one is of sufficiently general usefulness
then I have no objection. But when you can produce it trivially from the
output of "pg_dump -s", the need to hardcode it hardly seems pressing.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Naz Gassiep <naz(at)mira(dot)net>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-18 20:20:58
Message-ID: 24553.1155932458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Well, the other issue is how many canned breakup schemes we are going to
> support. If this particular one is of sufficiently general usefulness
> then I have no objection. But when you can produce it trivially from the
> output of "pg_dump -s", the need to hardcode it hardly seems pressing.

FWIW, I am in favor of providing a way to break up the dump output like
this, I was merely objecting to the vocabulary ;-). We have certainly
seen tons of people burnt by the performance problems inherent in
separate-data-and-schema restores, and splitting the dump into three
parts instead of two seems like it would fix that.

But I also like Alvaro's comment that this should be on the restore side
not so much the dump side. If you do two or three successive pg_dump
runs to make your dump then you run a nontrivial risk of not getting
consistent dumps. My advice to people would be to do *one* full
"pg_dump -Fc" and then extract three scripts out of that.

The question then is whether it's worth providing the extraction
functionality in a more canned, user-friendly form than "here, hack up
the -L output with this perl script". I'd vote yes.

regards, tom lane


From: Naz Gassiep <naz(at)mira(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Andreas Joseph Krogh <andreak(at)officenet(dot)no>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_dump schema breakup
Date: 2006-08-19 07:44:46
Message-ID: 44E6C16E.8000200@mira.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Tom Lane wrote:
<blockquote cite="mid24553(dot)1155932458(at)sss(dot)pgh(dot)pa(dot)us" type="cite">
<pre wrap="">Andrew Dunstan <a class="moz-txt-link-rfc2396E"
href="mailto:andrew(at)dunslane(dot)net">&lt;andrew(at)dunslane(dot)net&gt;</a> writes:
</pre>
<blockquote type="cite">
<pre wrap="">Well, the other issue is how many canned breakup schemes we are going to
support. If this particular one is of sufficiently general usefulness
then I have no objection. But when you can produce it trivially from the
output of "pg_dump -s", the need to hardcode it hardly seems pressing.
</pre>
</blockquote>
<pre wrap=""><!---->
FWIW, I am in favor of providing a way to break up the dump output like
this, I was merely objecting to the vocabulary ;-). We have certainly
seen tons of people burnt by the performance problems inherent in
separate-data-and-schema restores, and splitting the dump into three
parts instead of two seems like it would fix that.

But I also like Alvaro's comment that this should be on the restore side
not so much the dump side. If you do two or three successive pg_dump
runs to make your dump then you run a nontrivial risk of not getting
consistent dumps. My advice to people would be to do *one* full
"pg_dump -Fc" and then extract three scripts out of that.

The question then is whether it's worth providing the extraction
functionality in a more canned, user-friendly form than "here, hack up
the -L output with this perl script". I'd vote yes.

regards, tom lane</pre>
</blockquote>
I greatly appreciate the comments here and am glad that my initial idea
has support. This thread highlights to me the difference between the
"hey there's a good idea there despite the fact that's he's obviously
not a veteran software developer" culture that the PostgreSQL community
has instead of the "he is obviously not a veteran software developer so
what on Earth could he have to offer us" responses I've had from
various other open source projects.<br>
<br>
On a less obsequious note, I agree that pg_dump should be used to dump
everything in a single run to avoid consistency issues, and the
selection of data to be restored should be done with pg_restore. As
this is a feature that I would benefit greatly from, how do I go about
ensuring that this idea finds its way to the appropriate developer and
doesn't get forgotten in the mountain of ideas in the "that'd be nice
to have some day" category?<br>
<br>
- Naz<br>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 2.6 KB