Re: Anyone working on pg_dump dependency ordering?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Anyone working on pg_dump dependency ordering?
Date: 2003-11-21 16:40:10
Message-ID: 12906.1069432810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm thinking about attacking pg_dump's lack of knowledge about using
dependencies to determine a safe dump order. But if there's someone
out there actively working on the problem, I don't want to tread on
your toes ... anyone?

Also, if you've got uncommitted patches for pg_dump, please let me know.

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 02:24:19
Message-ID: 3FBEC8D3.3040807@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I'm thinking about attacking pg_dump's lack of knowledge about using
> dependencies to determine a safe dump order. But if there's someone
> out there actively working on the problem, I don't want to tread on
> your toes ... anyone?

I've done a whole lot of _thinking_, but basically no _doing_, so go
right ahead :)

I may as well let you know my thoughts:

There are two levels (sort of) of dependency. The first is that whole
classes of objects can be dependent on whole other classes. eg.
databases depend on users, or ALL FK's can be dumped after ALL tables,
etc.. It would make the dump more readable if you dumped those definite
dependencies in that order, rather than shuffling everything up.

The second level of dependency is when a bunch of object types can
depend on each other. The current solution for that is to sort by OID,
but this fails when it is possible to add a dependency to an object
after it has been created.

eg:

- Adding a column (with a type) to a table
- All the CREATE OR REPLACE commands
- etc.

Hence, a full db wide topological sort might not be necessary.

Lastly, I presume it's possible to create a system of circular
dependencies (eg create or replace view), which really cannot be solved
without a system of 'shells', similar to that needed to dump types and
their i/o functions.

Views seem to be by far the nastiest object. They can be dependent on
almost everything in the database.

> Also, if you've got uncommitted patches for pg_dump, please let me know.

Yes, my 'COMMENT ON' mega patch in the queue contains dumping of
conversions and comments on a bunch of objects.

BTW, if you commit that patch - you might want to change my comment on
type patch to put the "" around "any", and change the results file
appropriately. I noticed I accidentally included that in the patch, and
was about to mention it.

CHris


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 11:25:26
Message-ID: 3FBF47A6.5000404@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne wrote:

>
>
> There are two levels (sort of) of dependency. The first is that whole
> classes of objects can be dependent on whole other classes. eg.
> databases depend on users, or ALL FK's can be dumped after ALL tables,
> etc.. It would make the dump more readable if you dumped those
> definite dependencies in that order, rather than shuffling everything up.
>
I agree that dumping should be done class-wise (Tables, Functions,
Views) whenever possible, but I don't agree on FKs dumped separately
from the table. IMHO indexes and constraints belong to the table, and
modifying the dump will be hard if a table's code is scattered all around.

Regards,
Andreas


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 16:21:52
Message-ID: 20031122081814.E14063@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 22 Nov 2003, Andreas Pflug wrote:

> Christopher Kings-Lynne wrote:
>
> >
> >
> > There are two levels (sort of) of dependency. The first is that whole
> > classes of objects can be dependent on whole other classes. eg.
> > databases depend on users, or ALL FK's can be dumped after ALL tables,
> > etc.. It would make the dump more readable if you dumped those
> > definite dependencies in that order, rather than shuffling everything up.
> >
> I agree that dumping should be done class-wise (Tables, Functions,
> Views) whenever possible, but I don't agree on FKs dumped separately
> from the table. IMHO indexes and constraints belong to the table, and
> modifying the dump will be hard if a table's code is scattered all around.

You're going to potentially have the constraints scattered in any case due
to circular dependency chains. I'd think that having all the constraints
in one place would be easier than trying to go through the list of tables
that might be in a circular chain in order to find the constraints.


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 21:53:59
Message-ID: 3FBFDAF7.8090703@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephan Szabo wrote:

>You're going to potentially have the constraints scattered in any case due
>to circular dependency chains. I'd think that having all the constraints
>in one place would be easier than trying to go through the list of tables
>that might be in a circular chain in order to find the constraints.
>
>
I still disagree. cyclic dependencies should be avoided anyhow. You'll
get an awful lot of trouble loading data in such a case. Some database
systems refuse to create such stuff right away (mssql).

Regards,
Andreas


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 22:20:48
Message-ID: 1069539647.16912.124.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2003-11-22 at 16:53, Andreas Pflug wrote:
> Stephan Szabo wrote:
>
> >You're going to potentially have the constraints scattered in any case due
> >to circular dependency chains. I'd think that having all the constraints
> >in one place would be easier than trying to go through the list of tables
> >that might be in a circular chain in order to find the constraints.
> >
> >
> I still disagree. cyclic dependencies should be avoided anyhow. You'll
> get an awful lot of trouble loading data in such a case. Some database
> systems refuse to create such stuff right away (mssql).

CREATE TABLE a (col integer primary key);
CREATE TABLE b (col integer primary key);
ALTER TABLE a ADD FOREIGN KEY (col) REFERENCES b INITIALLY DEFERRED;
ALTER TABLE b ADD FOREIGN KEY (col) REFERENCES a;

How does MSSQL deal with the above?


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 23:02:10
Message-ID: 3FBFEAF2.8020700@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rod Taylor wrote:

>
>
>CREATE TABLE a (col integer primary key);
>CREATE TABLE b (col integer primary key);
>ALTER TABLE a ADD FOREIGN KEY (col) REFERENCES b INITIALLY DEFERRED;
>ALTER TABLE b ADD FOREIGN KEY (col) REFERENCES a;
>
>How does MSSQL deal with the above?#
>
>
It depends. Restricting FKs are generated silently, while ON DELETE
CASCADE will throw a message and refuse to create. MSSQL doesn't know
about deferred FKs; no chance to enter spuriously inconsistent data.
Still, using cyclic references is IMHO bad design style. I can't accept
an exceptional case as reason to break *all* table's definition into
pieces. The CREATE TABLE syntax shows that I'm probably not the only one
thinking like this: it may include all constraint definitions as well.

There might be discussions whether its better to script
CREATE TABLE xxx ..;
ALTER TABLE xxx ADD PRIMARY KEY ....;
ALTER TABLE xxx ADD FOREIGN KEY ....;
or
CREATE TABLE xxx (...., PRIMARY KEY (..), FOREIGN KEY (..));

I'd opt for the second version (a little formatted, maybe :-)

Regards,
Andreas


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-22 23:47:19
Message-ID: 1069544838.16912.133.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> >CREATE TABLE a (col integer primary key);
> >CREATE TABLE b (col integer primary key);
> >ALTER TABLE a ADD FOREIGN KEY (col) REFERENCES b INITIALLY DEFERRED;
> >ALTER TABLE b ADD FOREIGN KEY (col) REFERENCES a;

> Still, using cyclic references is IMHO bad design style. I can't accept

They're extremely useful when you have normalized data and a very
expensive (but repeatable) process whose results you want to cache for
performance reasons. It enforces that original data and cache are both
added in a single transaction.

But you're right. Normally they're a bad idea.

> an exceptional case as reason to break *all* table's definition into
> pieces. The CREATE TABLE syntax shows that I'm probably not the only one
> thinking like this: it may include all constraint definitions as well.
>
> There might be discussions whether its better to script
> CREATE TABLE xxx ..;
> ALTER TABLE xxx ADD PRIMARY KEY ....;
> ALTER TABLE xxx ADD FOREIGN KEY ....;
> or
> CREATE TABLE xxx (...., PRIMARY KEY (..), FOREIGN KEY (..));
>
> I'd opt for the second version (a little formatted, maybe :-)

Well.. the second one will be much slower when the foreign keys verify.
Primary, unique constraints I'll buy in the create statement. Check
constraints and defaults are a little fuzzier.

Logic will be required to pull them out in the event they call functions
which depends on the table or we enable subqueries (assertion like
constraints) in them.


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump dependency / physical hot backup
Date: 2003-11-23 12:39:24
Message-ID: 3FC0AA7C.4010108@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rod Taylor wrote:

>>There might be discussions whether its better to script
>>CREATE TABLE xxx ..;
>>ALTER TABLE xxx ADD PRIMARY KEY ....;
>>ALTER TABLE xxx ADD FOREIGN KEY ....;
>>or
>>CREATE TABLE xxx (...., PRIMARY KEY (..), FOREIGN KEY (..));
>>
>>I'd opt for the second version (a little formatted, maybe :-)
>>
>>
>
>Well.. the second one will be much slower when the foreign keys verify.
>
>
Verifying zero rows in the freshly created table should be quite fast...

>Primary, unique constraints I'll buy in the create statement. Check
>constraints and defaults are a little fuzzier.
>
>Logic will be required to pull them out in the event they call functions
>which depends on the table or we enable subqueries (assertion like
>constraints) in them.
>
>
>
Yes, complicated constructions might prevent creating a table's objects
in a single step. It's probably possible to design an object that can't
be extracted automatically and restored at all (e.g. a view using itself).

I wonder if there have been discussions about other ways to
backup/restore. The current pg_dump/pg_restore way will rebuild all
indexes etc, resulting in a fresh and consistent database after restore,
enabling backend upgrades, but it's tricky to guarantee everything runs
smoothly. And it can be quite slow. In a case of a disaster recovery,
this is probably very unlucky.
I wonder if a mixture of pg_dump and physical cluster backup(currently
only possible if backend is shut down) could be implemented, i.e. a
BACKUP sql command. This command should stream out all data from the
physical files, taking a snapshot. When restoring, the command would
create a new database from the input data, by more or less copying the
data to files. This way, all (hopefully non-existent) inconsistencies in
the database would be restored as well (probably including non-vacuumed
rows), but because no checks are applied the restore process would be as
fast as possible. This would be possible only for the same
backend/architecture version, but in case of disaster recovery that's
enough.

Regards,
Andreas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Rod Taylor <pg(at)rbt(dot)ca>
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-23 18:53:36
Message-ID: 28573.1069613616@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rod Taylor <pg(at)rbt(dot)ca> writes:
> Well.. the second one will be much slower when the foreign keys verify.
> Primary, unique constraints I'll buy in the create statement. Check
> constraints and defaults are a little fuzzier.

FK, primary, and unique constraints are already split out from the
CREATE TABLE for performance reasons. We could think about folding them
back in in a schema-only dump, but in a full dump I don't think it's
negotiable --- you really want to load the table data before you install
these constraints.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-23 21:21:12
Message-ID: 3FC124C8.5040105@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Rod Taylor <pg(at)rbt(dot)ca> writes:
>
>
>>Well.. the second one will be much slower when the foreign keys verify.
>>Primary, unique constraints I'll buy in the create statement. Check
>>constraints and defaults are a little fuzzier.
>>
>>
>
>FK, primary, and unique constraints are already split out from the
>CREATE TABLE for performance reasons. We could think about folding them
>back in in a schema-only dump, but in a full dump I don't think it's
>negotiable --- you really want to load the table data before you install
>these constraints.
>
>
IMHO here we have opposite requirements: The dump/restore process should
run as fast as possible, so constraints have to be generated separately,
but pg_dump is also used to reengineer and slightly modify the schema,
where it's helpful if definitions are grouped. I've been asked for
complete schema extraction features in pgAdmin3, and I replied 'use
pg_dump'. It seems that pg_dump tries to serve both requirements, being
a compromise where two dedicated tools could do it better.

Regards,
Andreas


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Rod Taylor <pg(at)rbt(dot)ca>, Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_dump dependency / physical hot backup
Date: 2003-11-24 01:26:33
Message-ID: 3FC15E49.3080801@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Verifying zero rows in the freshly created table should be quite fast...

It's hundreds of times faster to add an index to a full table than add
rows to a table with an index.

Chris


From: ow <oneway_111(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Rod Taylor <pg(at)rbt(dot)ca>
Cc: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-24 02:11:57
Message-ID: 20031124021157.72911.qmail@web21402.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> FK, primary, and unique constraints are already split out from the
> CREATE TABLE for performance reasons. We could think about folding them
> back in in a schema-only dump, but in a full dump I don't think it's
> negotiable --- you really want to load the table data before you install
> these constraints.

IMHO, not only data need to loaded before FK constraints are created but also
there has got to be a feature to allow creation of an FK constraint WITHOUT
doing the verification that all loaded/existing records satisfy the FK
constraint. The ability to create a FK constraint without verification of
existing records should exist by itself (maybe only to superuser) and also as
an option in pg_restore.

More details:
http://archives.postgresql.org/pgsql-admin/2003-11/msg00308.php
http://archives.postgresql.org/pgsql-admin/2003-11/msg00323.php

Thanks

__________________________________
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/


From: prinsarian(at)zonnet(dot)nl (Arian Prins)
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Anyone working on pg_dump dependency ordering?
Date: 2003-11-25 08:39:10
Message-ID: 254336a1.0311250039.4ae8803a@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

chriskl(at)familyhealth(dot)com(dot)au (Christopher Kings-Lynne) wrote in message news:<3FBEC8D3(dot)3040807(at)familyhealth(dot)com(dot)au>...
> Lastly, I presume it's possible to create a system of circular
> dependencies (eg create or replace view), which really cannot be solved
> without a system of 'shells', similar to that needed to dump types and
> their i/o functions.
>
> Views seem to be by far the nastiest object. They can be dependent on
> almost everything in the database.

Hello Group,

It might be an idea to keep track of all data-definition changes
during the lifetime of a database. Keep all the
SQL-definition-commands in a seperate systemtable somewhere. Then,
when the schema is dumped you'd have the choice of dumping:
- the most recent schema with database-determined ordering (as is the
case in the current situation)
- "replay" the "recorded" datadefinition of the past, so you know the
data definition is executed in a "sound" sequence.

Of course, data itself would be extracted seperate of the definitions
and there would also be the need to remove constrains while the data
is being loaded. This last thing could be done by parsing the
"recorded" data-definition-commands or by first blindly running the
"recorded" commands, then removing (or disabling) any constraints that
are there and finally replacing (or re-enabling) the constraints.

Negative side of this idea is obviously that you need to make changes
to a lot of internals of the postgresql database. AND you need to make
perfectly sure that the state of the database is always consistent
with the state of the recorded SQL.

Good Luck,
Arian.