pg_dump versus SERIAL, round N

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: "Morus Walter" <morus(dot)walter(at)experteer(dot)de>
Subject: pg_dump versus SERIAL, round N
Date: 2006-08-19 15:47:39
Message-ID: 27967.1156002459@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

We have still another complaint here:
http://archives.postgresql.org/pgsql-bugs/2006-08/msg00109.php
about pg_dump failing to cope nicely with any slightly-unusual
condition related to a SERIAL column. We've had previous
discussions about this, most recently this thread:
http://archives.postgresql.org/pgsql-hackers/2006-04/msg01118.php
but still no consensus on what to do.

I'm pretty well convinced at this point that having pg_dump dump
serial columns via SERIAL declarations is a failure. Doing it
that way means that the underlying sequence may change names
during reload, which pg_dump is not well prepared to cope with ---
we have a hack in place for setval() but not for anything else,
notably GRANT, nondefault ALTER SEQUENCE parameters, and references
to the sequence from other tables. Even if there were a reasonably
simple way to fix all those things, since when is it part of pg_dump's
charter to editorialize on your database schema? The goal ought to
be to reproduce the state of the database, not to "clean it up".

I think that we ought to make pg_dump dump these things as if they
were separate sequence objects, ie, in the style

CREATE SEQUENCE foo_bar_seq ... any nondefault parameters here ...

...

CREATE TABLE foo (
bar int default nextval('foo_bar_seq'),
...

This fixes most of the problems at a stroke. The one thing it breaks
is that after loading such a dump, there is no longer any dependency
from the sequence to the column, and thus dropping the column wouldn't
cause the sequence to go away, as it would've in the original database.

The new thought I had this morning was to fix that head-on: provide
a way for pg_dump to re-establish that dependency. If you were willing
to load the dump as superuser it could just INSERT a row into pg_depend,
but that's certainly not an acceptable assumption (and it wouldn't be
future-proof anyway; we learned the folly of that with pg_dumpall's
hacking on the pg_group table...) However, I see nothing wrong with
providing a slightly more abstract way of declaring that a sequence
"belongs to" a column. If we're willing to hack up the grammar a bit
we could make ALTER SEQUENCE do it, perhaps

ALTER SEQUENCE foo_bar_seq SERIAL FOR foo.bar;

which I would propose being allowed to anyone who owns both the sequence
and the table in question. Or the poor man's way to do it would
involve creating a pg_set_serial_sequence() function that does the
same thing. Further down the road we could consider allowing this
command to drop a serial-sequence association or reattach a sequence
to a different owning column, though pg_dump needs neither of these.

This seems no uglier to me than the existing pg_dump hack that uses
pg_get_serial_sequence(), and AFAICS it allows exact reproduction
of the state of the database, even in the face of renames, ALTER
COLUMN DEFAULT, etc.

In terms of the discussion I mentioned above, this amounts to embracing
the "SERIAL is a macro" philosophy rather than "SERIAL is a black box",
and recognizing that there's one little piece of the implementation
that still needs to be exposed so that we can describe exactly what
the macro consists of.

Comments?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-08-19 16:03:14 Re: pg_dump versus SERIAL, round N
Previous Message Tom Lane 2006-08-19 13:39:03 Re: Input Function (domain_in) Call