Re: pg_migrator versus inherited columns

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_migrator versus inherited columns
Date: 2009-07-01 22:36:19
Message-ID: 17540.1246487779@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I was testing pg_migrator the other day on the regression database,
and found out that it doesn't work:

Restoring database schema psql:/home/postgres/pg_migrator_dump_db.sql:4545: ERROR: column "........pg.dropped.1........" of relation "dropcolumnchild" does not exist

Some investigation found out the reason. pg_migrator needs to be able
to create tables that exactly match the physical column layout of the
old database, including dropped columns if any. The way that it does
this is that the --binary-upgrade switch added to pg_dump causes pg_dump
to go ahead and include dropped columns in CREATE TABLE commands, and
then immediately drop them. That's okay for simple tables, but it's
got exactly zero hope of working in inheritance scenarios. A column
that was created and later dropped in a parent table might or might not
exist (though now dropped) in child tables.

More generally, the approach is fatally broken for inheritance children
even if they contain no dropped columns. Adding a column to a parent
table results in child-table column ordering that is different from what
you get after a dump and reload, for example after

create table p (f1 int, f2 int);
create table c (f3 int) inherits (p);
alter table p add column f4 int;

the column order in c is f1,f2,f3,f4 ... but after dump and reload
it will be f1,f2,f4,f3, because f4 will be part of the initial
definition of p rather than getting tacked on later.

We understood years ago that pg_dump has to take extra measures to deal
with this --- that's why it has to use COPY with a column list.

The only solution I can see that has a chance of working without a
massive amount of new logic in pg_dump is to change the way that
inheritance is managed. Instead of dumping the above situation as

create table p (f1 int, f2 int, f4 int);
create table c (f3 int) inherits (p);

I propose that when --binary-upgrade is active, it should be dumped as

create table p (f1 int, f2 int, f4 int);
create table c (f1 int, f2 int, f4 int, f3 int);
alter table c add inherit p;

(It's a good thing we added ADD INHERIT in 8.4, or we'd be completely
up the creek here.) In this way we can ensure that the physical order
of columns really is what it needs to be in the child table. Dropped
columns can then be managed in the same way as the current code does,
but it'll actually work. (We have to drop them before doing the
ADD INHERIT of course.)

Comments?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_migrator versus inherited columns
Date: 2009-07-02 00:17:42
Message-ID: 407d949e0907011717h2f44a8e6tf8a41e472c20c760@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 1, 2009 at 11:36 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> (It's a good thing we added ADD INHERIT in 8.4, or we'd be completely
> up the creek here.)  In this way we can ensure that the physical order
> of columns really is what it needs to be in the child table.  Dropped
> columns can then be managed in the same way as the current code does,
> but it'll actually work.  (We have to drop them before doing the
> ADD INHERIT of course.)
>
> Comments?

Uhm, we've had ADD INHERIT since 8.2 -- that was my first patch.

This will result in all the columns being marked attislocal. Ie, if
they're dropped from the parent they won't be dropped automatically
from the children any longer.

Frankly I never really liked attislocal -- it seems to me the user
knows when he's dropping the column whether he wants it dropped from
the children and should be able to explicitly request it to cascade or
not. I can't imagine many use cases where you want it to cascade to
some children but not others based on how they were originally
created.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_migrator versus inherited columns
Date: 2009-07-02 00:25:56
Message-ID: 8479.1246494356@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Wed, Jul 1, 2009 at 11:36 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Comments?

> Uhm, we've had ADD INHERIT since 8.2 -- that was my first patch.

Hmm, 8.3 doesn't seem to recognize the syntax:

regression=# alter table c add inherit p;
ERROR: type "p" does not exist
LINE 1: alter table c add inherit p;
^

> This will result in all the columns being marked attislocal. Ie, if
> they're dropped from the parent they won't be dropped automatically
> from the children any longer.

Good point. We could have pg_dump fix that up, I suppose. On the other
hand, I'm not entirely sure that the current dump methodology guarantees
to preserve attislocal correctly anyway.

> Frankly I never really liked attislocal -- it seems to me the user
> knows when he's dropping the column whether he wants it dropped from
> the children and should be able to explicitly request it to cascade or
> not.

The original discussions about attislocal/attinhcount came up with some
cases that seemed to make it necessary, but I've long forgotten the
details.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_migrator versus inherited columns
Date: 2009-07-02 00:51:04
Message-ID: 407d949e0907011751g170cc2ffrc6abf782de85c7e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 2, 2009 at 1:25 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Hmm, 8.3 doesn't seem to recognize the syntax:
>
> regression=# alter table c add inherit p;
> ERROR:  type "p" does not exist
> LINE 1: alter table c add inherit p;
>                                  ^

Oh I remember being caught by this myself. The above is trying to add
a new column named "inherit". The syntax to add an inheritance parent
is just "alter table c inherit p"

--
greg
http://mit.edu/~gsstark/resume.pdf