logical column position

Lists: pgsql-hackers
From: Neil Conway <neilc(at)samurai(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Dave Cramer <davec(at)fastcrypt(dot)com>
Subject: logical column position
Date: 2003-11-17 16:48:01
Message-ID: 87wu9zndq6.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.

At present, attnum basically does three things: identifies an column
within a relation, indicates which columns are system columns, and
defines the order of a relation's columns. I'd like to move this last
functionality into a separate pg_attribute column named "attpos" (or
"attlogicalpos"):

- when the table is created, attnum == attpos. System columns
have attpos < 0, as with attnum. At no point will two
columns of the same relation have the same attpos.

- when returning output to the client and no column ordering
is implied by the query (e.g. "SELECT * ..."), we sort the
columns in ascending attpos order.

- when storing a tuple on disk, we don't consider attpos

- if we want to change the order of the column's in a
relation, we can do so merely by updating pg_attribute; no
changes to the on-disk storage of the relation should be
necessary

A few notes:

(a) ISTM this should also apply to COPY TO and COPY FROM if the user
didn't supply a column list. Is this reasonable? It would break
dumps of the table's contents, but then again, dumps aren't
guaranteed to remain valid over arbitrary changes to the table's
meta-data.

(b) Using the above scheme that attnum == attpos initially, there
won't be any gaps in the sequence of attpos values. That means
that if, for example, we want to move the column in position 50
to position 1, we'll need to change the position's of all the
columns in positions [1..49] (and suffer the resulting MVCC
bloat in pg_attribute). Changing the column order is hardly a
performance critical operation, so that might be acceptable.

If we want to avoid this, one easy (but arguably unclean) way to
do so would be to make the initial value of attpos == attnum *
1000, and make attpos an int4 rather than an int2. Then, we can
do most column reordering operations with only a single
pg_attribute update -- in the worst-case that enough
re-orderings are done that we overflow the 999 "padding"
positions, we can just fall-back to doing multiple pg_attribute
updates. Is this worth doing, and/or is there a better way to
achieve the same effect?

(c) Do I need to consider inheritance?

Comments are welcome.

-Neil


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Dave Cramer <davec(at)fastcrypt(dot)com>
Subject: Re: logical column position
Date: 2003-11-18 01:40:15
Message-ID: 200311180140.hAI1eFD20021@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway wrote:
> I'd like to add a new column to pg_attribute that specifies the
> attribute's "logical position" within its relation. The idea here is
> to separate the logical order of the columns in a relation from the
> on-disk storage of the relation's tuples. This allows us to easily &
> quickly change column order, add an additional column before or after
> an existing column, etc.
>
> At present, attnum basically does three things: identifies an column
> within a relation, indicates which columns are system columns, and
> defines the order of a relation's columns. I'd like to move this last
> functionality into a separate pg_attribute column named "attpos" (or
> "attlogicalpos"):
>
> - when the table is created, attnum == attpos. System columns
> have attpos < 0, as with attnum. At no point will two
> columns of the same relation have the same attpos.
>
> - when returning output to the client and no column ordering
> is implied by the query (e.g. "SELECT * ..."), we sort the
> columns in ascending attpos order.

Seems the only cases where attpos would be used would be SELECT *,
INSERT with no column list, and COPY --- seems like a nifty feature.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-20 15:39:24
Message-ID: 4357.1069342764@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> At present, attnum basically does three things: identifies an column
> within a relation, indicates which columns are system columns, and
> defines the order of a relation's columns. I'd like to move this last
> functionality into a separate pg_attribute column named "attpos" (or
> "attlogicalpos"):

"attpos" is a horrid choice of name, because no one will be able to
remember which of "attnum" and "attpos" is which. Pick a more distinct
name. Offhand the best thing I can think of is "attlognum" or "attlogpos".

> - when the table is created, attnum == attpos. System columns
> have attpos < 0, as with attnum. At no point will two
> columns of the same relation have the same attpos.

What are you going to do with deleted columns? I'd be inclined to give
them all attlogpos = 0, but that destroys your last comment.

> (a) ISTM this should also apply to COPY TO and COPY FROM if the user
> didn't supply a column list. Is this reasonable?

Yes, also INSERT INTO, also the implicit ordering of output columns of a
JOIN, also the matching of aliases to columns in a FROM-list alias,
probably one or two other places. SQL exposes column ordering in more
places than just "SELECT *".

> If we want to avoid this, one easy (but arguably unclean) way to
> do so would be to make the initial value of attpos == attnum *
> 1000, and make attpos an int4 rather than an int2. Then, we can
> do most column reordering operations with only a single
> pg_attribute update -- in the worst-case that enough
> re-orderings are done that we overflow the 999 "padding"
> positions, we can just fall-back to doing multiple pg_attribute
> updates. Is this worth doing, and/or is there a better way to
> achieve the same effect?

That seems horribly messy. Just renumber.

> (c) Do I need to consider inheritance?

Yes. I think it'd be good if things were constrained so that columns
1..n in a parent table always matched columns 1..n in every child,
which is not true now after adding/dropping columns. That would make it
easier/cheaper/more reliable to match up which child columns are to be
referenced in an inherited query (see adjust_inherited_attrs). I think
the effective constraints would have to be about the same as what we now
impose on column names in an inheritance hierarchy.

You have not presented any proposal for exactly what ALTER TABLE
operations would be offered to manipulate the column positions.
My recollection is that some consensus was reached on that point
in the last thread we had on this issue --- have you consulted the
archives?

regards, tom lane


From: Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-20 15:57:16
Message-ID: 20031120155716.GC24110@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:

> > (c) Do I need to consider inheritance?
>
> Yes. I think it'd be good if things were constrained so that columns
> 1..n in a parent table always matched columns 1..n in every child,
> which is not true now after adding/dropping columns. That would make it
> easier/cheaper/more reliable to match up which child columns are to be
> referenced in an inherited query (see adjust_inherited_attrs).

No way, because of multiple inheritance. Each child should have an
attparentnum, which would point to the parent's attnum for this to work ...

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
"Aprender sin pensar es intil; pensar sin aprender, peligroso" (Confucio)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-20 16:02:43
Message-ID: 4553.1069344163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera Munoz <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> On Thu, Nov 20, 2003 at 10:39:24AM -0500, Tom Lane wrote:
> (c) Do I need to consider inheritance?
>>
>> Yes. I think it'd be good if things were constrained so that columns
>> 1..n in a parent table always matched columns 1..n in every child,
>> which is not true now after adding/dropping columns.

> No way, because of multiple inheritance. Each child should have an
> attparentnum, which would point to the parent's attnum for this to work ...

Hm, good point. And I think we merge identically-named columns
inherited from different parents, which would mean that "attparentnum"
wouldn't have a unique value anyway.

Perhaps rearranging a parent's columns shouldn't have *any* direct
effect on a child? Seems ugly though.

regards, tom lane


From: Neil Conway <neilc(at)samurai(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-21 00:10:24
Message-ID: 877k1ua8en.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> "attpos" is a horrid choice of name, because no one will be able to
> remember which of "attnum" and "attpos" is which. Pick a more
> distinct name. Offhand the best thing I can think of is "attlognum"
> or "attlogpos".

Actually, I deliberately chose attpos rather than attlognum (which is
what some people had been calling this feature earlier). My reasoning
was that the "logical number" is really a nonsensical idea: we just
invented it on the spot. In contrast, a "position" is a fairly natural
thing for an attribute to have -- it's a notion with some counterpart
in the real world. To me, at least, it seems intuitive that an
"attnum" would identify a column whereas an "attpos" would specify the
column's position.

I'm happy to change the name if there's a consensus that attpos isn't
a good choice -- what does everyone think?

> What are you going to do with deleted columns? I'd be inclined to
> give them all attlogpos = 0, but that destroys your last comment.

I hadn't planned to do anything in particular for deleted columns:
since they are never displayed to the user, does it matter what their
attpos is?

In any event, the property that no two columns in a table have the
same logical number isn't important anyway.

> You have not presented any proposal for exactly what ALTER TABLE
> operations would be offered to manipulate the column positions.

I'd like to get the backend storage side of things implemented
first. I'll take a look at the archives before I do any UI work --
thanks for the suggestion.

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-21 04:27:31
Message-ID: 9022.1069388851@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Neil Conway <neilc(at)samurai(dot)com> writes:
> Actually, I deliberately chose attpos rather than attlognum (which is
> what some people had been calling this feature earlier). My reasoning
> was that the "logical number" is really a nonsensical idea: we just
> invented it on the spot.

True ...

> In contrast, a "position" is a fairly natural
> thing for an attribute to have -- it's a notion with some counterpart
> in the real world.

But "position" could at least as logically be considered to mean the
physical position in the tuple. I still say that these names are ripe
for confusion.

I don't have a better choice of name offhand, but if we spend 1% of the
time already spent arguing about these issues on finding a better name,
I'm sure we can think of one ;-)

regards, tom lane


From: Rod Taylor <pg(at)rbt(dot)ca>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-21 04:40:26
Message-ID: 1069389625.32017.36.camel@jester
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I don't have a better choice of name offhand, but if we spend 1% of the
> time already spent arguing about these issues on finding a better name,
> I'm sure we can think of one ;-)

virtual (attvirtnum)
external (attextnum)

atttisoywnum -> attribute this is the one you want number


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-21 14:30:11
Message-ID: 1069425011.10333.12494.camel@camel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2003-11-20 at 23:27, Tom Lane wrote:
> Neil Conway <neilc(at)samurai(dot)com> writes:
> > Actually, I deliberately chose attpos rather than attlognum (which is
> > what some people had been calling this feature earlier). My reasoning
> > was that the "logical number" is really a nonsensical idea: we just
> > invented it on the spot.
>
> True ...
>
> > In contrast, a "position" is a fairly natural
> > thing for an attribute to have -- it's a notion with some counterpart
> > in the real world.
>
> But "position" could at least as logically be considered to mean the
> physical position in the tuple. I still say that these names are ripe
> for confusion.
>
> I don't have a better choice of name offhand, but if we spend 1% of the
> time already spent arguing about these issues on finding a better name,
> I'm sure we can think of one ;-)
>

Seems merging the two would work... attlogpos, the attributes logical
position.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


From: Neil Conway <neilc(at)samurai(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: logical column position
Date: 2003-11-24 23:28:33
Message-ID: 87d6bh4a8u.fsf@mailbox.samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> Seems merging the two would work... attlogpos, the attributes
> logical position.

Unless anyone has any further objections, I'll switch to using attlogpos.

-Neil