Re: WIP: extensible enums

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP: extensible enums
Date: 2010-08-23 09:35:09
Message-ID: 4C7240CD.8090306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Attached is a WIP patch that allows enums to be extended with additional
labels arbitrarily. As previously discussed, it works by adding an
explicit sort order column to pg_enum. It keeps track of whether the
labels are correctly sorted by oid value, and if so uses that for
comparison, so the possible performance impact on existing uses, and on
almost all cases where a label is added at the end of the list, should
be negligible.

Open items include

* some additional error checking required
* missing documentation
* pg_upgrade considerations

To add a label at the end of the list, do:

ALTER TYPE myenum ADD 'newlabel';

To add a label somewhere else, do:

ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';

or

ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';

I'm not wedded to the syntax. Let the bikeshedding begin.

cheers

andrew

Attachment Content-Type Size
venum.patch text/x-patch 95.7 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 15:49:41
Message-ID: 1282578489-sup-9412@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Andrew Dunstan's message of lun ago 23 05:35:09 -0400 2010:

> To add a label at the end of the list, do:
>
> ALTER TYPE myenum ADD 'newlabel';
>
> To add a label somewhere else, do:
>
> ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
>
> or
>
> ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';

What do you need AFTER for? Seems to me that BEFORE should be enough.
(You already have the unadorned syntax for adding an item after the last
one, which is the corner case that BEFORE alone doesn't cover).

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 16:56:00
Message-ID: 4E4E37D1-D13F-4905-A6E7-E8CA08A35E49@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 23, 2010, at 2:35 AM, Andrew Dunstan wrote:

> I'm not wedded to the syntax. Let the bikeshedding begin.

Seems pretty good to me as-is.

David


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 17:42:29
Message-ID: 62936.203.166.48.226.1282585349.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
> Excerpts from Andrew Dunstan's message of lun ago 23 05:35:09 -0400 2010:
>
>> To add a label at the end of the list, do:
>>
>> ALTER TYPE myenum ADD 'newlabel';
>>
>> To add a label somewhere else, do:
>>
>> ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
>>
>> or
>>
>> ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';
>
> What do you need AFTER for? Seems to me that BEFORE should be enough.
> (You already have the unadorned syntax for adding an item after the last
> one, which is the corner case that BEFORE alone doesn't cover).
>

You're right. Strictly speaking we don't need it. But it doesn't hurt much
to provide it for a degree of symmetry.

cheers

andrew


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 17:53:05
Message-ID: 65021.203.166.48.226.1282585985.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
> Excerpts from Andrew Dunstan's message of lun ago 23 05:35:09 -0400 2010:
>
>> To add a label at the end of the list, do:
>>
>> ALTER TYPE myenum ADD 'newlabel';
>>
>> To add a label somewhere else, do:
>>
>> ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
>>
>> or
>>
>> ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';
>
> What do you need AFTER for? Seems to me that BEFORE should be enough.
> (You already have the unadorned syntax for adding an item after the last
> one, which is the corner case that BEFORE alone doesn't cover).
>

You're right. Strictly speaking we don't need it. But it doesn't hurt much
to provide it for a degree of symmetry.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 17:54:40
Message-ID: 4759.1282586080@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
>> What do you need AFTER for? Seems to me that BEFORE should be enough.
>> (You already have the unadorned syntax for adding an item after the last
>> one, which is the corner case that BEFORE alone doesn't cover).

> You're right. Strictly speaking we don't need it. But it doesn't hurt much
> to provide it for a degree of symmetry.

I'm with Alvaro: drop the AFTER variant. It provides more than one way
to do the same thing, which isn't that exciting, and it's also going to
make it harder to document the performance issues. Without that, you
can just say "ADD BEFORE will make the enum slower, but plain ADD won't"
(ignoring the issue of OID wraparound, which'll confuse matters in any
case).

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:01:08
Message-ID: 4C72B764.5020003@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> You're right. Strictly speaking we don't need it. But it doesn't hurt much
> to provide it for a degree of symmetry.

Swami Josh predicts that if we don't add AFTER now, we'll be adding it
in 2 years when enough people complain about it.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Thom Brown <thom(at)linux(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:09:12
Message-ID: AANLkTimSZ7kTbvA7dNYfvpYF0OgXj07qGca_iZoao_OW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23 August 2010 10:35, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> Attached is a WIP patch that allows enums to be extended with additional
> labels arbitrarily. As previously discussed, it works by adding an explicit
> sort order column to pg_enum. It keeps track of whether the labels are
> correctly sorted by oid value, and if so uses that for comparison, so the
> possible performance impact on existing uses, and on almost all cases where
> a label is added at the end of the list, should be negligible.
>
> Open items include
>
>   * some additional error checking required
>   * missing documentation
>   * pg_upgrade considerations
>
>
> To add a label at the end of the list, do:
>
>  ALTER TYPE myenum ADD 'newlabel';
>
> To add a label somewhere else, do:
>
>  ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
>
> or
>
>  ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';
>
>
> I'm not wedded to the syntax. Let the bikeshedding begin.
>
> cheers
>
> andrew

When you write the supporting doc changes, you might want to add a
note in to mention that you cannot remove a label once it has been
added.

Will the modified enums remain intact after a dump/restore?

--
Thom Brown
Registered Linux user: #516935


From: David Fetter <david(at)fetter(dot)org>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:13:07
Message-ID: 20100823181307.GA3253@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 23, 2010 at 11:49:41AM -0400, Alvaro Herrera wrote:
> Excerpts from Andrew Dunstan's message of lun ago 23 05:35:09 -0400 2010:
>
> > To add a label at the end of the list, do:
> >
> > ALTER TYPE myenum ADD 'newlabel';
> >
> > To add a label somewhere else, do:
> >
> > ALTER TYPE myenum ADD 'newlabel' BEFORE 'existinglabel';
> >
> > or
> >
> > ALTER TYPE myenum ADD 'newlabel' AFTER 'existinglabel';
>
> What do you need AFTER for? Seems to me that BEFORE should be enough.
> (You already have the unadorned syntax for adding an item after the last
> one, which is the corner case that BEFORE alone doesn't cover).

Making things easier for the users is a good thing all by itself :)

+1 for including both BEFORE and AFTER. Would it be worth it to allow
something like FIRST and LAST?

ALTER TYPE myenum ADD 'newlabel' FIRST;

ALTER TYPE myenum ADD 'newlabel' LAST;

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:14:06
Message-ID: 20100823181406.GB3253@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 23, 2010 at 01:54:40PM -0400, Tom Lane wrote:
> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> > On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
> >> What do you need AFTER for? Seems to me that BEFORE should be
> >> enough. (You already have the unadorned syntax for adding an
> >> item after the last one, which is the corner case that BEFORE
> >> alone doesn't cover).
>
> > You're right. Strictly speaking we don't need it. But it doesn't
> > hurt much to provide it for a degree of symmetry.
>
> I'm with Alvaro: drop the AFTER variant. It provides more than one
> way to do the same thing, which isn't that exciting,

Not to you, maybe, but to users, it's really handy to have intuitive,
rather than strictly orthogonal, ways to do things.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:16:45
Message-ID: 8370.1282587405@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Swami Josh predicts that if we don't add AFTER now, we'll be adding it
> in 2 years when enough people complain about it.

If it's not there, no one will ever miss it. You might as well argue
that there should be a way of creating a foreign key reference by
ALTER'ing the referenced table instead of the referencing table.
Sure, if the SQL committee was into symmetry, they might have provided
such a thing. But they didn't and no one misses it.

regards, tom lane


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:25:42
Message-ID: AANLkTim5sOrtq9mxHDEqmX-Ez-Vgx+ssxdPf-ymRC2G1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 23, 2010 at 1:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
>> On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
>>> What do you need AFTER for?  Seems to me that BEFORE should be enough.
>>> (You already have the unadorned syntax for adding an item after the last
>>> one, which is the corner case that BEFORE alone doesn't cover).
>
>> You're right. Strictly speaking we don't need it. But it doesn't hurt much
>> to provide it for a degree of symmetry.
>
> I'm with Alvaro: drop the AFTER variant.  It provides more than one way
> to do the same thing, which isn't that exciting, and it's also going to
> make it harder to document the performance issues.  Without that, you
> can just say "ADD BEFORE will make the enum slower, but plain ADD won't"
> (ignoring the issue of OID wraparound, which'll confuse matters in any
> case).

But what if you want to insert an OID at the end? You can't do it if
all you've got is BEFORE:

CREATE TYPE colors AS ENUM ('red', 'green', 'blue');

If I want it to become ('red', 'green', 'blue', 'orange'), what am I to do?


From: Thom Brown <thom(at)linux(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 18:29:33
Message-ID: AANLkTinow=Ta--VA36qkYeCprZEcRyMjB6nwwxBn+vwE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23 August 2010 19:25, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> On Mon, Aug 23, 2010 at 1:54 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
>>> On Mon, August 23, 2010 11:49 am, Alvaro Herrera wrote:
>>>> What do you need AFTER for?  Seems to me that BEFORE should be enough.
>>>> (You already have the unadorned syntax for adding an item after the last
>>>> one, which is the corner case that BEFORE alone doesn't cover).
>>
>>> You're right. Strictly speaking we don't need it. But it doesn't hurt much
>>> to provide it for a degree of symmetry.
>>
>> I'm with Alvaro: drop the AFTER variant.  It provides more than one way
>> to do the same thing, which isn't that exciting, and it's also going to
>> make it harder to document the performance issues.  Without that, you
>> can just say "ADD BEFORE will make the enum slower, but plain ADD won't"
>> (ignoring the issue of OID wraparound, which'll confuse matters in any
>> case).
>
> But what if you want to insert an OID at the end?  You can't do it if
> all you've got is BEFORE:
>
> CREATE TYPE colors AS ENUM ('red', 'green', 'blue');
>
> If I want it to become ('red', 'green', 'blue', 'orange'), what am I to do?
>

ALTER TYPE colors ADD 'orange';

--
Thom Brown
Registered Linux user: #516935


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 19:06:26
Message-ID: 9988.1282590386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thom Brown <thom(at)linux(dot)com> writes:
> On 23 August 2010 19:25, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> But what if you want to insert an OID at the end?

> ALTER TYPE colors ADD 'orange';

Alternatively, if people are dead set on symmetry, what we should do
to simplify is drop *this* syntax, and just have the BEFORE and AFTER
variants.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 19:08:58
Message-ID: 4C72C74A.8080802@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> If it's not there, no one will ever miss it. You might as well argue
> that there should be a way of creating a foreign key reference by
> ALTER'ing the referenced table instead of the referencing table.
> Sure, if the SQL committee was into symmetry, they might have provided
> such a thing. But they didn't and no one misses it.

That's a very different situation, since the relationship is not
symmetrical, and it would take far more than a single keyword. Analogy
fail.

And one of the reasons people don't miss it is because far too many
users don't use FKs in the first place. ;-( The only reason why users
wouldn't notice the absence of AFTER (or, more likely, try it and then
ask on IRC for error message diagnosis) is because they're not using the
feature. (In which case it doesn't matter how it operates)

Docs which say "Add new enums BEFORE the enum you want to add them to,
and if you need to add an enum at the end, then add it without the
BEFORE keyword" is unnecessarily confusing to users. Saying "Add new
enum values using the BEFORE or AFTER keyword before or after the
appropriate value" is vastly easier to understand.

I really don't see the value in making a command substantially less
intuitive in order to avoid a single keyword, unless it affects areas of
Postgres outside of this particular command.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 19:20:15
Message-ID: 4C72C9EF.6010307@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23/08/10 22:06, Tom Lane wrote:
> Thom Brown<thom(at)linux(dot)com> writes:
>> On 23 August 2010 19:25, Joseph Adams<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> But what if you want to insert an OID at the end?
>
>> ALTER TYPE colors ADD 'orange';
>
> Alternatively, if people are dead set on symmetry, what we should do
> to simplify is drop *this* syntax, and just have the BEFORE and AFTER
> variants.

Then you need to know the last existing value to add a new one to the
end. Seems awkward.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 19:20:39
Message-ID: 10301.1282591239@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> I really don't see the value in making a command substantially less
> intuitive in order to avoid a single keyword, unless it affects areas of
> Postgres outside of this particular command.

It's the three variants to do two things that I find unintuitive.

As I mentioned a minute ago, dropping the "abbreviated" syntax and
just having BEFORE and AFTER would be a good way of achieving
symmetry if you find that important.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 19:25:54
Message-ID: AANLkTikvNBpdU3j0iKW599pkCZ=hV4X9tn_4-4UBU78Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 23, 2010 at 3:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thom Brown <thom(at)linux(dot)com> writes:
>> On 23 August 2010 19:25, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> But what if you want to insert an OID at the end?
>
>> ALTER TYPE colors ADD 'orange';
>
> Alternatively, if people are dead set on symmetry, what we should do
> to simplify is drop *this* syntax, and just have the BEFORE and AFTER
> variants.

FWIW, I think Andrew's originally proposed syntax is fine and useful,
and we should just go with it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Thom Brown" <thom(at)linux(dot)com>, "Joseph Adams" <joeyadams3(dot)14159(at)gmail(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 20:57:20
Message-ID: 49618.203.166.48.226.1282597040.squirrel@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, August 23, 2010 3:20 pm, Heikki Linnakangas wrote:
> On 23/08/10 22:06, Tom Lane wrote:
>> Thom Brown<thom(at)linux(dot)com> writes:
>>> On 23 August 2010 19:25, Joseph Adams<joeyadams3(dot)14159(at)gmail(dot)com>
>>> wrote:
>>>> But what if you want to insert an OID at the end?
>>
>>> ALTER TYPE colors ADD 'orange';
>>
>> Alternatively, if people are dead set on symmetry, what we should do
>> to simplify is drop *this* syntax, and just have the BEFORE and AFTER
>> variants.
>
> Then you need to know the last existing value to add a new one to the
> end. Seems awkward.
>

I agree. This is a non-starter, I think. The most common case in my
experience is where the user doesn't care at all about the order, and just
wants to add a new label. We should make that as easy as possible,
especially since it's the most efficient.

cheers

andrew


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 21:34:12
Message-ID: BDB588B1-4AE2-4DD0-852A-AED8A10E5EA7@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Aug 23, 2010, at 12:20 PM, Tom Lane wrote:

> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I really don't see the value in making a command substantially less
>> intuitive in order to avoid a single keyword, unless it affects areas of
>> Postgres outside of this particular command.
>
> It's the three variants to do two things that I find unintuitive.

I strongly suspect that you are in the minority on this one.

Best,

David


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 22:36:23
Message-ID: 4C72F7E7.2020904@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 8/23/10 12:20 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> I really don't see the value in making a command substantially less
>> intuitive in order to avoid a single keyword, unless it affects areas of
>> Postgres outside of this particular command.
>
> It's the three variants to do two things that I find unintuitive.

Actually, it's 3 different things:

1. BEFORE adds a value before the value cited.
2. AFTER adds a value after the value cited.
3. unqualified adds a value at the end.

The fact that AFTER allows you to add a value at the end is
circumstantial overlap. While executing an AFTER, you wouldn't *know*
that you were adding it to the end, necessarily.

The other reason to have AFTER is that, in scripts, the user may not
have the before value handy due to context (i.e. dynamically building an
enum).

Anyway, this'll still be useful with BEFORE only. I'm just convinced
that we'll end up adding AFTER in 9.2 or 9.3 after we get a bunch of
user complaints and questions. So why not add it now?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 23:12:02
Message-ID: 201008232312.o7NNC2r15738@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> On 8/23/10 12:20 PM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> I really don't see the value in making a command substantially less
> >> intuitive in order to avoid a single keyword, unless it affects areas of
> >> Postgres outside of this particular command.
> >
> > It's the three variants to do two things that I find unintuitive.
>
> Actually, it's 3 different things:
>
> 1. BEFORE adds a value before the value cited.
> 2. AFTER adds a value after the value cited.
> 3. unqualified adds a value at the end.
>
> The fact that AFTER allows you to add a value at the end is
> circumstantial overlap. While executing an AFTER, you wouldn't *know*
> that you were adding it to the end, necessarily.
>
> The other reason to have AFTER is that, in scripts, the user may not
> have the before value handy due to context (i.e. dynamically building an
> enum).
>
> Anyway, this'll still be useful with BEFORE only. I'm just convinced
> that we'll end up adding AFTER in 9.2 or 9.3 after we get a bunch of
> user complaints and questions. So why not add it now?

CREATE ENUM in PG 9.0 allows you to create an enum with no columns,
e.g.:

test=> CREATE TYPE etest AS ENUM ();
CREATE TYPE

so I think we have to have the ability add an enum without a
before/after. This ability was added for pg_upgrade.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 23:34:26
Message-ID: 201008232334.o7NNYQK19748@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
> Attached is a WIP patch that allows enums to be extended with additional
> labels arbitrarily. As previously discussed, it works by adding an
> explicit sort order column to pg_enum. It keeps track of whether the
> labels are correctly sorted by oid value, and if so uses that for
> comparison, so the possible performance impact on existing uses, and on
> almost all cases where a label is added at the end of the list, should
> be negligible.
>
> Open items include
>
> * some additional error checking required
> * missing documentation
> * pg_upgrade considerations

I looked at the pg_upgrade ramifications of this change and it seems
some adjustments will have to be made. Right now pg_upgrade creates an
empty enum type:

CREATE TYPE etest AS ENUM ();

and then it calls EnumValuesCreate() to create the enum labels.
EnumValuesCreate() is called from within DefineEnum() where the enum
type is created, and that assumes the enums are always created initially
sorted. That would not be true when pg_upgrade is calling
EnumValuesCreate() directly with oid assignment as part of an upgrade.

I think the cleanest solution would be to modify pg_dump.c so that it
creates an empty ENUM type like before, but uses the new ALTER TYPE
myenum ADD 'newlabel' syntax to add the enum labels (with oid assignment
like we do for CREATE TYPE, etc.) The existing code had to hack to call
EnumValuesCreate() but with this new syntax it will no longer be
necessary. The call to EnumValuesCreate() for enums is the only time
pg_upgrade_support calls into a function rather than just assigning an
oid to a global variable, so it would be great to remove that last
direct function call usage.

Does this code handle the case where CREATE ENUM oid wraps around while
the enum label oids are being assigned? Does our existing code handle
that case?

I also noticed you grab an oid for the new type using the oid counter
without trying to make it in sorted order. Seems that would be possible
for adding enums to the end of the list, and might not be worth it. A
quick hack might be to just try of an oid+1 from the last enum and see
if that causes a conflict with the pg_enum.oid index.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-23 23:44:43
Message-ID: 15719.1282607083@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)kineticode(dot)com> writes:
> On Aug 23, 2010, at 12:20 PM, Tom Lane wrote:
>> It's the three variants to do two things that I find unintuitive.

> I strongly suspect that you are in the minority on this one.

Yeah, seems like I'm losing the argument. Oh well.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-24 00:26:10
Message-ID: 16345.1282609570@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> I also noticed you grab an oid for the new type using the oid counter
> without trying to make it in sorted order. Seems that would be possible
> for adding enums to the end of the list, and might not be worth it. A
> quick hack might be to just try of an oid+1 from the last enum and see
> if that causes a conflict with the pg_enum.oid index.

That wouldn't be race-condition-safe.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-24 08:53:24
Message-ID: 4C738884.5050307@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/23/2010 07:12 PM, Bruce Momjian wrote:
> Josh Berkus wrote:
>> On 8/23/10 12:20 PM, Tom Lane wrote:
>>> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>>>> I really don't see the value in making a command substantially less
>>>> intuitive in order to avoid a single keyword, unless it affects areas of
>>>> Postgres outside of this particular command.
>>> It's the three variants to do two things that I find unintuitive.
>> Actually, it's 3 different things:
>>
>> 1. BEFORE adds a value before the value cited.
>> 2. AFTER adds a value after the value cited.
>> 3. unqualified adds a value at the end.
>>
>> The fact that AFTER allows you to add a value at the end is
>> circumstantial overlap. While executing an AFTER, you wouldn't *know*
>> that you were adding it to the end, necessarily.
>>
>> The other reason to have AFTER is that, in scripts, the user may not
>> have the before value handy due to context (i.e. dynamically building an
>> enum).
>>
>> Anyway, this'll still be useful with BEFORE only. I'm just convinced
>> that we'll end up adding AFTER in 9.2 or 9.3 after we get a bunch of
>> user complaints and questions. So why not add it now?
> CREATE ENUM in PG 9.0 allows you to create an enum with no columns,
> e.g.:
>
> test=> CREATE TYPE etest AS ENUM ();
> CREATE TYPE
>
> so I think we have to have the ability add an enum without a
> before/after. This ability was added for pg_upgrade.
>

No we don't. pg_upgrade calls a C function. There is no support for this
at the SQL level AIUI. And the ability to add labels at arbitrary
positions in the sort order is an essential part of this feature.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-24 12:07:56
Message-ID: 201008241207.o7OC7ug05594@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> On 08/23/2010 07:12 PM, Bruce Momjian wrote:
> > Josh Berkus wrote:
> >> On 8/23/10 12:20 PM, Tom Lane wrote:
> >>> Josh Berkus<josh(at)agliodbs(dot)com> writes:
> >>>> I really don't see the value in making a command substantially less
> >>>> intuitive in order to avoid a single keyword, unless it affects areas of
> >>>> Postgres outside of this particular command.
> >>> It's the three variants to do two things that I find unintuitive.
> >> Actually, it's 3 different things:
> >>
> >> 1. BEFORE adds a value before the value cited.
> >> 2. AFTER adds a value after the value cited.
> >> 3. unqualified adds a value at the end.
> >>
> >> The fact that AFTER allows you to add a value at the end is
> >> circumstantial overlap. While executing an AFTER, you wouldn't *know*
> >> that you were adding it to the end, necessarily.
> >>
> >> The other reason to have AFTER is that, in scripts, the user may not
> >> have the before value handy due to context (i.e. dynamically building an
> >> enum).
> >>
> >> Anyway, this'll still be useful with BEFORE only. I'm just convinced
> >> that we'll end up adding AFTER in 9.2 or 9.3 after we get a bunch of
> >> user complaints and questions. So why not add it now?
> > CREATE ENUM in PG 9.0 allows you to create an enum with no columns,
> > e.g.:
> >
> > test=> CREATE TYPE etest AS ENUM ();
> > CREATE TYPE
> >
> > so I think we have to have the ability add an enum without a
> > before/after. This ability was added for pg_upgrade.
> >
>
> No we don't. pg_upgrade calls a C function. There is no support for this
> at the SQL level AIUI. And the ability to add labels at arbitrary
> positions in the sort order is an essential part of this feature.

pg_upgrade calls a C API to add labels, but the ability to create an
enum with no labels is supported at the SQL level, as I showed above. I
am not saying we don't need before/after, but I am saying we need the
ability to add labels without using before/after because there are no
labels in an empty enum.

I am not sure what you are arguing for/against. I thought we were
agreed to allow before/after, and no specification too. I am just
pointing out that we need the "no specification" syntax for logical as
well as practical reasons.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-25 07:29:23
Message-ID: 4C74C653.1070200@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/23/2010 07:34 PM, Bruce Momjian wrote:
> I looked at the pg_upgrade ramifications of this change and it seems
> some adjustments will have to be made. Right now pg_upgrade creates an
> empty enum type:
>
> CREATE TYPE etest AS ENUM ();
>
> and then it calls EnumValuesCreate() to create the enum labels.
> EnumValuesCreate() is called from within DefineEnum() where the enum
> type is created, and that assumes the enums are always created initially
> sorted. That would not be true when pg_upgrade is calling
> EnumValuesCreate() directly with oid assignment as part of an upgrade.
>
> I think the cleanest solution would be to modify pg_dump.c so that it
> creates an empty ENUM type like before, but uses the new ALTER TYPE
> myenum ADD 'newlabel' syntax to add the enum labels (with oid assignment
> like we do for CREATE TYPE, etc.) The existing code had to hack to call
> EnumValuesCreate() but with this new syntax it will no longer be
> necessary. The call to EnumValuesCreate() for enums is the only time
> pg_upgrade_support calls into a function rather than just assigning an
> oid to a global variable, so it would be great to remove that last
> direct function call usage.
>

I've just been taking another look at this suggestion. I think it will
work quite cleanly. As long as we add the enums in the correct order it
should just do the Right Thing (tm).

To answer your other question, Oid wraparound will not be a problem.

Will get coding.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-26 09:24:51
Message-ID: 4C7632E3.7010200@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/25/2010 03:29 AM, Andrew Dunstan wrote:
>
>
> I've just been taking another look at this suggestion. I think it will
> work quite cleanly. As long as we add the enums in the correct order
> it should just do the Right Thing (tm).
>
> To answer your other question, Oid wraparound will not be a problem.
>
> Will get coding.
>
>

Revised patch with pg_dump and pg_upgrade support is attached.

cheers

andrew

Attachment Content-Type Size
patchenum2 text/plain 94.4 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-08-26 09:56:23
Message-ID: 4C763A47.1000105@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/26/2010 05:24 AM, Andrew Dunstan wrote:
>
>
> On 08/25/2010 03:29 AM, Andrew Dunstan wrote:
>>
>>
>> I've just been taking another look at this suggestion. I think it
>> will work quite cleanly. As long as we add the enums in the correct
>> order it should just do the Right Thing (tm).
>>
>> To answer your other question, Oid wraparound will not be a problem.
>>
>> Will get coding.
>>
>>
>
> Revised patch with pg_dump and pg_upgrade support is attached.
>
>
>

This time in context diff format.

cheers

andrew

Attachment Content-Type Size
venum2.patch text/x-patch 108.7 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-09-29 19:46:10
Message-ID: 4CA39782.9040502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Attached is a a slightly updated version of this with the bitrot fixed.

cheers

andrew

Attachment Content-Type Size
venum3.patch text/x-patch 109.5 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-01 08:35:12
Message-ID: AANLkTinO42TEbcrJhArHUCeSNoL1L+PWL2a7mc7Sva2C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 29 September 2010 20:46, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> Attached is a a slightly updated version of this with the bitrot fixed.
>
> cheers
>
> andrew
>

Hi,

I had a quick look at this last night. I haven't had time to give it a
full review, but I did spot a couple of things:

1). It still has no docs.

2). In enum_ccmp(), when you cache the full list of elements, you're
not updating mycache->sort_list_length, so it will keep fetching the
full list each time. Also, I think that function could use a few more
comments.

3). I think you need to update psql so that \dT+ returns the enum
elements in the right order.

Otherwise I like it, and I definitely prefer the flexibility that this
syntax gives.

Regards,
Dean


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-01 11:12:24
Message-ID: 4CA5C218.4030006@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/01/2010 04:35 AM, Dean Rasheed wrote:
> 2). In enum_ccmp(), when you cache the full list of elements, you're
> not updating mycache->sort_list_length, so it will keep fetching the
> full list each time. Also, I think that function could use a few more
> comments.

Good catch. Will fix.

> 3). I think you need to update psql so that \dT+ returns the enum
> elements in the right order.

Yeah. Will do.

I will post a revised patch soon.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-13 06:08:56
Message-ID: AANLkTin3FXrSBcxLN+4F+LZ=Mo4o=cX-jwf4kdprYw4s@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 1, 2010 at 7:12 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 10/01/2010 04:35 AM, Dean Rasheed wrote:
>>
>> 2). In enum_ccmp(), when you cache the full list of elements, you're
>> not updating mycache->sort_list_length, so it will keep fetching the
>> full list each time. Also, I think that function could use a few more
>> comments.
>
> Good catch. Will fix.
>
>> 3). I think you need to update psql so that \dT+ returns the enum
>> elements in the right order.
>
> Yeah. Will do.
>
> I will post a revised patch soon.

Should we postpone this to the next CommitFest?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-13 11:33:58
Message-ID: 4CB59926.7050108@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/13/2010 02:08 AM, Robert Haas wrote:
> On Fri, Oct 1, 2010 at 7:12 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> On 10/01/2010 04:35 AM, Dean Rasheed wrote:
>>> 2). In enum_ccmp(), when you cache the full list of elements, you're
>>> not updating mycache->sort_list_length, so it will keep fetching the
>>> full list each time. Also, I think that function could use a few more
>>> comments.
>> Good catch. Will fix.
>>
>>> 3). I think you need to update psql so that \dT+ returns the enum
>>> elements in the right order.
>> Yeah. Will do.
>>
>> I will post a revised patch soon.
> Should we postpone this to the next CommitFest?

Sorry, got distracted. Here's a new patch that fixes the above and also
contains some documentation.

cheers

andrew

Attachment Content-Type Size
venum4.patch text/x-patch 117.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-13 22:17:45
Message-ID: AANLkTikJb1=h7R-ydx-1DWhkUaTPija7xgWzPmxQJi7v@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 13, 2010 at 7:33 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Sorry, got distracted. Here's a new patch that fixes the above and also
> contains some documentation.

Someone want to review this and (hopefully) mark it Ready for
Committer? I see that Brendan Jurd is the reviewer of record in the
CF app, but it seems Dean Rasheed is the person who has actually
reviewed it recently. Either way...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-14 07:39:15
Message-ID: AANLkTikxJaEr39rkD9osL6ry+aV82TrLG40+xJ1BvqE+@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 October 2010 23:17, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Oct 13, 2010 at 7:33 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Sorry, got distracted. Here's a new patch that fixes the above and also
>> contains some documentation.
>
> Someone want to review this and (hopefully) mark it Ready for
> Committer?  I see that Brendan Jurd is the reviewer of record in the
> CF app, but it seems Dean Rasheed is the person who has actually
> reviewed it recently.  Either way...
>

I'm happy to take another look at it, but I'm short on time, so I
doubt that I be able to do anything before the weekend. If anyone
wants to jump in before then, feel free.

Regards,
Dean

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-15 08:33:47
Message-ID: AANLkTi=yRixwEMq4YT5=0LsoOHHKyd2pwum8YL7xtF7W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 October 2010 08:39, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> Someone want to review this and (hopefully) mark it Ready for
>> Committer?  I see that Brendan Jurd is the reviewer of record in the
>> CF app, but it seems Dean Rasheed is the person who has actually
>> reviewed it recently.  Either way...
>>
>
> I'm happy to take another look at it, but I'm short on time, so I
> doubt that I be able to do anything before the weekend. If anyone
> wants to jump in before then, feel free.
>

I started looking at this last night, but ran out of time. I'll
continue this evening / over the weekend. Here are my comments so far:

Patch applies cleanly to current git master with no offsets.
Compiles cleanly with no warnings.
Regression tests pass.

The regression tests look reasonable, but I'd like to see a test of
\dT+. Also it could be made to exercise the comparison function more
if the test query did an ORDER BY CAST(enumlabel as planets).

The docs for ALTER TYPE have been updated. I found a few minor typos,
and also a couple of other sections of the manual that needed
updating.

Attached is an updated version with these changes. Andrew, let me know
if you're happy with these tweaks and I'll continue reviewing over the
weekend.

Regards,
Dean

Attachment Content-Type Size
venum5.patch text/x-patch 121.3 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-15 14:02:40
Message-ID: 4CB85F00.9020009@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/15/2010 04:33 AM, Dean Rasheed wrote:
> I started looking at this last night, but ran out of time. I'll
> continue this evening / over the weekend. Here are my comments so far:
>
> Patch applies cleanly to current git master with no offsets.
> Compiles cleanly with no warnings.
> Regression tests pass.
>
> The regression tests look reasonable, but I'd like to see a test of
> \dT+. Also it could be made to exercise the comparison function more
> if the test query did an ORDER BY CAST(enumlabel as planets).
>
> The docs for ALTER TYPE have been updated. I found a few minor typos,
> and also a couple of other sections of the manual that needed
> updating.
>
> Attached is an updated version with these changes. Andrew, let me know
> if you're happy with these tweaks and I'll continue reviewing over the
> weekend.

Thanks for the review. This looks good.

cheers

andrew


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-16 17:25:16
Message-ID: AANLkTinxjqw30v6NDvKZd6KyqaPYV90V0fHTDaKKP9di@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 10/15/2010 04:33 AM, Dean Rasheed wrote:
>>
>> I started looking at this last night, but ran out of time. I'll
>> continue this evening / over the weekend.

Continuing my review of this patch...

Usability review
----------------

What the patch does:

This patch adds syntax to allow additional enum values to be added to
an enum type, at any position in the list. The syntax has already been
discussed and a broad consensus reached. To me the new syntax seems
very logical and easy to use/remember.

Do we want that?

Yes, I think so. I can think of a few past projects where I could have
used this. A possible future extension would be the ability to remove
enum values, but that is likely to have additional complications, and
I think the number of use cases for it is smaller. I see no reason to
insist that it be part of this patch.

Do we already have it?

No.

Does it follow SQL spec, or the community-agreed behavior?

Not in the SQL spec, but it does follow agreed behaviour.

Does it include pg_dump support (if applicable)?

Yes.

Are there dangers?

None that I can think of.

Have all the bases been covered?

I just noticed that a couple of columns have been added to pg_type, so
there's another bit documentation that needs updating.

Feature test
------------

Does the feature work as advertised?

Yes.

Are there corner cases the author has failed to consider?

None that I could find.

Are there any assertion failures or crashes?

Yes, I'm afraid I managed to provoke a crash by running the regression
tests with -DCATCACHE_FORCE_RELEASE, after spotting a suspicious bit
of code (see below).

Performance review
------------------

Does the patch slow down simple tests?

There is a documented performance penalty when working with enum
values that are not in OID order. To attempt to measure this I created
2 tables, foo and bar, each with 800,000 rows. foo had an enum column
using the planets enum from the regression test, with values not in
OID order. bar had a similar enum column, but with values in the
default OID order.

Performance differences for comparison-heavy operations are most noticable:

SELECT MAX(p) FROM foo;
max
---------
neptune
(1 row)

Time: 132.305 ms

SELECT MAX(p) FROM bar;
max
---------
neptune
(1 row)

Time: 93.313 ms

SELECT p FROM foo ORDER BY p OFFSET 500000 LIMIT 1;
p
--------
saturn
(1 row)

Time: 1516.725 ms

SELECT p FROM bar ORDER BY p OFFSET 500000 LIMIT 1;
p
--------
saturn
(1 row)

Time: 1043.010 ms

(optimised build, asserts off)

That's a bigger performance hit than a I would have expected, even
though it is documented.

enum_ccmp() is using bsearch(), so there's a fair amount of function
call overhead there, and perhaps for small enums, a simple linear scan
would be faster. I'm not sure to what extent this is worth worrying
about.

Code review
-----------

I'm not familar enough with the pg_upgrade code to really comment on it.

Looking at AddEnumLabel() I found it a bit hard to follow, but near
the end of the block used when BEFORE/AFTER is specified, it does
this:

ReleaseCatCacheList(list);

/* are the labels sorted by OID? */
if (result && newelemorder > 1)
result = newOid > HeapTupleGetOid(existing[newelemorder-2]);
if (result && newelemorder < nelems + 1)
result = newOid < HeapTupleGetOid(existing[newelemorder-1]);

It looks to me as though 'existing[...]' is a pointer into a member of
the list that's just been freed, so it risks reading freed memory.
That seems to be confirmed by running the tests with
-DCATCACHE_FORCE_RELEASE. Doing so causes a number of the tests to
fail/crash, but I didn't dig any deeper to confirm that this was the
cause.

For the most part, this patch looks good, but I think there is still a
bit of tidying up to do.

Regards,
Dean


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 09:30:27
Message-ID: AANLkTi=hAuym4E8CeMHXYswAsP7Uf8Qg2++8H-H9cNwG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 October 2010 18:25, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
> Are there corner cases the author has failed to consider?
>
> None that I could find.
>
> Are there any assertion failures or crashes?
>

I just thought of another corner case, which can lead to a crash. The
comparison code assumes that the number of elements in the enumeration
is constant during a query, but that's not necessarily the case. For
example the following will crash it:

CREATE TYPE test_enum AS ENUM('Elem 1');

CREATE OR REPLACE FUNCTION test_fn(a int) RETURNS test_enum AS
$$
DECLARE
new_label text;
BEGIN
new_label := 'Elem '||a;
EXECUTE 'ALTER TYPE test_enum ADD '''||new_label||''' BEFORE ''Elem 1''';
RETURN new_label::test_enum;
END;
$$
LANGUAGE plpgsql;

WITH t(a) AS (SELECT test_fn(i) FROM generate_series(2, 10) g(i))
SELECT MAX(a) from t;

Of course that's a pathalogical example, but we should protect against
it, preferrably without compromising performance in more normal cases.

Regards,
Dean


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 14:34:08
Message-ID: 4CBB0960.7050005@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 05:30 AM, Dean Rasheed wrote:
> On 16 October 2010 18:25, Dean Rasheed<dean(dot)a(dot)rasheed(at)gmail(dot)com> wrote:
>> Are there corner cases the author has failed to consider?
>>
>> None that I could find.
>>
>> Are there any assertion failures or crashes?
>>
> I just thought of another corner case, which can lead to a crash. The
> comparison code assumes that the number of elements in the enumeration
> is constant during a query, but that's not necessarily the case. For
> example the following will crash it:
>
> CREATE TYPE test_enum AS ENUM('Elem 1');
>
> CREATE OR REPLACE FUNCTION test_fn(a int) RETURNS test_enum AS
> $$
> DECLARE
> new_label text;
> BEGIN
> new_label := 'Elem '||a;
> EXECUTE 'ALTER TYPE test_enum ADD '''||new_label||''' BEFORE ''Elem 1''';
> RETURN new_label::test_enum;
> END;
> $$
> LANGUAGE plpgsql;
>
> WITH t(a) AS (SELECT test_fn(i) FROM generate_series(2, 10) g(i))
> SELECT MAX(a) from t;
>
> Of course that's a pathalogical example, but we should protect against
> it, preferrably without compromising performance in more normal cases.

Yeah, good point. But how do we manage that? Looking up the number of
elements on each function call will cause a performance degradation, I
suspect. I'll think about it, but if you have any ideas please speak up.
I'm fairly sure we should also recheck the cached sorted property for
the same reason, incidentally.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 14:38:51
Message-ID: 8554.1287326331@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/17/2010 05:30 AM, Dean Rasheed wrote:
>> I just thought of another corner case, which can lead to a crash. The
>> comparison code assumes that the number of elements in the enumeration
>> is constant during a query, but that's not necessarily the case.
>> ...
>> Of course that's a pathalogical example, but we should protect against
>> it, preferrably without compromising performance in more normal cases.

> Yeah, good point. But how do we manage that?

Why is it crashing? I can see that this sort of thing might lead to
nonsensical answers, but a crash is harder to understand.

regards, tom "haven't read the patch" lane


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 15:32:03
Message-ID: AANLkTinYNXf96xO37LWM9p3sENuQ3hk_C9NJQjCfZZHv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2010 15:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 10/17/2010 05:30 AM, Dean Rasheed wrote:
>>> I just thought of another corner case, which can lead to a crash. The
>>> comparison code assumes that the number of elements in the enumeration
>>> is constant during a query, but that's not necessarily the case.
>>> ...
>>> Of course that's a pathalogical example, but we should protect against
>>> it, preferrably without compromising performance in more normal cases.
>
>> Yeah, good point. But how do we manage that?
>
> Why is it crashing?  I can see that this sort of thing might lead to
> nonsensical answers, but a crash is harder to understand.
>

Hmm, it's harder than I thought. The crash is because each time it
finds a label it hasn't seen before it adds it to the array of cached
values without checking the array bounds. That array is only as big as
the number of elements in the enum the first time it was called.
Allowing the array to grow would prevent crashes, but not protect
again returning incorrect answers.

Perhaps it should just read and cache all the values the first time it
is called. Then if it ever fails to find a value in the array, it
knows that the enum must have grown, and it can rebuild the whole
array.

Regards,
Dean

>                        regards, tom "haven't read the patch" lane
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 15:35:24
Message-ID: 4CBB17BC.3080508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 10:38 AM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 10/17/2010 05:30 AM, Dean Rasheed wrote:
>>> I just thought of another corner case, which can lead to a crash. The
>>> comparison code assumes that the number of elements in the enumeration
>>> is constant during a query, but that's not necessarily the case.
>>> ...
>>> Of course that's a pathalogical example, but we should protect against
>>> it, preferrably without compromising performance in more normal cases.
>> Yeah, good point. But how do we manage that?
> Why is it crashing? I can see that this sort of thing might lead to
> nonsensical answers, but a crash is harder to understand.
>
> regards, tom "haven't read the patch" lane

Heh.

I've been deep in buildfarm work, but I'll look at this now to see what
I can find.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 15:49:20
Message-ID: 10188.1287330560@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> Hmm, it's harder than I thought. The crash is because each time it
> finds a label it hasn't seen before it adds it to the array of cached
> values without checking the array bounds. That array is only as big as
> the number of elements in the enum the first time it was called.

[ scratches head... ] And where does it get that number of elements
from, if not by doing the same work that would allow it to fill the
array completely? Something seems ill-designed here.

> Allowing the array to grow would prevent crashes, but not protect
> again returning incorrect answers.

Well, one of the questions here is exactly how wrong the answers can
be. Offhand, it seems to me that comparisons of two existing entries
can never be falsified by adding a new entry, so I'm not seeing that
there could be any real problem. If we allowed rearrangement of the
sort order of existing entries, it'd be problematic.

> Perhaps it should just read and cache all the values the first time it
> is called. Then if it ever fails to find a value in the array, it
> knows that the enum must have grown, and it can rebuild the whole
> array.

This is kept in typcache, right? ISTM the right thing to do is arrange
to invalidate the cached array when a cache flush event occurs, and
rebuild the whole array on next use. Then you just throw an error if
you're passed a value that isn't there.

regards, tom lane


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 16:02:25
Message-ID: AANLkTi=rfLQBi6TLBxBkS-V7zW=KX_aUA-LSoSGSDe4E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2010 16:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> Hmm, it's harder than I thought. The crash is because each time it
>> finds a label it hasn't seen before it adds it to the array of cached
>> values without checking the array bounds. That array is only as big as
>> the number of elements in the enum the first time it was called.
>
> [ scratches head... ]  And where does it get that number of elements
> from, if not by doing the same work that would allow it to fill the
> array completely?  Something seems ill-designed here.
>

Hmm. That's coming from a new column added to pg_type (typnlabels).
Perhaps that's not safe though. Are there potential race conditions
there?

>> Allowing the array to grow would prevent crashes, but not protect
>> again returning incorrect answers.
>
> Well, one of the questions here is exactly how wrong the answers can
> be.  Offhand, it seems to me that comparisons of two existing entries
> can never be falsified by adding a new entry, so I'm not seeing that
> there could be any real problem.  If we allowed rearrangement of the
> sort order of existing entries, it'd be problematic.
>
>> Perhaps it should just read and cache all the values the first time it
>> is called. Then if it ever fails to find a value in the array, it
>> knows that the enum must have grown, and it can rebuild the whole
>> array.
>
> This is kept in typcache, right?  ISTM the right thing to do is arrange
> to invalidate the cached array when a cache flush event occurs, and
> rebuild the whole array on next use.  Then you just throw an error if
> you're passed a value that isn't there.
>

Makes sense.

Regards,
Dean

>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 17:20:47
Message-ID: 11628.1287336047@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 17 October 2010 16:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> [ scratches head... ] And where does it get that number of elements
>> from, if not by doing the same work that would allow it to fill the
>> array completely? Something seems ill-designed here.

> Hmm. That's coming from a new column added to pg_type (typnlabels).
> Perhaps that's not safe though. Are there potential race conditions
> there?

I knew I shoulda read this patch ;-). That seems a lot more invasive
than this feature justifies. And I share your qualms about whether it's
race-condition-proof. We don't have very much locking on pg_type
entries, so making a hard assumption about consistency between two
different catalogs seems pretty risky.

The way I'd be inclined to design this is that altering an enum doesn't
change its pg_type entry at all, just add another row to pg_enum.
When first needing to compare values of an enum, load up the typcache
entry for it. This involves scanning all the entries for that type OID
in pg_enum, and determining from that whether you can compare the easy
way or not. If not, build the array that tells you how to sort, and put
it in the typcache entry.

The missing piece in this is how to determine when the typcache entry
has to be flushed. That should be driven by sinval signalling. There
are two different ways you could do it:

1. Watch for SI events on pg_enum. The problem here is we don't have
a syscache on pg_enum, and there's no obvious other reason to want one.
Also I think you'd have to flush *all* enum typcache entries, since you
couldn't always tell which one had been modified.

2. Watch for SI events on the pg_type row. However, since according
to what I said above an ALTER TYPE wouldn't actually modify the pg_type
row, you'd need to have the ALTER send out a "dummy" SI event. This
is not hard to do --- we have the concept of dummy inval events on
relations already --- but it's a bit ugly because of the risk of
omission. But the number of places that would need to do this seems
small, so I don't think that risk is large.

On balance I like the second idea better.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 17:53:06
Message-ID: 12056.1287337986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> The missing piece in this is how to determine when the typcache entry
> has to be flushed. That should be driven by sinval signalling.

On reflection that doesn't seem good enough. Immediately after someone
else has committed an ALTER TYPE, your typcache entry is out of date,
and won't be updated until you get around to noticing the SI signal.
I was thinking that wouldn't matter because you'd never need to make a
comparison involving the new enum value --- it couldn't be in any table
rows you'd see as committed good. But this is wrong because you might
have to make index comparisons involving the new value, even before you
consider that the rows the index entries reference are good.

We could fix that with Dean's idea of reloading the cache whenever
we see that we are being asked to compare a value we don't have in the
cache entry. However, that assumes that we even notice that it's not
in the cache entry. If we're trying to use "fast" comparison then we
wouldn't notice that.

So the hard part of this really is to force other backends to switch
from "fast" to "slow" comparison in a timely fashion when an ALTER makes
that necessary. Right offhand I don't see any good way to do that,
at least not while having the "fast" method as fast as it is now.

regards, tom lane


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 18:19:30
Message-ID: AANLkTikQs6Op02T+1vcS2y1QMAfAayxfcupOhVdQtOMS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17 October 2010 18:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> The missing piece in this is how to determine when the typcache entry
>> has to be flushed.  That should be driven by sinval signalling.
>
> On reflection that doesn't seem good enough.  Immediately after someone
> else has committed an ALTER TYPE, your typcache entry is out of date,
> and won't be updated until you get around to noticing the SI signal.
> I was thinking that wouldn't matter because you'd never need to make a
> comparison involving the new enum value --- it couldn't be in any table
> rows you'd see as committed good.  But this is wrong because you might
> have to make index comparisons involving the new value, even before you
> consider that the rows the index entries reference are good.
>
> We could fix that with Dean's idea of reloading the cache whenever
> we see that we are being asked to compare a value we don't have in the
> cache entry.  However, that assumes that we even notice that it's not
> in the cache entry.  If we're trying to use "fast" comparison then we
> wouldn't notice that.
>

That makes me think maybe the "fast" and "slow" comparisons should
both be done the same way, having a cache so that we notice
immediately if we get a new value.

Obviously that's not going to be as fast as the current "fast" method,
but the question is, can it be made sufficiently close? I think the
current sort+bsearch method is always going to be significantly
slower, but perhaps a dedicated hash table algorithm might work.

Regards,
Dean

> So the hard part of this really is to force other backends to switch
> from "fast" to "slow" comparison in a timely fashion when an ALTER makes
> that necessary.  Right offhand I don't see any good way to do that,
> at least not while having the "fast" method as fast as it is now.
>
>                        regards, tom lane
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 18:25:21
Message-ID: 4CBB3F91.4070803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 01:20 PM, Tom Lane wrote:
> I knew I shoulda read this patch ;-). That seems a lot more invasive
> than this feature justifies. And I share your qualms about whether it's
> race-condition-proof. We don't have very much locking on pg_type
> entries, so making a hard assumption about consistency between two
> different catalogs seems pretty risky.
>
> The way I'd be inclined to design this is that altering an enum doesn't
> change its pg_type entry at all, just add another row to pg_enum.
> When first needing to compare values of an enum, load up the typcache
> entry for it. This involves scanning all the entries for that type OID
> in pg_enum, and determining from that whether you can compare the easy
> way or not. If not, build the array that tells you how to sort, and put
> it in the typcache entry.

Perhaps mistakenly I wanted to avoid doing that as it would slow down a
retail comparison quite a lot, especially in the case of an enum with a
very large label set. That's why I put the sorted property and label
count in pg_type.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 19:17:09
Message-ID: 13145.1287343029@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 17 October 2010 18:53, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> We could fix that with Dean's idea of reloading the cache whenever
>> we see that we are being asked to compare a value we don't have in the
>> cache entry. However, that assumes that we even notice that it's not
>> in the cache entry. If we're trying to use "fast" comparison then we
>> wouldn't notice that.

> That makes me think maybe the "fast" and "slow" comparisons should
> both be done the same way, having a cache so that we notice
> immediately if we get a new value.

Actually ... the race conditions can be a lot worse than just a race.
Consider

begin;
alter type myenum add 'some-value';
insert into mytab values('some-value');
rollback;

If mytab has an index on the enum col, we now have an index entry that
contains an enum value that isn't valid according to anybody, and nobody
knows how to compare it. If that entry is near the root then the index
is hopelessly corrupt: no one can tell which way to descend when
comparing it to some valid value.

I think what this says is that we cannot allow any manipulations that
involve an uncommitted enum value. Probably the easiest way is to make
the ALTER TYPE operation disallowed-inside-transaction-block. That's
pretty ugly, but doesn't seem like a serious restriction in practice
(though for example it'd mean we couldn't use it in pg_dump).

I'm not sure if enforcing such a restriction helps much in terms of
managing cache invalidations. Even with that, it seems possible to
encounter index entries for values that you haven't yet noticed the
invalidation message for.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 19:31:17
Message-ID: 4CBB4F05.2050601@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 02:19 PM, Dean Rasheed wrote:
> That makes me think maybe the "fast" and "slow" comparisons should
> both be done the same way, having a cache so that we notice
> immediately if we get a new value.
>
> Obviously that's not going to be as fast as the current "fast" method,
> but the question is, can it be made sufficiently close? I think the
> current sort+bsearch method is always going to be significantly
> slower, but perhaps a dedicated hash table algorithm might work.
>

Making that as fast as "Is this sorted? If yes, compare the two oids" or
even acceptably slower seems likely to be a challenge. I thought about
the sort of approach you suggest initially and didn't come up with
anything that seemed likely to work well enough.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 19:49:05
Message-ID: 4CBB5331.20205@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 03:17 PM, Tom Lane wrote:
> Dean Rasheed<dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> On 17 October 2010 18:53, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> We could fix that with Dean's idea of reloading the cache whenever
>>> we see that we are being asked to compare a value we don't have in the
>>> cache entry. However, that assumes that we even notice that it's not
>>> in the cache entry. If we're trying to use "fast" comparison then we
>>> wouldn't notice that.
>> That makes me think maybe the "fast" and "slow" comparisons should
>> both be done the same way, having a cache so that we notice
>> immediately if we get a new value.
> Actually ... the race conditions can be a lot worse than just a race.
> Consider
>
> begin;
> alter type myenum add 'some-value';
> insert into mytab values('some-value');
> rollback;
>
> If mytab has an index on the enum col, we now have an index entry that
> contains an enum value that isn't valid according to anybody, and nobody
> knows how to compare it. If that entry is near the root then the index
> is hopelessly corrupt: no one can tell which way to descend when
> comparing it to some valid value.
>
> I think what this says is that we cannot allow any manipulations that
> involve an uncommitted enum value. Probably the easiest way is to make
> the ALTER TYPE operation disallowed-inside-transaction-block. That's
> pretty ugly, but doesn't seem like a serious restriction in practice
> (though for example it'd mean we couldn't use it in pg_dump).

Even in binary upgrade mode?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 19:56:07
Message-ID: 13685.1287345367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Making that as fast as "Is this sorted? If yes, compare the two oids" or
> even acceptably slower seems likely to be a challenge. I thought about
> the sort of approach you suggest initially and didn't come up with
> anything that seemed likely to work well enough.

The fundamental problem here is that we can't tell by examining an enum
value whether we have to apply the "fast" or "slow" comparison method.
But what if we could?

The sneaky idea that just came to me is to declare that even-numbered
OID values can be sorted by direct comparison, whereas odd-numbered OIDs
can't. It seems fairly easy to ensure that that property holds while
creating the values, as long as you don't mind "burning" OIDs: if you
get a value you don't like, just demand another one. Then, as long as
both OIDs involved in a comparison are even, you just do a direct
comparison and no cache entry is needed at all. When either is odd, you
know you need a cache entry. You can also tell whether an existing
cache entry is stale: if it doesn't contain both values then you need to
refresh it. If it does have both, then it's good enough for the
immediate purpose, even if there are other values it doesn't know
about. So with this design we don't actually have to watch for inval
events at all ... we just refresh the cache entry whenever a comparison
finds that that's needed.

The one problem I can see with this is that it's only partially
on-disk-compatible with existing enum types: it'll almost certainly
think that they require slow comparison, even when they don't.
Maybe that's acceptable.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 20:10:06
Message-ID: 13890.1287346206@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/17/2010 03:17 PM, Tom Lane wrote:
>> I think what this says is that we cannot allow any manipulations that
>> involve an uncommitted enum value. Probably the easiest way is to make
>> the ALTER TYPE operation disallowed-inside-transaction-block. That's
>> pretty ugly, but doesn't seem like a serious restriction in practice
>> (though for example it'd mean we couldn't use it in pg_dump).

> Even in binary upgrade mode?

Binary upgrade can probably be treated as a special case.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 20:26:00
Message-ID: 14145.1287347160@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/17/2010 01:20 PM, Tom Lane wrote:
>> The way I'd be inclined to design this is that altering an enum doesn't
>> change its pg_type entry at all, just add another row to pg_enum.
>> When first needing to compare values of an enum, load up the typcache
>> entry for it.

> Perhaps mistakenly I wanted to avoid doing that as it would slow down a
> retail comparison quite a lot, especially in the case of an enum with a
> very large label set. That's why I put the sorted property and label
> count in pg_type.

Just going back to this point: I don't buy that argument at all.
If you have to consult pg_type to find out whether fast or slow
comparison is needed, you've already burned all the cycles required
for a cache lookup. The only way that a large typcache entry would
really be a performance issue compared to just consulting pg_type
is if it had to be refreshed a lot, which doesn't seem like a likely
problem.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 21:39:38
Message-ID: 4CBB6D1A.5020805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 03:56 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> Making that as fast as "Is this sorted? If yes, compare the two oids" or
>> even acceptably slower seems likely to be a challenge. I thought about
>> the sort of approach you suggest initially and didn't come up with
>> anything that seemed likely to work well enough.
> The fundamental problem here is that we can't tell by examining an enum
> value whether we have to apply the "fast" or "slow" comparison method.
> But what if we could?
>
> The sneaky idea that just came to me is to declare that even-numbered
> OID values can be sorted by direct comparison, whereas odd-numbered OIDs
> can't. It seems fairly easy to ensure that that property holds while
> creating the values, as long as you don't mind "burning" OIDs: if you
> get a value you don't like, just demand another one. Then, as long as
> both OIDs involved in a comparison are even, you just do a direct
> comparison and no cache entry is needed at all. When either is odd, you
> know you need a cache entry. You can also tell whether an existing
> cache entry is stale: if it doesn't contain both values then you need to
> refresh it. If it does have both, then it's good enough for the
> immediate purpose, even if there are other values it doesn't know
> about. So with this design we don't actually have to watch for inval
> events at all ... we just refresh the cache entry whenever a comparison
> finds that that's needed.

Hmm, nice. What I like about this is that it's not an all or nothing
deal. If you add one label that needs an odd Oid, most of the
comparisons will still be fast.

I think the rule for choosing the Oid for the new entry would go
something like this:

* if we're adding a label at the end and the Oid of the last entry
is even, take the first Oid that is either even and greater than
the Oid of the last entry, or odd and less than the Oid of the
last entry
* for all other positions take the first odd Oid

> The one problem I can see with this is that it's only partially
> on-disk-compatible with existing enum types: it'll almost certainly
> think that they require slow comparison, even when they don't.
> Maybe that's acceptable.

Yeah, not sure about that.

cheers

andrew


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-17 22:50:34
Message-ID: AANLkTikNDvyC-Mdu1CjPcx2LXPFy_USomv2kTmHqp+7=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 17, 2010 at 12:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>        begin;
>        alter type myenum add 'some-value';
>        insert into mytab values('some-value');
>        rollback;
....
> I think what this says is that we cannot allow any manipulations that
> involve an uncommitted enum value.  Probably the easiest way is to make
> the ALTER TYPE operation disallowed-inside-transaction-block.  That's
> pretty ugly, but doesn't seem like a serious restriction in practice
> (though for example it'd mean we couldn't use it in pg_dump).

I fear this is the camel's nose under the door towards making all DDL
non-transactional a la Oracle.

The alternative is that there are two steps to creating an enum. A
low-level modification which adds the new value and its collation
value to the list of valid things to compare. This would be done as
something like an autonomous transaction and be committed regardless
of whether the outer transaction commits. It wouldn't add the value to
the user-visible list of values or allowable values for inserting.
Only once that was committed could we then make the transactional
modification to the user visible DDL.

The main problem with this is of course that we don't actually have
autonomous transactions...

--
greg


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 14:14:23
Message-ID: 4CBC563F.3060106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/17/2010 03:56 PM, Tom Lane wrote:

[clever scheme to treat even numbered enum Oids as sorted]
> The one problem I can see with this is that it's only partially
> on-disk-compatible with existing enum types: it'll almost certainly
> think that they require slow comparison, even when they don't.
> Maybe that's acceptable.

Thinking more about this, could we do some sort of hybrid scheme that
stashes the 'oids are sorted correctly' property of the type somewhat
like what I proposed?

Binary upgrade would be mildly tricky but not impossible.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 14:52:21
Message-ID: 19609.1287413541@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/17/2010 03:56 PM, Tom Lane wrote:
>> [clever scheme to treat even numbered enum Oids as sorted]
>> The one problem I can see with this is that it's only partially
>> on-disk-compatible with existing enum types: it'll almost certainly
>> think that they require slow comparison, even when they don't.
>> Maybe that's acceptable.

> Thinking more about this, could we do some sort of hybrid scheme that
> stashes the 'oids are sorted correctly' property of the type somewhat
> like what I proposed?

I think that fundamentally doesn't work in the face of a concurrent
ALTER TYPE. And even if it did work, by the time you've done the
cache lookup to check your stashed flag, what have you really saved?
The only way to have comparisons that are on par with current
performance is to not need to do any lookup at all.

The scenario that seems the most dangerous to me is

1. Begin transaction T1.
2. T1 does cache lookup, or any other pushup you want, and
decides that the enum type is correctly sorted.
3. T2 commits an ALTER TYPE that adds a non-sorted OID.
4. T3 inserts that OID in an index.
5. T1 encounters the out-of-order OID in the index.

If T1 is not able to recognize that the OID it's looking at is not
in-order, despite its previously cached information, it's going to lose
badly. It's impractical to do an AcceptInvalidationMessages on every
single comparison, so AFAICT you *have to* be able to deal correctly
with enum values that were added since your cache entry was made.

We could possibly deal with enum types that follow the existing
convention if we made the cache entry hold a list of all the original,
known-to-be-sorted OIDs. (This could be reasonably compact and cheap to
probe if it were represented as a starting OID and a Bitmapset of delta
values, since we can assume that the initial set of OIDs is pretty close
together.) But we have to have that cache entry, and we have to consult
it on every single comparison, so it's definitely going to be slower
than before.

So I'm thinking the comparison procedure goes like this:

1. Both OIDs even?
If so, just compare them numerically, and we're done.

2. Lookup cache entry for enum type.

3. Both OIDs in list of known-sorted OIDs?
If so, just compare them numerically, and we're done.

4. Search the part of the cache entry that lists sort positions.
If not both present, refresh the cache entry.
If still not present, throw error.

5. Compare by sort positions.

Step 4 is the slowest part but would be avoided in most cases.
However, step 2 is none too speedy either, and would usually
be required when dealing with pre-existing enums.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 17:15:16
Message-ID: 4CBC80A4.4080508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2010 10:52 AM, Tom Lane wrote:
> We could possibly deal with enum types that follow the existing
> convention if we made the cache entry hold a list of all the original,
> known-to-be-sorted OIDs. (This could be reasonably compact and cheap to
> probe if it were represented as a starting OID and a Bitmapset of delta
> values, since we can assume that the initial set of OIDs is pretty close
> together.)

How are we going to know what those are?

We could keep track of the beginning and end of the original range maybe
and refuse to create any new enum values for the type inside that range.
That might make binary upgrade a bit ugly, but it's probably manageable
even so.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 17:28:14
Message-ID: 22176.1287422894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/18/2010 10:52 AM, Tom Lane wrote:
>> We could possibly deal with enum types that follow the existing
>> convention if we made the cache entry hold a list of all the original,
>> known-to-be-sorted OIDs. (This could be reasonably compact and cheap to
>> probe if it were represented as a starting OID and a Bitmapset of delta
>> values, since we can assume that the initial set of OIDs is pretty close
>> together.)

> How are we going to know what those are?

You read pg_enum while constructing the cache entry, sort by nominal
sort position, and look to see how many OIDs form a sorted and reasonably
compact range. I don't see a need to know which of those OIDs were
actually original; you can get close enough with heuristic reverse
engineering. The only real limitation is not wanting the bitmapset to
get too enormous, so you might often be able to incorporate OIDs that
in fact weren't original, so long as they chanced to be OK order-wise.
This'd be a win anytime it saved you from having to proceed to step 4
in comparisons.

regards, tom lane


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 17:40:02
Message-ID: AANLkTinX1N6xNtfyr7m7+UN+V1_zoyBqQ5ey35cz4mgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18 October 2010 15:52, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So I'm thinking the comparison procedure goes like this:
>
> 1. Both OIDs even?
>        If so, just compare them numerically, and we're done.
>
> 2. Lookup cache entry for enum type.
>
> 3. Both OIDs in list of known-sorted OIDs?
>        If so, just compare them numerically, and we're done.
>
> 4. Search the part of the cache entry that lists sort positions.
>        If not both present, refresh the cache entry.
>        If still not present, throw error.
>
> 5. Compare by sort positions.
>
> Step 4 is the slowest part but would be avoided in most cases.
> However, step 2 is none too speedy either, and would usually
> be required when dealing with pre-existing enums.
>

I was thinking that steps 2-5 could be sped up by doing something like:

2. If first time in:
Build a lightweight hash map: [ oid -> sort position ] with
all the enum values

3. Look up each oid in the hash map
If not both present, rebuild the hash map
If still not present, throw error.

4. Compare by sort positions.

I think the hash map lookups could be made pretty quick, if we're
prepared to write a bit of dedicated code there rather than relying on
the more general-purpose caching code.

Regards,
Dean

>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 17:48:45
Message-ID: 22515.1287424125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> I think the hash map lookups could be made pretty quick, if we're
> prepared to write a bit of dedicated code there rather than relying on
> the more general-purpose caching code.

It's still going to be very significantly slower than what I'm
suggesting --- I'm *already* assuming that step 4 is using a hash
or something else smarter than just traversing a list. My step 3
is just a bit-array index operation (well, 2 of 'em).

(I'm pretty dubious of unsubstantiated claims that you can build a hash
table that's significantly faster than our existing hash code, anyway.)

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 17:54:06
Message-ID: 4CBC89BE.5060100@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2010 01:40 PM, Dean Rasheed wrote:
> On 18 October 2010 15:52, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So I'm thinking the comparison procedure goes like this:
>>
>> 1. Both OIDs even?
>> If so, just compare them numerically, and we're done.
>>
>> 2. Lookup cache entry for enum type.
>>
>> 3. Both OIDs in list of known-sorted OIDs?
>> If so, just compare them numerically, and we're done.
>>
>> 4. Search the part of the cache entry that lists sort positions.
>> If not both present, refresh the cache entry.
>> If still not present, throw error.
>>
>> 5. Compare by sort positions.
>>
>> Step 4 is the slowest part but would be avoided in most cases.
>> However, step 2 is none too speedy either, and would usually
>> be required when dealing with pre-existing enums.
>>
> I was thinking that steps 2-5 could be sped up by doing something like:
>
> 2. If first time in:
> Build a lightweight hash map: [ oid -> sort position ] with
> all the enum values
>
> 3. Look up each oid in the hash map
> If not both present, rebuild the hash map
> If still not present, throw error.
>
> 4. Compare by sort positions.
>
> I think the hash map lookups could be made pretty quick, if we're
> prepared to write a bit of dedicated code there rather than relying on
> the more general-purpose caching code.
>

If you have want to work on it and prove it's going to be better, please
do. I'm not convinced it will do a whole lot better than a binary search
that in most cases will do no more than a handful of probes.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-18 18:08:37
Message-ID: 22977.1287425317@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> If you have want to work on it and prove it's going to be better, please
> do. I'm not convinced it will do a whole lot better than a binary search
> that in most cases will do no more than a handful of probes.

Yeah, that's a good point. There's a range of table sizes where hashing
is faster than binary search, but I'm not sure that typical enums will
fall into that range.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 04:21:18
Message-ID: 4CBD1CBE.9040808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/18/2010 10:52 AM, Tom Lane wrote:
> We could possibly deal with enum types that follow the existing
> convention if we made the cache entry hold a list of all the original,
> known-to-be-sorted OIDs. (This could be reasonably compact and cheap to
> probe if it were represented as a starting OID and a Bitmapset of delta
> values, since we can assume that the initial set of OIDs is pretty close
> together.) But we have to have that cache entry, and we have to consult
> it on every single comparison, so it's definitely going to be slower
> than before.
>
> So I'm thinking the comparison procedure goes like this:
>
> 1. Both OIDs even?
> If so, just compare them numerically, and we're done.
>
> 2. Lookup cache entry for enum type.
>
> 3. Both OIDs in list of known-sorted OIDs?
> If so, just compare them numerically, and we're done.
>
> 4. Search the part of the cache entry that lists sort positions.
> If not both present, refresh the cache entry.
> If still not present, throw error.
>
> 5. Compare by sort positions.
>
> Step 4 is the slowest part but would be avoided in most cases.
> However, step 2 is none too speedy either, and would usually
> be required when dealing with pre-existing enums.

OK, I've made adjustments that I think do what you're suggesting.

Patch is attached.

Alternatively this can be pulled from
<git(at)github(dot)com:adunstan/postgresql-dev.git>

cheers

andrew

Attachment Content-Type Size
venum6.patch text/x-patch 55.2 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 08:00:26
Message-ID: AANLkTimf8DZF6uCfSjHzYRUm8ChjWXDN38rGNmzLPW5U@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19 October 2010 05:21, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 10/18/2010 10:52 AM, Tom Lane wrote:
>>
>> We could possibly deal with enum types that follow the existing
>> convention if we made the cache entry hold a list of all the original,
>> known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
>> probe if it were represented as a starting OID and a Bitmapset of delta
>> values, since we can assume that the initial set of OIDs is pretty close
>> together.)  But we have to have that cache entry, and we have to consult
>> it on every single comparison, so it's definitely going to be slower
>> than before.
>>
>> So I'm thinking the comparison procedure goes like this:
>>
>> 1. Both OIDs even?
>>        If so, just compare them numerically, and we're done.
>>
>> 2. Lookup cache entry for enum type.
>>
>> 3. Both OIDs in list of known-sorted OIDs?
>>        If so, just compare them numerically, and we're done.
>>
>> 4. Search the part of the cache entry that lists sort positions.
>>        If not both present, refresh the cache entry.
>>        If still not present, throw error.
>>
>> 5. Compare by sort positions.
>>
>> Step 4 is the slowest part but would be avoided in most cases.
>> However, step 2 is none too speedy either, and would usually
>> be required when dealing with pre-existing enums.
>
> OK, I've made adjustments that I think do what you're suggesting.
>
> Patch is attached.
>

Ah, I'd missed the point about the bitmapset. In the most common case,
most of the enum elements are probably going to be in the right order,
so you save a lot by identifying that case quickly.

I didn't have time to play with hash maps myself, but I don't think it
will make much difference now because hopefully the binary search
isn't going to be hit a lot anyway.

There are a couple of things that look odd about this code though (I
haven't tested it, but they look wrong):

In the loop identifying the longest range:

for (list_end = start_pos+1; list_end < num; list_end++)
if (mycache->sort_order_list[list_end].sort_order <
mycache->sort_order_list[list_end - 1].sort_order ||
mycache->sort_order_list[list_end].enum_oid -
mycache->sort_order_list[list_end].enum_oid >= BITMAPSIZE)

That last test isn't doing anything. Shouldn't the second "list_end"
should be "start_pos"?

In the loop that sets bits in the bitmap, it looks like it's assuming
the range starts at 0. I haven't had any caffeine yet, so maybe I'm
misunderstanding it, but I can't see anywhere that sets bitmap_base.

Regards,
Dean

> Alternatively this can be pulled from
> <git(at)github(dot)com:adunstan/postgresql-dev.git>
>
> cheers
>
> andrew
>


From: Thom Brown <thom(at)linux(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 08:19:16
Message-ID: AANLkTinUrfs3FfMYDHbhL=N2Qw7MyV4jL+vSzw8nZ36G@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19 October 2010 05:21, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 10/18/2010 10:52 AM, Tom Lane wrote:
>>
>> We could possibly deal with enum types that follow the existing
>> convention if we made the cache entry hold a list of all the original,
>> known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
>> probe if it were represented as a starting OID and a Bitmapset of delta
>> values, since we can assume that the initial set of OIDs is pretty close
>> together.)  But we have to have that cache entry, and we have to consult
>> it on every single comparison, so it's definitely going to be slower
>> than before.
>>
>> So I'm thinking the comparison procedure goes like this:
>>
>> 1. Both OIDs even?
>>        If so, just compare them numerically, and we're done.
>>
>> 2. Lookup cache entry for enum type.
>>
>> 3. Both OIDs in list of known-sorted OIDs?
>>        If so, just compare them numerically, and we're done.
>>
>> 4. Search the part of the cache entry that lists sort positions.
>>        If not both present, refresh the cache entry.
>>        If still not present, throw error.
>>
>> 5. Compare by sort positions.
>>
>> Step 4 is the slowest part but would be avoided in most cases.
>> However, step 2 is none too speedy either, and would usually
>> be required when dealing with pre-existing enums.
>
> OK, I've made adjustments that I think do what you're suggesting.
>
> Patch is attached.
>
> Alternatively this can be pulled from
> <git(at)github(dot)com:adunstan/postgresql-dev.git>

Andrew, can't you get your own repo at git.postgresql.org?

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 08:19:56
Message-ID: 4CBD54AC.8070408@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/2010 04:00 AM, Dean Rasheed wrote:
> There are a couple of things that look odd about this code though (I
> haven't tested it, but they look wrong):

You're right, thanks. I have fixed those. New patch attached.

cheers

andrew

Attachment Content-Type Size
venum7.patch text/x-patch 55.3 KB

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 08:23:31
Message-ID: AANLkTikNmQgOMShU9xsXuZ1X6hPPzd6K5UCPGUOA8Vqm@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 19, 2010 at 10:19, Thom Brown <thom(at)linux(dot)com> wrote:
> On 19 October 2010 05:21, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>>
>> On 10/18/2010 10:52 AM, Tom Lane wrote:
>>>
>>> We could possibly deal with enum types that follow the existing
>>> convention if we made the cache entry hold a list of all the original,
>>> known-to-be-sorted OIDs.  (This could be reasonably compact and cheap to
>>> probe if it were represented as a starting OID and a Bitmapset of delta
>>> values, since we can assume that the initial set of OIDs is pretty close
>>> together.)  But we have to have that cache entry, and we have to consult
>>> it on every single comparison, so it's definitely going to be slower
>>> than before.
>>>
>>> So I'm thinking the comparison procedure goes like this:
>>>
>>> 1. Both OIDs even?
>>>        If so, just compare them numerically, and we're done.
>>>
>>> 2. Lookup cache entry for enum type.
>>>
>>> 3. Both OIDs in list of known-sorted OIDs?
>>>        If so, just compare them numerically, and we're done.
>>>
>>> 4. Search the part of the cache entry that lists sort positions.
>>>        If not both present, refresh the cache entry.
>>>        If still not present, throw error.
>>>
>>> 5. Compare by sort positions.
>>>
>>> Step 4 is the slowest part but would be avoided in most cases.
>>> However, step 2 is none too speedy either, and would usually
>>> be required when dealing with pre-existing enums.
>>
>> OK, I've made adjustments that I think do what you're suggesting.
>>
>> Patch is attached.
>>
>> Alternatively this can be pulled from
>> <git(at)github(dot)com:adunstan/postgresql-dev.git>
>
> Andrew, can't you get your own repo at git.postgresql.org?

He certainly could, but github provides more features and a nicer look
:-) And since it's git, it doesn't matter where the repo is.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Thom Brown <thom(at)linux(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-19 14:05:55
Message-ID: 1287497037-sup-4324@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Magnus Hagander's message of mar oct 19 05:23:31 -0300 2010:
>
> He certainly could, but github provides more features and a nicer look
> :-) And since it's git, it doesn't matter where the repo is.

Yeah. If you have a checked out copy of the GIT repo (preferably one
with the "master" branch in it), try this:

git remote add venum git://github.com/adunstan/postgresql-dev.git
git fetch venum venums:venums
git checkout venums

Then you have the patch in all its Git glory, in branch "venums".

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 16:53:01
Message-ID: 4CBDCCED.7040503@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/2010 12:21 AM, Andrew Dunstan wrote:
>
>
> On 10/18/2010 10:52 AM, Tom Lane wrote:
>> We could possibly deal with enum types that follow the existing
>> convention if we made the cache entry hold a list of all the original,
>> known-to-be-sorted OIDs. (This could be reasonably compact and cheap to
>> probe if it were represented as a starting OID and a Bitmapset of delta
>> values, since we can assume that the initial set of OIDs is pretty close
>> together.) But we have to have that cache entry, and we have to consult
>> it on every single comparison, so it's definitely going to be slower
>> than before.
>>
>> So I'm thinking the comparison procedure goes like this:
>>
>> 1. Both OIDs even?
>> If so, just compare them numerically, and we're done.
>>
>> 2. Lookup cache entry for enum type.
>>
>> 3. Both OIDs in list of known-sorted OIDs?
>> If so, just compare them numerically, and we're done.
>>
>> 4. Search the part of the cache entry that lists sort positions.
>> If not both present, refresh the cache entry.
>> If still not present, throw error.
>>
>> 5. Compare by sort positions.
>>
>> Step 4 is the slowest part but would be avoided in most cases.
>> However, step 2 is none too speedy either, and would usually
>> be required when dealing with pre-existing enums.
>
> OK, I've made adjustments that I think do what you're suggesting.
>
>

I've discovered and fixed a couple more bugs in this. I have one or two
more things to fix and then I'll send a new patch.

Meanwhile, I've been testing a database that was upgraded from 9.0, so
it has a lot of odd-numbered Oids. It's not really clear from
performance testing that the bitmap is a huge win, or even a win at all.
(Of course, my implementation might suck too.) I'll continue testing.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-19 21:42:43
Message-ID: 4CBE10D3.4030909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/2010 12:53 PM, Andrew Dunstan wrote:
>
>
> On 10/19/2010 12:21 AM, Andrew Dunstan wrote:
>>
>>
>> On 10/18/2010 10:52 AM, Tom Lane wrote:
>>> We could possibly deal with enum types that follow the existing
>>> convention if we made the cache entry hold a list of all the original,
>>> known-to-be-sorted OIDs. (This could be reasonably compact and
>>> cheap to
>>> probe if it were represented as a starting OID and a Bitmapset of delta
>>> values, since we can assume that the initial set of OIDs is pretty
>>> close
>>> together.) But we have to have that cache entry, and we have to
>>> consult
>>> it on every single comparison, so it's definitely going to be slower
>>> than before.
>>>
>>> So I'm thinking the comparison procedure goes like this:
>>>
>>> 1. Both OIDs even?
>>> If so, just compare them numerically, and we're done.
>>>
>>> 2. Lookup cache entry for enum type.
>>>
>>> 3. Both OIDs in list of known-sorted OIDs?
>>> If so, just compare them numerically, and we're done.
>>>
>>> 4. Search the part of the cache entry that lists sort positions.
>>> If not both present, refresh the cache entry.
>>> If still not present, throw error.
>>>
>>> 5. Compare by sort positions.
>>>
>>> Step 4 is the slowest part but would be avoided in most cases.
>>> However, step 2 is none too speedy either, and would usually
>>> be required when dealing with pre-existing enums.
>>
>> OK, I've made adjustments that I think do what you're suggesting.
>>
>>
>
> I've discovered and fixed a couple more bugs in this. I have one or
> two more things to fix and then I'll send a new patch.
>
> Meanwhile, I've been testing a database that was upgraded from 9.0,
> so it has a lot of odd-numbered Oids. It's not really clear from
> performance testing that the bitmap is a huge win, or even a win at
> all. (Of course, my implementation might suck too.) I'll continue
> testing.
>
>

Well a bit more testing shows some benefit. I've sorted out a few kinks,
so this seems to work. In particular, with the above tables, the version
imported from 9.0 can create have an index created in about the same
time as on the fresh table (identical data, but all even numbered Oids).

Of course, with lots of odd numbered Oids, if a label gets added the
imported version will degrade in performance much more quickly.

The test timed is:

do $$ begin for i in 1 .. 20 loop drop index if exists idx1;
create index idx1 on mydata(label); end loop; end; $$;

Latest patch attached.

cheers

andrew

Attachment Content-Type Size
venum9.patch text/x-patch 56.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 00:51:16
Message-ID: AANLkTimcZwMOx3s6hkoaX-J7kT9mn_QVsK47HOq8Hm4L@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Well a bit more testing shows some benefit. I've sorted out a few kinks, so
> this seems to work. In particular, with the above tables, the version
> imported from 9.0 can create have an index created in about the same time as
> on the fresh table (identical data, but all even numbered Oids).
>
> Of course, with lots of odd numbered Oids, if a label gets added the
> imported version will degrade in performance much more quickly.

I'm quite impressed by the amount of time and thought being put into
optimizing this. I didn't realize people cared so much about enum
performance; but it's good that they do.

I hope to see more such efforts in other parts of the system.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 01:15:33
Message-ID: 4CBE42B5.6000908@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/19/2010 08:51 PM, Robert Haas wrote:
> On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> Well a bit more testing shows some benefit. I've sorted out a few kinks, so
>> this seems to work. In particular, with the above tables, the version
>> imported from 9.0 can create have an index created in about the same time as
>> on the fresh table (identical data, but all even numbered Oids).
>>
>> Of course, with lots of odd numbered Oids, if a label gets added the
>> imported version will degrade in performance much more quickly.
> I'm quite impressed by the amount of time and thought being put into
> optimizing this. I didn't realize people cared so much about enum
> performance; but it's good that they do.
>
> I hope to see more such efforts in other parts of the system.

:-)

Efficiency has always been one of the major reasons for using enums, so
it's important that we make them extensible without badly affecting
performance.

cheers

andrew


From: David Fetter <david(at)fetter(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 19:16:52
Message-ID: 20101020191652.GD11162@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
> On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> > Well a bit more testing shows some benefit. I've sorted out a few kinks, so
> > this seems to work. In particular, with the above tables, the version
> > imported from 9.0 can create have an index created in about the same time as
> > on the fresh table (identical data, but all even numbered Oids).
> >
> > Of course, with lots of odd numbered Oids, if a label gets added the
> > imported version will degrade in performance much more quickly.
>
> I'm quite impressed by the amount of time and thought being put into
> optimizing this. I didn't realize people cared so much about enum
> performance; but it's good that they do.
>
> I hope to see more such efforts in other parts of the system.

Which parts of the system, in particular, do you have in mind? Other
people from EDB have mentioned that slimming down the on-disk
representation was one such target. What other ones would you see as
needing such attention?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 20:23:25
Message-ID: AANLkTinUwb1pYtr437rNcc15b4cQx7nw7oQkPLy3LsAq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 20, 2010 at 3:16 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Tue, Oct 19, 2010 at 08:51:16PM -0400, Robert Haas wrote:
>> On Tue, Oct 19, 2010 at 5:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> > Well a bit more testing shows some benefit. I've sorted out a few kinks, so
>> > this seems to work. In particular, with the above tables, the version
>> > imported from 9.0 can create have an index created in about the same time as
>> > on the fresh table (identical data, but all even numbered Oids).
>> >
>> > Of course, with lots of odd numbered Oids, if a label gets added the
>> > imported version will degrade in performance much more quickly.
>>
>> I'm quite impressed by the amount of time and thought being put into
>> optimizing this.  I didn't realize people cared so much about enum
>> performance; but it's good that they do.
>>
>> I hope to see more such efforts in other parts of the system.
>
> Which parts of the system, in particular, do you have in mind?  Other
> people from EDB have mentioned that slimming down the on-disk
> representation was one such target.  What other ones would you see as
> needing such attention?

On-disk footprint.
WAL volume.
COPY speed.
Checkpoint I/O.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 22:54:05
Message-ID: AANLkTinef2UYmru5aAOc3oxjKDkw_XU1+Sraroh4y29i@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Efficiency has  always been one of the major reasons for using enums, so
> it's important that we make them extensible without badly affecting
> performance.

on that note is it worthwhile backpatching recent versions to allocate
enums with even numbered oids? That way people binary upgrading can
get the benefit of the optimization they should qualify for...

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-20 22:56:42
Message-ID: AANLkTinxuFwcGBcPk9CUctTRQ2v4tFHjxgYeRD3vLw5E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Oct 20, 2010 at 6:54 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Tue, Oct 19, 2010 at 9:15 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Efficiency has  always been one of the major reasons for using enums, so
>> it's important that we make them extensible without badly affecting
>> performance.
>
> on that note is it worthwhile backpatching recent versions to allocate
> enums with even numbered oids? That way people binary upgrading can
> get the benefit of the optimization they should qualify for...

Uh, -1 from me. This is not a bug fix, and it will only help people
who create new enums between the time they upgrade to the relevant
minor release and the time they upgrade to 9.1. We are not into the
business of back-patching marginal peformance enhancements. If we
want to have a 9.0R2 release, or whatever, then so be it, but let's
not be modifying behavior in stable branches unless there's a *bug*.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:12:10
Message-ID: 10588.1287875530@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Latest patch attached.

I've been working through this patch. It occurs to me that there's a
fairly serious problem with the current implementation of insertion of
new values within the bounds of the current sort ordering. Namely, that
it does that by reassigning the enumsortorder values of pre-existing
rows. That creates a race condition: suppose that our backend is doing
that while another process is busy loading its internal cache of values
of the enum. If we commit partway through the other process's loading
of its cache, it may end up with a cache containing some pre-commit
entries and some post-commit entries. In the worst case it might even
have two images of the same enum label, with different enumsortorder
values. Needless to say, this is catastrophic for correctness of
subsequent comparisons in the other process.

We could try to avoid the race condition, but it's not going to be easy.
I think a better idea is to avoid having to issue updates against
pg_enum rows once they are inserted. To do that, I propose that instead
of integer enumsortorder values, we use float8 values. The initial
entries for a type would still have numbers 1..n, but when we need to
insert a value between two existing entries, we assign it a value
halfway between their enumsortorder values. Then we never have to alter
pre-existing entries, and there's no race condition: at worst, a
process's cache entry might be missing some just-committed rows, and we
know how to deal with that.

The disadvantage of this scheme is that if you repeatedly insert entries
in the "same place" in the sort order, you halve the available range
each time, so you'd run out of room after order-of-fifty halvings.
The values would eventually differ by only one unit in the last place,
so it'd not be possible to insert another value that would still be
distinguishable in the sort order. Is that an acceptable restriction?
(If you did run into this, you could manually reassign enumsortorder
values to get out of it, without having to dump-and-reload; but you'd
have to beware of the same race condition as above.) Of course adding
new values at the start or end of the enum's list wouldn't have that
restriction.

Thoughts?

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:21:42
Message-ID: 4CC36E06.3000409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> The disadvantage of this scheme is that if you repeatedly insert entries
> in the "same place" in the sort order, you halve the available range
> each time, so you'd run out of room after order-of-fifty halvings.

This is not a real issue. If anyone is using an ENUM with 1000 values
in it, they're doing it wrong. However, we'd have to present an
intelligible error message in that case.

The float method would also have a couple other issues:

(1) larger value for the enum order, so more RAM. Do we care?
(2) would need to create a view which hid the floats from admins who
just want to look at the enum ordering.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:29:14
Message-ID: 10815.1287876554@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> The disadvantage of this scheme is that if you repeatedly insert entries
>> in the "same place" in the sort order, you halve the available range
>> each time, so you'd run out of room after order-of-fifty halvings.

> This is not a real issue. If anyone is using an ENUM with 1000 values
> in it, they're doing it wrong. However, we'd have to present an
> intelligible error message in that case.

You wouldn't need 1000 values to get into trouble. If you did

CREATE TYPE e AS ENUM ('a', 'b');

ALTER TYPE e ADD 'c1' BEFORE 'b';
ALTER TYPE e ADD 'c2' BEFORE 'b';
ALTER TYPE e ADD 'c3' BEFORE 'b';
ALTER TYPE e ADD 'c4' BEFORE 'b';
...

you'd hit the failure somewhere around c50, assuming IEEE-style floats.
I think an "intelligible error message" wouldn't be hard; it'd say
something like "no more room to insert another enum label between enum
labels c49 and b".

> The float method would also have a couple other issues:

> (1) larger value for the enum order, so more RAM. Do we care?

The in-memory representation wouldn't be any bigger, because we don't
actually need to keep the enumorder values in the cache entries.
pg_enum rows would get wider, but not materially so considering they
already contain a 64-byte name column.

> (2) would need to create a view which hid the floats from admins who
> just want to look at the enum ordering.

Why? We weren't going to hide the enumorder values before, why would we
do it with this?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:33:32
Message-ID: 39B6E838-4028-41A3-8C7C-90FCA3D80F79@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 23, 2010, at 7:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Latest patch attached.
>
> I've been working through this patch. It occurs to me that there's a
> fairly serious problem with the current implementation of insertion of
> new values within the bounds of the current sort ordering. Namely, that
> it does that by reassigning the enumsortorder values of pre-existing
> rows. That creates a race condition:

It strikes me that this is merely one facet of our failure to do proper locking on DDL objects other than relations, and that this would be as good a time as any to start fixing it. ISTM that ALTER TYPE should grab a self-excluding lock just as ALTER TABLE already does.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:52:22
Message-ID: 11144.1287877942@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Oct 23, 2010, at 7:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I've been working through this patch. It occurs to me that there's a
>> fairly serious problem with the current implementation of insertion of
>> new values within the bounds of the current sort ordering. Namely, that
>> it does that by reassigning the enumsortorder values of pre-existing
>> rows. That creates a race condition:

> It strikes me that this is merely one facet of our failure to do proper locking on DDL objects other than relations, and that this would be as good a time as any to start fixing it. ISTM that ALTER TYPE should grab a self-excluding lock just as ALTER TABLE already does.

The point of all the design thrashing we've been doing here is to
*avoid* taking locks while comparing enum OIDs. So I'm not impressed
with this proposal. (A self-exclusive lock to prevent concurrent
ALTER TYPEs might be a good idea, but I don't want to block enum
comparisons too.)

I did just think of a possible solution that would work with the
updating implementation: readers of pg_enum could use an MVCC snapshot
instead of SnapshotNow while loading their caches. I'm not certain
offhand how unpleasant this'd be at the C-code level, but it should
be possible. I still prefer the idea of not changing rows once they're
inserted, though --- doing so could possibly also cause transient
failures in, eg, enum_in/enum_out, since those depend on syscaches that
are loaded with SnapshotNow.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-23 23:59:06
Message-ID: 2DE02FF5-E726-405E-99E5-D36FC28E065B@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Oct 23, 2010, at 7:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I still prefer the idea of not changing rows once they're
> inserted, though

Me too. But I really dislike the idea of having a failure mode where we can't insert for no reason that the user can understand. So I'm trying to think of a better option.

Why would you need to lock out type comparisons? Locking out concurrent DDL seems sufficient.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 00:11:11
Message-ID: 11405.1287879071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Why would you need to lock out type comparisons?

Didn't you get the point? The hazard is to a concurrent process that is
merely trying to load up its enum-values cache so that it can perform an
enum comparison. I don't want such an operation to have to block,
especially not against something that's trying to acquire a more or less
exclusive lock.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-24 00:35:15
Message-ID: 4CC37F43.2070506@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/23/2010 07:12 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> Latest patch attached.
> I've been working through this patch.

Cool.

[snip: reallocating enum sortorder on existing values has race
conditions. Suggestion to use a float8 instead and add new value half
way between neighbours, so no reassignment is necessary]

> The disadvantage of this scheme is that if you repeatedly insert entries
> in the "same place" in the sort order, you halve the available range
> each time, so you'd run out of room after order-of-fifty halvings.
> The values would eventually differ by only one unit in the last place,
> so it'd not be possible to insert another value that would still be
> distinguishable in the sort order. Is that an acceptable restriction?
> (If you did run into this, you could manually reassign enumsortorder
> values to get out of it, without having to dump-and-reload; but you'd
> have to beware of the same race condition as above.) Of course adding
> new values at the start or end of the enum's list wouldn't have that
> restriction.

Well, it's a tiny bit like my initial proposal for enum Oid ranges with
gaps, but with a level of indirection :-)

Seriously, I think it might be OK. Could we provide some safe way of
resetting the sortorder values? Or even a not-entirely-safe
superuser-only function might be useful. Binary upgrade could probably
call it safely, for example.

We've sure expended quite a lot of neurons on this feature :-)

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-24 00:54:36
Message-ID: 11933.1287881676@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Seriously, I think it might be OK. Could we provide some safe way of
> resetting the sortorder values? Or even a not-entirely-safe
> superuser-only function might be useful. Binary upgrade could probably
> call it safely, for example.

You could do it with plain SQL, as long as you weren't concerned about
confusing processes that were concurrently loading their enum caches.

Another thought here is that the split-in-half rule might be
unnecessarily dumb. It leaves equal amounts of code space on both sides
of the new value, even though the odds of subsequent insertions on both
sides are probably unequal. But I'm not sure if we can predict the
usage pattern well enough to know which side is more likely.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 04:04:03
Message-ID: AANLkTi=VnQV+XhRDSR9WyRyUZCXSrVdC3CTRCzyt6Ag=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 23, 2010 at 8:11 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Why would you need to lock out type comparisons?
>
> Didn't you get the point?  The hazard is to a concurrent process that is
> merely trying to load up its enum-values cache so that it can perform an
> enum comparison.  I don't want such an operation to have to block,
> especially not against something that's trying to acquire a more or less
> exclusive lock.

Hmm, yeah, I missed the point. Sorry.

I suppose you could fix this by always updating every row, and storing
in each row the total count of elements (or a random number). Then
it'd be obvious if you'd read an inconsistent view of the world.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-10-24 04:07:58
Message-ID: 4CC3B11E.9030707@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/23/2010 08:54 PM, Tom Lane wrote:
> Another thought here is that the split-in-half rule might be
> unnecessarily dumb. It leaves equal amounts of code space on both sides
> of the new value, even though the odds of subsequent insertions on both
> sides are probably unequal. But I'm not sure if we can predict the
> usage pattern well enough to know which side is more likely.

We can't. In particular, we can't rely on the label to tell us, so we
have no information at all to go on, really. Let's just go with the
simple midpoint.

Are you going to try doing this?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 04:26:07
Message-ID: 23598.1287894367@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I suppose you could fix this by always updating every row, and storing
> in each row the total count of elements (or a random number). Then
> it'd be obvious if you'd read an inconsistent view of the world.

Well, the easy way to read a consistent view of the world is to load the
cache using an MVCC snapshot instead of SnapshotNow. The current code
structure isn't amenable to that because it's relying on a syscache to
fetch the data for it, but that seems pretty inefficient anyway. I'm
thinking of changing it around so that the enum cache gets loaded with
a regular systable_beginscan() scan, and then we could load with an
MVCC snapshot.

I'm kind of inclined to go to the float-based representation anyway,
though, just because not having to update other rows to do an insert
seems like a good thing. But we could combine that with an MVCC
snapshot on the read side, which would make renumbering safe, which
would mean that we could auto-renumber when we ran out of code space
and not otherwise. Is that getting too complicated?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 05:11:13
Message-ID: AANLkTikTZTk9M8aGpgJArziOVEYKwVWSt4=0sqzG80LO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 12:26 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I suppose you could fix this by always updating every row, and storing
>> in each row the total count of elements (or a random number).  Then
>> it'd be obvious if you'd read an inconsistent view of the world.
>
> Well, the easy way to read a consistent view of the world is to load the
> cache using an MVCC snapshot instead of SnapshotNow.

Yeah, I have to admit I had never given much thought to how evil
SnapshotNow is. I wonder if we have latent bugs because of this.
Seems like anything where an object is represented by more than one
table row is suspect.

> The current code
> structure isn't amenable to that because it's relying on a syscache to
> fetch the data for it, but that seems pretty inefficient anyway.  I'm
> thinking of changing it around so that the enum cache gets loaded with
> a regular systable_beginscan() scan, and then we could load with an
> MVCC snapshot.

That sounds reasonable.

> I'm kind of inclined to go to the float-based representation anyway,
> though, just because not having to update other rows to do an insert
> seems like a good thing.  But we could combine that with an MVCC
> snapshot on the read side, which would make renumbering safe, which
> would mean that we could auto-renumber when we ran out of code space
> and not otherwise.  Is that getting too complicated?

The complexity doesn't bother me, but I'd probably just renumber every
time. It's worth optimizing enums for speed, but micro-optimizing the
enum DDL for speed may not be worth it, especially because it will
ensure that the renumbering code is very rarely tested.

If you are going to only renumber when necessary, I'd probably make
the ordering position an integer rather than a float, and just set the
values for the initial labels to something like 256 * 10^16, 257 *
10^16, 258 * 10^16, 259 * 10^16; and then some regression tests that
add labels a, z, y, x, .... I dunno if floats have completely
consistent representations on all the platforms we support, but with
integers it should be quite easy to predict the exact point when
you're going to run out of space and what the contents of pg_enum
should be just before and just after that point.

--
Robert Haas
President, Floating-Point Haters Anonymous


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 07:41:20
Message-ID: AANLkTinSYJ=w1WZ+vWRnqTGZ00Q8wdUPYAp6n=kZuiGq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 23, 2010 at 10:11 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I dunno if floats have completely
> consistent representations on all the platforms we support, but with
> integers it should be quite easy to predict the exact point when
> you're going to run out of space and what the contents of pg_enum
> should be just before and just after that point.

There's nothing magic about the integral types here. If you use a
string then you could always split by making the string longer. It
would get weird after you get to 256 (you can still handle it but
there would be weird special case code) but an array of integers would
be just as flexible and wouldn't have the same problem.

--
greg


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 12:01:26
Message-ID: AANLkTikk=CJ=mG=aKUNmWUR2jYVkhPF_uMo+2-5HajOo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 October 2010 05:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, the easy way to read a consistent view of the world is to load the
> cache using an MVCC snapshot instead of SnapshotNow.  The current code
> structure isn't amenable to that because it's relying on a syscache to
> fetch the data for it, but that seems pretty inefficient anyway.  I'm
> thinking of changing it around so that the enum cache gets loaded with
> a regular systable_beginscan() scan, and then we could load with an
> MVCC snapshot.
>
> I'm kind of inclined to go to the float-based representation anyway,
> though, just because not having to update other rows to do an insert
> seems like a good thing.  But we could combine that with an MVCC
> snapshot on the read side, which would make renumbering safe, which
> would mean that we could auto-renumber when we ran out of code space
> and not otherwise.  Is that getting too complicated?
>

As an alternative, how about storing the sort order as an array of
OIDs, in a single row in pg_type, or a new table, rather than across
multiple rows in pg_enum.

Code that read it would be guaranteed a consistent view of the data,
and at worst, it might be missing recently added elements, which the
comparison code can already deal with by re-reading the array.

Regards,
Dean

>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 16:20:22
Message-ID: 3415.1287937222@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> There's nothing magic about the integral types here. If you use a
> string then you could always split by making the string longer.

The entire objective here is to make enum comparisons fast. Somehow,
going to a non-primitive data type to represent the sort values does
not seem like a win from that perspective.

(Likewise for Dean's suggestion for an array of another kind.)

What I'm currently thinking is that we should actually use float4 not
float8. That eliminates any concerns about making the representation
larger than it was before. We'll definitely have to have the logic
to do renumbering, but it'll still be an exceedingly rare thing in
practice. With float4 the implementation would fail at somewhere
around 2^24 elements in an enum (since even with renumbering, there
wouldn't be enough bits to give each element a distinguishable value).
I don't see that as a real objection, and anyway if you were trying
to have an enum with many elements, you'd want the in-memory
representation to be compact.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 16:48:04
Message-ID: 4CC46344.9040005@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 12:20 PM, Tom Lane wrote:
> With float4 the implementation would fail at somewhere
> around 2^24 elements in an enum (since even with renumbering, there
> wouldn't be enough bits to give each element a distinguishable value).
> I don't see that as a real objection, and anyway if you were trying
> to have an enum with many elements, you'd want the in-memory
> representation to be compact.

Anything beyond the square root of this is getting pretty insane,
IMNSHO, so I'm really not that bothered by that number.

Assuming we renumber the sortorder as even positive integers, that
number comes down a couple of bits, but even so that gives us lots of
head room I think.

cheers

andrew


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 16:58:57
Message-ID: AANLkTimOJRzpBx62T_NrvK4FPVdOfwT00vqc3a+qeq9a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 24 October 2010 17:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <gsstark(at)mit(dot)edu> writes:
>> There's nothing magic about the integral types here. If you use a
>> string then you could always split by making the string longer.
>
> The entire objective here is to make enum comparisons fast.  Somehow,
> going to a non-primitive data type to represent the sort values does
> not seem like a win from that perspective.
>
> (Likewise for Dean's suggestion for an array of another kind.)
>

The point with an OID array is that you wouldn't need to store the
enumsortorder values at all. The sort order would just be the index of
the OID in the array. So the comparison code would read the OID array,
traverse it building an array of enum_sort structs {oid, idx}, sort
that by OID and cache it. Then do binary searches to efficiently find
the index (i.e., sort order) for any given OID. That's pretty much
what the comparison code is doing now, except without explicitly
stored sort positions.

The reason I thought this might help is that it would be a single
atomic operation to read the whole enum sort order, so you'd never get
a mix of old and new sort positions, if another transaction was
altering the enum.

> What I'm currently thinking is that we should actually use float4 not
> float8.  That eliminates any concerns about making the representation
> larger than it was before.  We'll definitely have to have the logic
> to do renumbering, but it'll still be an exceedingly rare thing in
> practice.  With float4 the implementation would fail at somewhere
> around 2^24 elements in an enum (since even with renumbering, there
> wouldn't be enough bits to give each element a distinguishable value).
> I don't see that as a real objection, and anyway if you were trying
> to have an enum with many elements, you'd want the in-memory
> representation to be compact.
>

That seems like a lot of additional complexity to do the renumbering,
and you'd have to make the comparison code safe against a concurrent
renumbering.

Regards,
Dean

>                        regards, tom lane
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 19:28:03
Message-ID: 15699.1287948483@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/24/2010 12:20 PM, Tom Lane wrote:
>> With float4 the implementation would fail at somewhere
>> around 2^24 elements in an enum (since even with renumbering, there
>> wouldn't be enough bits to give each element a distinguishable value).
>> I don't see that as a real objection, and anyway if you were trying
>> to have an enum with many elements, you'd want the in-memory
>> representation to be compact.

> Anything beyond the square root of this is getting pretty insane,
> IMNSHO, so I'm really not that bothered by that number.

Here's a WIP patch that incorporates most of what's been discussed here.
The critical part of it is summed up in the comments for RenumberEnumType:

/*
* RenumberEnumType
* Renumber existing enum elements to have sort positions 1..n.
*
* We avoid doing this unless absolutely necessary; in most installations
* it will never happen. The reason is that updating existing pg_enum
* entries creates hazards for other backends that are concurrently reading
* pg_enum with SnapshotNow semantics. A concurrent SnapshotNow scan could
* see both old and new versions of an updated row as valid, or neither of
* them, if the commit happens between scanning the two versions. It's
* also quite likely for a concurrent scan to see an inconsistent set of
* rows (some members updated, some not).
*
* We can avoid these risks by reading pg_enum with an MVCC snapshot
* instead of SnapshotNow, but that forecloses use of the syscaches.
* We therefore make the following choices:
*
* 1. Any code that is interested in the enumsortorder values MUST read
* pg_enum with an MVCC snapshot, or else acquire lock on the enum type
* to prevent concurrent execution of AddEnumLabel(). The risk of
* seeing inconsistent values of enumsortorder is too high otherwise.
*
* 2. Code that is not examining enumsortorder can use a syscache
* (for example, enum_in and enum_out do so). The worst that can happen
* is a transient failure to find any valid value of the row. This is
* judged acceptable in view of the infrequency of use of RenumberEnumType.
*/

This patch isn't committable as-is because I haven't made enum_first,
enum_last, enum_range follow these coding rules: they need to stop
using the syscache and instead use an indexscan on
pg_enum_typid_sortorder_index to locate the relevant rows. That should
be just a small fix though, and it seems likely to be a net win for
performance anyway. There are a couple of other loose ends
too, in particular I still think we need to prevent ALTER TYPE ADD
inside a transaction block because of the risk of finding undefined
enum OIDs in indexes.

Anybody really unhappy with this approach? If not, I'll finish this
up and commit it.

regards, tom lane

Attachment Content-Type Size
venum10.patch text/x-patch 74.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 19:33:42
Message-ID: 15783.1287948822@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> The point with an OID array is that you wouldn't need to store the
> enumsortorder values at all. The sort order would just be the index of
> the OID in the array. So the comparison code would read the OID array,
> traverse it building an array of enum_sort structs {oid, idx}, sort
> that by OID and cache it.

Hmm. But I guess we'd have to keep that array in the pg_type row,
and it'd be a huge PITA to work with at the SQL level. For instance,
psql and pg_dump can easily be adapted to use enumsortorder instead
of pg_enum.oid when they want to read out the labels in sorted order.
Doing the same with an array representation would be a very different
and much uglier query. I'm not eager to contort the catalog
representation that much.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 21:09:18
Message-ID: 4CC4A07E.2040104@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 03:33 PM, Tom Lane wrote:
> Dean Rasheed<dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> The point with an OID array is that you wouldn't need to store the
>> enumsortorder values at all. The sort order would just be the index of
>> the OID in the array. So the comparison code would read the OID array,
>> traverse it building an array of enum_sort structs {oid, idx}, sort
>> that by OID and cache it.
> Hmm. But I guess we'd have to keep that array in the pg_type row,
> and it'd be a huge PITA to work with at the SQL level. For instance,
> psql and pg_dump can easily be adapted to use enumsortorder instead
> of pg_enum.oid when they want to read out the labels in sorted order.
> Doing the same with an array representation would be a very different
> and much uglier query. I'm not eager to contort the catalog
> representation that much.

If that's the only objection I don't know that it's terribly serious.
psql and pg_dump could sanely use something like:

select enum_oid, row_number() over () as sort_order
from unnest(null::myenum) as enum_oid

That said, I'm generally wary of array fields, especially in the catalog.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 21:20:55
Message-ID: 4CC4A337.9080300@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 03:28 PM, Tom Lane wrote:
> This patch isn't committable as-is because I haven't made enum_first,
> enum_last, enum_range follow these coding rules: they need to stop
> using the syscache and instead use an indexscan on
> pg_enum_typid_sortorder_index to locate the relevant rows. That should
> be just a small fix though, and it seems likely to be a net win for
> performance anyway. There are a couple of other loose ends too, in
> particular I still think we need to prevent ALTER TYPE ADD inside a
> transaction block because of the risk of finding undefined enum OIDs
> in indexes. Anybody really unhappy with this approach? If not, I'll
> finish this up and commit it.

I'll look at it tonight. At first glance it looks OK.

(BTW, this would be a good case for publishing your development branch
somewhere (e.g. git.postgresql.org). That way I could import it and
easily do a diff between your patch and mine, in about three simple
commands. There are other ways of getting at it, and I'll go and do
that, but we should start to look at using the nice facilities git
provides.)

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 21:24:07
Message-ID: 4CC4A3F7.4000208@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 05:09 PM, Andrew Dunstan wrote:
>
>
> select enum_oid, row_number() over () as sort_order
> from unnest(null::myenum) as enum_oid
>
>

Of course, I meant:

select enum_label, row_number() over () as sort_order
from unnest(enum_range(null::myenum)) as enum_label

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 21:34:16
Message-ID: 18841.1287956056@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

BTW, I've forgotten --- did anyone publish a test case for checking
performance of enum comparisons? Or should I just cons up something
privately?

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-24 22:16:53
Message-ID: 4CC4B055.3060605@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 05:34 PM, Tom Lane wrote:
> BTW, I've forgotten --- did anyone publish a test case for checking
> performance of enum comparisons? Or should I just cons up something
> privately?
>

I have been running tests with
<http://developer.postgresql.org/~adunstan/enumtest.dmp>

The table "mydata" contains 100k rows with the column "label" containing
random values of an enum type with 500 labels. Basically, my main test
is to set up an index on that column. The alter the enum type, and test
again. Then alter some of the rows, and test again.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-25 00:12:46
Message-ID: 23536.1287965566@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/24/2010 05:34 PM, Tom Lane wrote:
>> BTW, I've forgotten --- did anyone publish a test case for checking
>> performance of enum comparisons? Or should I just cons up something
>> privately?

> I have been running tests with
> <http://developer.postgresql.org/~adunstan/enumtest.dmp>

> The table "mydata" contains 100k rows with the column "label" containing
> random values of an enum type with 500 labels. Basically, my main test
> is to set up an index on that column. The alter the enum type, and test
> again. Then alter some of the rows, and test again.

OK, I did some timing consisting of building a btree index with
maintenance_work_mem set reasonably high. This is on a debug-enabled
build, so it's not representative of production performance, but it will
do for seeing what we're doing to enum comparison performance. Here's
what I tried:

Stock 9.0.1 24.9 sec

patch, all OIDs even 25.2 sec (~ 1% hit)

patch, half of OIDs odd 27.2 sec (~ 9% hit)

same, bitmapset forced null 64.9 sec (~ 160% hit)

(Note that the noise level in these measurements is about 1%;
I'm not entirely convinced that the all-even case is really measurably
slower than 9.0.)

The "half of OIDs odd" case is what you'd get for a binary upgrade
from a 9.0 database. The last case shows what happens if the
intermediate bitmapset-test optimization is disabled, forcing all
comparisons to do binary searches in the sorted-by-OID array
(except for the one-quarter of cases where both OIDs are even
by chance). It's pretty grim but it represents a worst case that
you'd be very unlikely to hit in practice.

This shows that the bitmapset optimization really is quite effective,
at least for cases where all the enum labels are sorted by OID after
all. That motivated me to change the bitmapset setup code to what's
attached. This is potentially a little slower at initializing the
cache, but it makes up for that by still marking most enum members
as sorted even when a few out-of-order members have been inserted.
The key point is that an out-of-order member in the middle of the
array doesn't prevent us from considering following members as
properly sorted, as long as they are correctly ordered with respect to
the other properly-sorted members.

With this approach we can honestly say that inserting an out-of-order
enum value doesn't impact comparison performance for pre-existing
enum members, only for comparisons involving the out-of-order value
itself; even when the existing members were binary-upgraded and thus
weren't all even. I think that's a worthwhile advantage.

IMHO this level of performance is good enough. Anyone unhappy?

regards, tom lane

/*
* Here, we create a bitmap listing a subset of the enum's OIDs that are
* known to be in order and can thus be compared with just OID comparison.
*
* The point of this is that the enum's initial OIDs were certainly in
* order, so there is some subset that can be compared via OID comparison;
* and we'd rather not do binary searches unnecessarily.
*
* This is somewhat heuristic, and might identify a subset of OIDs that
* isn't exactly what the type started with. That's okay as long as
* the subset is correctly sorted.
*/
bitmap_base = InvalidOid;
bitmap = NULL;
bm_size = 1; /* only save sets of at least 2 OIDs */

for (start_pos = 0; start_pos < numitems - 1; start_pos++)
{
/*
* Identify longest sorted subsequence starting at start_pos
*/
Bitmapset *this_bitmap = bms_make_singleton(0);
int this_bm_size = 1;
Oid start_oid = items[start_pos].enum_oid;
float4 prev_order = items[start_pos].sort_order;
int i;

for (i = start_pos + 1; i < numitems; i++)
{
Oid offset;

offset = items[i].enum_oid - start_oid;
/* quit if bitmap would be too large; cutoff is arbitrary */
if (offset >= 1024)
break;
/* include the item if it's in-order */
if (items[i].sort_order > prev_order)
{
prev_order = items[i].sort_order;
this_bitmap = bms_add_member(this_bitmap, (int) offset);
this_bm_size++;
}
}

/* Remember it if larger than previous best */
if (this_bm_size > bm_size)
{
bms_free(bitmap);
bitmap_base = start_oid;
bitmap = this_bitmap;
bm_size = this_bm_size;
}
else
bms_free(this_bitmap);

/*
* Done if it's not possible to find a longer sequence in the rest
* of the list. In typical cases this will happen on the first
* iteration, which is why we create the bitmaps on the fly instead
* of doing a second pass over the list.
*/
if (bm_size >= (numitems - start_pos - 1))
break;
}


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-25 00:55:30
Message-ID: 4CC4D582.2070709@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 08:12 PM, Tom Lane wrote:
> OK, I did some timing consisting of building a btree index with
> maintenance_work_mem set reasonably high. This is on a debug-enabled
> build, so it's not representative of production performance, but it will
> do for seeing what we're doing to enum comparison performance. Here's
> what I tried:
>
> Stock 9.0.1 24.9 sec
>
> patch, all OIDs even 25.2 sec (~ 1% hit)
>
> patch, half of OIDs odd 27.2 sec (~ 9% hit)
>
> same, bitmapset forced null 64.9 sec (~ 160% hit)
>
> (Note that the noise level in these measurements is about 1%;
> I'm not entirely convinced that the all-even case is really measurably
> slower than 9.0.)

Yeah, that was my conclusion. I tested with debug/cassert turned off,
but my results were similar.

> The "half of OIDs odd" case is what you'd get for a binary upgrade
> from a 9.0 database. The last case shows what happens if the
> intermediate bitmapset-test optimization is disabled, forcing all
> comparisons to do binary searches in the sorted-by-OID array
> (except for the one-quarter of cases where both OIDs are even
> by chance). It's pretty grim but it represents a worst case that
> you'd be very unlikely to hit in practice.
>
> This shows that the bitmapset optimization really is quite effective,
> at least for cases where all the enum labels are sorted by OID after
> all. That motivated me to change the bitmapset setup code to what's
> attached. This is potentially a little slower at initializing the
> cache, but it makes up for that by still marking most enum members
> as sorted even when a few out-of-order members have been inserted.
> The key point is that an out-of-order member in the middle of the
> array doesn't prevent us from considering following members as
> properly sorted, as long as they are correctly ordered with respect to
> the other properly-sorted members.

That's nice. It's a tradeoff though. Bumping up the cost of setting up
the cache won't have much effect on a creating a large index, but could
affect to performance of retail comparisons significantly. But this is
probably worth it. You'd have to work hard to create the perverse case
that could result in seriously worse cache setup cost.

> With this approach we can honestly say that inserting an out-of-order
> enum value doesn't impact comparison performance for pre-existing
> enum members, only for comparisons involving the out-of-order value
> itself; even when the existing members were binary-upgraded and thus
> weren't all even. I think that's a worthwhile advantage.

Yeah, that's nice.

> IMHO this level of performance is good enough. Anyone unhappy?

No, seems good.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-25 01:20:58
Message-ID: 25742.1287969658@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 10/24/2010 08:12 PM, Tom Lane wrote:
>> This shows that the bitmapset optimization really is quite effective,
>> at least for cases where all the enum labels are sorted by OID after
>> all. That motivated me to change the bitmapset setup code to what's
>> attached. This is potentially a little slower at initializing the
>> cache, but it makes up for that by still marking most enum members
>> as sorted even when a few out-of-order members have been inserted.

> That's nice. It's a tradeoff though. Bumping up the cost of setting up
> the cache won't have much effect on a creating a large index, but could
> affect to performance of retail comparisons significantly. But this is
> probably worth it. You'd have to work hard to create the perverse case
> that could result in seriously worse cache setup cost.

Well, notice that I moved the caching into typcache.c, rather than
having it be associated with query startup. So unless you're actively
frobbing the enum definition, that's going to be paid only once per
session.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-10-25 01:38:41
Message-ID: 4CC4DFA1.3000306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/24/2010 09:20 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 10/24/2010 08:12 PM, Tom Lane wrote:
>>> This shows that the bitmapset optimization really is quite effective,
>>> at least for cases where all the enum labels are sorted by OID after
>>> all. That motivated me to change the bitmapset setup code to what's
>>> attached. This is potentially a little slower at initializing the
>>> cache, but it makes up for that by still marking most enum members
>>> as sorted even when a few out-of-order members have been inserted.
>> That's nice. It's a tradeoff though. Bumping up the cost of setting up
>> the cache won't have much effect on a creating a large index, but could
>> affect to performance of retail comparisons significantly. But this is
>> probably worth it. You'd have to work hard to create the perverse case
>> that could result in seriously worse cache setup cost.
> Well, notice that I moved the caching into typcache.c, rather than
> having it be associated with query startup. So unless you're actively
> frobbing the enum definition, that's going to be paid only once per
> session.

Oh, yes. Good. I'm just starting to look at this in detail.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 18:36:46
Message-ID: 201011121836.oACIak620194@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On 10/24/2010 08:12 PM, Tom Lane wrote:
> >> This shows that the bitmapset optimization really is quite effective,
> >> at least for cases where all the enum labels are sorted by OID after
> >> all. That motivated me to change the bitmapset setup code to what's
> >> attached. This is potentially a little slower at initializing the
> >> cache, but it makes up for that by still marking most enum members
> >> as sorted even when a few out-of-order members have been inserted.
>
> > That's nice. It's a tradeoff though. Bumping up the cost of setting up
> > the cache won't have much effect on a creating a large index, but could
> > affect to performance of retail comparisons significantly. But this is
> > probably worth it. You'd have to work hard to create the perverse case
> > that could result in seriously worse cache setup cost.
>
> Well, notice that I moved the caching into typcache.c, rather than
> having it be associated with query startup. So unless you're actively
> frobbing the enum definition, that's going to be paid only once per
> session.

Thanks for modifying pg_upgrade so it works with this new format. The
changes look good and cleaner than what I had to do for 9.0.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 18:40:28
Message-ID: 201011121840.oACIeS820559@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On 10/24/2010 08:12 PM, Tom Lane wrote:
> >> This shows that the bitmapset optimization really is quite effective,
> >> at least for cases where all the enum labels are sorted by OID after
> >> all. That motivated me to change the bitmapset setup code to what's
> >> attached. This is potentially a little slower at initializing the
> >> cache, but it makes up for that by still marking most enum members
> >> as sorted even when a few out-of-order members have been inserted.
>
> > That's nice. It's a tradeoff though. Bumping up the cost of setting up
> > the cache won't have much effect on a creating a large index, but could
> > affect to performance of retail comparisons significantly. But this is
> > probably worth it. You'd have to work hard to create the perverse case
> > that could result in seriously worse cache setup cost.
>
> Well, notice that I moved the caching into typcache.c, rather than
> having it be associated with query startup. So unless you're actively
> frobbing the enum definition, that's going to be paid only once per
> session.

FYI, I marked the TODO item for adding enums as completed. The TODO
item used to also mention renaming or removing enums, but I have seen
few requests for that so I removed that suggestion. We can always
re-add it if there is demand.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 18:50:00
Message-ID: 4CDD8C58.1020802@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/2010 01:40 PM, Bruce Momjian wrote:
> FYI, I marked the TODO item for adding enums as completed. The TODO
> item used to also mention renaming or removing enums, but I have seen
> few requests for that so I removed that suggestion. We can always
> re-add it if there is demand.

Renaming an item would not be terribly hard. Removing one is that nasty
case. There are all sorts of places the old value could be referred to:
table data, view definitions, check constraints, functions etc.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 18:55:18
Message-ID: 14331.1289588118@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/12/2010 01:40 PM, Bruce Momjian wrote:
>> FYI, I marked the TODO item for adding enums as completed. The TODO
>> item used to also mention renaming or removing enums, but I have seen
>> few requests for that so I removed that suggestion. We can always
>> re-add it if there is demand.

> Renaming an item would not be terribly hard. Removing one is that nasty
> case. There are all sorts of places the old value could be referred to:
> table data, view definitions, check constraints, functions etc.

Well, you can rename an item today if you don't mind doing a direct
UPDATE on pg_enum. I think that's probably sufficient if the demand
only amounts to one or two requests a year. I'd say leave it off the
TODO list till we see if there's more demand than that.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 18:58:24
Message-ID: AANLkTimqDS6JKhDCnNNyNnwhuMmEoK84bm2vRTHGCN2u@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 12, 2010 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 11/12/2010 01:40 PM, Bruce Momjian wrote:
>>> FYI, I marked the TODO item for adding enums as completed.  The TODO
>>> item used to also mention renaming or removing enums, but I have seen
>>> few requests for that so I removed that suggestion.  We can always
>>> re-add it if there is demand.
>
>> Renaming an item would not be terribly hard. Removing one is that nasty
>> case. There are all sorts of places the old value could be referred to:
>> table data, view definitions, check constraints, functions etc.
>
> Well, you can rename an item today if you don't mind doing a direct
> UPDATE on pg_enum.  I think that's probably sufficient if the demand
> only amounts to one or two requests a year.  I'd say leave it off the
> TODO list till we see if there's more demand than that.

I'd say put it on and mark it with an [E]. We could use some more
[E]asy items for that list.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 19:20:22
Message-ID: 14886.1289589622@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Nov 12, 2010 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, you can rename an item today if you don't mind doing a direct
>> UPDATE on pg_enum. I think that's probably sufficient if the demand
>> only amounts to one or two requests a year. I'd say leave it off the
>> TODO list till we see if there's more demand than that.

> I'd say put it on and mark it with an [E]. We could use some more
> [E]asy items for that list.

We don't need to add marginally-useful features just because they're
easy. If it doesn't have a real use-case, the incremental maintenance
cost of more code is a good reason to reject it.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 21:52:39
Message-ID: 1289598714-sup-8514@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Bruce Momjian's message of vie nov 12 15:40:28 -0300 2010:

> FYI, I marked the TODO item for adding enums as completed. The TODO
> item used to also mention renaming or removing enums, but I have seen
> few requests for that so I removed that suggestion. We can always
> re-add it if there is demand.

I'm sure there's going to be more demand for ENUM features, now that
they are more usable.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 22:19:57
Message-ID: B3D9A4CF-AC75-451E-8AC5-A767D4C22D23@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 12, 2010, at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Nov 12, 2010 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, you can rename an item today if you don't mind doing a direct
>>> UPDATE on pg_enum. I think that's probably sufficient if the demand
>>> only amounts to one or two requests a year. I'd say leave it off the
>>> TODO list till we see if there's more demand than that.
>
>> I'd say put it on and mark it with an [E]. We could use some more
>> [E]asy items for that list.
>
> We don't need to add marginally-useful features just because they're
> easy. If it doesn't have a real use-case, the incremental maintenance
> cost of more code is a good reason to reject it.

If we allow users to name objects, we ought to make every effort to also allow renaming them. In my mind, the only way renaming is too marginal to be useful is if the feature itself is too marginal to be useful.

...Robert


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-12 22:34:01
Message-ID: 1289601241.4869.16.camel@jd-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2010-11-12 at 14:20 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Fri, Nov 12, 2010 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Well, you can rename an item today if you don't mind doing a direct
> >> UPDATE on pg_enum. I think that's probably sufficient if the demand
> >> only amounts to one or two requests a year. I'd say leave it off the
> >> TODO list till we see if there's more demand than that.
>
> > I'd say put it on and mark it with an [E]. We could use some more
> > [E]asy items for that list.
>
> We don't need to add marginally-useful features just because they're
> easy. If it doesn't have a real use-case, the incremental maintenance
> cost of more code is a good reason to reject it.

Perhaps we should remove the ability to rename tables and databases too.
It would certainly lighten the code path.

JD

>
> regards, tom lane
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: jd(at)commandprompt(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-13 02:18:20
Message-ID: 201011130218.oAD2IKt22373@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joshua D. Drake wrote:
> On Fri, 2010-11-12 at 14:20 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > On Fri, Nov 12, 2010 at 1:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > >> Well, you can rename an item today if you don't mind doing a direct
> > >> UPDATE on pg_enum. I think that's probably sufficient if the demand
> > >> only amounts to one or two requests a year. I'd say leave it off the
> > >> TODO list till we see if there's more demand than that.
> >
> > > I'd say put it on and mark it with an [E]. We could use some more
> > > [E]asy items for that list.
> >
> > We don't need to add marginally-useful features just because they're
> > easy. If it doesn't have a real use-case, the incremental maintenance
> > cost of more code is a good reason to reject it.
>
> Perhaps we should remove the ability to rename tables and databases too.
> It would certainly lighten the code path.

OK, got it. Added incomplete TODO item:

Allow renaming and deleting enumerated values from an existing
enumerated data type

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: jd(at)commandprompt(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-13 02:56:01
Message-ID: 4CDDFE41.90508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/12/2010 09:18 PM, Bruce Momjian wrote:
> OK, got it. Added incomplete TODO item:
>
> Allow renaming and deleting enumerated values from an existing
> enumerated data type
>

I have serious doubts that deleting will ever be sanely doable.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: jd(at)commandprompt(dot)com, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-13 03:05:29
Message-ID: 201011130305.oAD35T300525@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> On 11/12/2010 09:18 PM, Bruce Momjian wrote:
> > OK, got it. Added incomplete TODO item:
> >
> > Allow renaming and deleting enumerated values from an existing
> > enumerated data type
> >
>
>
> I have serious doubts that deleting will ever be sanely doable.

True. Should we not mention it then? I can't think of many objects we
can't delete.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-13 17:30:03
Message-ID: 1289669403.13321.8.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-11-12 at 17:19 -0500, Robert Haas wrote:
> If we allow users to name objects, we ought to make every effort to
> also allow renaming them. In my mind, the only way renaming is too
> marginal to be useful is if the feature itself is too marginal to be
> useful.

The bottom line is, any kind of database object needs to be changeable
and removable, otherwise there will always be hesitations about its use.
And when there are hesitations about the use, it's often easiest not to
bother.

I remember ten years ago or so we used to send people away who requested
the ability to drop columns, claiming they didn't plan their database
properly, or they should load it from scratch. Nowadays that is
ludicrous; databases live forever, development is agile, everything
needs to be changeable.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: extensible enums
Date: 2010-11-13 22:31:57
Message-ID: AANLkTimyJiA=ypP5Cjd-JMdhrXmKpLsho4PcAESA7mWt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 13, 2010 at 12:30 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On fre, 2010-11-12 at 17:19 -0500, Robert Haas wrote:
>> If we allow users to name objects, we ought to make every effort to
>> also allow renaming them.  In my mind, the only way renaming is too
>> marginal to be useful is if the feature itself is too marginal to be
>> useful.
>
> The bottom line is, any kind of database object needs to be changeable
> and removable, otherwise there will always be hesitations about its use.
> And when there are hesitations about the use, it's often easiest not to
> bother.
>
> I remember ten years ago or so we used to send people away who requested
> the ability to drop columns, claiming they didn't plan their database
> properly, or they should load it from scratch.  Nowadays that is
> ludicrous; databases live forever, development is agile, everything
> needs to be changeable.

It was ludicrous then, too. I picked MySQL for several projects early
on for precisely the lack of the ability to drop columns in
PostgreSQL.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP: extensible enums
Date: 2010-11-14 20:58:17
Message-ID: 4CE04D69.7040907@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> I'd say put it on and mark it with an [E]. We could use some more
>> [E]asy items for that list.
>
> We don't need to add marginally-useful features just because they're
> easy. If it doesn't have a real use-case, the incremental maintenance
> cost of more code is a good reason to reject it.

I'll bite.

Use-case:

1) DBA adds "Department Role" enum, with set
{'Director','Secretary','Staff','Support','Temporary','Liason'}.

2) 3-person data entry team updates all employee records with those roles.

3) First summary report is run.

4) Manager points out that "Liason" is misspelled.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com