Quick Links

Re: extensible enum types

Lists:	pgsql-hackers

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	extensible enum types
Date:	2010-06-18 15:50:58
Message-ID:	4C1B95E2.6090508@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

A recent discussion with a developer convinced me that enums would be
much more useful if new values could be added to the label list easily
without changing the stored values. Given the current representation of
enums as a globally unique oid, I think this is just about impossible if
the new label is to go somewhere other than on the end of the list
(assuming that's where the next oid would get allocated).

So I have been thinking about a new type family which for the sake of
argument I will call varenum (please hold the bikeshedding till later).
Staying with 4 bytes as the stored value, the idea is that we partition
it into two 16 bit pieces. The high-order piece is the varenum type
identifier. The low-order piece would uniquely identify the label within
the set for that varenum. Initial values would be allocated like this:
calculate the space between values p as 2**16 / (16 + number_of_labels).
Then set the first value at 8 * p, then next at 9* p and so on. This is
designed to allow more space to add labels at the beginning and end of
the list, where this is more likely. Adding a label would be a matter of
finding the labels adjacent to the position where we want to add the new
label, and placing it half way between them, possibly with special rules
for the list ends. If we want to add the label between two labels having
values n and n+1 the addition would fail.

All this would mean a) we can't have more than 65536 varenum types in a
system, and no varenum type could have more than 65536 values (and
possibly less, depending on how they are added). In practice I think
these are quite reasonable restrictions, and 99.99% of users would never
come close to bumping up against them.

Given all this, we could then allow things like:

ALTER TYPE varenumtype ADD 'newlabel' [ BEFORE | AFTER 'existinglabel' ]

I haven't looked at how we'd set this up in the catalog, but I assume
that by analogy with pg_enum it should be fairly straightforward.

We could actually allow something like the above for existing enum types
without the optional clause, which would just add the label wherever the
next oid happened to fall, which would commonly be at the end of the
list. That would handle the common case where the application doesn't
care about the label ordering. That should be a much simpler change and
is probably worth doing first.

There was some discussion of this here:
<http://archives.postgresql.org/message-id/20080425182718.GD5888@alvh.no-ip.org>
But I'm fairly reluctant to do things which will require table rewrites.
I'm also less excited about the idea of removing values - that is
something I don't ever recall being asked about, but I have often been
asked about adding labels. For people prepared to rewrite tables, there
is currently a workaround: create a new enum type, alter the table to
use the new enum type, drop the old enum type, rename the new enum type
to the old enum type.

Thoughts?

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:07:47
Message-ID:	AANLkTikdrGIJ7wCy9qS7r_9RA0a7QTMJR3TSZ-BzPlme@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Then set the
> first value at 8 * p, then next at 9* p and so on. This is designed to
> allow more space to add labels at the beginning and end of the list, where
> this is more likely. Adding a label would be a matter of finding the labels
> adjacent to the position where we want to add the new label, and placing it
> half way between them, possibly with special rules for the list ends. If we
> want to add the label between two labels having values n and n+1 the
> addition would fail.

I like the idea of being able to modify enums on the fly, but I'm
skeptical of an implementation that won't always work. Maybe it's
still better than what we have now, but it seems grotty.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:34:35
Message-ID:	4C1BA01B.9080202@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> Then set the
>> first value at 8 * p, then next at 9* p and so on. This is designed to
>> allow more space to add labels at the beginning and end of the list, where
>> this is more likely. Adding a label would be a matter of finding the labels
>> adjacent to the position where we want to add the new label, and placing it
>> half way between them, possibly with special rules for the list ends. If we
>> want to add the label between two labels having values n and n+1 the
>> addition would fail.
>>
>
> I like the idea of being able to modify enums on the fly, but I'm
> skeptical of an implementation that won't always work. Maybe it's
> still better than what we have now, but it seems grotty.
>
>

I'd be perfectly happy to hear a reasonable alternative. Assuming we use
some integer representation, given two labels represented by n and n+1,
we can't add a label between them without rewriting the tables that use
the type, whether it's my representation scheme or some other. Maybe we
could have a FORCE option which would rewrite if necessary.

cheers

andrew

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:35:45
Message-ID:	A9935587-F377-427F-AAF3-A5BBEBA83B17@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 18, 2010, at 9:07 AM, Robert Haas wrote:

>> Then set the
>> first value at 8 * p, then next at 9* p and so on. This is designed to
>> allow more space to add labels at the beginning and end of the list, where
>> this is more likely. Adding a label would be a matter of finding the labels
>> adjacent to the position where we want to add the new label, and placing it
>> half way between them, possibly with special rules for the list ends. If we
>> want to add the label between two labels having values n and n+1 the
>> addition would fail.
>
> I like the idea of being able to modify enums on the fly, but I'm
> skeptical of an implementation that won't always work. Maybe it's
> still better than what we have now, but it seems grotty.

Yes, other than that I fully endorse the idea. What's the likelihood of a failure? And would the position of the new label (represented by its internal number) be predictive? IOW, would updating the same varenumtype in two databases (or on two servers) yield the same underlying positional value?

Best,

David

From:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:36:23
Message-ID:	28C0DDCD-59F6-476E-8769-BE68E4B70BED@kineticode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:

> I'd be perfectly happy to hear a reasonable alternative. Assuming we use some integer representation, given two labels represented by n and n+1, we can't add a label between them without rewriting the tables that use the type, whether it's my representation scheme or some other. Maybe we could have a FORCE option which would rewrite if necessary.

People would likely always use it.

Why not use a decimal number?

Best,

David

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:43:23
Message-ID:	1276879403.24576.39.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-06-18 at 12:34 -0400, Andrew Dunstan wrote:
>
> Robert Haas wrote:
> > On Fri, Jun 18, 2010 at 11:50 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> >
> >> Then set the
> >> first value at 8 * p, then next at 9* p and so on. This is designed to
> >> allow more space to add labels at the beginning and end of the list, where
> >> this is more likely. Adding a label would be a matter of finding the labels
> >> adjacent to the position where we want to add the new label, and placing it
> >> half way between them, possibly with special rules for the list ends. If we
> >> want to add the label between two labels having values n and n+1 the
> >> addition would fail.
> >>
> >
> > I like the idea of being able to modify enums on the fly, but I'm
> > skeptical of an implementation that won't always work. Maybe it's
> > still better than what we have now, but it seems grotty.
> >
> >
>
> I'd be perfectly happy to hear a reasonable alternative. Assuming we use
> some integer representation, given two labels represented by n and n+1,
> we can't add a label between them without rewriting the tables that use
> the type, whether it's my representation scheme or some other. Maybe we
> could have a FORCE option which would rewrite if necessary.

I apologize as this thread has already moved past the initial proposal.
I had mail problems so I didn't see any part of the thread in my client
until now.

My standard response to enums is, don't use them. Specifically because
you can't modify them. When I talk to developers and they see we have
enums they get excited and then I tell them you can't modify them and
they get very downtrodden. I tell them just to use a look up table.

Anyway, Andrew if you can find a reasonable way to make them modifiable,
I am all for it :)

On that note and yes this would be weird but have we considered some
kind of wrapper around just a lookup table? I mean what if there was a
system table that listed :

nspname, relation, column, value

the "type" enum would know to look in that table for valid values

Then we just add various syntactical sugar to manage the values in the
table via that specific enum:

ALTER TABLE foo ALTER COLUMN bar VALUES (bar,baz,bing,foo)

ALTER TABLE would know its an enum and would perform proper operations
on the system table.

Sincerely,

Joshua D. Drake

>
> cheers
>
> andrew
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:45:10
Message-ID:	14542.1276879510@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Robert Haas wrote:
>> I like the idea of being able to modify enums on the fly, but I'm
>> skeptical of an implementation that won't always work. Maybe it's
>> still better than what we have now, but it seems grotty.

> I'd be perfectly happy to hear a reasonable alternative.

Insert a sort order column into pg_enum, and rearrange the values in
that whenever the user wants to add a new value in a particular place.
You give up cheap comparisons in exchange for flexibility. I think lots
of people would accept that tradeoff, especially if they could make it
per-datatype.

One point here is that you'd have to restrict the rearrangements so that
the relative sort order of existing values never changes, else you break
(for example) indexes on columns of that type.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:56:14
Message-ID:	4C1BA52E.6060809@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David E. Wheeler wrote:
> What's the likelihood of a failure?

Constructing a failure case would be simple. In practice, I suspect it
would be very low.

> And would the position of the new label (represented by its internal number) be predictive? IOW, would updating the same varenumtype in two databases (or on two servers) yield the same underlying positional value?
>
>
>

The algorithm I outlined is deterministic, so the same sequence of
operations on the type would yield the same set of values on the
low-order 16 bits. But that doesn't mean they would have the same high
order 16 bits. That would depend on the history of the system. But more
importantly, why do you care? the stored value is an implementation
artefact that should be totally invisible to users. There would be no
guarantee of the same underlying values on dump/reload either, just as
there is not now for enums.

cheers

andrew

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 16:59:10
Message-ID:	4C1BA5DE.2090501@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

David E. Wheeler wrote:
> On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:
>
>
>> I'd be perfectly happy to hear a reasonable alternative. Assuming we use some integer representation, given two labels represented by n and n+1, we can't add a label between them without rewriting the tables that use the type, whether it's my representation scheme or some other. Maybe we could have a FORCE option which would rewrite if necessary.
>>
>
> People would likely always use it.
>
> Why not use a decimal number?
>
>
>

You are just bumping up the storage cost. Part of the attraction of
enums is their efficiency.

cheers

andrew

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 17:17:36
Message-ID:	4C1BAA30.7060402@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Robert Haas wrote:
>>
>>> I like the idea of being able to modify enums on the fly, but I'm
>>> skeptical of an implementation that won't always work. Maybe it's
>>> still better than what we have now, but it seems grotty.
>>>
>
>
>> I'd be perfectly happy to hear a reasonable alternative.
>>
>
> Insert a sort order column into pg_enum, and rearrange the values in
> that whenever the user wants to add a new value in a particular place.
> You give up cheap comparisons in exchange for flexibility. I think lots
> of people would accept that tradeoff, especially if they could make it
> per-datatype.
>
> One point here is that you'd have to restrict the rearrangements so that
> the relative sort order of existing values never changes, else you break
> (for example) indexes on columns of that type.
>
>
>

Hmm. Yes, that could work. The assumption in my proposal was that
existing values would not be reordered anyway.

But I'm not happy about giving up cheap comparison. And how would it be
per data-type? That part isn't clear to me. Would we mark a given enum
type as having its oids in order? It would also be sensible to quantify
how much more expensive comparisons would become. If the sort order data
were kept in the syscache the extra cost might get very small.

What I actually like most about this suggestion is that we would be able
to apply it cleanly to existing enum types without inventing anything
much new.

cheers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 17:18:11
Message-ID:	AANLkTilM3cEGxjGpunambSEuzVHeADH9vsSNY_BeChd-@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> You are just bumping up the storage cost. Part of the attraction of enums is
> their efficiency.

What's efficient about them? Aren't we using 4 bytes to store a value
that will nearly always fit in 2, if not 1?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 17:59:09
Message-ID:	4C1BB3ED.9090000@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> You are just bumping up the storage cost. Part of the attraction of enums is
>> their efficiency.
>>
>
> What's efficient about them? Aren't we using 4 bytes to store a value
> that will nearly always fit in 2, if not 1?
>
>
This was debated when we implemented enums. As between 1,2 and 4 there
is often not much to choose, as alignment padding makes it pretty much
the same. But any of them are more efficient than storing a numeric
value or the label itself.

Anyway, it might well be moot.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 18:06:07
Message-ID:	AANLkTinpnzqhQyMOBpXlODvhgy3P92TZAcSRh3BE-jTI@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> This was debated when we implemented enums. As between 1,2 and 4 there is
> often not much to choose, as alignment padding makes it pretty much the
> same. But any of them are more efficient than storing a numeric value or the
> label itself.

I was assuming the alternative was an integer, rather than a
numeric... but yeah, a numeric or the label itself would definitely
be larger.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 18:28:06
Message-ID:	AANLkTik8eUZheEiEDW5nS7hgOzrPuC5F1hipNhz3INa3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 6:17 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Tom Lane wrote:
>>
>> Insert a sort order column into pg_enum, and rearrange the values in
>> that whenever the user wants to add a new value in a particular place.

+1 I was going to say exactly the same thing.

>> You give up cheap comparisons in exchange for flexibility. I think lots
>> of people would accept that tradeoff, especially if they could make it
>> per-datatype.
> Hmm. Yes, that could work. The assumption in my proposal was that existing
> values would not be reordered anyway.
>
> But I'm not happy about giving up cheap comparison. And how would it be per
> data-type? That part isn't clear to me. Would we mark a given enum type as
> having its oids in order? It would also be sensible to quantify how much
> more expensive comparisons would become. If the sort order data were kept in
> the syscache the extra cost might get very small.

I think you would need a syscache or something like it. My first
instinct was to load the whole enum value->sort order mapping into a
hash table the first time you're asked to compare two values in a
given type. Then your comparison operator amounts to "look
up/initialize hash table for this enum type, look up both sort orders
in hash table, return comparison". You might need something like a
syscache for the hash tables so that you don't keep the hash tables
around forever.

Using a syscache for the individual sort values would be slower to
initially load if you're sorting a list since you would be doing a lot
of retail lookups of individual values. But then perhaps it's a cheap
way to protect against very large enums which using a hash table per
enum type would be fragile against.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 19:18:51
Message-ID:	16830.1276888731@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Insert a sort order column into pg_enum, and rearrange the values in
>> that whenever the user wants to add a new value in a particular place.
>> You give up cheap comparisons in exchange for flexibility. I think lots
>> of people would accept that tradeoff, especially if they could make it
>> per-datatype.

> But I'm not happy about giving up cheap comparison.

I don't think it would be all that bad. We could teach typcache.c to
cache the ordering data for any type that's in active use. It'd
certainly be a lot more expensive than OID comparison, but perhaps not
worse than NUMERIC comparisons.

> And how would it be per data-type?

Well, there'd be two kinds of enums, just as you were saying before.
I'm not sure how we'd expose that to users exactly, or whether there
could be provisions for switching a type's behavior after creation.

regards, tom lane

From:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 22:17:50
Message-ID:	AANLkTilS_39PNwqwFRt7hD-Of_naEEProIsYpzXQymfE@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 1:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Robert Haas wrote:
>>
>> On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>> wrote:
>>
>>>
>>> You are just bumping up the storage cost. Part of the attraction of enums
>>> is
>>> their efficiency.
>>>
>>
>> What's efficient about them? Aren't we using 4 bytes to store a value
>> that will nearly always fit in 2, if not 1?
>>
>>
>
> This was debated when we implemented enums. As between 1,2 and 4 there is
> often not much to choose, as alignment padding makes it pretty much the
> same. But any of them are more efficient than storing a numeric value or the
> label itself.
>
> Anyway, it might well be moot.
>
> cheers
>
> andrew

Something I don't understand in all this is: why can't the type of an
enum be determined statically rather than stored in every single
value? For instance, if we have:

CREATE TYPE number AS ENUM ('one', 'two', 'three');
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

PostgreSQL won't allow a comparison between two different enum types, e.g.:

> SELECT 'one'::number = 'red'::color;
ERROR: operator does not exist: number = color

However, when we say:

SELECT 'one'::number = 'one'::number

Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
enum input rather than rely on it being stored in the value (either
implicitly via OID or explicitly as a word half)?

Also, I can't seem to find the original debates from when enums were
implemented. Does anyone have a link to that thread in the archives?
Thanks.

Joey Adams

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 22:33:12
Message-ID:	4C1BF428.8050506@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joseph Adams wrote:
> Also, I can't seem to find the original debates from when enums were
> implemented. Does anyone have a link to that thread in the archives?
> Thanks.
>

Start here
<http://archives.postgresql.org/pgsql-hackers/2006-08/msg00979.php>

cheers

andrew

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, David E(dot) Wheeler <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 22:36:05
Message-ID:	1276900502-sup-1914@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Excerpts from Joseph Adams's message of vie jun 18 18:17:50 -0400 2010:

> Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
> enum input rather than rely on it being stored in the value (either
> implicitly via OID or explicitly as a word half)?
>
> Also, I can't seem to find the original debates from when enums were
> implemented. Does anyone have a link to that thread in the archives?

Probably around here
http://archives.postgresql.org/pgsql-patches/2006-12/msg00099.php

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 22:53:20
Message-ID:	20587.1276901600@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> writes:
> Couldn't enum_eq just use get_fn_expr_argtype to determine the type of
> enum input rather than rely on it being stored in the value

No. Support functions have to work in many contexts where there is no
"side channel" such as get_fn_expr_argtype. What's more, it's very
difficult to provide a side channel without creating security holes.
We used to think it was OK for output functions to rely on a type OID
passed separately from the actual value, but that's insecure:
http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-18 23:59:45
Message-ID:	4C1C0871.8080500@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>> And how would it be per data-type?
>>
>
> Well, there'd be two kinds of enums, just as you were saying before.
> I'm not sure how we'd expose that to users exactly, or whether there
> could be provisions for switching a type's behavior after creation.
>
>
>

I'd be a whole lot happier if we didn't have to do that. I've been
trying to wrack my brain for some clever way to avoid it (so far without
success).

cheers

andrew

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-19 18:58:59
Message-ID:	4C1D1373.5020103@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> Insert a sort order column into pg_enum, and rearrange the values in
>>> that whenever the user wants to add a new value in a particular place.
>>> You give up cheap comparisons in exchange for flexibility. I think lots
>>> of people would accept that tradeoff, especially if they could make it
>>> per-datatype.
>>>
>
>
>> But I'm not happy about giving up cheap comparison.
>>
>
> I don't think it would be all that bad. We could teach typcache.c to
> cache the ordering data for any type that's in active use. It'd
> certainly be a lot more expensive than OID comparison, but perhaps not
> worse than NUMERIC comparisons.
>
>
>

Another thought: could we add a column to pg_type with a flag that's
true if the oids are in sort order? Then the comparison routines could
just look that up in the type cache and if it's true (as it often will
be) just return the oid comparison.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-19 19:03:15
Message-ID:	10080.1276974195@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Another thought: could we add a column to pg_type with a flag that's
> true if the oids are in sort order? Then the comparison routines could
> just look that up in the type cache and if it's true (as it often will
> be) just return the oid comparison.

Well, having to do a cache lookup already makes it a couple orders of
magnitude more expensive than an OID comparison. However, it's hard to
say how much that matters in terms of total application performance.
We really could do with a bit of performance testing here ...

regards, tom lane

From:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-19 20:14:29
Message-ID:	AANLkTimFxo6594Fs_AlodJpqP0is0XS71uGsqBSiEJWf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Jun 18, 2010 at 12:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>wrote:

>
>
> David E. Wheeler wrote:
>
>> On Jun 18, 2010, at 9:34 AM, Andrew Dunstan wrote:
>>
>>
>>
>>> I'd be perfectly happy to hear a reasonable alternative. Assuming we use
>>> some integer representation, given two labels represented by n and n+1, we
>>> can't add a label between them without rewriting the tables that use the
>>> type, whether it's my representation scheme or some other. Maybe we could
>>> have a FORCE option which would rewrite if necessary.
>>>
>>>
>>
>> People would likely always use it.
>>
>> Why not use a decimal number?
>>
>>
>>
>>
>
> You are just bumping up the storage cost. Part of the attraction of enums
> is their efficiency.
>
>
Probably it'd be the same as the decimal suggestion above, but we can use
floating-point data type.

It will allow injection of a new label at any stage.

CREATE leads to

Label1 -> 1.0
Label2 -> 2.0
Label3 -> 3.0

ALTER ... ADD Label4 AFTER Label2; leads to
Label1 -> 1.0
Label2 -> 2.0
Label4 -> 2.5
Label3 -> 3.0

ALTER ... ADD Label5 AFTER Label2; leads to
Label1 -> 1.0
Label2 -> 2.0
Label5 -> 2.25
Label4 -> 2.5
Label3 -> 3.0

Since floating-point implementation is architecture dependent, the ALTER
command should check that the injected value does not equate to any value
around it (eg. comparisons of (2.5 == 2) and (2.25 == 2.5) should not yield
0); and if it does, then throw an error and ask the user to use the
rewrite-the-table version of the command.

And since it is still 32 bit, and comparisons done by machine, performance
should be acceptably close to current integer comparisons, and much faster
that the cache lookups etc. being proposed.

This is very similar to Andrew's original suggestion of splitting 32 bits
into 16+16, but managed by the machine hence no complicated comparison algos
needed on our part. Also, since this is all transparent to the SQL
interface, our dump-reload cycle or Slony replication, etc. should not be
affected either.

Regards,
--
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh(dot)gurjeet(at){ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>
Cc:	"David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-19 20:55:10
Message-ID:	4C1D2EAE.1090000@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Gurjeet Singh wrote:
>
>
> This is very similar to Andrew's original suggestion of splitting 32
> bits into 16+16, but managed by the machine hence no complicated
> comparison algos needed on our part. Also, since this is all
> transparent to the SQL interface, our dump-reload cycle or Slony
> replication, etc. should not be affected either.
>
>

It would break the on-disk representation, though. That's not something
we want to do any more if it can possibly be avoided. We want to keep
pg_upgrade working.

cheers

andrew

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Gurjeet Singh <singh(dot)gurjeet(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-19 21:11:43
Message-ID:	AANLkTikGvSO81dbMWIACBJzt2qa2AgkKZhgZigXMe4zc@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Jun 19, 2010 at 4:55 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Gurjeet Singh wrote:
>>
>>
>> This is very similar to Andrew's original suggestion of splitting 32 bits
>> into 16+16, but managed by the machine hence no complicated comparison algos
>> needed on our part. Also, since this is all transparent to the SQL
>> interface, our dump-reload cycle or Slony replication, etc. should not be
>> affected either.
>>
>>
>
> It would break the on-disk representation, though. That's not something we
> want to do any more if it can possibly be avoided. We want to keep
> pg_upgrade working.

I was partial to your original idea -- i thought it was quite clever
actually. enums are a performance side of a tradeoff already so I
think any improvement for them should be looked at through that lens.

16 bits is IMO enough to pick a reasonable bucket size that gives you
enough play to handle the vast majority of cases that are appropriate
for enums. your workaround in the rare case you actually hit the
limitations (most of these would fall under the 'oops, i used the
wrong tool' category) seems perfectly ok imo.

merlin

From:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-20 00:15:36
Message-ID:	1276992936.23257.102053.camel@ebony
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote:

> Thoughts?

enum types exist as an optimisation-by-avoidance of referential
integrity.

We're a relational database, so IMHO we should spend time performance
tuning RI.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-20 02:16:38
Message-ID:	4C1D7A06.6060401@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Fri, 2010-06-18 at 11:50 -0400, Andrew Dunstan wrote:
>
>
>> Thoughts?
>>
>
> enum types exist as an optimisation-by-avoidance of referential
> integrity.
>
> We're a relational database, so IMHO we should spend time performance
> tuning RI.
>
>

I don't accept your initial assertion at all. But in any case, these are
not mutually exclusive. Your work tuning RI will not obstruct mine in
making enums more useful, nor vice versa.

cheers

andrew

From:	Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-20 02:42:48
Message-ID:	AANLkTikpg5yDICt0ng_fPgp5iJHKCU3wPTFJyzFFxqL2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

>> Thoughts?
>
> enum types exist as an optimisation-by-avoidance of referential
> integrity.
>
> We're a relational database, so IMHO we should spend time performance
> tuning RI.

I take the view that they exist as a way of representing enumerations
of application/domain values - if it's hard coded in the application,
it's hard coded in the database by using an enum. This is why it isn't
that big a problem that they cannot be amended - they ought to be very
static, immutable values in the first place. I still think it would be
handy to be able to append new values though, and not have to ALTER
COLUMN USING to a new enum type. Besides, using enums in place of
lookup tables doesn't really make much sense as an optimisation.

It's very cool to be able to write queries like SELECT * FROM payments
WHERE payment_type = 'cash', rather than having to recall time and
again what the PK of cash is within your lookup table.

--
Regards,
Peter Geoghegan

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-20 03:50:50
Message-ID:	1277005850.5130.9.camel@jd-desktop.unknown.charter.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, 2010-06-20 at 03:42 +0100, Peter Geoghegan wrote:
> >> Thoughts?

> It's very cool to be able to write queries like SELECT * FROM payments
> WHERE payment_type = 'cash', rather than having to recall time and
> again what the PK of cash is within your lookup table.

Ahem. That is what a natural key is for :)

Joshua D. Drake

>
> --
> Regards,
> Peter Geoghegan
>

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering

From:	Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To:	jd(at)commandprompt(dot)com
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-20 15:02:37
Message-ID:	AANLkTim2Zzn9vs_AusOBXt-LIJFZFlYv88aetunBXJJf@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> Ahem. That is what a natural key is for :)

Well, they have their own drawbacks that don't make them particularly
appealing to use with lookup tables to ape enums. How many lookup
tables have you seen in the wild with a natural key?

People sometimes represent things like US states as enums. This is
probably a mistake, because you cannot control or predict if there'll
be a new US state, unlikely though that me be. You *can* control, for
example, what types of payment your application can deal with, and
you'll probably have to hardcode differences in dealing with each
inside your application, which makes enums a good choice. In my
earlier example, in addition to 'cash', there is a value for
payment_type of 'credit_card' . There is a separate column in the
payments table that references a credit_cards table, because credit
cards are considered transitory. A check constraint enforces that
credit_cards_id is null or not null as appropriate.

I don't like the idea of having values in a table that aren't so much
data as an integral part of your application/database. I think it's
wrong-headed. That's why I am not in favour of your enums as a lookup
table wrapper suggestion.

--
Regards,
Peter Geoghegan

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-21 14:43:45
Message-ID:	4C1F7AA1.6070000@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Another thought: could we add a column to pg_type with a flag that's
>> true if the oids are in sort order? Then the comparison routines could
>> just look that up in the type cache and if it's true (as it often will
>> be) just return the oid comparison.
>>
>
> Well, having to do a cache lookup already makes it a couple orders of
> magnitude more expensive than an OID comparison. However, it's hard to
> say how much that matters in terms of total application performance.
> We really could do with a bit of performance testing here ...
>
>
>

I have done some. The performance hit is fairly horrible. Adding cache
lookups for the enum rows to the comarison routines made a REINDEX on a
1m row table where the index is on an enum column (the enum has 500
randomly ordered labels) jump from around 10s to around 70s. I think
that probably rules out doing anything like this for the existing enum
types. I think the most we can reasonably do there is to allow adding a
label to the end of the enum list. I'm fairly resistant to doing
something which will have a major performance impact, as I know there
are users who are relying on enums for performce reasons. I'm also
fairly resistant to doing things which will require table rewriting.

So the question then is: do we want to allow lots of flexibility for
positioning new labels with significant degradation in comparison
performace for a new enum variant, or have a new variant with some
restrictions which probably won't impact most users but would have
equivalent performance to the current enum family, or do nothing?

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-21 15:01:05
Message-ID:	12725.1277132465@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Tom Lane wrote:
>> Well, having to do a cache lookup already makes it a couple orders of
>> magnitude more expensive than an OID comparison. However, it's hard to
>> say how much that matters in terms of total application performance.
>> We really could do with a bit of performance testing here ...

> I have done some. The performance hit is fairly horrible. Adding cache
> lookups for the enum rows to the comarison routines made a REINDEX on a
> 1m row table where the index is on an enum column (the enum has 500
> randomly ordered labels) jump from around 10s to around 70s.

Hmmm... that's bad, but I bet it's still less than the cost of comparing
NUMERICs. Also, did you make any attempt to avoid repetitive cache
lookups by storing a pointer in fn_extra (cf array comparisons)?

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-21 16:28:18
Message-ID:	4C1F9322.3010600@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
>> Adding cache
>> lookups for the enum rows to the comarison routines made a REINDEX on a
>> 1m row table where the index is on an enum column (the enum has 500
>> randomly ordered labels) jump from around 10s to around 70s.
>>
>
> Hmmm... that's bad, but I bet it's still less than the cost of comparing
> NUMERICs. Also, did you make any attempt to avoid repetitive cache
> lookups by storing a pointer in fn_extra (cf array comparisons)?
>
>
>

No. Will work on that. Thanks.

cheers

andrew

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-22 02:38:49
Message-ID:	4C202239.5070901@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Tom Lane wrote:
>>
>>> Well, having to do a cache lookup already makes it a couple orders of
>>> magnitude more expensive than an OID comparison. However, it's hard to
>>> say how much that matters in terms of total application performance.
>>> We really could do with a bit of performance testing here ...
>>>
>
>
>> I have done some. The performance hit is fairly horrible. Adding cache
>> lookups for the enum rows to the comarison routines made a REINDEX on a
>> 1m row table where the index is on an enum column (the enum has 500
>> regards, tom lane
>>
>> randomly ordered labels) jump from around 10s to around 70s.
>>
>
> Hmmm... that's bad, but I bet it's still less than the cost of comparing
> NUMERICs. Also, did you make any attempt to avoid repetitive cache
> lookups by storing a pointer in fn_extra (cf array comparisons)?
>
>
>

OK, making a bit of progress. Attached is a sort of proof of concept
patch that does that. It stores a bsearchable list of {enum, sort_order}
pairs in fn_extra, along with a flag that indicates if the oids are in
fact ordered. This flag, which would be maintained in and populated from
pg_type, would allow avoidance of any significant performance penalty in
such cases by relying on straight Oid comparison. We'd probably need to
keep a count of labels in pg_type too so we could size the cache
appropriately. This approach just about buys the best of both worlds.
The execution time for the test mentioned above is down from around 70s
to around 20s. I think for a worst case that's not too bad, especially
when it is completely avoided unless we have perturbed the sort order.

If anyone wants to play along, my test set is available at
<http://developer.postgresql.org/~adunstan/enumtest.dmp> It's about 8.5Mb.

cheers

andrew

Attachment	Content-Type	Size
venum-poc.patch	text/x-patch	5.4 KB

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-23 21:57:36
Message-ID:	201006232157.o5NLva517972@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > Robert Haas wrote:
> >> I like the idea of being able to modify enums on the fly, but I'm
> >> skeptical of an implementation that won't always work. Maybe it's
> >> still better than what we have now, but it seems grotty.
>
> > I'd be perfectly happy to hear a reasonable alternative.
>
> Insert a sort order column into pg_enum, and rearrange the values in
> that whenever the user wants to add a new value in a particular place.
> You give up cheap comparisons in exchange for flexibility. I think lots
> of people would accept that tradeoff, especially if they could make it
> per-datatype.
>
> One point here is that you'd have to restrict the rearrangements so that
> the relative sort order of existing values never changes, else you break
> (for example) indexes on columns of that type.

Sorry to be commenting late, but don't most people want to add to the
end or beginning of the enum list, rather than in the middle, and can't
we support that already?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-23 22:01:24
Message-ID:	29149.1277330484@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Sorry to be commenting late, but don't most people want to add to the
> end or beginning of the enum list, rather than in the middle, and can't
> we support that already?

We could allow adding a value, but we couldn't guarantee where it would
appear in the type's sort ordering. Depending on the current OID
counter it would usually show up either at the end or the beginning.
I think the general feeling is that this is too implementation-dependent
to be acceptable.

regards, tom lane

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-23 22:05:36
Message-ID:	201006232205.o5NM5aJ21869@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Sorry to be commenting late, but don't most people want to add to the
> > end or beginning of the enum list, rather than in the middle, and can't
> > we support that already?
>
> We could allow adding a value, but we couldn't guarantee where it would
> appear in the type's sort ordering. Depending on the current OID
> counter it would usually show up either at the end or the beginning.
> I think the general feeling is that this is too implementation-dependent
> to be acceptable.

Well, we don't need the enum value to map into the entire oid range.
Can't we just add one to the top-most value and see if there is a
conflict?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Bruce Momjian <bruce(at)momjian(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-23 22:08:55
Message-ID:	29292.1277330935@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Well, we don't need the enum value to map into the entire oid range.
> Can't we just add one to the top-most value and see if there is a
> conflict?

If you don't use the OID counter to generate the new value, you're going
to have problems with race conditions. There's also that small chance
that there is no free value before 2^32.

The bottom line here is not wanting a feature that "usually" works but
fails once in awhile on the basis of conditions the user can't control.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: extensible enum types
Date:	2010-06-23 23:30:29
Message-ID:	4C229915.3080803@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>
>> Well, we don't need the enum value to map into the entire oid range.
>> Can't we just add one to the top-most value and see if there is a
>> conflict?
>>
>
> If you don't use the OID counter to generate the new value, you're going
> to have problems with race conditions. There's also that small chance
> that there is no free value before 2^32.
>
> The bottom line here is not wanting a feature that "usually" works but
> fails once in awhile on the basis of conditions the user can't control.
>
>

Yeah, what I'm now hoping to be able to do, following good suggestions
from Tom, is to provide a way to get virtually no degradation in bulk
comparison performance in the common case where any additions have been
made at the end of the list with no oid wraparound, and acceptable
performance otherwise, but provide for an ability to add new values at
arbitrary places in the ordering, with no limit.

If we can do that why would we want less?

cheers

andrew