Quick Links

Re: Custom Data Type Question

Lists:	pgsql-hackers

From:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Custom Data Type Question
Date:	2006-11-15 21:18:37
Message-ID:	455B842D.6000007@atdesk.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I'm trying to create a custom data type similar to an enumeration type.
However, I'd like the mapping of the int<->string to be dynamic instead
of hard coded. I'd like to have a table that contains this mapping that
can be appended to. Creating this type is not very difficult. However,
for performance reasons, I'd like to cache the mapping so that the table
is only queried once every connection unless it changes. I'm thinking a
combination of a flag that can be triggered on insert and a transaction
id could be used to decide if the table needs to be reloaded.
Unfortunately, I'm not exactly sure how to get started on this, any ideas?

Thanks,
Greg

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-15 21:38:59
Message-ID:	455B88F3.2020204@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Mitchell wrote:
> I'm trying to create a custom data type similar to an enumeration
> type. However, I'd like the mapping of the int<->string to be dynamic
> instead of hard coded. I'd like to have a table that contains this
> mapping that can be appended to. Creating this type is not very
> difficult. However, for performance reasons, I'd like to cache the
> mapping so that the table is only queried once every connection unless
> it changes. I'm thinking a combination of a flag that can be triggered
> on insert and a transaction id could be used to decide if the table
> needs to be reloaded. Unfortunately, I'm not exactly sure how to get
> started on this, any ideas?
>
>

Are you aware that there is a patch for first class enumeration types
waiting to be reviewed for 8.3? The mapping is kept entirely internal,
and you should never see what it is kept as underneath. It does not
provide for dynamically extending the enumeration set, for various
reasons, but there is an easy workaround, namely to create a new type
with the extra member(s) and then do:

alter table foo alter column bar type newtype using bar::newtype;

My little enumkit tool allows you to create enumerations today very
easily, but its values are completely hardcoded. However, the above
trick still works. The downside is that each enumeration type requires a
tiny bit of compilation.

cheers

andrew

From:	Tom Dunstan <pgsql(at)tomd(dot)cc>
To:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-15 22:15:29
Message-ID:	455B9181.3090205@tomd.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Greg

Greg Mitchell wrote:
> I'm trying to create a custom data type similar to an enumeration type.
> However, I'd like the mapping of the int<->string to be dynamic instead
> of hard coded. I'd like to have a table that contains this mapping that
> can be appended to. Creating this type is not very difficult. However,
> for performance reasons, I'd like to cache the mapping so that the table
> is only queried once every connection unless it changes.

A simpler way to do this might be to only cache the list per query
context. In your IO functions, you could whack a pointer to your cache
onto fcinfo->flinfo->fn_extra, and the same flinfo gets passed in for
e.g. all output function calls for that column for that query, IIRC.
This was what I had in mind originally when I did the enum patch, but I
ended up just using syscaches, which I think would be unavailable to you
writing a UDT.

The upside of the above is that for a given query, the contents of your
table shouldn't change, so there's no mucking about with trying to
keep things in other backends up to date. The downside is that you have
to do the lookup per query, but if you're dealing with lots of data then
it'll get dwarfed by the actual query, and if not, who cares?

The other question that leaps to mind is whether you want to have more
than one of these types. If you do, you may have to have multiple
versions of the IO functions, otherwise e.g. your output function might
be passed the value 0, but was that the 0 representing the 'red' string
from the rgb enum, or the 'northern' string from the hemisphere enum?
You don't know, and postgresql won't tell you directly.

There are a few ways around this. In your case, it might be ok to
compile different versions of the IO functions for each enum which point
to different tables, or the same table with a discriminator. Or you
could see the various different proposals when my patch was first
discussed. See the thread starting at
http://archives.postgresql.org/pgsql-hackers/2006-08/msg00979.php or if
you want a peek at the patch, see
http://archives.postgresql.org/pgsql-patches/2006-09/msg00000.php. A
rather simpler starting point might be Andrew's enumkit
http://www.oreillynet.com/pub/a/databases/2006/01/06/enumerated-fields-in-postgresql.html?page=last&x-showcontent=text,
or possibly Martijn's tagged types at
http://svana.org/kleptog/pgsql/taggedtypes.html.

Cheers

Tom

From:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
To:	Tom Dunstan <pgsql(at)tomd(dot)cc>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-16 13:15:16
Message-ID:	455C6464.7040202@atdesk.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> A simpler way to do this might be to only cache the list per query
> context. In your IO functions, you could whack a pointer to your cache
> onto fcinfo->flinfo->fn_extra, and the same flinfo gets passed in for
> e.g. all output function calls for that column for that query, IIRC.
> This was what I had in mind originally when I did the enum patch, but I
> ended up just using syscaches, which I think would be unavailable to you
> writing a UDT.

If my understanding is correct, if fn_extra is null, I would palloc() my
data cache and store the pointer in fn_extra? What about freeing this
pointer? Or is cleanup automatic?

Also, are there any ADTs like a hash-map or tree-map in the server
libraries (my background is C++ and am use to having std::map<>) or do I
need to role my own?

I am using enumkit for some true enums I have in the DB and like it very
much. Though I tend to customize the C-code to optimize it for my use.

Thanks,
Greg

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
Cc:	Tom Dunstan <pgsql(at)tomd(dot)cc>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-16 14:00:48
Message-ID:	455C6F10.700@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Mitchell wrote:
>
>
>
> Also, are there any ADTs like a hash-map or tree-map in the server
> libraries (my background is C++ and am use to having std::map<>) or do I
> need to role my own?
>
>

Look at the dynahash code. I just used it for the first time in a plperl
patch, and it's reasonably straightforward.

cheers

andrew

From:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-16 18:03:31
Message-ID:	455CA7F3.5080106@atdesk.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

As far as memory management goes, do I just use hash_create() and assign
that pointer to fn_extra and at the end of the query it will be freed?
Or will it not be freed until this end of the transaction? I'm really
having trouble understanding the memory management issues with Postgres.

Greg

Andrew Dunstan wrote:
> Greg Mitchell wrote:
>>
>>
>>
>> Also, are there any ADTs like a hash-map or tree-map in the server
>> libraries (my background is C++ and am use to having std::map<>) or do I
>> need to role my own?
>>
>>
>
> Look at the dynahash code. I just used it for the first time in a plperl
> patch, and it's reasonably straightforward.
>
> cheers
>
> andrew
>

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Mitchell <gmitchell(at)atdesk(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-16 18:43:26
Message-ID:	19969.1163702606@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Mitchell <gmitchell(at)atdesk(dot)com> writes:
> As far as memory management goes, do I just use hash_create() and assign
> that pointer to fn_extra and at the end of the query it will be freed?
> Or will it not be freed until this end of the transaction? I'm really
> having trouble understanding the memory management issues with Postgres.

You have to be careful that the hashtable is created in the correct
"memory context" --- in this case you want it to be in a query-lifespan
context, not the short-term (per-tuple-lifespan) context that your
function will be called in. The usual procedure for cases like this is
to use the context identified by fn_mcxt. src/backend/utils/mmgr/README
might make useful reading for you.

regards, tom lane

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Greg Mitchell" <gmitchell(at)atdesk(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Custom Data Type Question
Date:	2006-11-18 14:31:12
Message-ID:	1163860272.27956.811.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote:

> My little enumkit tool allows you to create enumerations today very
> easily, but its values are completely hardcoded. However, the above
> trick still works. The downside is that each enumeration type requires a
> tiny bit of compilation.

Andrew,

Your enum sounds good, apart from the hardcoded/compilation thing. That
is a data management nightmare AFAICS and so restricts the usefulness of
the solution.

It would be much better to read things dynamically into an array, so
using an init function in *preload_libraries would work well.

I'd also love any suggestions as to how we might be able to use a
similar local-data-cacheing mechanism to work when we specify SQL
standard FOREIGN KEYs. It would be really cool to say USING LOCAL CACHE
or some way of avoiding the overhead of all those stored after triggers
and SPI SELECT statements when we've got checks against tables with only
a few rows where the values hardly ever change. The enum concept departs
radically from the declarative Referential Integrity concepts that many
of us are already used to. I'd like to be able to speed things up
without radical re-design of the database... so a few nicely sprinked
ALTER TABLE statements would be a much better way of implementing this
IMHO.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Greg Mitchell <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-18 15:54:13
Message-ID:	455F2CA5.1000500@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Wed, 2006-11-15 at 16:38 -0500, Andrew Dunstan wrote:
>
>
>> My little enumkit tool allows you to create enumerations today very
>> easily, but its values are completely hardcoded. However, the above
>> trick still works. The downside is that each enumeration type requires a
>> tiny bit of compilation.
>>
>
> Andrew,
>
> Your enum sounds good, apart from the hardcoded/compilation thing. That
> is a data management nightmare AFAICS and so restricts the usefulness of
> the solution.
>
>
>

Simon, Tom Dunstan has submitted a patch for first class enum types that
do not have the compilation requirement - it's in the queue for 8.3. You
might want to review that.

cheers

andrew

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Greg Mitchell" <gmitchell(at)atdesk(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Custom Data Type Question
Date:	2006-11-20 20:57:54
Message-ID:	1164056275.3841.178.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, 2006-11-18 at 10:54 -0500, Andrew Dunstan wrote:

> > Your enum sounds good, apart from the hardcoded/compilation thing. That
> > is a data management nightmare AFAICS and so restricts the usefulness of
> > the solution.

> Simon, Tom Dunstan has submitted a patch for first class enum types that
> do not have the compilation requirement - it's in the queue for 8.3. You
> might want to review that.

Well, the link to previous discussion ends: How about being more
specific about what you are trying to accomplish? My thoughts:

IMHO we need to optimise FOREIGN KEY checks. One way of doing that is by
having enums that build the allowable values into the datatypes, I can
think of others. An overall evaluation of the various approaches should
be made before we settle on a specific one.

My requirements list would be to allow FOREIGN KEYS to
- be specified in SQL standard syntax
- work as fast as CHECK (col IN (1,2,3))
- use less memory and completely avoid any spill-to-disk nightmare(*)
- have the list of allowable values to be dynamically updateable,
automatically as INSERTs/DELETEs occur on the referenced table
- optimize without needing to change/reload database, just by adding
minimum number of statements (zero being the best)

(*) doesn't exist yet, thats why its a nightmare

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Greg Mitchell <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-20 21:30:30
Message-ID:	45621E76.7000506@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Sat, 2006-11-18 at 10:54 -0500, Andrew Dunstan wrote:
>
>
>>> Your enum sounds good, apart from the hardcoded/compilation thing. That
>>> is a data management nightmare AFAICS and so restricts the usefulness of
>>> the solution.
>>>
>
>
>> Simon, Tom Dunstan has submitted a patch for first class enum types that
>> do not have the compilation requirement - it's in the queue for 8.3. You
>> might want to review that.
>>
>
> Well, the link to previous discussion ends: How about being more
> specific about what you are trying to accomplish? My thoughts:
>
> IMHO we need to optimise FOREIGN KEY checks. One way of doing that is by
> having enums that build the allowable values into the datatypes, I can
> think of others. An overall evaluation of the various approaches should
> be made before we settle on a specific one.
>

Using the submitted patch, FK checks should be VERY fast - the stored
values are just oids.

> My requirements list would be to allow FOREIGN KEYS to
> - be specified in SQL standard syntax
> - work as fast as CHECK (col IN (1,2,3))
>

If I understand it, you are really not talking about doing an FK check
on an enum value, but rather using an FK check as a means of validating
an enum. That is not what we are talking about. But the validity checks
will be at least as fast as any check constraint.

> - use less memory and completely avoid any spill-to-disk nightmare(*)
>

See above.

> - have the list of allowable values to be dynamically updateable,
> automatically as INSERTs/DELETEs occur on the referenced table
>

Why? People seem so hung up on this. If you want dynamically updatable
set, then use a reference table. The whole point of this exercise was to
provide first class enum types that work *just*like*other *types*. If
you want to change a column's type, you do 'alter table foo alter column
bar type newtype'. And so you would if you need to change to a different
enum type. What if you deleted a value in the allowed set? Boom goes
data integrity.

> - optimize without needing to change/reload database, just by adding
> minimum number of statements (zero being the best)
>
> (*) doesn't exist yet, thats why its a nightmare
>

I should add that it would have been nice if this discussion had
happened back in August when the work was being done.

cheers

andrew

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc:	"Greg Mitchell" <gmitchell(at)atdesk(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Custom Data Type Question
Date:	2006-11-20 23:18:56
Message-ID:	1164064736.3841.192.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2006-11-20 at 16:30 -0500, Andrew Dunstan wrote:

> > IMHO we need to optimise FOREIGN KEY checks. One way of doing that is by
> > having enums that build the allowable values into the datatypes, I can
> > think of others. An overall evaluation of the various approaches should
> > be made before we settle on a specific one.
> >
>
> Using the submitted patch, FK checks should be VERY fast - the stored
> values are just oids.

I see that they would be.

> > My requirements list would be to allow FOREIGN KEYS to
> > - be specified in SQL standard syntax
> > - work as fast as CHECK (col IN (1,2,3))
> >
>
> If I understand it, you are really not talking about doing an FK check
> on an enum value, but rather using an FK check as a means of validating
> an enum. That is not what we are talking about. But the validity checks
> will be at least as fast as any check constraint.

Not really talking about enums at all. Just trying to understand how to
optimize the general case of Foreign Keys, which I have suddenly
realised is exactly what you are trying to achieve with enums.

> Boom goes
> data integrity.

My point exactly (assuming I didn't quote you out of context).

> I should add that it would have been nice if this discussion had
> happened back in August when the work was being done.

I'm happy to apologise. I hadn't made the connection at that point
between the role of an enum datatype and the need to optimise FKs.

Since MySQL has an enum type, it will aid portability to allow an enum
type. However, I'm interested in providing a mechanism to speed up FKs
without the *need* to adopt specific data domain specific datatypes.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
To:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc:	"Greg Mitchell" <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-21 00:16:55
Message-ID:	2491.24.211.165.134.1164068215.squirrel@www.dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Mon, 2006-11-20 at 16:30 -0500, Andrew Dunstan wrote:

>> I should add that it would have been nice if this discussion had
>> happened back in August when the work was being done.
>
> I'm happy to apologise. I hadn't made the connection at that point
> between the role of an enum datatype and the need to optimise FKs.
>
> Since MySQL has an enum type, it will aid portability to allow an enum
> type. However, I'm interested in providing a mechanism to speed up FKs
> without the *need* to adopt specific data domain specific datatypes.
>

No problem.

The original genesis of this work was an article I wrote for OReilly last
year, that looked at the taxonomy of enums. Essentially, the idea was to
get the advantages of enums done using lookup tables without the
disadvantages. Enumkit was a first step, but had the disadvantage of
needing compilationand having values hardcoded. Tom's work is essentially
to provide the same facility but with grammar and catalog support and
without requiring compilation.

Note that MySQL enum types are essentially anonymous types, and have a
number of very odd quirks. What Tom has submitted is, in my opinion, enums
done right, and orthogonally with the rest of the type system. IOW, a very
typical piece of Postgres development :-)

cheers

andrew

From:	Tom Dunstan <pgsql(at)tomd(dot)cc>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Greg Mitchell <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-21 02:11:33
Message-ID:	45626055.5020802@tomd.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan wrote:
> Simon Riggs wrote:
>> My requirements list would be to allow FOREIGN KEYS to
>> - be specified in SQL standard syntax
>> - work as fast as CHECK (col IN (1,2,3))
>>
>
> If I understand it, you are really not talking about doing an FK check
> on an enum value, but rather using an FK check as a means of validating
> an enum. That is not what we are talking about. But the validity checks
> will be at least as fast as any check constraint.

Right. Enums (as implemented) require no explicit FK check. When you try
to enter an enum, the input function does a syscache lookup using the
typoid and the string value entered and returns the enum value OID.
There's probably a point at which the syscache lookup becomes faster
than a CHECK constraint, as I assume that the CHECK will get slower
linearly as the number of allowed values increases. That number might be
quite small.

>> - have the list of allowable values to be dynamically updateable,
>> automatically as INSERTs/DELETEs occur on the referenced table
>>
>
> Why? People seem so hung up on this. If you want dynamically updatable
> set, then use a reference table. The whole point of this exercise was to
> provide first class enum types that work *just*like*other *types*. If
> you want to change a column's type, you do 'alter table foo alter column
> bar type newtype'. And so you would if you need to change to a different
> enum type. What if you deleted a value in the allowed set? Boom goes
> data integrity.

Well, if there really is demand, there are some things that we could do
here to make life a bit easier. Firstly, just adding a new value to the
end of the enum list should be trivial (just add a new row to pg_enum)
except in the case of OID wraparound. When that happens, or if they want
to add a value in the middle or start of the enum, we could possibly do
the create new type, alter referenced tables, drop old type, rename new
type automagically. Dropping a value from an enum would be a matter of
checking that no referencing tables had the to-be-dropped value stored,
and I suppose locking them while the delete from pg_enum is performed.
Maybe that would be easy, maybe hard, but these things aren't
impossible, just more work than it seemed it was worth at the time. If
other people have use cases that require changing these more than we
anticipated, however, maybe they'll leap forward with contributions. :)

What I *would* say, though, is that if people want these to be
sufficiently dynamic that they can ever foresee using code rather than a
schema script to change them, then they're using the wrong solution, and
should go back to using a table.

Cheers

Tom

From:	Tom Dunstan <pgsql(at)tomd(dot)cc>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Greg Mitchell <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-21 02:51:40
Message-ID:	456269BC.2030009@tomd.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> I'd also love any suggestions as to how we might be able to use a
> similar local-data-cacheing mechanism to work when we specify SQL
> standard FOREIGN KEYs. It would be really cool to say USING LOCAL CACHE
> or some way of avoiding the overhead of all those stored after triggers
> and SPI SELECT statements when we've got checks against tables with only
> a few rows where the values hardly ever change.

Uh, sounds like an enum is a perfect fit. :) This is certainly one of
the use-cases that I've encountered when I wished that I had had an enum
type to use.

> The enum concept departs
> radically from the declarative Referential Integrity concepts that many
> of us are already used to.

I have to challenge this. It's *just another type*. Is using a boolean
type a radical departure from RI just because you're not referencing
some external table with the definitions for true and false in it? After
all, from a functional point of view, booleans are just another
enumerated type.

A major annoyance with SQL has been that it hasn't had a good solution
for this pattern. I've seen any number of broken solutions, from lots of
little mostly-static tables littered all over your data model, to single
big code tables that every other table references, and for which you
need triggers to enforce data integrity because standard RI doesn't
work, to chars and varchars with incorrect ordering or meaningless names
or which suck storage-wise. Don't even get me started on MySQL enums.

The reason that I wanted to do the enum patch was because *all* of those
solutions suck. Requiring a table to represent a small fixed set of
allowable values that a column should take is broken. But because it's
the least ugly solution that we've had using vanilla SQL, it's what
we've used, and dare I suggest that because we've all done it for so
long, we start to think that *not* doing it that way is broken.

Enums, as implemented in the patch, are reasonably efficient, typesafe
and properly ordered. Plus they make your data model look cleaner, your
queries don't need to have lookups anymore and you use less disk space.
Oh, and they also bring you coffee and put out the trash :)

> I'd like to be able to speed things up
> without radical re-design of the database... so a few nicely sprinked
> ALTER TABLE statements would be a much better way of implementing this
> IMHO.

OK, back to what you'd like to do. :)

If your external tables are so small and static, just how long does the
FK check take? Are they really that slow?

I would have thought that it might be difficult to get rid of the FK
check altogether, but perhaps, in the context of a single query (I'm
thinking a bulk insert) you could have some sort of LRU cache. If you
want the cache to stick around, you've got to deal with what happens
when it goes out of date... notifying all the backends etc, and what
happens when one if the other backends was halfway through a
transaction. Maybe you could set this "cached mode" on, but would have
to switch it off before updating the tables in question or something
like that. I dunno. That stuff sounds hard; I found it easier to just
implement my own type ;)

Cheers

Tom

From:	"Simon Riggs" <simon(at)2ndquadrant(dot)com>
To:	"Tom Dunstan" <pgsql(at)tomd(dot)cc>
Cc:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Greg Mitchell" <gmitchell(at)atdesk(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Custom Data Type Question
Date:	2006-11-21 09:02:58
Message-ID:	1164099779.3841.216.camel@silverbirch.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2006-11-21 at 02:51 +0000, Tom Dunstan wrote:

> Requiring a table to represent a small fixed set of
> allowable values that a column should take is broken. But because
> it's
> the least ugly solution that we've had using vanilla SQL, it's what
> we've used, and dare I suggest that because we've all done it for so
> long, we start to think that *not* doing it that way is broken.

I do support your goal of higher performance.

Putting data in tables is reasonably accepted practice, round here at
least.

I see the strong need to optimise the case where people want/need to
follow the SQL standard and have defined their databases that way. There
is also the need to support DELETE RESTRICT functionality from the
referenced to the referencing table, as a protection against data
quality problems. A link between two tables is important - otherwise we
introduce another DBA task and the possibility of error that results.

If there is a body of opinion behind enums, then thats good. The MySQL
way is not something to be ignored and that is a good argument for
inclusion. I've got no problem with multiple ways of doing things.

In the long run, as currently envisaged, enums don't do all that I would
like. I see the need to performance tune Referential Integrity more
directly.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Dunstan <pgsql(at)tomd(dot)cc>, Greg Mitchell <gmitchell(at)atdesk(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Custom Data Type Question
Date:	2006-11-21 14:29:38
Message-ID:	45630D52.5000301@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs wrote:
> On Tue, 2006-11-21 at 02:51 +0000, Tom Dunstan wrote:
>
> In the long run, as currently envisaged, enums don't do all that I would
> like.

In a sense they do more than you want. They will provide the ability to
set values much faster than anything using an FK constraint, I expect,
and without having to use any explicit constraint.

> I see the need to performance tune Referential Integrity more
> directly.
>
>

Sure. Go for it. As far as enums go, the only cases I can think of where
that will have any application are:
. you don't use enums because you want strictly vanilla SQL, or
. you don't use enums because you want to be able to alter the set of
allowed values arbitrarily.

That still leaves lots of applications (e.g. those I work on in my day
job) that will benefit from enums.

cheers

andrew