Obstacles to user-defined range canonicalization functions

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Obstacles to user-defined range canonicalization functions
Date: 2011-11-24 03:33:19
Message-ID: 10967.1322105599@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I got religion this evening about the potential usefulness of
user-defined canonicalization functions --- the example that did it for
me was thinking about a range type over timestamp that quantizes
boundaries to hours, or half hours, or 15 minutes, or any scheduling
unit that is standard in a particular environment. In that sort of
situation you really want a discrete range type, which the standard
tsrange type is not. So how hard is it to build a user-defined
canonicalization function to support such an application? The logic
doesn't seem terribly difficult ... but *you have to write the darn
thing in C*. There are two reasons why:

* The underlying range_serialize function is only exposed at the C
level. If you try to write something in, say, plpgsql then you are
going to end up going through range_constructorN or range_in to produce
your result value, and those call the type's canonical function.
Infinite recursion, here we come.

* The only way to create a canonicalization function in advance of
declaring the range type is to declare it against a shell type. But the
PL languages all reject creating PL functions that take or return a
shell type. Maybe we could relax that, but it's nervous-making, and
anyway the first problem still remains.

Now you could argue that for performance reasons everybody should write
their canonicalization functions in C anyway, but I'm not sure I buy
that --- at the very least, it'd be nice to write the functions in
something higher-level while prototyping.

I have no immediate proposal for how to fix this, but I think it's
something we ought to think about.

One possibility that just came to me is to decree that every discrete
range type has to be based on an underlying continuous range type (with
all the same properties except no canonicalization function), and then
the discrete range's canonicalization function could be declared to take
and return the underlying range type instead of the discrete type
itself. Haven't worked through the details though.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Obstacles to user-defined range canonicalization functions
Date: 2011-11-24 03:49:42
Message-ID: 30EF645F-F3EB-4349-997E-F9B0E3FFC909@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov24, 2011, at 04:33 , Tom Lane wrote:
> One possibility that just came to me is to decree that every discrete
> range type has to be based on an underlying continuous range type (with
> all the same properties except no canonicalization function), and then
> the discrete range's canonicalization function could be declared to take
> and return the underlying range type instead of the discrete type
> itself. Haven't worked through the details though.

We could also make the canonicalization function receive the boundaries
and boundary types as separate arguments, and return them in the same way.

In plpgsql the signature could be

canonicalize(inout lower base_type, inout upper base_type,
inout lower_inclusive boolean, inout upper_inclusive boolean)

Not exactly pretty, but it avoids the need for a second continuous range
type...

best regards,
Florian Pflug


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Obstacles to user-defined range canonicalization functions
Date: 2011-11-24 04:07:18
Message-ID: DA9C5E17-9919-41D0-9C96-07700EBA16DE@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 23, 2011, at 10:33 PM, Tom Lane wrote:

> Now you could argue that for performance reasons everybody should write
> their canonicalization functions in C anyway, but I'm not sure I buy
> that --- at the very least, it'd be nice to write the functions in
> something higher-level while prototyping.

I would apply this argument to every single part of the system that requires code that extends the database to be written in C, including:

* I/O functions (for custom data types)
* tsearch parsers
* use of RECORD arguments

And probably many others. There are a *lot* of problems I’d love to be able to solve with prototypes written in PLs other than C, and in small databases (there are a lot of them out there), they may remain the production solutions.

So I buy the argument in the case of creating range canonicalization functions, too, of course!

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Obstacles to user-defined range canonicalization functions
Date: 2011-11-24 19:01:52
Message-ID: 1322161312.16623.11.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-11-23 at 22:33 -0500, Tom Lane wrote:
> * The underlying range_serialize function is only exposed at the C
> level. If you try to write something in, say, plpgsql then you are
> going to end up going through range_constructorN or range_in to produce
> your result value, and those call the type's canonical function.
> Infinite recursion, here we come.

That seems solvable, unless I'm missing something.

> * The only way to create a canonicalization function in advance of
> declaring the range type is to declare it against a shell type. But the
> PL languages all reject creating PL functions that take or return a
> shell type. Maybe we could relax that, but it's nervous-making, and
> anyway the first problem still remains.

That seems a little more challenging.

> One possibility that just came to me is to decree that every discrete
> range type has to be based on an underlying continuous range type (with
> all the same properties except no canonicalization function), and then
> the discrete range's canonicalization function could be declared to take
> and return the underlying range type instead of the discrete type
> itself. Haven't worked through the details though.

An interesting approach. I wonder if there would be a reason to tie such
types together for a reason other than just the canonical function?
Would you have to define everything in terms of the continuous range, or
could it be a constraint hierarchy; e.g. a step size 100 is based on a
step size of 10 which is based on numeric?

Regards,
Jeff Davis