Range Types, constructors, and the type system

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Range Types, constructors, and the type system
Date: 2011-06-25 22:29:33
Message-ID: 1309040973.2443.123.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
[a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
(anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

by suggesting that functions of the form:
(anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
int4range(int4, int4)
int8range(int8, int8)
...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
Jeff Davis


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-26 07:57:55
Message-ID: 4E06E683.2070009@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

To eludicate my earlier comments on this subject ...

I believe that the best general solution here is for every ordered base type to
just have a single total order, which is always used with that type in any
generic order-sensitive operation, including any ranges defined over it, or any
ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
system-defined collation that can't be changed, and it would be something simple
and generic, such as simply sorting on the codepoint as integers.

When we want to have some other "native" ordering for an existing type, such as
when we want to use a different text collation, we do this by creating a *new
base type*, using CREATE TYPE or some shorthand thereof, and this new type
defines its own ordering, such as that a particular text collation is used.

For example:

CREATE TYPE text__en_US AS (v text);

CREATE TYPE text__C AS (v text);

These will not compare equal to each other or to text, and that is good, because
having a different text collation implies that we consider 'foo'::text__en_US
and 'foo'::text__C to be different values.

I believe that any other approach is worse, and in particular I believe that
creating DOMAIN over text is worse, because DOMAIN are supposed to be subtypes
of some other type, whose set of member values is a subset of the other type's
values, and that have the same ordering. Multiple CREATE type over the same
base type don't interfere with each other like multiple DOMAIN could.

Assuming that what CREATE TYPE produces is actually a base type, I believe there
is no better solution using the facilities that SQL provides.

If there is concern about performance related to CREATE TYPE being a composite
type, I'm sure it is possible to engineer an optimization for when the type has
just 1 attribute so that performance isn't an issue. The main point I'm trying
to raise here is about semantics and good type systems.

Likewise, don't let concern about syntax for using values of such composite
types. Once again, there can be shorthands if necessary.

In fact, Postgres could provide a general shorthand for creating a composite
type of 1 attribute whose purpose is to make one type that is like but unequal
to another, and using this shorthand could also cause the composite type to
overload/polymorph all the operators of it's attribute type, so that the syntax
to define one is very short.

For example:

CREATE TYPE text__en_US WRAPS text COLLATE en_US;

... and I assume the name of that attribute would just be system-defined.

Note that the above is specific to wrapping text, and the COLLATE is just
shorthand for defining an ordering function for text__en_US. A more general
form could be:

CREATE TYPE bar WRAPS foo ORDER USING FUNCTION baz (lhs foo, rhs foo) ...;

And then we can say:

RANGE OF text__en_US

RANGE OF text

... similarly to how we declare array types with ARRAY.

One can also just define range values as they do array values, such as like this:

range('foo','bar') # default collation

range('foo'::text__en_US, 'bar'::text__en_US) # en_us collation

If that seems verbose, I have a few words for you:

1. Users should in practice name their wrapper types over their intended
meaning, not their mechanics, such as like this (not using text for variety),
and that may be more terse:

CREATE TYPE acct_num WRAPS integer; # inherits integer ordering by default

2. If the wrapper types overload the base operators, either automatically or
selectively (does it make sense to multiply an acct_num?), one doesn't have to
keep unpacking and packing them to use them in most cases. For example, I'd
expect many text wrappers to polymorph catenation or substring etc, so no extra
syntax.

3. In practice, most literal values come from applications and are given to SQL
code either as function parameters or bind parameter arguments. While lots of
example code may have literal values in it, I would think that most real-work
code would hardly have any, and hence you'd rarely see any 'foo'::text__en_US
for example. You'd more likely see the less common var::text__en_US or such.

So that's my position, CREATE TYPE on the regular types or the like is the best
solution, and anything else is an inferior solution.

Such a design is also how I do collations and ranges in my Muldis D language.

-- Darren Duncan

Jeff Davis wrote:
> Different ranges over the same subtype make sense when using different
> total orders for the subtype. This is most apparent with text collation,
> but makes sense (at least mathematically, if not practically) for any
> subtype.
>
> For instance:
> [a, Z)
> is a valid range in "en_US", but not in "C", so it makes sense to have
> multiple ranges over the same subtype with different collations.
>
> But what if you have a function (like a constructor), of the form:
> (anyelement, anyelement) -> anyrange
> ? To work with the type system, you need to be able to figure out the
> return type from the arguments; which means to support functions like
> this we need a mapping from the subtype to the range type.
> Unfortunately, that restricts us to one range type per subtype (this
> isn't a problem for ARRAYs, because there is only one useful array type
> for a given element type).
>
> This problem first came up a while ago:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php
>
> My workaround was to use domains, but that's not a very clean solution
> (you have to add a bunch of casts to make sure the right domain is
> chosen). It became entirely unworkable with collations, because people
> would be using different text collations a lot more frequently than,
> say, a different ordering for timestamptz. Tom mentioned that here:
>
> http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us
>
> I think Florian proposed the most promising line of attack here:
>
> http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org
>
> by suggesting that functions of the form:
> (anyelement, [other non-anyrange arguments]) -> anyrange
> might be expendable. After all, they are only useful for constructors as
> far as we can tell. Other range functions will have an anyrange
> parameter, and we can use the actual type of the argument to know the
> range type (as well as the subtype).
>
> Although it's very nice to be able to say:
> range(1,10)
> and get an int4range out of it, it's not the only way, and it's not
> without its problems anyway. For instance, to get an int8range you have
> to do:
> range(1::int8, 10::int8)
> or similar.
>
> So, we could just make functions like:
> int4range(int4, int4)
> int8range(int8, int8)
> ...
> when creating the range type, and it would actually be a usability
> improvement.
>
> There are at least a few constructors that would need to be made for
> each rangetype: the constructor above, the singleton constructor,
> constructors that have infinite bounds, the empty constructor, and all
> of the permutations for inclusivity/exclusivity. That adds up to quite a
> few catalog entries per range type.
>
> We could reduce some of the permutations by using extra arguments
> somehow, but that seems like it adds to the ugliness. This might also be
> a time to revisit whether there is a better way to present all of these
> constructors (rather than the _[co][co] suffixes to names, etc.).
>
> Even if we're willing to put up with a bunch of catalog entries, it will
> take a little creativity to figure out how to run the functions
> generically from a fixed set of C functions.
>
> Are there other thoughts or ideas about eliminating the need for generic
> constructors like range()?
>
> Another idea Florian suggested (in the same email) was the ability to
> declare the return type of a function, and then use the declared type to
> infer the argument types. That would be nice because you would just have
> to do:
> range(1,10)::int8range
> However, that's kind of backwards from how our type inference system
> works now, and sounds like a big change.
>
> Maybe we could consider a syntax addition for constructing range values?
> That was kicked around briefly, but perhaps we should revisit the
> possibilities there.
>
> Regards,
> Jeff Davis
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-26 15:06:47
Message-ID: 11379.1309100807@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darren Duncan <darren(at)darrenduncan(dot)net> writes:
> I believe that the best general solution here is for every ordered base type to
> just have a single total order, which is always used with that type in any
> generic order-sensitive operation, including any ranges defined over it, or any
> ORDER BY or any <,>,etc.

We've spent years and blood on making sure that Postgres could support
multiple orderings for any datatype; and there are plenty of natural
examples for the usefulness of that. So I'm not at all impressed by
any line of reasoning that starts out by baldly throwing that away.

> When we want to have some other "native" ordering for an existing type, such as
> when we want to use a different text collation, we do this by creating a *new
> base type*,

Nope. This has all sorts of problems that you're conveniently ignoring,
beginning with the need to duplicate all of the infrastructure for the
type (such as non-ordering-related operators), and then moving into
difficulties arising from added ambiguity as to which operator is meant.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-26 17:06:36
Message-ID: 1309107996.2443.186.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:
> I believe that the best general solution here is for every ordered base type to
> just have a single total order, which is always used with that type in any
> generic order-sensitive operation, including any ranges defined over it, or any
> ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
> system-defined collation that can't be changed, and it would be something simple
> and generic, such as simply sorting on the codepoint as integers.

Well, we're trying to support SQL, and SQL supports collations, so I
don't think we can just ignore that.

I also agree with Tom that it's not a good idea. My reasons are:

* Practical considerations, such as having a bunch of cruft from
duplicated types all over the system. With sufficient changes to the
type system, maybe that could be overcome. Or perhaps domains could be
used to make that work for range types (sort of), but the result would
not be very consistent with the rest of the system.

* It doesn't seem to be based in any mathematical argument. A type is a
set of values, and there's no reason it can't have several total orders;
or no total order at all. So it appears to just be piggybacking on the
type system infrastructure as a place to hold the metadata for a total
order.

* Who's to say that a "compare" function is the only way to specify a
total order? There might be other interfaces that would support
something closer to a lexicographic sort. So, from a theoretical
standpoint, trying to attach a single notion of total order to a type
seems strange, because there might be multiple interfaces for specifying
even one total order.

* It would require extra explicit type annotations. If you have 12 text
types, the only way to practically use any text type is to constantly
specify which more-specific text type it actually is (probably using
the :: operator). That is not necessarily a bad choice if starting a
language from scratch and forming the syntax in a way that it's
reasonable to do. But this is SQL, and lots of type annotations are
un-SQL-like.

Regards,
Jeff Davis


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-26 22:56:39
Message-ID: 150CAF5F-E764-4539-94CF-7C6B643836F5@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun26, 2011, at 00:29 , Jeff Davis wrote:
> declare the return type of a function, and then use the declared type to
> infer the argument types. That would be nice because you would just have
> to do:
> range(1,10)::int8range
> However, that's kind of backwards from how our type inference system
> works now, and sounds like a big change.

Well, there actually *is* some precedence for that kind of top-down
(form a syntactic perspective) type inference. We *enforce* the cast
in
array[]::<arraytype>
and actually for a very similar reason - without the case, there's no
way of knowing which type of empty array was meant. I think we also
special-case
'literal'::<type>
to use the input function of type directly, instead of first creating
a text value and later casting it to <type>.

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 00:48:47
Message-ID: 1309135727.2443.206.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote:
> Well, there actually *is* some precedence for that kind of top-down
> (form a syntactic perspective) type inference. We *enforce* the cast
> in
> array[]::<arraytype>
> and actually for a very similar reason - without the case, there's no
> way of knowing which type of empty array was meant. I think we also

That's a good point.

Although, I'm not sure whether that's an argument that we can make the
type system work as-is, or if it means that we should add syntax like
ARRAY[].

> special-case
> 'literal'::<type>
> to use the input function of type directly, instead of first creating
> a text value and later casting it to <type>.

That is certainly true. Quoted strings never start out as text, they
start out as "unknown" and wait for the type inference to determine the
type. I'm not entirely sure whether a quoted string followed by a cast
is briefly unknown and then cast, or if it's directly interpreted using
the cast's type input function.

I don't know if that's a good example though because it's near the end
of the line and there's no function call in between the arguments and
the cast. It might get more complex with cases like:

range(lower(range(1,2)),upper(range(1,2)))::int8range

but maybe that can be done more easily than I think?

Regards,
Jeff Davis


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 05:29:06
Message-ID: 4E081522.3040101@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Darren Duncan <darren(at)darrenduncan(dot)net> writes:
>> I believe that the best general solution here is for every ordered base type to
>> just have a single total order, which is always used with that type in any
>> generic order-sensitive operation, including any ranges defined over it, or any
>> ORDER BY or any <,>,etc.
>
> We've spent years and blood on making sure that Postgres could support
> multiple orderings for any datatype; and there are plenty of natural
> examples for the usefulness of that. So I'm not at all impressed by
> any line of reasoning that starts out by baldly throwing that away.

I'm not saying that you can't use multiple orderings with a data type. I'm just
saying that the type only has *at most* one (possibly none) *native* ordering,
which is what is used when you do something ordered-sensitive with the type,
such as have a range.

To be specific, if the type system supports a concept like Perl 6 roles (or
other languages have similar concepts) where types can declare themselves
members of a union type such as "Ordered", then types of that union would have
the native ordering and other types wouldn't and then generic range operators
could be declared over ANYORDERED or such.

>> When we want to have some other "native" ordering for an existing type, such as
>> when we want to use a different text collation, we do this by creating a *new
>> base type*,
>
> Nope. This has all sorts of problems that you're conveniently ignoring,
> beginning with the need to duplicate all of the infrastructure for the
> type (such as non-ordering-related operators), and then moving into
> difficulties arising from added ambiguity as to which operator is meant.

Well a related solution is to have exactly 1 text wrapper type which has 2
attributes, one being the text value and the other being the collation name.
Then you just have 1 type that does the job instead a separate one per
collation. But to keep the semantics, the collation name is part of the
identity of the type. For example:

CREATE TYPE collated_text AS (t text, c collation);

The key point I'm trying to support is that collation issues are firmly attached
to the text type, not the range type.

Anyway, if a better solution can be arrived at for the problem at hand, then
good for the team; meanwhile, what I've proposed is the best one I can think of.

-- Darren Duncan


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 06:03:06
Message-ID: 4E081D1A.8060305@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Sun, 2011-06-26 at 00:57 -0700, Darren Duncan wrote:
>> I believe that the best general solution here is for every ordered base type to
>> just have a single total order, which is always used with that type in any
>> generic order-sensitive operation, including any ranges defined over it, or any
>> ORDER BY or any <,>,etc. The built-in generic text type would have exactly 1
>> system-defined collation that can't be changed, and it would be something simple
>> and generic, such as simply sorting on the codepoint as integers.
>
> Well, we're trying to support SQL, and SQL supports collations, so I
> don't think we can just ignore that.

I'm not saying you can't support collations. See also my reply to Tom.

> I also agree with Tom that it's not a good idea. My reasons are:
>
> * Practical considerations, such as having a bunch of cruft from
> duplicated types all over the system. With sufficient changes to the
> type system, maybe that could be overcome. Or perhaps domains could be
> used to make that work for range types (sort of), but the result would
> not be very consistent with the rest of the system.

Yes, duplication can be avoided.

> * It doesn't seem to be based in any mathematical argument. A type is a
> set of values, and there's no reason it can't have several total orders;
> or no total order at all. So it appears to just be piggybacking on the
> type system infrastructure as a place to hold the metadata for a total
> order.

Yes, I agree that a type is a set of values, and a type can have 0..N total
orders. My proposal is just that, for those types that have at least 1 total
order, exactly 1 of those is defined to be used implicitly in contexts where a
total order is desired and no explicit collation is given, such as in ranges.

> * Who's to say that a "compare" function is the only way to specify a
> total order? There might be other interfaces that would support
> something closer to a lexicographic sort. So, from a theoretical
> standpoint, trying to attach a single notion of total order to a type
> seems strange, because there might be multiple interfaces for specifying
> even one total order.

Thank you for bringing this up, the notion of multiple interfaces for specifying
even one total order. My example of a compare function was just an example, and
it is valuable to consider that this may not be the only way to do it.

> * It would require extra explicit type annotations. If you have 12 text
> types, the only way to practically use any text type is to constantly
> specify which more-specific text type it actually is (probably using
> the :: operator). That is not necessarily a bad choice if starting a
> language from scratch and forming the syntax in a way that it's
> reasonable to do. But this is SQL, and lots of type annotations are
> un-SQL-like.

Well sometimes it doesn't hurt to suggest solutions from the point of view that
one can start the language from scratch, because that provides a clean way to
conceptualize and explain a feature. And then people can find some middle
ground that adapts benefits from that idea for the feature without changing SQL
more than needed. Witness the various improvements to Perl 5 that were first
expressed in terms of Perl 6.

-- Darren Duncan


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 10:16:01
Message-ID: 57903AC0-61EB-485E-8720-A784FD1BAA46@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun27, 2011, at 02:48 , Jeff Davis wrote:
> On Mon, 2011-06-27 at 00:56 +0200, Florian Pflug wrote:
>> Well, there actually *is* some precedence for that kind of top-down
>> (form a syntactic perspective) type inference. We *enforce* the cast
>> in
>> array[]::<arraytype>
>> and actually for a very similar reason - without the case, there's no
>> way of knowing which type of empty array was meant. I think we also
>
> That's a good point.
>
> Although, I'm not sure whether that's an argument that we can make the
> type system work as-is, or if it means that we should add syntax like
> ARRAY[].

It was meant as an argument for the former, i.e. for extending the type
system (or rather the function call syntax, as I argue below).

>> special-case
>> 'literal'::<type>
>> to use the input function of type directly, instead of first creating
>> a text value and later casting it to <type>.
>
> That is certainly true. Quoted strings never start out as text, they
> start out as "unknown" and wait for the type inference to determine the
> type. I'm not entirely sure whether a quoted string followed by a cast
> is briefly unknown and then cast, or if it's directly interpreted using
> the cast's type input function.

It's at least labelled with type "unknown" for a while AFAIK.

> I don't know if that's a good example though because it's near the end
> of the line and there's no function call in between the arguments and
> the cast. It might get more complex with cases like:
>
> range(lower(range(1,2)),upper(range(1,2)))::int8range
>
> but maybe that can be done more easily than I think?

I wouldn't take it that far. What I had in mind was to *only* support
the case where the cast directly follows the function call, i.e. the case
f(...)::type

I view this more as an extension of the function call syntax than of
type inference. In other languages with polymorphism, there usually is
an explicit syntactic construct for specifying the type arguments to
a polymorphic function. For example, C++ you'd write
make_range<int>(3,4)
to call the polymorphic function make_range() with it's (first)
type argument set to "int". I think of
f(...)::type
as essentially the same thing, but re-using already existing syntax
instead of inventing new one.

I just checked - we currently special case "array[]::type" in transformExpr()
by detecting the case of an array expression being the immediate child
of a cast expression. I suggest we do the same for "f(...)::type", i.e.
also special case a function call being the immediate child of a cast
expression and pass down the forced result type to the function call node.

Function call nodes would then usually ignore that passed-down result type,
except in the case of a polymorphic functions whose argument types don't
uniquely define its result type.

But I haven't tried doing that, so there might be stumbling block down
that road that I missed...

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 16:45:52
Message-ID: 1309193152.2443.233.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-27 at 12:16 +0200, Florian Pflug wrote:
> I wouldn't take it that far. What I had in mind was to *only* support
> the case where the cast directly follows the function call, i.e. the case
> f(...)::type

OK, so instead of writing:
range(lower(range(1,2)),upper(range(1,2)))::int8range

users would write:
range(lower(range(1,2)::int8range),upper(range(1,2)::int8range))::int8range

A little more verbose, but it seems like it wouldn't be a practical
problem in very many cases. Multiple levels of constructors seem like
they'd be fairly uncommon, and probably a case where a function should
be written anyway.

OK, I'll have to think about this a little more, but it seems like a
reasonable approach.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 17:00:09
Message-ID: 1309194009.2443.238.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:
> Tom Lane wrote:
> > Darren Duncan <darren(at)darrenduncan(dot)net> writes:
> >> I believe that the best general solution here is for every ordered base type to
> >> just have a single total order, which is always used with that type in any
> >> generic order-sensitive operation, including any ranges defined over it, or any
> >> ORDER BY or any <,>,etc.
> >
> > We've spent years and blood on making sure that Postgres could support
> > multiple orderings for any datatype; and there are plenty of natural
> > examples for the usefulness of that. So I'm not at all impressed by
> > any line of reasoning that starts out by baldly throwing that away.
>
> I'm not saying that you can't use multiple orderings with a data type. I'm just
> saying that the type only has *at most* one (possibly none) *native* ordering,
> which is what is used when you do something ordered-sensitive with the type,
> such as have a range.

So, are you saying that it would be impossible to have a range that uses
a different ordering? What about ORDER BY? What about BTrees?

And if those things can use different orders for the same type, then
what is the difference between what you are suggesting and a default
ordering for the type (which we already support)?

I suppose it's hard to tell what you mean by "native".

Regards,
Jeff Davis


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 18:15:13
Message-ID: 4E08C8B1.3050705@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Sun, 2011-06-26 at 22:29 -0700, Darren Duncan wrote:
>> Tom Lane wrote:
>>> Darren Duncan <darren(at)darrenduncan(dot)net> writes:
>>>> I believe that the best general solution here is for every ordered base type to
>>>> just have a single total order, which is always used with that type in any
>>>> generic order-sensitive operation, including any ranges defined over it, or any
>>>> ORDER BY or any <,>,etc.
>>> We've spent years and blood on making sure that Postgres could support
>>> multiple orderings for any datatype; and there are plenty of natural
>>> examples for the usefulness of that. So I'm not at all impressed by
>>> any line of reasoning that starts out by baldly throwing that away.
>> I'm not saying that you can't use multiple orderings with a data type. I'm just
>> saying that the type only has *at most* one (possibly none) *native* ordering,
>> which is what is used when you do something ordered-sensitive with the type,
>> such as have a range.
>
> So, are you saying that it would be impossible to have a range that uses
> a different ordering? What about ORDER BY? What about BTrees?
>
> And if those things can use different orders for the same type, then
> what is the difference between what you are suggesting and a default
> ordering for the type (which we already support)?
>
> I suppose it's hard to tell what you mean by "native".
>
> Regards,
> Jeff Davis

Maybe I'm just talking about "default ordering" then. -- Darren Duncan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-27 18:50:45
Message-ID: BANLkTin=K8F9pytSvx1Fr-7Nk5LYtBXe2Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 25, 2011 at 6:29 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Different ranges over the same subtype make sense when using different
> total orders for the subtype. This is most apparent with text collation,
> but makes sense (at least mathematically, if not practically) for any
> subtype.
>
> For instance:
>  [a, Z)
> is a valid range in "en_US", but not in "C", so it makes sense to have
> multiple ranges over the same subtype with different collations.
>
> But what if you have a function (like a constructor), of the form:
>  (anyelement, anyelement) -> anyrange
> ? To work with the type system, you need to be able to figure out the
> return type from the arguments; which means to support functions like
> this we need a mapping from the subtype to the range type.
> Unfortunately, that restricts us to one range type per subtype (this
> isn't a problem for ARRAYs, because there is only one useful array type
> for a given element type).
>
> This problem first came up a while ago:
> http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php
>
> My workaround was to use domains, but that's not a very clean solution
> (you have to add a bunch of casts to make sure the right domain is
> chosen). It became entirely unworkable with collations, because people
> would be using different text collations a lot more frequently than,
> say, a different ordering for timestamptz. Tom mentioned that here:
>
> http://archives.postgresql.org/message-id/24831.1308579443@sss.pgh.pa.us
>
> I think Florian proposed the most promising line of attack here:
>
> http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org
>
> by suggesting that functions of the form:
>  (anyelement, [other non-anyrange arguments]) -> anyrange
> might be expendable. After all, they are only useful for constructors as
> far as we can tell. Other range functions will have an anyrange
> parameter, and we can use the actual type of the argument to know the
> range type (as well as the subtype).
>
> Although it's very nice to be able to say:
>  range(1,10)
> and get an int4range out of it, it's not the only way, and it's not
> without its problems anyway. For instance, to get an int8range you have
> to do:
>  range(1::int8, 10::int8)
> or similar.
>
> So, we could just make functions like:
>  int4range(int4, int4)
>  int8range(int8, int8)
>  ...
> when creating the range type, and it would actually be a usability
> improvement.

Couldn't we also do neither of these things? I mean, presumably
'[1,10]'::int8range had better work.

I'm not saying that's ideal from a usability perspective but I fear
this patch is going to be unmanageably large, and separating out the
things that you need for it to work at all from the things that you
need in order for it to be convenient might have some merit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 03:42:48
Message-ID: 1309232568.10707.31.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote:
> Couldn't we also do neither of these things? I mean, presumably
> '[1,10]'::int8range had better work.

I think that if we combine this idea with Florian's "PAIR" suggestion
here:
http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org

then I think we have a solution.

If we add a type RANGEINPUT that is not a pseudotype, we can use that as
an intermediate type that is returned by range constructors. Then, we
add casts from RANGEINPUT to each range type. That would allow
range(1,2)::int8range
to work without changing the type system around, because range() would
have the signature:
range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
and then the cast would change it into an int8range. But we only need
the one cast per range type, and we can also support all of the other
kinds of constructors like:
range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
range_linf_c(ANYELEMENT) -> RANGEINPUT
without additional hassle.

The RANGEINPUT type itself would hold similar information to actual
range types: the subtype OID (instead of the range type, because it's
not a range yet), optionally the two bounds (depending on the flags),
and the flags byte. The cast to a real range type would read the
subtype, and try to coerce the bounds to the subtype of the range you're
casting to, set the range type oid, leave the flags byte the same, and
it's done.

So, in effect, RANGEINPUT is a special type used only for range
constructors. If someone tried to output it, it would throw an
exception, and we'd even have enough information at that point to print
a nice error message with a hint.

Actually, this is pretty much exactly Florian's idea (thanks again,
Florian), but at the time I didn't like it because "pair" didn't capture
everything that I wanted to capture, like infinite bounds, etc. But
there's no reason that it can't, and your point made me realize that --
you are effectively just using TEXT as the intermediate type (which
works, but has some undesirable characteristics).

Do we think that this is a good way forward? The only thing I can think
of that's undesirable is that it's not normal to be required to cast the
result of a function, and might be slightly difficult to explain in the
documentation in a straightforward way.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 14:58:14
Message-ID: BANLkTimW6cpAi7BNPGgqSn3Z9je7io6GfQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> So, in effect, RANGEINPUT is a special type used only for range
> constructors. If someone tried to output it, it would throw an
> exception, and we'd even have enough information at that point to print
> a nice error message with a hint.

I don't think I like the idea of throwing an error when you try to
output it, but the rest seems reasonably sensible.

> Actually, this is pretty much exactly Florian's idea (thanks again,
> Florian), but at the time I didn't like it because "pair" didn't capture
> everything that I wanted to capture, like infinite bounds, etc. But
> there's no reason that it can't, and your point made me realize that --
> you are effectively just using TEXT as the intermediate type (which
> works, but has some undesirable characteristics).

What undesirable characteristics?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 16:30:22
Message-ID: C1FC1438-68E3-4F0D-95A8-6EC8E6D4055A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote:

> Do we think that this is a good way forward? The only thing I can think
> of that's undesirable is that it's not normal to be required to cast the
> result of a function, and might be slightly difficult to explain in the
> documentation in a straightforward way

That's the part that bothers me. I think that if it's not cast it should somehow be useful. Maybe default to a text range or something?

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 16:58:11
Message-ID: 1309280291.10707.52.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote:
> On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > So, in effect, RANGEINPUT is a special type used only for range
> > constructors. If someone tried to output it, it would throw an
> > exception, and we'd even have enough information at that point to print
> > a nice error message with a hint.
>
> I don't think I like the idea of throwing an error when you try to
> output it, but the rest seems reasonably sensible.

I thought it might add a little confusion if people thought they had a
range type but really had RANGEINPUT. For instance, if you do a "create
table as select range(1,2)" then the result might be slightly
unexpected.

But it's probably no more unexpected than "create table as select
'foo'". So, I suppose there's not much reason to throw an error. We can
just output it in the same format as a range type.

It's also much easier to explain something in the documentation that has
an output format, because at least it's tangible.

> > Actually, this is pretty much exactly Florian's idea (thanks again,
> > Florian), but at the time I didn't like it because "pair" didn't capture
> > everything that I wanted to capture, like infinite bounds, etc. But
> > there's no reason that it can't, and your point made me realize that --
> > you are effectively just using TEXT as the intermediate type (which
> > works, but has some undesirable characteristics).
>
> What undesirable characteristics?

Well, for one, outputting something as text and then reading it back in
does not always produce the same value. For instance, for float, it only
does that if you have extra_float_digits set to some high-enough value.
I suppose I could save the GUC, set it, and set it back; but that seems
like unnecessary ugliness.

There's also the deparsing/reparsing cycle. That might not really matter
for performance, but it seems unnecessary.

And there's always the fallback that "we have types for a reason".
Wouldn't it be odd if you wrote a query like:
select range(1,2) || 'foo'
and it succeeded? I'm sure that kind of thing can lead to some dangerous
situations.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 17:18:41
Message-ID: 1309281521.10707.68.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-06-28 at 09:30 -0700, David E. Wheeler wrote:
> On Jun 27, 2011, at 8:42 PM, Jeff Davis wrote:
>
> > Do we think that this is a good way forward? The only thing I can think
> > of that's undesirable is that it's not normal to be required to cast the
> > result of a function, and might be slightly difficult to explain in the
> > documentation in a straightforward way
>
> That's the part that bothers me.

Yeah, that bothered me, too.

> I think that if it's not cast it should somehow be useful.

Let's see, what can one do with a range that has no ordering yet? ;)

Robert suggested that we don't need to throw an error, and I think I
agree. Just having a working output function solves most of the
documentation problem, because it makes it less abstract.

The only operators that we could really support are accessors, which
seems somewhat reasonable. However, I'd have some concerns even about
that, because if you do range(10,1), then what's the upper bound?

> Maybe default to a text range or something?

That sounds a little dangerous:
select range('1','09')
would fail before it could be cast to int4range.

We could invent an UNKNOWNRANGE type or something. But I don't
particularly like that; it would start out working nicely when people
only had one textrange type, and then their old queries would start
failing when they added another range type based on text.

I think it's fine if the RANGEINPUT type isn't too useful by itself.
It's already a common requirement to cast unknown literals, and this
isn't too much different. It's only for constructors, so it still fits
pretty closely with that idea.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 17:24:32
Message-ID: BANLkTi=wB2ADMSof-L_Wr3m+GMf2u_DHxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 28, 2011 at 12:58 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2011-06-28 at 10:58 -0400, Robert Haas wrote:
>> On Mon, Jun 27, 2011 at 11:42 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> > So, in effect, RANGEINPUT is a special type used only for range
>> > constructors. If someone tried to output it, it would throw an
>> > exception, and we'd even have enough information at that point to print
>> > a nice error message with a hint.
>>
>> I don't think I like the idea of throwing an error when you try to
>> output it, but the rest seems reasonably sensible.
>
> I thought it might add a little confusion if people thought they had a
> range type but really had RANGEINPUT. For instance, if you do a "create
> table as select range(1,2)" then the result might be slightly
> unexpected.

True...

> But it's probably no more unexpected than "create table as select
> 'foo'". So, I suppose there's not much reason to throw an error. We can
> just output it in the same format as a range type.

+1.

> It's also much easier to explain something in the documentation that has
> an output format, because at least it's tangible.

+1.

>> > Actually, this is pretty much exactly Florian's idea (thanks again,
>> > Florian), but at the time I didn't like it because "pair" didn't capture
>> > everything that I wanted to capture, like infinite bounds, etc. But
>> > there's no reason that it can't, and your point made me realize that --
>> > you are effectively just using TEXT as the intermediate type (which
>> > works, but has some undesirable characteristics).
>>
>> What undesirable characteristics?
>
> Well, for one, outputting something as text and then reading it back in
> does not always produce the same value. For instance, for float, it only
> does that if you have extra_float_digits set to some high-enough value.
> I suppose I could save the GUC, set it, and set it back; but that seems
> like unnecessary ugliness.

Yeah, I don't think we want to go there.

> There's also the deparsing/reparsing cycle. That might not really matter
> for performance, but it seems unnecessary.
>
> And there's always the fallback that "we have types for a reason".
> Wouldn't it be odd if you wrote a query like:
>  select range(1,2) || 'foo'
> and it succeeded? I'm sure that kind of thing can lead to some dangerous
> situations.

That's pretty much what we tried to get rid of with the 8.3 casting
changes, so agreed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-28 20:20:50
Message-ID: 610E50C6-1793-44C5-8EAB-54DE3F5D98D6@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun28, 2011, at 05:42 , Jeff Davis wrote:
> On Mon, 2011-06-27 at 14:50 -0400, Robert Haas wrote:
>> Couldn't we also do neither of these things? I mean, presumably
>> '[1,10]'::int8range had better work.
>
> I think that if we combine this idea with Florian's "PAIR" suggestion
> here:
> http://archives.postgresql.org/message-id/AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org
>
> then I think we have a solution.
>
> If we add a type RANGEINPUT that is not a pseudotype, we can use that as
> an intermediate type that is returned by range constructors. Then, we
> add casts from RANGEINPUT to each range type. That would allow
> range(1,2)::int8range
> to work without changing the type system around, because range() would
> have the signature:
> range(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
> and then the cast would change it into an int8range. But we only need
> the one cast per range type, and we can also support all of the other
> kinds of constructors like:
> range_cc(ANYELEMENT, ANYELEMENT) -> RANGEINPUT
> range_linf_c(ANYELEMENT) -> RANGEINPUT
> without additional hassle.

Hm, so RANGEINPUT would actually be what was previously discussed as
the "range as a pair of bounds" definition, as opposed to the
"range as a set of values" definition. So essentially we'd add a
second concept of what a "range" is to work around the range input
troubles.

I believe if we go that route we should make RANGEINPUT a full-blown
type, having "pair of bound" semantics. Adding a lobotomized version
just for the sake of range input feels a bit like a kludge to me.

Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
and add functions ANYRANGE->ANYRANGE which allow specifying the
bound operator (<, <= respectively >,>=) after construction.

So you'd write (using the functions-as-fields syntax I believe
we support)
(ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
and
ARRAY[NULL,2]::int8range for '[-inf,2]'

All assuming that modifying the type system to support polymorphic
type resolution based on the return type is out of the question... ;-)

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 03:02:54
Message-ID: 1309316574.10707.87.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote:
> Hm, so RANGEINPUT would actually be what was previously discussed as
> the "range as a pair of bounds" definition, as opposed to the
> "range as a set of values" definition. So essentially we'd add a
> second concept of what a "range" is to work around the range input
> troubles.
>
> I believe if we go that route we should make RANGEINPUT a full-blown
> type, having "pair of bound" semantics. Adding a lobotomized version
> just for the sake of range input feels a bit like a kludge to me.

I think David Wheeler was trying to make a similar point, but I'm still
not convinced.

It's not a pair, because it can be made up of 0, 1, or 2 scalar values
(unless you count infinity as one of those values, in which case 0 or
2). And without ordering, it's not clear that those values are really
"bounds".

The type needs to:
* represent two values, either of which might be a special infinite
value
* represent the value "empty"
* represent inclusivity/exclusivity of both values

and those things seem fairly specific to ranges, so I don't really see
what other use we'd have for such a type. But I'm open to suggestion.

I don't think that having an extra type around is so bad. It solves a
lot of problems, and doesn't seem like it would get in the way. And it's
only for the construction of ranges out of scalars, which seems like the
most natural place where a cast might be required (similar to casting an
unknown literal, which is fairly common).

> Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
> and add functions ANYRANGE->ANYRANGE which allow specifying the
> bound operator (<, <= respectively >,>=) after construction.
>
> So you'd write (using the functions-as-fields syntax I believe
> we support)
> (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
> and
> ARRAY[NULL,2]::int8range for '[-inf,2]'

I think we can rule this one out:
* The functions-as-fields syntax is all but deprecated (or should be)
* That's hardly a readability improvement
* It still suffers similar problems as casting back and forth to text:
ANYARRAY is too general, doesn't really take advantage of the type
system, and not a great fit anyway.

> All assuming that modifying the type system to support polymorphic
> type resolution based on the return type is out of the question... ;-)

It's still not out of the question, but I thought that the intermediate
type would be a less-intrusive alternative (and Robert seemed concerned
about how intrusive it was).

There also might be a little more effort educating users if we selected
the function based on the return type, because they might think that
casting the inputs explicitly would be enough to get it to pick the
right function. If it were a new syntax like RANGE[]::int8range, then I
think it would be easier to understand.

Regards,
Jeff Davis


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 11:35:48
Message-ID: 613D83B8-CB7F-450C-A97B-BA6E3DB50374@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 05:02 , Jeff Davis wrote:
> On Tue, 2011-06-28 at 22:20 +0200, Florian Pflug wrote:
>> I believe if we go that route we should make RANGEINPUT a full-blown
>> type, having "pair of bound" semantics. Adding a lobotomized version
>> just for the sake of range input feels a bit like a kludge to me.
>
> It's not a pair, because it can be made up of 0, 1, or 2 scalar values
> (unless you count infinity as one of those values, in which case 0 or
> 2). And without ordering, it's not clear that those values are really
> "bounds".

Hm, yeah, the lack of an ordering operator is trouble. There also seem
to be more problems with that idea, see below for that. So scratch
the idea of turning RANGEINPUT into a full-blown type...

> I don't think that having an extra type around is so bad. It solves a
> lot of problems, and doesn't seem like it would get in the way. And it's
> only for the construction of ranges out of scalars, which seems like the
> most natural place where a cast might be required (similar to casting an
> unknown literal, which is fairly common).

What I'm concerned about is how elegantly we'd be able to tie up all
the loose ends. What'd be the result of
select range(1,2)
for example? Or
create table (r rangeinput)
for that matter.

I think we'd want to forbid both of these, and more or less every other
use except
range(1,2)::<some range type>
but that seems to require special-casing RANGEINPUT in a lot of places.

If we don't restrict RANGEINPUT that way, I think we ought to provide
at least a basic set of operators and functions for it - e.g.
input, output, lower(), upper(), ...

*Pondering this*

But we can't do that easily, since RANGEINPUT would actually be a kind of
VARIANT type (i.e. can hold values of arbitrary types). That's something
that our type system doesn't really support. We do have RECORD, which is
similar in a way, but its implementation is about as intrusive as it
gets...

>> Alternatively, we could replace RANGEINPUT simply with ANYARRAY,
>> and add functions ANYRANGE->ANYRANGE which allow specifying the
>> bound operator (<, <= respectively >,>=) after construction.
>>
>> So you'd write (using the functions-as-fields syntax I believe
>> we support)
>> (ARRAY[1,2]::int8range).left_open.right_closed for '(1,2]'
>> and
>> ARRAY[NULL,2]::int8range for '[-inf,2]'
>
> I think we can rule this one out:
> * The functions-as-fields syntax is all but deprecated (or should be)

Is it? That's actually too bad, since I kinda like it. But anyway,
if that's a concern it could also be
range_bounds(ARRAY[1,2]::int8range, '(]')

> * That's hardly a readability improvement

Granted, it won't win any beauty contest, but

> * It still suffers similar problems as casting back and forth to text:
> ANYARRAY is too general, doesn't really take advantage of the type
> system, and not a great fit anyway.

I believe it alleviates the gravest problems of casting back and forth
to text. It doesn't have quoting issues and it doesn't potentially lose
information.

In any case, I wouldn't expect this to *stay* the only way to construct
a range forever. But I does have it's virtues for a first incarnation of
range type, I believe, mostly because it's completely unintrusive and
won't cause any backwards-compatbility headaches in the future

>> All assuming that modifying the type system to support polymorphic
>> type resolution based on the return type is out of the question... ;-)
>
> It's still not out of the question, but I thought that the intermediate
> type would be a less-intrusive alternative (and Robert seemed concerned
> about how intrusive it was).

I fear that the intermediate type will turn out to be quite intrusive,
at least if we try to handle all the corner cases and loose ends. And if
we don't, I'm concerned that we're painting ourselves into a corner here...

> There also might be a little more effort educating users if we selected
> the function based on the return type, because they might think that
> casting the inputs explicitly would be enough to get it to pick the
> right function. If it were a new syntax like RANGE[]::int8range, then I
> think it would be easier to understand.

There's certainly a risk of confusion here, simply because the relationship
between ANYRANGE and ANYLEMENT will be quite different than that of
ANYARRAY and ANYLEMENT. All we can do is state this very clearly in the
docs I think, and explain that it must be that way to support multiple
range types over the same base type.

best regards,
Florian Pflug


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 12:52:27
Message-ID: BANLkTikBNXXLQv281wVHoUExbYoMo8AxDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> It's still not out of the question, but I thought that the intermediate
> type would be a less-intrusive alternative (and Robert seemed concerned
> about how intrusive it was).

I'm no great fan of our existing type system, and I'm not opposed to
trying to improve it. However, I'm a bit wary of the theory that we
can just tweak X, Y, or Z and then everything will go more smoothly
for range types. I fear that there will be knock-on consequences that
we'll spend a lot of time either (a) arguing about or (b) fixing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 15:41:49
Message-ID: 1309362109.10707.129.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-06-29 at 13:35 +0200, Florian Pflug wrote:
> What I'm concerned about is how elegantly we'd be able to tie up all
> the loose ends. What'd be the result of
> select range(1,2)
> for example? Or
> create table (r rangeinput)
> for that matter.
>
> I think we'd want to forbid both of these, and more or less every other
> use except
> range(1,2)::<some range type>
> but that seems to require special-casing RANGEINPUT in a lot of places.

We could make it a pseudo-type and make the IO functions generate
exceptions. That should prevent most mistakes and effectively hide it
from the user (sure, they could probably use it somewhere if they really
want to, but I wouldn't be worried about breaking backwards
compatibility with undocumented usage like that). There are plenty of
types that are hidden from users in one way or another -- trigger, void,
internal, fdw_handler, etc., so I don't see this as special-casing at
all.

That might make it slightly harder to document, but I think it can be
done. All we have to do is document the range constructors saying "you
must cast the result to a valid range type; trying to use the result of
these functions directly raises an exception". In fact, I think I'll
take back the "hard to document" claim from before: it will be pretty
easy to document, and if someone gets it wrong, we can throw a helpful
error and hint.

Robert didn't really seem to like the idea of throwing an error though
-- Robert, can you expand on your reasoning here?

I tend to lean toward throwing an error as well, but I don't really have
much of an opinion.

> If we don't restrict RANGEINPUT that way, I think we ought to provide
> at least a basic set of operators and functions for it - e.g.
> input, output, lower(), upper(), ...
>
> *Pondering this*
>
> But we can't do that easily, since RANGEINPUT would actually be a kind of
> VARIANT type (i.e. can hold values of arbitrary types). That's something
> that our type system doesn't really support. We do have RECORD, which is
> similar in a way, but its implementation is about as intrusive as it
> gets...

I don't want to go down the road of making this a fully supported type.
I don't see any use case for it at all, and I think it's a bad idea to
design something with no idea how people might want to use it.

> Is it? That's actually too bad, since I kinda like it. But anyway,
> if that's a concern it could also be
> range_bounds(ARRAY[1,2]::int8range, '(]')

What type would the result of that be? What value?

> > * It still suffers similar problems as casting back and forth to text:
> > ANYARRAY is too general, doesn't really take advantage of the type
> > system, and not a great fit anyway.
>
> I believe it alleviates the gravest problems of casting back and forth
> to text. It doesn't have quoting issues and it doesn't potentially lose
> information.

I think it still circumvents the type system to a degree. We're just
putting stuff in an array with no intention of really using it that way.

> In any case, I wouldn't expect this to *stay* the only way to construct
> a range forever. But I does have it's virtues for a first incarnation of
> range type, I believe, mostly because it's completely unintrusive and
> won't cause any backwards-compatbility headaches in the future

I'm not sure that your overloading of arrays is completely immune from
backwards-compatibility problems, should we decide to change it later.

But regardless, we have quite a lot of time to make a decision before
9.2 is released; so let's do it once and do it right.

> I fear that the intermediate type will turn out to be quite intrusive,
> at least if we try to handle all the corner cases and loose ends. And if
> we don't, I'm concerned that we're painting ourselves into a corner here...

Can you expand on some of the corner-cases and loose ends you're
concerned about? Does marking it as a pseudotype and making the IO
functions throw exceptions handle them?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 15:42:48
Message-ID: 1309362168.10707.130.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-06-29 at 08:52 -0400, Robert Haas wrote:
> On Tue, Jun 28, 2011 at 11:02 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > It's still not out of the question, but I thought that the intermediate
> > type would be a less-intrusive alternative (and Robert seemed concerned
> > about how intrusive it was).
>
> I'm no great fan of our existing type system, and I'm not opposed to
> trying to improve it. However, I'm a bit wary of the theory that we
> can just tweak X, Y, or Z and then everything will go more smoothly
> for range types. I fear that there will be knock-on consequences that
> we'll spend a lot of time either (a) arguing about or (b) fixing.

Agreed.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 16:34:06
Message-ID: BANLkTi=cGF2s0Y=UCQtLDityppVO2jdJ4w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 29, 2011 at 11:41 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> Robert didn't really seem to like the idea of throwing an error though
> -- Robert, can you expand on your reasoning here?

I guess I don't have any terribly well-thought out reasoning - maybe
it's fine. It just seems strange to have a type that you can't
display.

But now that I'm thinking about this a little more, I'm worried about this case:

CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype);
DROP TYPE funkytype;

It seems to me that the first statement had better fail, or else the
second one is going to create a hopeless mess (imagine that a new type
comes along and gets the OID of funkytype).

It also seems a bit strange to me that we're contemplating a system
where users are always going to have to cast the return type.
Generally, casts are annoying and we want to minimize the need for
them. I'm not sure what the alternative is, though, unless we create
separate constructor functions for each type: int8range_cc(1, 2).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 17:01:55
Message-ID: E93EA7F3-9995-43B3-89C1-3808A073C095@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 28, 2011, at 8:02 PM, Jeff Davis wrote:

> I think David Wheeler was trying to make a similar point, but I'm still
> not convinced.
>
> It's not a pair, because it can be made up of 0, 1, or 2 scalar values
> (unless you count infinity as one of those values, in which case 0 or
> 2). And without ordering, it's not clear that those values are really
> "bounds".
>
> The type needs to:
> * represent two values, either of which might be a special infinite
> value
> * represent the value "empty"
> * represent inclusivity/exclusivity of both values
>
> and those things seem fairly specific to ranges, so I don't really see
> what other use we'd have for such a type. But I'm open to suggestion.
>
> I don't think that having an extra type around is so bad. It solves a
> lot of problems, and doesn't seem like it would get in the way. And it's
> only for the construction of ranges out of scalars, which seems like the
> most natural place where a cast might be required (similar to casting an
> unknown literal, which is fairly common).

I'm fine with that, but my point is that if it's going to be exposed to users somehow, it needs to be useful on its own, without casting. Because some wit will make a column of this type. If it's not somehow useful on its own, then it should be an implementation detail or internal that I never see in SQL. IMHO.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 17:05:31
Message-ID: B5DE5864-6B43-4CAE-9AEE-9F5A259ED65D@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 29, 2011, at 8:41 AM, Jeff Davis wrote:

> We could make it a pseudo-type and make the IO functions generate
> exceptions. That should prevent most mistakes and effectively hide it
> from the user (sure, they could probably use it somewhere if they really
> want to, but I wouldn't be worried about breaking backwards
> compatibility with undocumented usage like that). There are plenty of
> types that are hidden from users in one way or another -- trigger, void,
> internal, fdw_handler, etc., so I don't see this as special-casing at
> all.

That could work.

> I don't want to go down the road of making this a fully supported type.
> I don't see any use case for it at all, and I think it's a bad idea to
> design something with no idea how people might want to use it.

+1

I'm still not clear, though, on why the return type of range() should not be related to the types of its arguments. So

range(1, 5)

Should return intrange, and

range(1::int8, 5::int8)

Should return int8range, and

range('foo', 'foooo')

Should return textrange.

Best,

David


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 17:11:52
Message-ID: 55DA0626-7369-491A-AAD9-0CC5903E4680@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 18:34 , Robert Haas wrote:
> It also seems a bit strange to me that we're contemplating a system
> where users are always going to have to cast the return type.
> Generally, casts are annoying and we want to minimize the need for
> them. I'm not sure what the alternative is, though, unless we create
> separate constructor functions for each type: int8range_cc(1, 2).

Well, if we want multiple range types per base type (which we do), then
the user needs to specify which one to use somehow. A cast seems the most
natural way to do that to me - after all, casting is *the* way to coerce
value to a certain type.

best regards,
Florian Pflug


From: Florian Pflug <fgp(at)phlo(dot)org>
To: David E(dot) Wheeler <david(at)kineticode(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 17:13:55
Message-ID: 6A92A4E6-9176-4713-8121-EA3342927656@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 19:05 , David E. Wheeler wrote:
> I'm still not clear, though, on why the return type of range()
> should not be related to the types of its arguments. So
>
> range(1, 5)
>
> Should return intrange, and
>
> range(1::int8, 5::int8)
>
> Should return int8range, and
>
> range('foo', 'foooo')
>
> Should return textrange.

Because there might be more than one range type for a
base type. Say there are two range types over text, one
with collation 'de_DE' and one with collation 'en_US'.
What would the type of
range('foo', 'foooo')
be?

best regards,
Florian Pflug


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 17:15:05
Message-ID: 0D42378D-0DB3-4882-84B0-8F10798CB134@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:

> Because there might be more than one range type for a
> base type. Say there are two range types over text, one
> with collation 'de_DE' and one with collation 'en_US'.
> What would the type of
> range('foo', 'foooo')
> be?

The one that corresponds to the current LC_COLLATE setting.

Best,

David


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-29 21:44:53
Message-ID: 1309383893.4921.2.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
> On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
>
> > Because there might be more than one range type for a
> > base type. Say there are two range types over text, one
> > with collation 'de_DE' and one with collation 'en_US'.
> > What would the type of
> > range('foo', 'foooo')
> > be?
>
> The one that corresponds to the current LC_COLLATE setting.

Yes, or more generally, we have logic that determines, for example, what
collation to use for

'foo' < 'foooo'

The same logic can be used to determine what collation to use for

range('foo', 'foooo')

(In fact, if you implement range() as a user-space function, that will
happen automatically.)


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 06:45:10
Message-ID: CFF6B5E9-66D7-4E78-B580-8E98E5F632D7@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 23:44 , Peter Eisentraut wrote:
> On ons, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
>> On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
>>> Because there might be more than one range type for a
>>> base type. Say there are two range types over text, one
>>> with collation 'de_DE' and one with collation 'en_US'.
>>> What would the type of
>>> range('foo', 'foooo')
>>> be?
>>
>> The one that corresponds to the current LC_COLLATE setting.
>
> Yes, or more generally, we have logic that determines, for example, what
> collation to use for
>
> 'foo' < 'foooo'
>
> The same logic can be used to determine what collation to use for
>
> range('foo', 'foooo')
>
> (In fact, if you implement range() as a user-space function, that will
> happen automatically.)

I don't think it will - as it stands, there isn't a single collatable
type RANGE but instead one *distinct* type per combination of base type,
btree opclass and collation. The reasons for that were discussed at length -
the basic argument for doing it that way was to make a range represent
a fixed set of values.

There's also no guarantee that a range type with collation LC_COLLATE
even exists.

best regards,
Florian Pflug


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 07:05:29
Message-ID: 1309417529.26660.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote:
> I don't think it will - as it stands, there isn't a single collatable
> type RANGE but instead one *distinct* type per combination of base
> type, btree opclass and collation. The reasons for that were discussed
> at length - the basic argument for doing it that way was to make a
> range represent a fixed set of values.

How would the system catalogs be initialized under that theory: surely
you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
opclasses) range types in initdb?


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 07:11:01
Message-ID: 3E66E0E9-0D2C-45B6-825D-C003423550DD@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun30, 2011, at 09:05 , Peter Eisentraut wrote:
> On tor, 2011-06-30 at 08:45 +0200, Florian Pflug wrote:
>> I don't think it will - as it stands, there isn't a single collatable
>> type RANGE but instead one *distinct* type per combination of base
>> type, btree opclass and collation. The reasons for that were discussed
>> at length - the basic argument for doing it that way was to make a
>> range represent a fixed set of values.
>
> How would the system catalogs be initialized under that theory: surely
> you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
> opclasses) range types in initdb?

There's CREATE RANGE. By default, no range types would exists I believe.

best regards,
Florian Pflug


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 10:28:49
Message-ID: 9CB77E02-614B-43A7-B2DF-68E7CFEA7A29@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 17:41 , Jeff Davis wrote:
>> Is it? That's actually too bad, since I kinda like it. But anyway,
>> if that's a concern it could also be
>> range_bounds(ARRAY[1,2]::int8range, '(]')
>
> What type would the result of that be? What value?

ARRAY[1,2]::int8range would return an int8range instance representing
the range [1,2] (i.e. the set of values {1,2}). range_bounds would then
modify the left bound to be exclusive, and thus return the int8range
(1,2] (i.e. the set of values {2}). range_bounds would have the signature
range_bounds(anyrange) returns anyrange.

I do think we'll probably want to have functions to modify the boundary
type (open or closed) anyway, so it wouldn't be that huge of a deal if
the range constructor didn't let you specify them.

Empty ranges would be constructed by
ARRAY[]::int8range
(Hm, ok, now I'm cheating... Currently you'd need to write
ARRAY[]::int8[]::int8range, but fixing that only needs a few lines
in the transformExpression* function that makes ARRAY[]::int8[] work).

>>> * It still suffers similar problems as casting back and forth to text:
>>> ANYARRAY is too general, doesn't really take advantage of the type
>>> system, and not a great fit anyway.
>>
>> I believe it alleviates the gravest problems of casting back and forth
>> to text. It doesn't have quoting issues and it doesn't potentially lose
>> information.
>
> I think it still circumvents the type system to a degree. We're just
> putting stuff in an array with no intention of really using it that way.

Well, arrays are containers, and we need two values to construct a range,
so putting them into a container first and then creating the range from that
doesn't seem so bad to me. We do use the full set of features that arrays
provide, since we only ever expect zero, one or two entries. But I don't
think this is different from functions who only support single-dimensional
arrays - they too choose to use only a subset of the features set of arrays.

>> In any case, I wouldn't expect this to *stay* the only way to construct
>> a range forever. But I does have it's virtues for a first incarnation of
>> range type, I believe, mostly because it's completely unintrusive and
>> won't cause any backwards-compatbility headaches in the future
>
> I'm not sure that your overloading of arrays is completely immune from
> backwards-compatibility problems, should we decide to change it later.
>
> But regardless, we have quite a lot of time to make a decision before
> 9.2 is released; so let's do it once and do it right.
>
>> I fear that the intermediate type will turn out to be quite intrusive,
>> at least if we try to handle all the corner cases and loose ends. And if
>> we don't, I'm concerned that we're painting ourselves into a corner here...
>
> Can you expand on some of the corner-cases and loose ends you're
> concerned about? Does marking it as a pseudotype and making the IO
> functions throw exceptions handle them?

Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent
it from being used as a column type and from being used as an argument
type, then I guess it's workable...

Btw, what happened to the idea of making RANGE(...) a special syntactic
construct instead of a normal function call? Did we discard that for its
intrusiveness, or were there other reasons?

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 16:29:37
Message-ID: 1309451377.10707.137.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2011-06-30 at 09:11 +0200, Florian Pflug wrote:
> > How would the system catalogs be initialized under that theory: surely
> > you're not going to seed (nr. of types) * (nr. of collations) * (nr. of
> > opclasses) range types in initdb?
>
> There's CREATE RANGE.

Right. In that respect, it's more like a record type: many possible
record types exist, but you only define the ones you want.

> By default, no range types would exists I believe.

I was planning to include _some_ by default. Probably not text ranges,
but integer and timestamp[tz] ranges. If nothing else, it makes it
easier to document.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 16:34:18
Message-ID: 1309451658.10707.141.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-06-29 at 10:15 -0700, David E. Wheeler wrote:
> On Jun 29, 2011, at 10:13 AM, Florian Pflug wrote:
>
> > Because there might be more than one range type for a
> > base type. Say there are two range types over text, one
> > with collation 'de_DE' and one with collation 'en_US'.
> > What would the type of
> > range('foo', 'foooo')
> > be?
>
> The one that corresponds to the current LC_COLLATE setting.

Then how do you get a text range that doesn't correspond to the
LC_COLLATE setting? Does that mean you couldn't dump/reload from a
system with one collation and get the same values in a system with a
different collation? That would be very strange.

Or, what about other types that just happen to have multiple useful
total orders?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 16:40:53
Message-ID: 1309452053.10707.144.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-06-29 at 12:34 -0400, Robert Haas wrote:
> But now that I'm thinking about this a little more, I'm worried about this case:
>
> CREATE TABLE foo AS RANGE('something'::funkytype, 'somethingelse'::funktype);
> DROP TYPE funkytype;
>
> It seems to me that the first statement had better fail, or else the
> second one is going to create a hopeless mess (imagine that a new type
> comes along and gets the OID of funkytype).

Interesting point. I don't think it's a problem because pseudo-types
can't be used that way, so that provides us a mechanism to stop it. But
it means that we have to be a little more sure that such values can't
persist anywhere.

Regards,
Jeff Davis


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 16:58:42
Message-ID: ACDCDE86-F857-4C47-968F-BECC190E170A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote:

> Right. In that respect, it's more like a record type: many possible
> record types exist, but you only define the ones you want.

Well, okay. How is this same problem handled for RECORD types, then?

>> By default, no range types would exists I believe.
>
> I was planning to include _some_ by default. Probably not text ranges,
> but integer and timestamp[tz] ranges. If nothing else, it makes it
> easier to document.

+1

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-06-30 16:59:43
Message-ID: A9F84B6D-9EA2-4EEF-8A8E-00132CE97B5F@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote:

> Then how do you get a text range that doesn't correspond to the
> LC_COLLATE setting?

You cast it.

> Does that mean you couldn't dump/reload from a
> system with one collation and get the same values in a system with a
> different collation? That would be very strange.

No, pg_dump should always explicitly cast things. But there should be a reasonable default behavior if I'm in psql and don't cast.

> Or, what about other types that just happen to have multiple useful
> total orders?

Cast where you need it explicit, and have a reasonable default when it's not cast.

Best,

David


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-01 05:52:39
Message-ID: 1309499559.10707.146.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2011-06-30 at 09:58 -0700, David E. Wheeler wrote:
> On Jun 30, 2011, at 9:29 AM, Jeff Davis wrote:
>
> > Right. In that respect, it's more like a record type: many possible
> > record types exist, but you only define the ones you want.
>
> Well, okay. How is this same problem handled for RECORD types, then?

What problem, exactly? For a given list of subtypes, there is only one
valid record type.

Also, record is not a great example. The implementation uses at least
one pretty horrible hack.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-01 06:01:10
Message-ID: 1309500070.10707.155.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2011-06-30 at 09:59 -0700, David E. Wheeler wrote:
> On Jun 30, 2011, at 9:34 AM, Jeff Davis wrote:
>
> > Then how do you get a text range that doesn't correspond to the
> > LC_COLLATE setting?
>
> You cast it.

My original solution was something like this, except involving domains.
With a sufficient amount of casting of all arguments to anything
involving a range type, it works, but it's a little too un-SQL-like.
There was at least one fairly strong objection to my approach, but if
you have some further thoughts along that line, I'm open to suggestion.

Also, what if the LC_COLLATE is C, and you want to cast it to en_US
collation?
range('a','Z')
would be invalid in the C locale, and it would fail before you had a
chance to cast it.

> Cast where you need it explicit, and have a reasonable default when
> it's not cast.

I thought about that, too, but it's not ideal, either. That means that
something might start out as the only range type for a given subtype,
and doesn't need explicit casts. Then you define another range type over
that subtype, and all the original queries break because they are now
ambiguous.

I think the fundamental differences with range types that we're dealing
with are:
1. multiple range types might reasonbly exist for a single subtype
2. the order is a fundamental part of the type definition, not just an
extra argument useful for operations on the range type

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-01 06:09:04
Message-ID: 1309500544.10707.162.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2011-06-30 at 12:28 +0200, Florian Pflug wrote:
> Well, arrays are containers, and we need two values to construct a range,

What about empty ranges? What about infinite ranges?

It seems quite a bit more awkward to shoehorn ranges into an array than
to use a real type (even if it's intermediate and otherwise useless).

> Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent
> it from being used as a column type and from being used as an argument
> type, then I guess it's workable...
>
> Btw, what happened to the idea of making RANGE(...) a special syntactic
> construct instead of a normal function call? Did we discard that for its
> intrusiveness, or were there other reasons?

It has not been discarded; as far as I'm concerned it's still on the
table. The main advantage is that it doesn't require an intermediate
type, and that requiring a cast (or some specification of the range
type) might be a little more natural. The downside is that, well, it's
new syntax, and there's a little inertia there.

But if it's actually better, we should do it. If an intermediate type
seems to be problematic, or if people think it's strange to require
casting, then I think this is reasonable.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 14:06:24
Message-ID: CA+TgmoburoG8WdJGe5VXn-miZ-Lz1RsE6aA0rcUcWV3vj8fYNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 1, 2011 at 2:09 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Thu, 2011-06-30 at 12:28 +0200, Florian Pflug wrote:
>> Well, arrays are containers, and we need two values to construct a range,
>
> What about empty ranges? What about infinite ranges?
>
> It seems quite a bit more awkward to shoehorn ranges into an array than
> to use a real type (even if it's intermediate and otherwise useless).
>
>> Hm, I guess. I'm sill no huge fan of RANGEINPUT, but if we prevent
>> it from being used as a column type and from being used as an argument
>> type, then I guess it's workable...
>>
>> Btw, what happened to the idea of making RANGE(...) a special syntactic
>> construct instead of a normal function call? Did we discard that for its
>> intrusiveness, or were there other reasons?
>
> It has not been discarded; as far as I'm concerned it's still on the
> table. The main advantage is that it doesn't require an intermediate
> type, and that requiring a cast (or some specification of the range
> type) might be a little more natural. The downside is that, well, it's
> new syntax, and there's a little inertia there.
>
> But if it's actually better, we should do it. If an intermediate type
> seems to be problematic, or if people think it's strange to require
> casting, then I think this is reasonable.

I don't understand how the bespoke syntax avoids the need for a cast?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 15:11:02
Message-ID: 1309878662.3012.21.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote:
> > But if it's actually better, we should do it. If an intermediate type
> > seems to be problematic, or if people think it's strange to require
> > casting, then I think this is reasonable.
>
> I don't understand how the bespoke syntax avoids the need for a cast?

It doesn't, it just avoids the need for an intermediate type.

What I meant was that it might be strange to require a cast on the
result of a function call, because we don't really do that anywhere
else. Florian pointed out that it's common to require casting the
ARRAY[] constructor, so that has more of a precedent. I'm not really
sure how much that matters.

I'm OK with the intermediate type, but Florian seems skeptical of that
idea.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 15:26:12
Message-ID: CA+Tgmoa93w=4cv3tHHZexprUFQyF1xs5WZdtJz_mEACBDCbbcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote:
>> > But if it's actually better, we should do it. If an intermediate type
>> > seems to be problematic, or if people think it's strange to require
>> > casting, then I think this is reasonable.
>>
>> I don't understand how the bespoke syntax avoids the need for a cast?
>
> It doesn't, it just avoids the need for an intermediate type.
>
> What I meant was that it might be strange to require a cast on the
> result of a function call, because we don't really do that anywhere
> else. Florian pointed out that it's common to require casting the
> ARRAY[] constructor, so that has more of a precedent. I'm not really
> sure how much that matters.
>
> I'm OK with the intermediate type, but Florian seems skeptical of that
> idea.

How about the idea of creating a family of four constructor functions
for each new range type? The functions would be named after the range
type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead
of writing:

RANGE(1,8,'c','o')::int8range

...or somesuch, you could just say:

int8range_co(1,8)

...which is both more compact and less ugly, IMHO, and seems to
circumvent all the type system problems as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 15:48:58
Message-ID: 1FE22405-4DBA-408A-A785-31C405F36F54@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul5, 2011, at 17:11 , Jeff Davis wrote:
> I'm OK with the intermediate type, but Florian seems skeptical of that
> idea.

I'm starting to get used to it, though ;-) I do now believe that it can
be made safe against accidental miss-use, it seem that I was overly
anxious there.

What I still don't like about it is that it feels like a workaround for
a feature missing in our type system - the possibility of having function
with a polymorphic return type, but no polymorphic arguments. I feel
somewhat strongly about this, because it bit me when I tried to implement
record_getfield() and record_setfield() to get and set a record's field
based on it's name.

However, placing the burden of solving that onto the range type patch
doesn't seem fair.

Plus, I've realized now that a RANGEINPUT type would allow us to easily
support some things that otherwise seem hard. We could, for example,
make the cast from RANGEINPUT to the individual range types an assignment
cast (or even implicit), thereby removing the need for an explicit
cast in a lot of common cases like insert into a table with a range column.

best regards,
Florian Pflug


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 16:23:22
Message-ID: CAHyXU0zw6x2a_FqwQc80a_StwrjVTkyxbqDDhMX8MpzjcUce_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 10:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote:
>>> > But if it's actually better, we should do it. If an intermediate type
>>> > seems to be problematic, or if people think it's strange to require
>>> > casting, then I think this is reasonable.
>>>
>>> I don't understand how the bespoke syntax avoids the need for a cast?
>>
>> It doesn't, it just avoids the need for an intermediate type.
>>
>> What I meant was that it might be strange to require a cast on the
>> result of a function call, because we don't really do that anywhere
>> else. Florian pointed out that it's common to require casting the
>> ARRAY[] constructor, so that has more of a precedent. I'm not really
>> sure how much that matters.
>>
>> I'm OK with the intermediate type, but Florian seems skeptical of that
>> idea.
>
> How about the idea of creating a family of four constructor functions
> for each new range type?  The functions would be named after the range
> type, with "_cc", "_co", "_oc", and "_oo" appended.  So, then, instead
> of writing:
>
> RANGE(1,8,'c','o')::int8range
>
> ...or somesuch, you could just say:
>
> int8range_co(1,8)
>
> ...which is both more compact and less ugly, IMHO, and seems to
> circumvent all the type system problems as well.

+1 on this (so you wouldn't even then directly cast to a range?)

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 16:24:00
Message-ID: CA+TgmoY3PPCsw6T_VD0t_NgUw7Vjw4b3Mmno-Z=FFf2g9T8rJw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 12:23 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Tue, Jul 5, 2011 at 10:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Tue, Jul 5, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> On Tue, 2011-07-05 at 10:06 -0400, Robert Haas wrote:
>>>> > But if it's actually better, we should do it. If an intermediate type
>>>> > seems to be problematic, or if people think it's strange to require
>>>> > casting, then I think this is reasonable.
>>>>
>>>> I don't understand how the bespoke syntax avoids the need for a cast?
>>>
>>> It doesn't, it just avoids the need for an intermediate type.
>>>
>>> What I meant was that it might be strange to require a cast on the
>>> result of a function call, because we don't really do that anywhere
>>> else. Florian pointed out that it's common to require casting the
>>> ARRAY[] constructor, so that has more of a precedent. I'm not really
>>> sure how much that matters.
>>>
>>> I'm OK with the intermediate type, but Florian seems skeptical of that
>>> idea.
>>
>> How about the idea of creating a family of four constructor functions
>> for each new range type?  The functions would be named after the range
>> type, with "_cc", "_co", "_oc", and "_oo" appended.  So, then, instead
>> of writing:
>>
>> RANGE(1,8,'c','o')::int8range
>>
>> ...or somesuch, you could just say:
>>
>> int8range_co(1,8)
>>
>> ...which is both more compact and less ugly, IMHO, and seems to
>> circumvent all the type system problems as well.
>
> +1 on this (so you wouldn't even then directly cast to a range?)

You wouldn't need to, because these functions would be declared to
return the range type.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 16:54:34
Message-ID: 1309884874.3012.66.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-07-05 at 11:26 -0400, Robert Haas wrote:
> How about the idea of creating a family of four constructor functions
> for each new range type? The functions would be named after the range
> type, with "_cc", "_co", "_oc", and "_oo" appended. So, then, instead
> of writing:
>
> RANGE(1,8,'c','o')::int8range

It would be something like: range_co(1,8)::int8range

(just so we're comparing apples to apples)

The intermediate type proposal doesn't require that we move the "c" and
"o" into the parameter list.

> int8range_co(1,8)
>
> ...which is both more compact and less ugly, IMHO, and seems to
> circumvent all the type system problems as well.

I brought that up before:
http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php

It certainly circumvents the polymorphic type problems, but the problem
is that it adds up to quite a few permutations. Not only are there
cc/co/oc/oo, but there are also variations for infinite bounds and empty
ranges. So I think we're talking 10+ functions per range type rather
than 4.

Also, if someone has an idea for another constructor, like the one you
mention above:
range(1,8,'c','o')
then they have to create it for every range type, and they can't
anticipate new range types that someone might create. In other words,
the constructors wouldn't benefit from the polymorphism. However, if we
used an intermediate type, then they could create the above constructor
and it would work for any range type automatically.

I don't object to this idea, but we'll need to come up with a pretty
exhaustive list of possibly-useful constructors.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-05 17:06:22
Message-ID: CA+TgmoYwO5oZm62i__ChkdA0CCYiHcwm9cBDDgZ9jeQYkbuheQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2011-07-05 at 11:26 -0400, Robert Haas wrote:
>> How about the idea of creating a family of four constructor functions
>> for each new range type?  The functions would be named after the range
>> type, with "_cc", "_co", "_oc", and "_oo" appended.  So, then, instead
>> of writing:
>>
>> RANGE(1,8,'c','o')::int8range
>
> It would be something like: range_co(1,8)::int8range
>
> (just so we're comparing apples to apples)
>
> The intermediate type proposal doesn't require that we move the "c" and
> "o" into the parameter list.

Well, you have to specify the bounds somewhere...

>> int8range_co(1,8)
>>
>> ...which is both more compact and less ugly, IMHO, and seems to
>> circumvent all the type system problems as well.
>
> I brought that up before:
> http://archives.postgresql.org/pgsql-hackers/2011-06/msg02046.php
>
> It certainly circumvents the polymorphic type problems, but the problem
> is that it adds up to quite a few permutations. Not only are there
> cc/co/oc/oo, but there are also variations for infinite bounds and empty
> ranges. So I think we're talking 10+ functions per range type rather
> than 4.

OK, so let's pass the information on the bounds as a separate
argument. Like this:

int8range(1,8,'co')

Then you can instead pass 'o' for open or 'i' for infinity (passing
NULL for the corresponding argument position in that case). The third
argument can be optional and default to 'cc'.

For empty ranges I doubt we need a separate constructor function;
presumably the representation of an empty range is some fixed string
and users can just write 'empty'::int8range or similar.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 05:19:45
Message-ID: 1309929585.3012.97.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-07-05 at 13:06 -0400, Robert Haas wrote:
> On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > It would be something like: range_co(1,8)::int8range
> >
> > (just so we're comparing apples to apples)
> >
> > The intermediate type proposal doesn't require that we move the "c" and
> > "o" into the parameter list.
>
> Well, you have to specify the bounds somewhere...

That's true. In my example it's in the function name.

> OK, so let's pass the information on the bounds as a separate
> argument. Like this:
>
> int8range(1,8,'co')

That has a lot going for it, in the sense that it avoids dealing with
the type problems.

> Then you can instead pass 'o' for open or 'i' for infinity (passing
> NULL for the corresponding argument position in that case). The third
> argument can be optional and default to 'cc'.

The fact that there can be a default for the third argument makes this
quite a lot more appealing than I had originally thought (although I
think 'co' is the generally-accepted default).

There's some slight ugliness around the NULL/infinity business, but I
think that I could be convinced. I'd like to avoid confusion between
NULL and infinity if possible.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 13:10:02
Message-ID: CA+TgmoZ=fGup0MSdJmc-oNT=RUjbzuFYHV-EFGmh0jDOx7O+cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 6, 2011 at 1:19 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Tue, 2011-07-05 at 13:06 -0400, Robert Haas wrote:
>> On Tue, Jul 5, 2011 at 12:54 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> > It would be something like: range_co(1,8)::int8range
>> >
>> > (just so we're comparing apples to apples)
>> >
>> > The intermediate type proposal doesn't require that we move the "c" and
>> > "o" into the parameter list.
>>
>> Well, you have to specify the bounds somewhere...
>
> That's true. In my example it's in the function name.
>
>> OK, so let's pass the information on the bounds as a separate
>> argument.  Like this:
>>
>> int8range(1,8,'co')
>
> That has a lot going for it, in the sense that it avoids dealing with
> the type problems.
>
>> Then you can instead pass 'o' for open or 'i' for infinity (passing
>> NULL for the corresponding argument position in that case).  The third
>> argument can be optional and default to 'cc'.
>
> The fact that there can be a default for the third argument makes this
> quite a lot more appealing than I had originally thought (although I
> think 'co' is the generally-accepted default).
>
> There's some slight ugliness around the NULL/infinity business, but I
> think that I could be convinced. I'd like to avoid confusion between
> NULL and infinity if possible.

I was thinking that if you passed 'i' for one of the bounds, it would
ignore the supplied argument and substitute its special infinity
value. But you'd still need to supply some argument in that position,
which could be NULL or anything else. It doesn't really seem worth
having additional constructor functions to handle the case where one
or both arguments are infinite.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 16:22:13
Message-ID: 1309969333.3012.114.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-07-06 at 09:10 -0400, Robert Haas wrote:
> > There's some slight ugliness around the NULL/infinity business, but I
> > think that I could be convinced. I'd like to avoid confusion between
> > NULL and infinity if possible.
>
> I was thinking that if you passed 'i' for one of the bounds, it would
> ignore the supplied argument and substitute its special infinity
> value. But you'd still need to supply some argument in that position,
> which could be NULL or anything else. It doesn't really seem worth
> having additional constructor functions to handle the case where one
> or both arguments are infinite.

Right, that's what I assumed that you meant. I can't think of anything
better, either, because I like the fact that two arguments are there so
that you can visually see which sides are bounded/unbounded.

I suppose we could have constructors like:
range(text, subtype)
and
range(subtype, text)
where the text field is used to specify "infinity". But that has the
obvious problem "what if the subtype is text?". So, of course, we make a
special new pseudotype to represent infinity... ;)

But seriously, your idea is starting to look more appealing.

To get into some more details: how exactly would this constructor be
generated on the fly? Clearly we want only one underlying C function
that accepts something like:
range_internal(lower, upper, flags, Oid rangetype)
So how do we get the rangetype in there? I suppose a default 4th
argument?

That would be kind of an interesting option, but what if someone
actually specified that 4th argument? We couldn't allow that.

Also, are default arguments always applied in all the contexts where
this function might be called?

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 16:51:50
Message-ID: CA+Tgmob4AtC-W60570a3eE4it2gJhkchTShODwSe7gj=1TsYRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> To get into some more details: how exactly would this constructor be
> generated on the fly? Clearly we want only one underlying C function
> that accepts something like:
>  range_internal(lower, upper, flags, Oid rangetype)
> So how do we get the rangetype in there?

I think that the C function could call get_call_result_type() and get
the return type OID back via the second argument.

> Also, are default arguments always applied in all the contexts where
> this function might be called?

Uh, I'm not sure. But I don't see why it would need different
handling than any other function which takes default arguments. It
shouldn't be needed during bootstrapping or anything funky like that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 17:04:00
Message-ID: 1309971840.3012.124.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-07-06 at 12:51 -0400, Robert Haas wrote:
> On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> > To get into some more details: how exactly would this constructor be
> > generated on the fly? Clearly we want only one underlying C function
> > that accepts something like:
> > range_internal(lower, upper, flags, Oid rangetype)
> > So how do we get the rangetype in there?
>
> I think that the C function could call get_call_result_type() and get
> the return type OID back via the second argument.

I'm also a little unclear on the rules for when that might be set
properly or not.

I ran into problems with that before... I think with the I/O functions.
I don't think that's a problem here, but I thought I'd ask.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-06 19:14:30
Message-ID: 25069.1309979670@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2011-07-06 at 12:51 -0400, Robert Haas wrote:
>> On Wed, Jul 6, 2011 at 12:22 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> To get into some more details: how exactly would this constructor be
>>> generated on the fly? Clearly we want only one underlying C function
>>> that accepts something like:
>>> range_internal(lower, upper, flags, Oid rangetype)
>>> So how do we get the rangetype in there?

>> I think that the C function could call get_call_result_type() and get
>> the return type OID back via the second argument.

> I'm also a little unclear on the rules for when that might be set
> properly or not.

> I ran into problems with that before... I think with the I/O functions.
> I don't think that's a problem here, but I thought I'd ask.

I think it'd probably be all right to do that. The places where you
might find shortcuts being taken are where functions are called directly
by C code, such as I/O function calls --- but these constructors should
only ever get invoked from SQL queries, no?

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types, constructors, and the type system
Date: 2011-07-07 03:08:58
Message-ID: 1310008138.3012.144.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-07-06 at 15:14 -0400, Tom Lane wrote:
> > I ran into problems with that before... I think with the I/O functions.
> > I don't think that's a problem here, but I thought I'd ask.
>
> I think it'd probably be all right to do that. The places where you
> might find shortcuts being taken are where functions are called directly
> by C code, such as I/O function calls --- but these constructors should
> only ever get invoked from SQL queries, no?

Perhaps index expressions/predicates as well (which are also fine). I
was more worried about some case that I hadn't thought of.

Regards,
Jeff Davis