Re: Range Types and extensions

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Range Types and extensions
Date: 2011-06-05 18:59:08
Message-ID: 1307300348.2402.127.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'd like to take another look at Range Types and whether part of it
should be an extension. Some of these issues relate to extensions in
general, not just range types.

First of all, what are the advantages to being in core?

1. ANYRANGE + CREATE TYPE ... AS RANGE
--------------------------------------
This is the most compelling, in my opinion. People can define new range
functions and new range types independently and each one gets the
benefit of the other automatically. Without this, there will be an
explosion of functions and a bunch of inconsistencies like functions
that support most range types but not all (merely because the function
author didn't know that the type existed).

In the several talks that I've given, a common question is related to
"multiranges" (ranges with holes). These get a little complex, and I
don't have a complete answer. However, multiranges can be approximated
with ordered arrays of non-overlapping, non-adjacent ranges. If someone
wants to take it upon themselves to develop a set of operators here,
that would be great -- but without ANYRANGE the operators would be
unmanageable.

2. Documentation and Tests
--------------------------
Let's say we take a minimalist view, and only have ANYRANGE and CREATE
TYPE ... AS RANGE in core; and leave the rest as an extension.

What exactly would the documentation say? I think it would be even more
hypothetical and abstract than the documentation for Exclusion
Constraints. So, there is a certain documentation advantage to having at
least enough functionality to allow someone to try out the feature.

And the tests for such a minimalist feature would be a significant
challenge -- what do we do there? Get pg_regress to load the extension
from PGXN?

3. Quality
----------
PostgreSQL has a great reputation for quality, and for good reason. But
extensions don't follow the same quality-control standards; and even if
some do, there is no visible stamp of approval. So, to ask someone to
use an extension means that they have to evaluate the quality for
themselves, which is a pretty high barrier.

Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
many of the other issues, quality control is one of the biggest ones
remaining. I still get questions about when the temporal type will be
"in core", and I think this is why.

I don't think this is a good excuse to put it in core though. We need to
solve this problem, and the best way to start is by getting
well-reviewed, high-quality extensions out there.

4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
---------------------------------
There are a few aspects of range types that aren't in the first patch,
but are fairly obvious follow-up additions. These will require some
knowledge about ranges in the backend, like finding the "overlaps"
operator for a range. The current patch provides this knowledge by
providing a built-in overlaps operator for ANYRANGE. This would be a
non-issue if we had a good type interface system (that works on
polymorphic types) -- we could just have a built-in "range" interface,
and the range extension could add "&&" as the range interface's overlaps
operator for the type ANYRANGE.

=================================

So, where on this spectrum should range types fall? I think the most
minimalist would be to only support #1 (and the necessary type IO
functions); and leave all other functions, operators, and opclasses to
an extension. That has a lot of appeal, but I don't think we can ignore
the challenges above.

On the other hand, trying to make it a complete feature in core has
challenges as well. For instance, even with Range Types, Exclusion
Constraints aren't practical out-of-the-box unless we also have
BTree-GiST in core. So there's a snowball effect.

There might also be some middle ground, where its like the minimalist
approach, but with a few very basic constructors and accessors. That
would at least make it easier to test, but then to be actually useful
(with index support, operators, fancy functions, etc.) you'd need the
extension.

Thoughts?

Regards,
Jeff Davis


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 04:51:57
Message-ID: 4DEC5CED.6030105@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> I'd like to take another look at Range Types and whether part of it
> should be an extension. Some of these issues relate to extensions in
> general, not just range types.
>
> First of all, what are the advantages to being in core?

I believe that ranges aka intervals are widely useful generic types, next after
relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

In particular, the usefulness of ranges/intervals is often orthogonal to many
other things, and for many types including numbers, strings, temporals.

Now assuming that a range/interval value is generally defined in terms of a pair
of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
{<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
capable of distinguishing open and closed intervals.

For example, a range value can be represented by a tuple with 4 attributes,
where two of those are the endpoint values, and two of those are booleans saying
whether each of the endpoints is inside or outside the range/interval.

Also, if Postgres has some concept of type-generic special values -Inf and +Inf
(which always sort before or after any other value in the type system), those
can be used as endpoints to indicate that the interval is unbounded.

Unless you have some other syntax in mind, I suggest lifting the range literal
syntax from Perl 6, where ".." is an infix operator building a range between its
arguments, and a "^" on either side means that side is open, I think; so there
are 4 variants: {..,^..,..^,^..^}.

Now as to general usefulness of intervals ...

Any operation that wants to deal with a range somehow, such as the BETWEEN
syntax, could instead use a range/interval; for example, both of:

foo in 1..10

foo between 1 and 10

... would mean the same thing, but the 1..10 can be replaced by an arbitrary
value expression or variable reference.

Likewise with:

date in start ..^ end

date >= start and date < end

... mean the same thing.

The LIMIT clause could take a range to specify take and skip count at once.

Array slicing can be done using foo[first..last] or such.

A random number generator that takes endpoints can take a range argument.

An array or relation of these range can represent ranges with holes, and the
general results of range union operations.

-- Darren Duncan


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 04:56:47
Message-ID: BANLkTi=cuV66mWyi9EoHnmUXZRmiVRqXig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/6 Darren Duncan <darren(at)darrenduncan(dot)net>:
> Jeff Davis wrote:
>>
>> I'd like to take another look at Range Types and whether part of it
>> should be an extension. Some of these issues relate to extensions in
>> general, not just range types.
>>
>> First of all, what are the advantages to being in core?

it should be supported by FOREACH statement in PL/pgSQL

Pavel


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 05:21:20
Message-ID: 4DEC63D0.1050304@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2011/6/6 Darren Duncan <darren(at)darrenduncan(dot)net>:
>> Jeff Davis wrote:
>>> I'd like to take another look at Range Types and whether part of it
>>> should be an extension. Some of these issues relate to extensions in
>>> general, not just range types.
>>>
>>> First of all, what are the advantages to being in core?
>
> it should be supported by FOREACH statement in PL/pgSQL

Yes, absolutely. I know this feature is loved in Perl. But this usage would
only work for a more limited range of data types, namely those over which one
can build a sequence generator, such as integers, because they have a
next-value/prev-value function defined. In other words, while range types in
general would work for any ordered type, FOREACH would only work for the subset
of those that are ordinal types. -- Darren Duncan


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 12:42:51
Message-ID: 8739jnjflg.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> I'd like to take another look at Range Types and whether part of it
> should be an extension. Some of these issues relate to extensions in
> general, not just range types.

That's a good question :)

I think the way things are going to be organised now is that we will
have core-blessed extensions: don't mix the mechanism and the policy.

> non-issue if we had a good type interface system (that works on
> polymorphic types) -- we could just have a built-in "range" interface,
> and the range extension could add "&&" as the range interface's overlaps
> operator for the type ANYRANGE.

That means that this is, IMHO, the right approach. Have core support
that enables user defined RANGE types with indexing and planner support,
etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

And the useful stuff you need to have to benefit from that core support
would be an extension. It could be a core maintained extension, and it
could even get installed by default, so that all the users would need to
do is 'CREATE EXTENSION timeranges;', for example.

So, I see us getting those different kinds of extensions in the future:

a. core extensions, shipped by default
b. contribs, not shipped by default, maintained by core hackers
c. examples, included in the source code only, maintained as contribs
d. “trusted network” of extensions (pgdg, pgxn, debian, privates, etc)
e. external independent extensions, just as now

The other main difference between a core extension and a contrib will be
where it's documented. Greg Smith proposed a patch wherein he moved
some contribs to a new extension/ place, and had them auto installed.

I think the consensus is to instead add a new chapter (maybe between
current chapters 9. Functions and Operators and 10. Type Conversion) and
host “core extensions” docs there. The source code organisation is
controversial because technically not necessary. We have to keep the
work Greg did to keep those contribs shipped by default. Oh, and that
is on the 9.1 Open Items, right?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:18:14
Message-ID: 1307377094.2402.140.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
> 2011/6/6 Darren Duncan <darren(at)darrenduncan(dot)net>:
> > Jeff Davis wrote:
> >>
> >> I'd like to take another look at Range Types and whether part of it
> >> should be an extension. Some of these issues relate to extensions in
> >> general, not just range types.
> >>
> >> First of all, what are the advantages to being in core?
>
> it should be supported by FOREACH statement in PL/pgSQL

Oh, good idea. It would only work for discrete ranges though.

However, I would need to somehow reintroduce the concept of "next",
which has some hazards to it (as Tom pointed out, we don't want someone
to define the "next" for a float to be "+1.0"). I'll have to think about
this.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:26:46
Message-ID: 1307377606.2402.146.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
> Jeff Davis wrote:
> > I'd like to take another look at Range Types and whether part of it
> > should be an extension. Some of these issues relate to extensions in
> > general, not just range types.
> >
> > First of all, what are the advantages to being in core?
>
> I believe that ranges aka intervals are widely useful generic types, next after
> relations/tuples/arrays, and they *should* be supported in core, same as arrays are.

I think we all agree that ranges are important. I am not suggesting that
we sacrifice on the semantics to make it an extension; I'm just trying
to see if involving extensions for some of the approximately 5000 lines
would be a good idea.

> Now assuming that a range/interval value is generally defined in terms of a pair
> of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
> {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
> capable of distinguishing open and closed intervals.

Right, it already does that explicitly. I'd appreciate your input on
some of the previous discussion though.

> Also, if Postgres has some concept of type-generic special values -Inf and +Inf
> (which always sort before or after any other value in the type system), those
> can be used as endpoints to indicate that the interval is unbounded.

I already introduced +/- infinity to range types. They are not generic
outside of ranges, however -- therefore you can't select the upper bound
of an upper-infinite range.

> Unless you have some other syntax in mind, I suggest lifting the range literal
> syntax from Perl 6, where ".." is an infix operator building a range between its
> arguments, and a "^" on either side means that side is open, I think; so there
> are 4 variants: {..,^..,..^,^..^}.

Oh, interesting syntax. That might make a good operator version of a
constructor. Unfortunately, "." is not valid in an operator name in PG.
Maybe I can use tilde or dash?

> Any operation that wants to deal with a range somehow, such as the BETWEEN
> syntax, could instead use a range/interval; for example, both of:
>
> foo in 1..10

I don't know if it's reasonable to introduce syntax like "in" here.
Maybe we could just still use "between" and it would recognize that the
RHS is a range?

> The LIMIT clause could take a range to specify take and skip count at once.

Interesting idea.

> Array slicing can be done using foo[first..last] or such.

I like that, but we already have foo[3:7], so it might be better not to
introduce redundancy. Too bad I can't use ":" as an operator.

> A random number generator that takes endpoints can take a range argument.

Sounds useful because it would make it more explicit whether the
endpoints are possible results.

> An array or relation of these range can represent ranges with holes, and the
> general results of range union operations.

Right, that's been brought up before as well. In particular, Scott
Bailey has done some thinking/writing on this topic.

Regards,
Jeff Davis


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:28:05
Message-ID: BANLkTimKR_=yaN8vLWocoK5j4FYZenL-zQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/6 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Mon, 2011-06-06 at 06:56 +0200, Pavel Stehule wrote:
>> 2011/6/6 Darren Duncan <darren(at)darrenduncan(dot)net>:
>> > Jeff Davis wrote:
>> >>
>> >> I'd like to take another look at Range Types and whether part of it
>> >> should be an extension. Some of these issues relate to extensions in
>> >> general, not just range types.
>> >>
>> >> First of all, what are the advantages to being in core?
>>
>> it should be supported by FOREACH statement in PL/pgSQL
>
> Oh, good idea. It would only work for discrete ranges though.
>
> However, I would need to somehow reintroduce the concept of "next",
> which has some hazards to it (as Tom pointed out, we don't want someone
> to define the "next" for a float to be "+1.0"). I'll have to think about
> this.

we can define a step

FOREACH x IN RANGE ..... BY ....
LOOP
END LOOP

Regards

Pavel

>
> Regards,
>        Jeff Davis
>
>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:45:06
Message-ID: 1307378706.2402.160.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-06 at 14:42 +0200, Dimitri Fontaine wrote:
> I think the way things are going to be organised now is that we will
> have core-blessed extensions: don't mix the mechanism and the policy.

I like that idea.

> > non-issue if we had a good type interface system (that works on
> > polymorphic types) -- we could just have a built-in "range" interface,
> > and the range extension could add "&&" as the range interface's overlaps
> > operator for the type ANYRANGE.
>
> That means that this is, IMHO, the right approach. Have core support
> that enables user defined RANGE types with indexing and planner support,
> etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

If we take the minimal approach, the index support would be the first to
be moved to an extension. In order to have index support in core, we
need quite a few functions and a significant amount of code.

Minimal would be:
* CREATE TYPE ... AS RANGE
* ANYRANGE
* The IO functions
* Possibly the constructors and accessors ( range(),
range_oc(), lower(), upper(), etc.)

Regarding the type interfaces, the only thing that really worries me
there is that my future work will depend on them existing, and I haven't
really thought through the details. For instance, it just occurred to me
recently that it would need to support polymorphic types, which might be
a little bit more complicated than a simple lookup.

I suppose it's easier to put a few functions in core later if we get
stuck than to rip them out later.

> And the useful stuff you need to have to benefit from that core support
> would be an extension. It could be a core maintained extension, and it
> could even get installed by default, so that all the users would need to
> do is 'CREATE EXTENSION timeranges;', for example.

Sounds good to me. However, would the extension be available in
pg_regress? If not, I will need to include those constructors/accessors
to be able to test anything.

> I think the consensus is to instead add a new chapter (maybe between
> current chapters 9. Functions and Operators and 10. Type Conversion) and
> host “core extensions” docs there. The source code organisation is
> controversial because technically not necessary. We have to keep the
> work Greg did to keep those contribs shipped by default. Oh, and that
> is on the 9.1 Open Items, right?

OK, so there are still a few things to be decided around documentation
and tests. Both of those things can take a significant amount of time to
rework, so I think I'll leave it alone until we have more of a
consensus.

We still have time before 9.2 to break some of the code out into an
extension when we do have the doc/test issues resolved.

Regards,
Jeff Davis


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:45:38
Message-ID: BANLkTikdXYLr+_giAFXJjpsLg9sTVrpc7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 5, 2011 at 6:59 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> There might also be some middle ground, where its like the minimalist
> approach, but with a few very basic constructors and accessors. That
> would at least make it easier to test, but then to be actually useful
> (with index support, operators, fancy functions, etc.) you'd need the
> extension.
>
> Thoughts?

I can see merit to having parts of RANGE implemented in core, along
with some of the "usage parts" implemented as extensions, so that if
I'm not actually using (say) INET ranges, then the database isn't
cluttered up with all the functions and operators for INET ranges.

How to slice it apart into an appropriate admixture of core and
extensions is a good question, though it seems pretty likely that
having an extension for each data type that is to be mixed into a
range is a reasonable way to go.

I think this also can make some would-be arguments against RANGE go away...

"I hate that this RANGE extension means we have to draw 5000 lines of
code into every database, and draws in 275 operator functions"
evaporates if the "base" part is entirely smaller, and if you only
draw in all the functions and operators if you request loading of each
of the 17 extensions.

Per-type extensions offers a pretty natural partitioning of the code
for each type, which seems pretty good.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:47:38
Message-ID: 1307378858.2402.161.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
> we can define a step
>
> FOREACH x IN RANGE ..... BY ....

That wouldn't need any of the range infrastructure at all -- it would be
purely syntactic, right?

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:53:20
Message-ID: 1307379200.2402.165.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-06 at 16:45 +0000, Christopher Browne wrote:
> How to slice it apart into an appropriate admixture of core and
> extensions is a good question, though it seems pretty likely that
> having an extension for each data type that is to be mixed into a
> range is a reasonable way to go.

...

> Per-type extensions offers a pretty natural partitioning of the code
> for each type, which seems pretty good.

Ideally, most range types can be created with a simple:

CREATE TYPE foorange AS RANGE (subtype=foo);

There might be a few subtype-specific functions, like the canonical
function, but overall it should be a small amount of code per range.
However, I'd say just bundle a bunch of rangetypes together in one
extension. There's not really much cost -- if you are using one range
type, you'll use a few more.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 16:53:49
Message-ID: BANLkTinTTie+aezHve5VLKP+MqiA74iPpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 5, 2011 at 2:59 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> So, where on this spectrum should range types fall? I think the most
> minimalist would be to only support #1 (and the necessary type IO
> functions); and leave all other functions, operators, and opclasses to
> an extension. That has a lot of appeal, but I don't think we can ignore
> the challenges above.
>
> On the other hand, trying to make it a complete feature in core has
> challenges as well. For instance, even with Range Types, Exclusion
> Constraints aren't practical out-of-the-box unless we also have
> BTree-GiST in core. So there's a snowball effect.
>
> There might also be some middle ground, where its like the minimalist
> approach, but with a few very basic constructors and accessors. That
> would at least make it easier to test, but then to be actually useful
> (with index support, operators, fancy functions, etc.) you'd need the
> extension.

I don't have clear feeling on this question in general, but if we're
going to break this up into pieces, it's important that they be
logical pieces. Putting half the feature in core and half into an
extension just because we can will simplify complicate code
maintenance to no good end. The snowball effect is something to
avoid, and we need to watch out for that, but if the upshot of putting
part of it in core is that the core code can no longer be understood
or maintained because it depends heavily on a bunch of non-core code,
that's not helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 17:13:40
Message-ID: BANLkTikpyXzq4pfROgyk1WjpMBO7m4Gccw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/6 Jeff Davis <pgsql(at)j-davis(dot)com>:
> On Mon, 2011-06-06 at 18:28 +0200, Pavel Stehule wrote:
>> we can define a step
>>
>> FOREACH x IN RANGE ..... BY ....
>
> That wouldn't need any of the range infrastructure at all -- it would be
> purely syntactic, right?
>

I don't think. For lot of types the specification of a step is
necessary - for date, for float.

Pavel

> Regards,
>        Jeff Davis
>
>


From: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 18:03:06
Message-ID: 20110606180306.GC23930@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 06, 2011 at 12:53:49PM -0400, Robert Haas wrote:
>
> I don't have clear feeling on this question in general, but if we're
> going to break this up into pieces, it's important that they be
> logical pieces. Putting half the feature in core and half into an
> extension just because we can will simplify complicate code
> maintenance to no good end. The snowball effect is something to
> avoid, and we need to watch out for that, but if the upshot of putting
> part of it in core is that the core code can no longer be understood
> or maintained because it depends heavily on a bunch of non-core code,
> that's not helpful.
>
And concretely, code paths that cannot be exercised easily from
core-only code will not get regression tested, and will therefore rot.

Ross
--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 18:27:32
Message-ID: 20110606182732.GN18128@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Dimitri Fontaine (dimitri(at)2ndQuadrant(dot)fr) wrote:
> That means that this is, IMHO, the right approach. Have core support
> that enables user defined RANGE types with indexing and planner support,
> etc, like we have OPERATOR CLASS and FAMILY and all the jazz.

Yes, we do, however..

> And the useful stuff you need to have to benefit from that core support
> would be an extension. It could be a core maintained extension, and it
> could even get installed by default, so that all the users would need to
> do is 'CREATE EXTENSION timeranges;', for example.

I don't like the idea of having a capability which is not utilized
in core. We should make it so extensions can *also* have access to
define their own, but we should have the basics covered in core.

> a. core extensions, shipped by default

Having it as a core extension might work, but I'm not really 'sold' on
it.

Thanks,

Stephen


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 18:34:02
Message-ID: BANLkTimR5rcNauYzgzQiJ=jWio2CW9TkcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 5, 2011 at 1:59 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> In the several talks that I've given, a common question is related to
> "multiranges" (ranges with holes). These get a little complex, and I
> don't have a complete answer. However, multiranges can be approximated
> with ordered arrays of non-overlapping, non-adjacent ranges. If someone
> wants to take it upon themselves to develop a set of operators here,
> that would be great -- but without ANYRANGE the operators would be
> unmanageable.
>
> 2. Documentation and Tests
> --------------------------
> Let's say we take a minimalist view, and only have ANYRANGE and CREATE
> TYPE ... AS RANGE in core; and leave the rest as an extension.
>
> What exactly would the documentation say? I think it would be even more
> hypothetical and abstract than the documentation for Exclusion
> Constraints. So, there is a certain documentation advantage to having at
> least enough functionality to allow someone to try out the feature.
>
> And the tests for such a minimalist feature would be a significant
> challenge -- what do we do there? Get pg_regress to load the extension
> from PGXN?
>
>
> 3. Quality
> ----------
> PostgreSQL has a great reputation for quality, and for good reason. But
> extensions don't follow the same quality-control standards; and even if
> some do, there is no visible stamp of approval. So, to ask someone to
> use an extension means that they have to evaluate the quality for
> themselves, which is a pretty high barrier.
>
> Since PGXN (thanks David Wheeler) and EXTENSIONs (thanks Dmitri) solve
> many of the other issues, quality control is one of the biggest ones
> remaining. I still get questions about when the temporal type will be
> "in core", and I think this is why.
>
> I don't think this is a good excuse to put it in core though. We need to
> solve this problem, and the best way to start is by getting
> well-reviewed, high-quality extensions out there.
>
>
> 4. Future work -- RANGE KEY, RANGE FOREIGN KEY, RANGE MERGE JOIN, etc.
> ---------------------------------
> There are a few aspects of range types that aren't in the first patch,
> but are fairly obvious follow-up additions. These will require some
> knowledge about ranges in the backend, like finding the "overlaps"
> operator for a range. The current patch provides this knowledge by
> providing a built-in overlaps operator for ANYRANGE. This would be a
> non-issue if we had a good type interface system (that works on
> polymorphic types) -- we could just have a built-in "range" interface,
> and the range extension could add "&&" as the range interface's overlaps
> operator for the type ANYRANGE.
>
> =================================
>
> So, where on this spectrum should range types fall? I think the most
> minimalist would be to only support #1 (and the necessary type IO
> functions); and leave all other functions, operators, and opclasses to
> an extension. That has a lot of appeal, but I don't think we can ignore
> the challenges above.
>
> On the other hand, trying to make it a complete feature in core has
> challenges as well. For instance, even with Range Types, Exclusion
> Constraints aren't practical out-of-the-box unless we also have
> BTree-GiST in core. So there's a snowball effect.
>
> There might also be some middle ground, where its like the minimalist
> approach, but with a few very basic constructors and accessors. That
> would at least make it easier to test, but then to be actually useful
> (with index support, operators, fancy functions, etc.) you'd need the
> extension.
>
> Thoughts?

ISTM (I haven't followed all the lead up so apologies if this is
already covered) a range is a 3rd pseudo 'container' type (the other
two being composites and arrays). Do you see:

*) being able to make arrays of ranges/ranges of arrays?
*) range of composites?

I vote for at minimum the type itself and ANYRANGE to be in core.
>From there you could make it like arrays where the range type is
automatically generated for each POD type. I would consider that for
sure on basis of simplicity in user-land unless all the extra types
and operators are a performance hit.

A clean and highly usable implementation in the type system in the
spirit of arrays would be fantastic. I'm particularly interested in
hypothetical constructor/destructor and in/out mechanics...an 'unnest'
like function, a range(a,b,c) that does as row(a,b,c) does, etc,
especially if you can work it out so that everything is not hammered
through textual processing.

merlin


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 19:18:36
Message-ID: m2oc2aepkj.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> I don't like the idea of having a capability which is not utilized
> in core. We should make it so extensions can *also* have access to
> define their own, but we should have the basics covered in core.

Well if another part of core depends on the feature set, then of course
you don't have a choice to make it an extension any more. I think
that's where I would draw the line.

> Having it as a core extension might work, but I'm not really 'sold' on
> it.

Well, core extension means built by default, part of default regression
tests and all. The regression test simply begins with the create
extension stanza, that's about it.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 21:42:22
Message-ID: 4DED49BE.9070902@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Sun, 2011-06-05 at 21:51 -0700, Darren Duncan wrote:
>> Jeff Davis wrote:
>>> I'd like to take another look at Range Types and whether part of it
>>> should be an extension. Some of these issues relate to extensions in
>>> general, not just range types.
>>>
>>> First of all, what are the advantages to being in core?
>> I believe that ranges aka intervals are widely useful generic types, next after
>> relations/tuples/arrays, and they *should* be supported in core, same as arrays are.
>
> I think we all agree that ranges are important. I am not suggesting that
> we sacrifice on the semantics to make it an extension; I'm just trying
> to see if involving extensions for some of the approximately 5000 lines
> would be a good idea.

Generally speaking, the best way to go about this is to define the *generic*
data type in the core, and leave most operators to extensions. So, in core, we
need to have the way to select a range value over ANYTYPE either completely as a
value literal or in terms of endpoint values from arbitrary expressions or
variables, store the range value in a database, retrieve it, and access its
component attributes (endpoints, open/closed) in user-defined constraint and
operator definitions.

The fundamental value of ranges is the fact that they're a concise way to store
and express an interval over an ordered type, and to either compare such
intervals or test whether individual values or sets of values are in intervals.
And people do that a *lot* (such as with dates), so I see having this range
type, which is generic and orthogonal to other types in the same way as arrays
or tables are, in core just makes the most sense, and as previously illustrated,
ranges are useful in places one might not always think about.

Ranges are also much more flexible than BETWEEN for what it does, because AFAIK
you can't indicate open or closed with BETWEEN.

You should not need to define separate range types or operators for each ordered
type, same as you should not have to do so for arrays, or where such
functionality is defined should be similar; whatever functionality for arrays
you do or don't define in core, do corresponding things for ranges.

>> Now assuming that a range/interval value is generally defined in terms of a pair
>> of endpoints of some ordered type (that is, a type for which ORDER BY or RANK or
>> {<,>,<=,>=} etc or LIMIT makes sense), it will be essential that this value is
>> capable of distinguishing open and closed intervals.
>
> Right, it already does that explicitly. I'd appreciate your input on
> some of the previous discussion though.

On this note, here's a *big* thing that needs discussion ...

Citing this whole FOREACH talk, we need to recognize that this talk about ranges
is actually being overloaded for 2 very distinct concepts, which are probably
best dealt with separately, possibly as distinct types.

This discussion came up in the development of Perl 6 too, and that discussion is
probably worth looking into.

Ranges/intervals in the general sense can *not* be used to enumerate a list of
values in a standard type-sensical manner, such as FOREACH requires.
Ranges/intervals are about *comparison*, meaning combinations of tests of how 2
arbitrary values of an ordered type sort relative to each other, and that's it.
This usage works for integers, other numbers, strings, dates, and so on, all
in a natural manner.

Value enumeration, such as in a FOREACH, is a *separate* concept.

The comparison and enumeration tasks have distinct sets of operators and are
used in distinct contexts. Enumeration requires next/prev-value operators,
while ranges/intervals in general do not. Enumeration requires discrete types
(or the faking of such) like integers while ranges work for continuous types.

Moreover, in practice, one probably wants enumerations to be more flexible than
just monotonic increases. With enumerations you'd probably want to start go
top-down or bottom-up, you might want to increase geometrically or by some other
formula rather than incrementally.

I totally agree with sharing syntax and using ranges/intervals to define
sequence generators, but a range value should be considered immutable like a
number or string while a sequence generator may mutate.

For syntax, one could use "x..y" to define an interval while "x...y" for a
sequence generator, or that's what Perl 6 does.

See also http://perlcabal.org/syn/S03.html#Range_and_RangeIter_semantics that
talks about how Perl 6 does ranges.

>> Also, if Postgres has some concept of type-generic special values -Inf and +Inf
>> (which always sort before or after any other value in the type system), those
>> can be used as endpoints to indicate that the interval is unbounded.
>
> I already introduced +/- infinity to range types. They are not generic
> outside of ranges, however -- therefore you can't select the upper bound
> of an upper-infinite range.

Well, what you have is the least one would want.

>> Unless you have some other syntax in mind, I suggest lifting the range literal
>> syntax from Perl 6, where ".." is an infix operator building a range between its
>> arguments, and a "^" on either side means that side is open, I think; so there
>> are 4 variants: {..,^..,..^,^..^}.
>
> Oh, interesting syntax. That might make a good operator version of a
> constructor. Unfortunately, "." is not valid in an operator name in PG.
> Maybe I can use tilde or dash?

Can Pg be changed to support "." in operator names as long as they don't just
appear by themselves? What would this break to do so?

>> Any operation that wants to deal with a range somehow, such as the BETWEEN
>> syntax, could instead use a range/interval; for example, both of:
>>
>> foo in 1..10
>
> I don't know if it's reasonable to introduce syntax like "in" here.
> Maybe we could just still use "between" and it would recognize that the
> RHS is a range?

I believe it is quite reasonable to treat ranges like sets, in an abstract
sense, and so using set membership syntax like "in" is valid. Same as one
should be able to use "in" to test whether a value is in an array. I would
expect "in" to be a polymorphic infix operator same as "<" or "=" etc are,
aren't they? This shouldn't conflict with testing tuples in relations as they
are different types, same as you can use the same "<" for numbers and strings,
can't you?

We could add parenthesis if that helps:

foo in (1..10)

>> The LIMIT clause could take a range to specify take and skip count at once.
>
> Interesting idea.
>
>> Array slicing can be done using foo[first..last] or such.
>
> I like that, but we already have foo[3:7], so it might be better not to
> introduce redundancy. Too bad I can't use ":" as an operator.

On that note, some languages use ":" for defining intervals rather than "..".

Some languages also use round parenthesis or curly braces to define intervals,
but I really don't like that and we shouldn't use it.

>> A random number generator that takes endpoints can take a range argument.
>
> Sounds useful because it would make it more explicit whether the
> endpoints are possible results.

Exactly.

>> An array or relation of these range can represent ranges with holes, and the
>> general results of range union operations.
>
> Right, that's been brought up before as well. In particular, Scott
> Bailey has done some thinking/writing on this topic.

I also see these as considerably less important and useful in practice than the
continuous intervals. Facilities for discontinuous intervals could more easily
be left to extensions than those for continuous ones. I see the continuous as
more fundamental, at least in the same manner as seeing integers as more
fundamental than rationals (you can define the latter with the former), though
one could define things in the opposite manner too.

> Regards,
> Jeff Davis

-- Darren Duncan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-06 23:23:55
Message-ID: 15340.1307402635@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> I vote for at minimum the type itself and ANYRANGE to be in core.
> From there you could make it like arrays where the range type is
> automatically generated for each POD type. I would consider that for
> sure on basis of simplicity in user-land unless all the extra types
> and operators are a performance hit.

Auto-generation of range types isn't going to happen, simply because the
range type needs more information than is provided by the base type
declaration. (First, you need a btree opclass, and second, you need a
"next" function if it's a discrete type.)

By my count there are only about 20 datatypes in core for which it looks
sensible to provide a range type (ie, it's a non-deprecated,
non-composite type with a standard default btree opclass). For that
many, we might as well just build 'em in.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 14:57:19
Message-ID: BANLkTikGwZXnJ=d-WkwWFdwM00cN6QZkwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 6, 2011 at 6:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> I vote for at minimum the type itself and ANYRANGE to be in core.
>> From there you could make it like arrays where the range type is
>> automatically generated for each POD type.  I would consider that for
>> sure on basis of simplicity in user-land unless all the extra types
>> and operators are a performance hit.
>
> Auto-generation of range types isn't going to happen, simply because the
> range type needs more information than is provided by the base type
> declaration.  (First, you need a btree opclass, and second, you need a
> "next" function if it's a discrete type.)
>
> By my count there are only about 20 datatypes in core for which it looks
> sensible to provide a range type (ie, it's a non-deprecated,
> non-composite type with a standard default btree opclass).  For that
> many, we might as well just build 'em in.

right. hm -- can you have multiple range type definitions for a
particular type? I was thinking about a type reduction for casting
like we have for arrays: select '[1,3)'::int{}. but maybe that isn't
specific enough?

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 15:15:38
Message-ID: 13756.1307459738@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Mon, Jun 6, 2011 at 6:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> By my count there are only about 20 datatypes in core for which it looks
>> sensible to provide a range type (ie, it's a non-deprecated,
>> non-composite type with a standard default btree opclass). For that
>> many, we might as well just build 'em in.

> right. hm -- can you have multiple range type definitions for a
> particular type?

In principle, sure, if the type has multiple useful sort orderings.
I don't immediately see any core types for which we'd bother. (In
particular I don't see a use case for range types corresponding to
the *_pattern_ops btree opclasses, especially now that COLLATE "C"
has rendered them sorta obsolete.)

BTW, Jeff, have you worked out the implications of collations for
textual range types? I confess to not having paid much attention
to range types lately.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 17:20:49
Message-ID: 1307467249.2402.175.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> > right. hm -- can you have multiple range type definitions for a
> > particular type?
>
> In principle, sure, if the type has multiple useful sort orderings.

Right. Additionally, you might want to use different "canonical"
functions for the same subtype.

> I don't immediately see any core types for which we'd bother.

Agreed.

> BTW, Jeff, have you worked out the implications of collations for
> textual range types?

Well, "it seems to work" is about as far as I've gotten.

As far as the implications, I'll need to do a little more research and
thinking. But I don't immediately see anything too worrisome.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 17:31:40
Message-ID: 1307467900.2402.185.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
> On this note, here's a *big* thing that needs discussion ...

[ refering to the concept of "discrete" versus "continuous" ranges ]

Yes, there has been much discussion on this topic already.

The solution right now is that they both behave like continuous ranges
for most operations. But each time a value is produced, a discrete range
has a "canonicalize" function that aligns it to the proper boundaries
and chooses a convention from [], [), (], (). For discrete ranges that's
only a convention, because multiple representations are equal in value,
but that's not so for continuous ranges.

Another approach would be to offer "next" and "prev" functions instead
of "canonical", or a "plus(thetype, integer)" and "minus(thetype,
integer)".

> Can Pg be changed to support "." in operator names as long as they don't just
> appear by themselves? What would this break to do so?

Someone else would have to comment on that. My feeling is that it might
create problems with qualified names, and also with PG's "arg.function"
call syntax.

> >> foo in 1..10

> I believe it is quite reasonable to treat ranges like sets, in an abstract
> sense, and so using set membership syntax like "in" is valid.

OK, I think I agree with this now. I'll think about it some more.

> I also see these as considerably less important and useful in practice than the
> continuous intervals.

[ multiranges ]

Agreed. I've left those alone for now, because it's a separate concept.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 17:59:33
Message-ID: 16950.1307469573@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
>> Can Pg be changed to support "." in operator names as long as they don't just
>> appear by themselves? What would this break to do so?

> Someone else would have to comment on that.

DOT_DOT is already a token in plpgsql; trying to make it be also an
operator name would break a lot of existing plpgsql code.

regards, tom lane


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 18:28:51
Message-ID: 4DEE6DE3.4090308@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Tue, 2011-06-07 at 11:15 -0400, Tom Lane wrote:
>> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>>> right. hm -- can you have multiple range type definitions for a
>>> particular type?
>> In principle, sure, if the type has multiple useful sort orderings.
>
> Right. Additionally, you might want to use different "canonical"
> functions for the same subtype.
>
>> I don't immediately see any core types for which we'd bother.
>
> Agreed.
>
>> BTW, Jeff, have you worked out the implications of collations for
>> textual range types?
>
> Well, "it seems to work" is about as far as I've gotten.
>
> As far as the implications, I'll need to do a little more research and
> thinking. But I don't immediately see anything too worrisome.

I would expect ranges to have exactly the same semantics as ORDER BY or "<" etc
with respect to collations for textual range types.

If collation is an attribute of a textual type, meaning that the textual type or
its values have a sense of their collation built-in, then ranges for those
textual types should "just work" without any extra range-specific syntax, same
as you could say ORDER BY without any further qualifiers.

If collation is not an attribute of a textual type, meaning that you normally
have to qualify the desired collation for each order-sensitive operation using
it (even if that can be defined by a session/etc setting which still just
ultimately works at the operator rather than type level), or if a textual type
can have it built in but it is overridable per operator, then either ranges
should have an extra attribute saying what collation (or other type-specific
order-determining function) to use, or all range operators take the optional
collation parameter like with ORDER BY.

Personally, I think it is a more elegant programming language design for an
ordered type to have its own sense of a one true canonical ordering of its
values, and where one could conceptually have multiple orderings, there would be
a separate data type for each one. That is, while you probably only need a
single type with respect to ordering for any real numeric type, for textual
types you could have a separate textual type for each collation.

In particular, I say separate type because a collation can sometimes affect
differently what text values compare as "same", as far as I know.

On a tangent, I believe that various insensitive comparisons or sortings are
very reasonably expressed as collations rather than some other mechanism, eg if
you wanted sortings that compare different letter case as same or not, or with
or without accents as same or not.

So under this "elegant" system, there is no need to ever specify collation at
the operator level (which could become quite verbose and unweildy), but instead
you can cast data types if you want to change their sense of canonical ordering.

Now if the various text-specific operators are polymorphic across these text
type variants, users don't generally have to know the difference except when it
matters.

On a tangent, I believe that the best definition of "equal" or "same" in a type
system is global substitutability. Ignoring implementation details, if a
program ever finds that 2 operands to the generic "=" (equality test) operator
result in TRUE, then the program should feel free to replace all occurrences of
one operand in the program with occurrences of the other, for optimization,
because generic "=" returning TRUE means one is just as good as the other. This
assumes generally that we're dealing with immutable value types.

-- Darren Duncan


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-07 18:40:02
Message-ID: 4DEE7082.6070102@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis wrote:
> On Mon, 2011-06-06 at 14:42 -0700, Darren Duncan wrote:
>> Can Pg be changed to support "." in operator names as long as they don't just
>> appear by themselves? What would this break to do so?
>
> Someone else would have to comment on that. My feeling is that it might
> create problems with qualified names, and also with PG's "arg.function"
> call syntax.

With respect to qualified names or "arg.function", then unless the "function"
can be symbolic, I considered your examples to be the "appear by themselves",
hence "." by itself wouldn't be a new operator, and I generally assumed here
that any multi-character operators with "." to be symbolic.

In any event, I also saw Tom's reply about DOT_DOT being a token already.

-- Darren Duncan


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-08 15:46:47
Message-ID: 1307548007.2402.195.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-06-07 at 10:20 -0700, Jeff Davis wrote:
> > BTW, Jeff, have you worked out the implications of collations for
> > textual range types?
>
> Well, "it seems to work" is about as far as I've gotten.
>
> As far as the implications, I'll need to do a little more research and
> thinking. But I don't immediately see anything too worrisome.

I take that back :(

It looks like the type input function may be a problem, because it
doesn't look like it knows what the collation is yet. In other words,
PG_GET_COLLATION() is zero for the type input function.

But I need to do a comparison to find out if the range is valid or not.
For instance:
'[a, Z)'::textrange
is valid in "en_US" but not "C".

The range constructor:
range('a', 'Z')
is fine though.

Not sure what to do here.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-08 16:04:34
Message-ID: 10907.1307549074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> It looks like the type input function may be a problem, because it
> doesn't look like it knows what the collation is yet. In other words,
> PG_GET_COLLATION() is zero for the type input function.

Yeah, we've assumed that I/O functions do not need to know collation.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-09 22:26:30
Message-ID: BF45665B-D5C9-4E9F-88A6-0E7FABF4550F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun8, 2011, at 17:46 , Jeff Davis wrote:
> It looks like the type input function may be a problem, because it
> doesn't look like it knows what the collation is yet. In other words,
> PG_GET_COLLATION() is zero for the type input function.
>
> But I need to do a comparison to find out if the range is valid or not.
> For instance:
> '[a, Z)'::textrange
> is valid in "en_US" but not "C".

Maybe that check should just be removed? If one views the range
'[L, U)' as a concise way of expressing "L <= x AND x < U" for some
x, then allowing the case L > U seems quite natural. There won't
be any such x of course, but the range is still valid, just empty.

Actually, thinking for this a bit, I believe this is the only
way text ranges can support collations. If the validity of a range
depends on the collation, then changing the collation after creation
seems weird, since it can make previous valid ranges invalid and
vice versa.

There could be a function RANGE_EMPTY() which people can put into
their CHECK constraints if they don't want such ranges to sneak
into their tables...

best regards,
Florian Pflug


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-12 02:37:59
Message-ID: BANLkTik3CnqYYNgEft44qTRp8nairns+rg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 9, 2011 at 6:26 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
> On Jun8, 2011, at 17:46 , Jeff Davis wrote:
>> It looks like the type input function may be a problem, because it
>> doesn't look like it knows what the collation is yet. In other words,
>> PG_GET_COLLATION() is zero for the type input function.
>>
>> But I need to do a comparison to find out if the range is valid or not.
>> For instance:
>>  '[a, Z)'::textrange
>> is valid in "en_US" but not "C".
>
> Maybe that check should just be removed? If one views the range
> '[L, U)' as a concise way of expressing "L <= x AND x < U" for some
> x, then allowing the case L > U seems quite natural. There won't
> be any such x of course, but the range is still valid, just empty.
>
> Actually, thinking for this a bit, I believe this is the only
> way text ranges can support collations. If the validity of a range
> depends on the collation, then changing the collation after creation
> seems weird, since it can make previous valid ranges invalid and
> vice versa.
>
> There could be a function RANGE_EMPTY() which people can put into
> their CHECK constraints if they don't want such ranges to sneak
> into their tables...

I think the collation is going to have to be baked into the type
definition, no? You can't just up and change the collation of the
column as you could for a straight text column, if that might cause
the contents of some rows to be viewed as invalid.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-12 11:53:08
Message-ID: 73795993-273D-4C15-86C2-BCDFFC255A0F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun12, 2011, at 04:37 , Robert Haas wrote:
> On Thu, Jun 9, 2011 at 6:26 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Jun8, 2011, at 17:46 , Jeff Davis wrote:
>>> It looks like the type input function may be a problem, because it
>>> doesn't look like it knows what the collation is yet. In other words,
>>> PG_GET_COLLATION() is zero for the type input function.
>>>
>>> But I need to do a comparison to find out if the range is valid or not.
>>> For instance:
>>> '[a, Z)'::textrange
>>> is valid in "en_US" but not "C".
>>
>> Maybe that check should just be removed? If one views the range
>> '[L, U)' as a concise way of expressing "L <= x AND x < U" for some
>> x, then allowing the case L > U seems quite natural. There won't
>> be any such x of course, but the range is still valid, just empty.
>>
>> Actually, thinking for this a bit, I believe this is the only
>> way text ranges can support collations. If the validity of a range
>> depends on the collation, then changing the collation after creation
>> seems weird, since it can make previous valid ranges invalid and
>> vice versa.
>>
>> There could be a function RANGE_EMPTY() which people can put into
>> their CHECK constraints if they don't want such ranges to sneak
>> into their tables...
>
> I think the collation is going to have to be baked into the type
> definition, no? You can't just up and change the collation of the
> column as you could for a straight text column, if that might cause
> the contents of some rows to be viewed as invalid.

Now you've lost me. If a text range is simply a pair of strings,
as I suggested, and collations are applied only during comparison
and RANGE_EMPTY(), why would the collation have to be baked into
the type?

If you're referring to the case
(1) Create table with text-range column and collation C1
(2) Add check constraint containing RANGE_EMPTY()
(3) Add data
(4) Alter column to have collation C2, possibly changing
the result of RANGE_EMPTY() for existing ranges.
then that points to a problem with ALTER COLUMN.

best regards,
Florian Pflug


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-13 04:00:08
Message-ID: BANLkTimT0qecKObBXumaKTWhSJoaFjneVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> I think the collation is going to have to be baked into the type
>> definition, no?  You can't just up and change the collation of the
>> column as you could for a straight text column, if that might cause
>> the contents of some rows to be viewed as invalid.
>
> Now you've lost me. If a text range is simply a pair of strings,
> as I suggested, and collations are applied only during comparison
> and RANGE_EMPTY(), why would the collation have to be baked into
> the type?
>
> If you're referring to the case
>  (1) Create table with text-range column and collation C1
>  (2) Add check constraint containing RANGE_EMPTY()
>  (3) Add data
>  (4) Alter column to have collation C2, possibly changing
>      the result of RANGE_EMPTY() for existing ranges.
> then that points to a problem with ALTER COLUMN.

No, I'm saying that you might have a column containing '[a, Z)', and
someone might change the collation of the column from en_US to C.
When the collation was en_US, the column could legally contain that
value, but now that the collation is C, it can't. ALTER TABLE isn't
going to recheck the validity of the data when someone changes the
collation: that's only supposed to affect the sort order, not the
definition of what is a legal value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-13 04:47:28
Message-ID: 4DF59660.8080206@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Sun, Jun 12, 2011 at 7:53 AM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>> I think the collation is going to have to be baked into the type
>>> definition, no? You can't just up and change the collation of the
>>> column as you could for a straight text column, if that might cause
>>> the contents of some rows to be viewed as invalid.
>> Now you've lost me. If a text range is simply a pair of strings,
>> as I suggested, and collations are applied only during comparison
>> and RANGE_EMPTY(), why would the collation have to be baked into
>> the type?
>>
>> If you're referring to the case
>> (1) Create table with text-range column and collation C1
>> (2) Add check constraint containing RANGE_EMPTY()
>> (3) Add data
>> (4) Alter column to have collation C2, possibly changing
>> the result of RANGE_EMPTY() for existing ranges.
>> then that points to a problem with ALTER COLUMN.
>
> No, I'm saying that you might have a column containing '[a, Z)', and
> someone might change the collation of the column from en_US to C.
> When the collation was en_US, the column could legally contain that
> value, but now that the collation is C, it can't. ALTER TABLE isn't
> going to recheck the validity of the data when someone changes the
> collation: that's only supposed to affect the sort order, not the
> definition of what is a legal value.

You can have the same collation problem even without range types.

Consider the following:
(1) Create table with the 2 text columns {L,R} and both columns have the
collation en_US.
(2) Add check constraint requiring "L <= R".
(3) Add a record with the value 'a' for L and 'Z' for R.
(4) Alter the columns to have the collation C.

Good language design principles demand that the semantics for this simplified
case and the semantics for replacing {L,R} with a single range-of-text-typed
column be the same, including what happens with CHECK and ALTER TABLE.

Likewise, anything that affects ORDER BY should affect {<,>,<=,>=} and friends
the same way and vice-versa and likewise should affect range validity.

It makes sense for collation to be considered part of text data types, and
changing collation is casting from one text type to another. Generally
speaking, any inherent or applied aspect of a text or other value (such as
collation) that affects the results of any deterministic operations on those
values (such as sorting) should be considered part of the data type of those values.

-- Darren Duncan


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-13 04:58:28
Message-ID: BANLkTik=9f=FGrq-GyLyA7gDhDwhDcFUpw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 13, 2011 at 12:47 AM, Darren Duncan <darren(at)darrenduncan(dot)net> wrote:
>>> If you're referring to the case
>>>  (1) Create table with text-range column and collation C1
>>>  (2) Add check constraint containing RANGE_EMPTY()
>>>  (3) Add data
>>>  (4) Alter column to have collation C2, possibly changing
>>>     the result of RANGE_EMPTY() for existing ranges.
>>> then that points to a problem with ALTER COLUMN.
>>
>> No, I'm saying that you might have a column containing  '[a, Z)', and
>> someone might change the collation of the column from en_US to C.
>> When the collation was en_US, the column could legally contain that
>> value, but now that the collation is C, it can't.  ALTER TABLE isn't
>> going to recheck the validity of the data when someone changes the
>> collation: that's only supposed to affect the sort order, not the
>> definition of what is a legal value.
>
> You can have the same collation problem even without range types.
>
> Consider the following:
>  (1) Create table with the 2 text columns {L,R} and both columns have the
> collation en_US.
>  (2) Add check constraint requiring "L <= R".
>  (3) Add a record with the value 'a' for L and 'Z' for R.
>  (4) Alter the columns to have the collation C.

Oh, good point.

rhaas=# create table sample (t text collate "en_US", check (t < 'Z'));
CREATE TABLE
rhaas=# insert into sample values ('a');
INSERT 0 1
rhaas=# alter table sample alter column t type text collate "C";
ERROR: check constraint "sample_t_check" is violated by some row

But interestingly, my Mac has a different notion of how this collation
works: it thinks 'a' > 'Z' even in en_US. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-18 08:10:01
Message-ID: 1308384601.2597.56.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2011-06-10 at 00:26 +0200, Florian Pflug wrote:
> Maybe that check should just be removed? If one views the range
> '[L, U)' as a concise way of expressing "L <= x AND x < U" for some
> x, then allowing the case L > U seems quite natural. There won't
> be any such x of course, but the range is still valid, just empty.

[ Please excuse the late reply, I was on vacation. ]

That's an interesting perspective, but I don't think it's a good idea.

Up to this point, I've considered a range value to be a set of
contiguous values, and the endpoints just happen to be a way to
represent that set. If changing the collation changes a set of positive
cardinality into an empty set, clearly it's a different value.

We don't want the COLLATE clause to change the value, because things
that do change the value (like a typecast) should offer the opportunity
to call a function so that you can verify that it's valid or change it
to some canonical form.

So, I believe that you are proposing to change the concept of a range
value from "a contiguous set of values" to "a pair of bounds". There are
numerous implications, one of which is that I don't think that we can
maintain the equality of all empty ranges. Consider these expressions,
where x is a non-empty range with collation "A", but is empty in
collation "B" (and "*" means "range intersection"):

(x COLLATE "B") COLLATE "A"
((x COLLATE "B") * '(-Inf, Inf)') COLLATE "A"
('-'::textrange * '(-Inf, Inf)') COLLATE "A"

All of those expressions should be equal (according to global
substitutibility, as Darren mentioned). But they can't be, because the
last expression is always an empty range, whereas the first one is not
(because merely changing the collation back and forth offers no
opportunity to even notice that you have an empty range at one point).
So, I believe that we'd be stuck with non-equal empty ranges, as well as
many other possibly non-intuitive implications.

So, I lean strongly toward the interpretation that a range is a
contiguous set of values, and changing the collation should not change
the value. Things that do change the value (like a typecast) should
offer the opportunity to handle cases like this with a function call,
but changing collation does not.

This leaves making the collation a part of the range type itself (as
Robert suggested).

Comments?

Regards,
Jeff Davis


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-18 20:19:10
Message-ID: B593F1CC-C33B-4525-9130-7A56C20454C9@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun18, 2011, at 10:10 , Jeff Davis wrote:
> On Fri, 2011-06-10 at 00:26 +0200, Florian Pflug wrote:
> So, I believe that you are proposing to change the concept of a range
> value from "a contiguous set of values" to "a pair of bounds".

Yeah. Mostly though because I figured that'd make defining their
semantics easier, not necessarily because that interpretation is
better, though.

> There are
> numerous implications, one of which is that I don't think that we can
> maintain the equality of all empty ranges. Consider these expressions,
> where x is a non-empty range with collation "A", but is empty in
> collation "B" (and "*" means "range intersection"):
>
> (x COLLATE "B") COLLATE "A"
> ((x COLLATE "B") * '(-Inf, Inf)') COLLATE "A"
> ('-'::textrange * '(-Inf, Inf)') COLLATE "A"
>
> All of those expressions should be equal (according to global
> substitutibility, as Darren mentioned). But they can't be, because the
> last expression is always an empty range, whereas the first one is not
> (because merely changing the collation back and forth offers no
> opportunity to even notice that you have an empty range at one point).
> So, I believe that we'd be stuck with non-equal empty ranges, as well as
> many other possibly non-intuitive implications.

Yeah. Once you give up the idea that range is a set, extensionality
(i.e. the axiom "there's only one empty range" or more precisely
"there only one range which no object is a member of") has to go too.

> So, I lean strongly toward the interpretation that a range is a
> contiguous set of values,

Yeah, I agree now, mainly because defining them as a set give rise
to richer semantics than defining them to be a pair. If someone
needs just a pair of values and maybe a BETWEEN operator, that is
easily done with CREATE TYPE and a few SQL or PLPGSQL functions.

> and changing the collation should not change
> the value. Things that do change the value (like a typecast) should
> offer the opportunity to handle cases like this with a function call,
> but changing collation does not.
>
> This leaves making the collation a part of the range type itself (as
> Robert suggested).

Yes, that seems necessary for consistency. That leaves the question
of what to do if someone tries to modify a textrange's collation with
a COLLATE clause. For example,

For example, whats the result of
'Ä' in '[A,Z']::textrange_german COLLATE 'C'
where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
that textrange_german was defined with collation 'de_DE').

With the set-based definition of ranges, the only sensible thing
is to simply ignore the COLLATE clause I think.

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-18 22:23:00
Message-ID: 1308435780.2597.68.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2011-06-18 at 22:19 +0200, Florian Pflug wrote:
> Yes, that seems necessary for consistency. That leaves the question
> of what to do if someone tries to modify a textrange's collation with
> a COLLATE clause. For example,
>
> For example, whats the result of
> 'Ä' in '[A,Z']::textrange_german COLLATE 'C'
> where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
> in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
> that textrange_german was defined with collation 'de_DE').
>
> With the set-based definition of ranges, the only sensible thing
> is to simply ignore the COLLATE clause I think.

I think rejecting it makes more sense, so a range would not be a
collatable type; it just happens to use collations of the subtype
internally.

Regards,
Jeff Davis


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-19 09:21:28
Message-ID: EE2E5083-83A0-4A9B-AB11-CB3BB01EA7D4@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun19, 2011, at 00:23 , Jeff Davis wrote:
> On Sat, 2011-06-18 at 22:19 +0200, Florian Pflug wrote:
>> Yes, that seems necessary for consistency. That leaves the question
>> of what to do if someone tries to modify a textrange's collation with
>> a COLLATE clause. For example,
>>
>> For example, whats the result of
>> 'Ä' in '[A,Z']::textrange_german COLLATE 'C'
>> where 'Ä' is a german Umlaut-A which sorts after 'A' but before 'B'
>> in locale 'de_DE' but sorts after 'Z' in locale 'C'. (I'm assuming
>> that textrange_german was defined with collation 'de_DE').
>>
>> With the set-based definition of ranges, the only sensible thing
>> is to simply ignore the COLLATE clause I think.
>
> I think rejecting it makes more sense, so a range would not be a
> collatable type; it just happens to use collations of the subtype
> internally.

Ah, crap, I put the COLLATE in the wrong place. What I actually
had in mind was
('Ä' COLLATE 'C') in '[A,Z]'::textrange_german

I was afraid that the "in" operator cannot distinguish this case
from
field in '[A,Z]'::textrange_german
where "field" is declared with "COLLATE 'C'".

In the seconds case, throwing an error seems a bit harsh....

There's also this fun little case
field in '[A,Z]'
(note lack of an explicit cast). Here the input function would
probably need to verify that there's a range type corresponding
to the field's type *and* that the range type's collation matches
the field's collation. I wonder if that's possible - Tom said
somewhere that input function don't receive collation information,
though I don't know if that restriction applies in this case.

best regards,
Florian Pflug


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-19 10:24:37
Message-ID: 20110619102436.GA3197@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 19, 2011 at 11:21:28AM +0200, Florian Pflug wrote:
> > I think rejecting it makes more sense, so a range would not be a
> > collatable type; it just happens to use collations of the subtype
> > internally.
>
> Ah, crap, I put the COLLATE in the wrong place. What I actually
> had in mind was
> ('Ä' COLLATE 'C') in '[A,Z]'::textrange_german

Operators don't have to be collation sensetive. If they're not then the
COLLATE in the above statement is redundant. You can decide that an
interval needs an implicit collation and you can just use that.

> I was afraid that the "in" operator cannot distinguish this case
> from
> field in '[A,Z]'::textrange_german
> where "field" is declared with "COLLATE 'C'".

It should be able to, after all in the first case the collation is
explicit, in the latter implicit.

> There's also this fun little case
> field in '[A,Z]'
> (note lack of an explicit cast). Here the input function would
> probably need to verify that there's a range type corresponding
> to the field's type *and* that the range type's collation matches
> the field's collation. I wonder if that's possible - Tom said
> somewhere that input function don't receive collation information,
> though I don't know if that restriction applies in this case.

Collation checking is generally done by the planner. I don't see why
the input function should check, the result of an input function is by
definition DEFAULT. It's up to the 'in' operator to check.

Note that the whole idea of collation is not really supposed to be
assigned to object for storage. How that can be resolved I'm not sure.

Mvg,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-19 18:08:39
Message-ID: 1308506919.2597.83.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-06-19 at 12:24 +0200, Martijn van Oosterhout wrote:
> Collation checking is generally done by the planner. I don't see why
> the input function should check, the result of an input function is by
> definition DEFAULT. It's up to the 'in' operator to check.
>
> Note that the whole idea of collation is not really supposed to be
> assigned to object for storage. How that can be resolved I'm not sure.

I think if we just say that it's a property of the range type
definition, then that's OK. It's similar to specifying a non-default
btree opclass for the range type -- it just changes which total order
the range type adheres to.

If you meant that the collation shouldn't be stored along with the value
itself, then I agree.

Regards,
Jeff Davis


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-19 19:29:31
Message-ID: 83B5F0BA-FFE3-44AB-9CC0-A29ED142411F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun19, 2011, at 20:08 , Jeff Davis wrote:
> On Sun, 2011-06-19 at 12:24 +0200, Martijn van Oosterhout wrote:
>> Collation checking is generally done by the planner. I don't see why
>> the input function should check, the result of an input function is by
>> definition DEFAULT. It's up to the 'in' operator to check.
>>
>> Note that the whole idea of collation is not really supposed to be
>> assigned to object for storage. How that can be resolved I'm not sure.
>
> I think if we just say that it's a property of the range type
> definition, then that's OK. It's similar to specifying a non-default
> btree opclass for the range type -- it just changes which total order
> the range type adheres to.

In fact, it's exactly the same, because what we *actually* need to
specify is not an opclass but a comparison operator. Which is only
well-defined if you know *both* an opclass *and* a collation.

That reminds me - the conclusion there was that we cannot have
two range types with the same base type but different opclasses,
wasn't it?

AFAIR precisely because otherwise there's no sensible way to handle
'text' in '[lower,upper]'

If I'm not mistaken about this, that would imply that we also cannot
have two range types with the same base type, the same opclass,
but different collations. Which seems rather unfortunate... In fact,
if that's true, maybe restricing range types to the database collation
would be best...

best regards,
Florian Pflug


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 06:33:02
Message-ID: 1308551582.2597.119.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
> If I'm not mistaken about this, that would imply that we also cannot
> have two range types with the same base type, the same opclass,
> but different collations. Which seems rather unfortunate... In fact,
> if that's true, maybe restricing range types to the database collation
> would be best...

Yes, we cannot have two range types with the same base type. That is a
consequence of the polymorphic type system, which needs to be able to
determine the range type given the base type.

A workaround is to use domains. That is effective, but awkward. For
instance, given:
CREATE DOMAIN textdomain AS text;
CREATE TYPE textdomainrange AS RANGE (subtype=textdomain);
then:
'[a,z)'::textdomainrange @> 'b'::textdomain
would work, but:
'[a,z)'::textdomainrange @> 'b'
would not, which would be annoying.

I don't see a way around this. It's not a collation problem, but a
general "multiple range types with the same subtype" problem.

I don't think there's much benefit in restricting it to the DB
collation. If someone really needs a different collation (or opclass,
for that matter), it might as well be allowed, even if you have to do
extra type annotations.

Regards,
Jeff Davis


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 13:19:21
Message-ID: BANLkTinaaZPiavc_VDp88Ozkdn2JvyauXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
>> If I'm not mistaken about this, that would imply that we also cannot
>> have two range types with the same base type, the same opclass,
>> but different collations. Which seems rather unfortunate... In fact,
>> if that's true, maybe restricing range types to the database collation
>> would be best...
>
> Yes, we cannot have two range types with the same base type. That is a
> consequence of the polymorphic type system, which needs to be able to
> determine the range type given the base type.

Boy, that's an unfortunate limitation. :-(

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: David Fetter <david(at)fetter(dot)org>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 13:44:52
Message-ID: 20110620134452.GD19779@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jun 19, 2011 at 11:33:02PM -0700, Jeff Davis wrote:
> On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
> > If I'm not mistaken about this, that would imply that we also
> > cannot have two range types with the same base type, the same
> > opclass, but different collations. Which seems rather
> > unfortunate... In fact, if that's true, maybe restricing range
> > types to the database collation would be best...
>
> Yes, we cannot have two range types with the same base type. That is
> a consequence of the polymorphic type system, which needs to be able
> to determine the range type given the base type.
>
> A workaround is to use domains. That is effective, but awkward. For
> instance, given:
> CREATE DOMAIN textdomain AS text;
> CREATE TYPE textdomainrange AS RANGE (subtype=textdomain);
> then:
> '[a,z)'::textdomainrange @> 'b'::textdomain
> would work, but:
> '[a,z)'::textdomainrange @> 'b'
> would not, which would be annoying.
>
> I don't see a way around this. It's not a collation problem, but a
> general "multiple range types with the same subtype" problem.

How might you address that problem, assuming you had the needed
resources to do it?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 14:01:00
Message-ID: AD4FC75D-DB99-48ED-9082-52EE3A4D74A6@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun20, 2011, at 15:19 , Robert Haas wrote:
> On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> On Sun, 2011-06-19 at 21:29 +0200, Florian Pflug wrote:
>>> If I'm not mistaken about this, that would imply that we also cannot
>>> have two range types with the same base type, the same opclass,
>>> but different collations. Which seems rather unfortunate... In fact,
>>> if that's true, maybe restricing range types to the database collation
>>> would be best...
>>
>> Yes, we cannot have two range types with the same base type. That is a
>> consequence of the polymorphic type system, which needs to be able to
>> determine the range type given the base type.
>
> Boy, that's an unfortunate limitation. :-(

Hm, I'm starting to wonder if there isn't a way around that. It seems that
this restriction comes from the desire to allow functions with the
polymorphic signature
(ANYELEMENT, ANYELEMENT) -> ANYRANGE.

The only such function I can currently come up with is the generic
range constructor. Is having that worth the restriction to one
range type per base type?

Another option might be to extend polymorphic argument matching
to allow functions with the signature
(<non-polymorphic args>) -> <polymorphic type>
but to require the concrete output type to be specified with a cast
at the call site. For the generic range constructor, you'd then
have to write
RANGE(lower, upper)::range_type

(If we had that, we could also (finally) provide functions to
set and get fields of composite types by name. As it stands,
doing that cleanly is hard because the desired signature of
the get function, namely
(record, fieldname text) -> anyelement
is not supported.)

A third approach might be to first define a PAIR type and then
define ranges on top of that. Since PAIR types wouldn't include
a comparison operators, the restriction to one PAIR type per
base type wouldn't matter. Instead of a generic RANGE constructor
you'd then use the generic PAIR constructor and cast the resulting
PAIR to whatever range you desire, i.e. write
PAIR(lower, upper)::range_type.

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 14:17:23
Message-ID: 24831.1308579443@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Jun 20, 2011 at 2:33 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> Yes, we cannot have two range types with the same base type. That is a
>> consequence of the polymorphic type system, which needs to be able to
>> determine the range type given the base type.

> Boy, that's an unfortunate limitation. :-(

Given the need to deal with multiple collations for collatable types,
I'd say it's not so much "unfortunate" as "utterly unworkable". At
least unless you give up the notion of binding the collation into the
type definition ... which has other issues, per discussion a few days
ago. Even ignoring collations, I really think we want to allow multiple
range types for base types that have multiple btree sort orderings.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 15:09:28
Message-ID: BANLkTi=VOTnkb9tD71RjXo1JwUhibVHvHXXAFyzv5=yZdOfsPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 20, 2011 at 3:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Given the need to deal with multiple collations for collatable types,
> I'd say it's not so much "unfortunate" as "utterly unworkable".  At
> least unless you give up the notion of binding the collation into the
> type definition ... which has other issues, per discussion a few days
> ago.  Even ignoring collations, I really think we want to allow multiple
> range types for base types that have multiple btree sort orderings.

I was imagining it would be not part of the type but part of the
internal data in the range type. The dumped representation would look
something like ['bar','baz',''en_US'] and input forms like
['bar','baz'] would just default to the database default collation or
the session's default collation or whatever.

The most disturbing thing about this is that it would make
unrestorable dumps if any of those collation names change or are not
installed before the data is loaded. It's kind of like having your
table names embedded in a text column in your tables. It could make
things awkward to manage later.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Florian Pflug <fgp(at)phlo(dot)org>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 15:33:06
Message-ID: 26722.1308583986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <stark(at)mit(dot)edu> writes:
> On Mon, Jun 20, 2011 at 3:17 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Given the need to deal with multiple collations for collatable types,
>> I'd say it's not so much "unfortunate" as "utterly unworkable". At
>> least unless you give up the notion of binding the collation into the
>> type definition ... which has other issues, per discussion a few days
>> ago. Even ignoring collations, I really think we want to allow multiple
>> range types for base types that have multiple btree sort orderings.

> I was imagining it would be not part of the type but part of the
> internal data in the range type. The dumped representation would look
> something like ['bar','baz',''en_US'] and input forms like
> ['bar','baz'] would just default to the database default collation or
> the session's default collation or whatever.

> The most disturbing thing about this is that it would make
> unrestorable dumps if any of those collation names change or are not
> installed before the data is loaded. It's kind of like having your
> table names embedded in a text column in your tables. It could make
> things awkward to manage later.

Yeah. In particular this would cause issues for pg_upgrade, which would
have to somehow ensure that collation OIDs didn't change between old and
new installations, which is just about impossible given the current
method for assigning them. I think we need to avoid that, really.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 16:21:12
Message-ID: 1308586872.2597.129.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-20 at 16:01 +0200, Florian Pflug wrote:
> Hm, I'm starting to wonder if there isn't a way around that. It seems that
> this restriction comes from the desire to allow functions with the
> polymorphic signature
> (ANYELEMENT, ANYELEMENT) -> ANYRANGE.
>
> The only such function I can currently come up with is the generic
> range constructor. Is having that worth the restriction to one
> range type per base type?

Good point.

Having constructors is obviously important, but perhaps they don't have
to be generic. We could generate catalog entries for each constructor
for each range type, and name them after the range type itself. So,
instead of:
range(1, 10)
you'd write:
int4range(1,10)

That actually might be better anyway, because relying on the polymorphic
version is not perfect now anyway. For instance, if you want an
int8range using the generic range() constructor, you need a cast.

We'd still need to get the polymorphic type system to work the way we
want in this case. I'll look into that.

> Another option might be to extend polymorphic argument matching
> to allow functions with the signature
> (<non-polymorphic args>) -> <polymorphic type>
> but to require the concrete output type to be specified with a cast
> at the call site. For the generic range constructor, you'd then
> have to write
> RANGE(lower, upper)::range_type

Interesting idea.

> A third approach might be to first define a PAIR type and then
> define ranges on top of that. Since PAIR types wouldn't include
> a comparison operators, the restriction to one PAIR type per
> base type wouldn't matter. Instead of a generic RANGE constructor
> you'd then use the generic PAIR constructor and cast the resulting
> PAIR to whatever range you desire, i.e. write
> PAIR(lower, upper)::range_type.

Another interesting idea. A little awkward though, and doesn't offer
much opportunity to specify inclusivity/exclusivity of the bounds.

Regards,
Jeff Davis


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 17:16:43
Message-ID: BANLkTi=4_D0QD1EZoLEcnFZBG9xJT-cdew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2011-06-20 at 16:01 +0200, Florian Pflug wrote:
>> Hm, I'm starting to wonder if there isn't a way around that. It seems that
>> this restriction comes from the desire to allow functions with the
>> polymorphic signature
>>   (ANYELEMENT, ANYELEMENT) -> ANYRANGE.
>>
>> The only such function I can currently come up with is the generic
>> range constructor. Is having that worth the restriction to one
>> range type per base type?
>
> Good point.
>
> Having constructors is obviously important, but perhaps they don't have
> to be generic. We could generate catalog entries for each constructor
> for each range type, and name them after the range type itself. So,
> instead of:
>  range(1, 10)
> you'd write:
>  int4range(1,10)
>
> That actually might be better anyway, because relying on the polymorphic
> version is not perfect now anyway. For instance, if you want an
> int8range using the generic range() constructor, you need a cast.
>
> We'd still need to get the polymorphic type system to work the way we
> want in this case. I'll look into that.

hm, what if there *was( only one range type per base type, but in the
various contexts where specific ordering and collation was important
you could optionally pass them in? Meaning, the specific ordering was
not bound rigidly to the type, but to the operation?

merlin


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 17:33:19
Message-ID: A98D8DC3-4368-4B87-A09C-43FB3F78A424@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun20, 2011, at 19:16 , Merlin Moncure wrote:
> On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> hm, what if there *was( only one range type per base type, but in the
> various contexts where specific ordering and collation was important
> you could optionally pass them in? Meaning, the specific ordering was
> not bound rigidly to the type, but to the operation?

I suggested that previously here
http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php

In the ensuing discussion, however, it became clear that by doing so
range types become little more than a pair of values. More specifically,
a range then *doesn't* represent a set of values, because whether or
not a value is "in" the range depends on a specific sort order.

Actually, you'd probably even loose the possibility of having a
normalization function for discrete base types (which makes sure
that we know that "[1,2]" is the same as "[1,3)"), because it's
hard to image one normalization function that works sensibly for
two different orderings.

So by doing that, you effectively turn a RANGE into a quadruple
(lower type, lower_included bool, upper type, upper_included bool).

best regards,
Florian Pflug


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-20 17:43:27
Message-ID: 22970.1308591807@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug <fgp(at)phlo(dot)org> writes:
> On Jun20, 2011, at 19:16 , Merlin Moncure wrote:
>> On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> hm, what if there *was( only one range type per base type, but in the
>> various contexts where specific ordering and collation was important
>> you could optionally pass them in? Meaning, the specific ordering was
>> not bound rigidly to the type, but to the operation?

> I suggested that previously here
> http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php

> In the ensuing discussion, however, it became clear that by doing so
> range types become little more than a pair of values. More specifically,
> a range then *doesn't* represent a set of values, because whether or
> not a value is "in" the range depends on a specific sort order.

Yeah, that doesn't seem like the way to go. If a range value doesn't
represent a well-defined set of base-type values, we lose a lot of the
mathematical underpinnings for range operations.

So ... just how awful would it be if we hard-wired range types to always
use their base type's default btree sort ordering and the database's
default collation? In principle that sucks, but I'm not sure how wide
the use-cases actually will be for other choices.

The other viable alternative seems to be to require those two properties
(btree opclass and collation) to be part of a specific range type
definition. The complaint about that seemed to be that we couldn't
infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
that by identifying one range type as the default for the base type,
and then using that one in cases where we have no ANYRANGE input?

regards, tom lane


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-20 18:38:58
Message-ID: 4DFF93C2.2030500@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Florian Pflug <fgp(at)phlo(dot)org> writes:
>> On Jun20, 2011, at 19:16 , Merlin Moncure wrote:
>>> On Mon, Jun 20, 2011 at 11:21 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>>> hm, what if there *was( only one range type per base type, but in the
>>> various contexts where specific ordering and collation was important
>>> you could optionally pass them in? Meaning, the specific ordering was
>>> not bound rigidly to the type, but to the operation?
>
>> I suggested that previously here
>> http://archives.postgresql.org/pgsql-hackers/2011-06/msg00846.php
>
>> In the ensuing discussion, however, it became clear that by doing so
>> range types become little more than a pair of values. More specifically,
>> a range then *doesn't* represent a set of values, because whether or
>> not a value is "in" the range depends on a specific sort order.
>
> Yeah, that doesn't seem like the way to go. If a range value doesn't
> represent a well-defined set of base-type values, we lose a lot of the
> mathematical underpinnings for range operations.
>
> So ... just how awful would it be if we hard-wired range types to always
> use their base type's default btree sort ordering and the database's
> default collation? In principle that sucks, but I'm not sure how wide
> the use-cases actually will be for other choices.
>
> The other viable alternative seems to be to require those two properties
> (btree opclass and collation) to be part of a specific range type
> definition. The complaint about that seemed to be that we couldn't
> infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
> that by identifying one range type as the default for the base type,
> and then using that one in cases where we have no ANYRANGE input?
>
> regards, tom lane

I still think that the most elegant solution is for stuff like collation to just
be built-in to the base types that the range is ranging over, meaning we have a
separate text base type for each text collation, and the text operators are
polymorphic over all those base types. Having collations and stuff as something
off to the side not built-in to text/etc types is the root of the problem. The
range-specific stuff can remain ANYELEMENT and no special-casing is required.
Also, besides range constructors, a generic membership test like "value in
range" is polymorphic. -- Darren Duncan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-20 18:58:21
Message-ID: 2176.1308596301@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Darren Duncan <darren(at)darrenduncan(dot)net> writes:
> I still think that the most elegant solution is for stuff like collation to just
> be built-in to the base types that the range is ranging over, meaning we have a
> separate text base type for each text collation, and the text operators are
> polymorphic over all those base types. Having collations and stuff as something
> off to the side not built-in to text/etc types is the root of the
> problem.

I tend to agree that this aspect of the SQL standard isn't terribly well
designed, but it's the standard and we're stuck with it. We're not
going to support two parallel methods of dealing with collations.

regards, tom lane


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Darren Duncan <darren(at)darrenduncan(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-20 19:19:04
Message-ID: EE45E2EB-86BA-4367-89FE-D4F07A7E71E4@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun20, 2011, at 20:58 , Tom Lane wrote:
> Darren Duncan <darren(at)darrenduncan(dot)net> writes:
>> I still think that the most elegant solution is for stuff like collation to just
>> be built-in to the base types that the range is ranging over, meaning we have a
>> separate text base type for each text collation, and the text operators are
>> polymorphic over all those base types. Having collations and stuff as something
>> off to the side not built-in to text/etc types is the root of the
>> problem.
>
> I tend to agree that this aspect of the SQL standard isn't terribly well
> designed, but it's the standard and we're stuck with it. We're not
> going to support two parallel methods of dealing with collations.

Plus, you can always define a DOMAIN for every collation you intent to use,
and stay clear of COLLATE clauses except as part of these domain definitions.

Most interestingly, this is also the workaround Jeff Davis suggested for
those who absolutely need two range types over the same base type (i.e.
define one of the ranges over a domain).

best regards,
Florian Pflug


From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-20 19:54:52
Message-ID: 4DFFA58C.5050504@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian Pflug wrote:
> On Jun20, 2011, at 20:58 , Tom Lane wrote:
>> Darren Duncan <darren(at)darrenduncan(dot)net> writes:
>>> I still think that the most elegant solution is for stuff like collation to just
>>> be built-in to the base types that the range is ranging over, meaning we have a
>>> separate text base type for each text collation, and the text operators are
>>> polymorphic over all those base types. Having collations and stuff as something
>>> off to the side not built-in to text/etc types is the root of the
>>> problem.
>> I tend to agree that this aspect of the SQL standard isn't terribly well
>> designed, but it's the standard and we're stuck with it. We're not
>> going to support two parallel methods of dealing with collations.
>
> Plus, you can always define a DOMAIN for every collation you intent to use,
> and stay clear of COLLATE clauses except as part of these domain definitions.
>
> Most interestingly, this is also the workaround Jeff Davis suggested for
> those who absolutely need two range types over the same base type (i.e.
> define one of the ranges over a domain).
>
> best regards,
> Florian Pflug

That DOMAIN-based solution ostensibly sounds like a good one then, under the
circumstances. What I *don't* want to see is for things like ranges to have
their own collations and the like. From the perspective of all range-specific
things, the types over which they're defined like text should just have their
own native ordering, which defines the range's sense of "before" and "after".
If DOMAIN effectively does that for text types, then that is the way to go. --
Darren Duncan


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Darren Duncan <darren(at)darrenduncan(dot)net>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Range Types and extensions
Date: 2011-06-21 08:29:58
Message-ID: 1308644998.2443.11.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-20 at 12:54 -0700, Darren Duncan wrote:
> That DOMAIN-based solution ostensibly sounds like a good one then, under the
> circumstances.

It's not bad from a theoretical standpoint, but it does require some
extra type annotation, which is not really the "SQL way".

> What I *don't* want to see is for things like ranges to have
> their own collations and the like.

I'm not 100% sure what you mean here. If you mean that you don't want
range types to pay attention to COLLATE clauses, etc., then I agree. I
would also agree if you mean that range values should not carry the
collation with them.

However, it looks like we might try to make the opclass/collation pair a
property of the range type definition. That seems nice, because it
allows us to keep the nice properties of ranges as well as the type
inference and polymorphism for everything except the constructors.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgresql(dot)org, Darren Duncan <darren(at)darrenduncan(dot)net>
Subject: Re: Range Types and extensions
Date: 2011-06-21 08:31:11
Message-ID: 1308645071.2443.12.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-06-20 at 13:43 -0400, Tom Lane wrote:
> The other viable alternative seems to be to require those two properties
> (btree opclass and collation) to be part of a specific range type
> definition. The complaint about that seemed to be that we couldn't
> infer an ANYRANGE type given only ANYELEMENT, but could we alleviate
> that by identifying one range type as the default for the base type,
> and then using that one in cases where we have no ANYRANGE input?

Yes, that sounds similar to Florian's suggestion, and I think there may
be a solution down this path. However, if we're going to have range
types with non-default orderings, then we need a way to construct them.

I suggested that, if constructors are the primary problem case, then
just generate non-polymorphic constructors at range type definition
time, named after the range type name. I'll look into that approach.

Regards,
Jeff Davis