pre-proposal: type interfaces

Lists: pgsql-hackers
From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pre-proposal: type interfaces
Date: 2009-10-23 05:17:33
Message-ID: 1256275053.2580.782.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am starting to plan a few features that are important for temporal
data, and one prerequisite for several of them is the ability to find an
operator that fills a certain role.

For instance, one feature that I'm considering now is a "temporal join"
which is a join on "overlaps" rather than "equals", e.g.:

SELECT * FROM a, b WHERE a.x && b.x;

I might try to provide a modified merge join algorithm to implement that
more efficiently in some cases. I don't mean to discuss the feature in
detail here (I will make a separate proposal) but the algorithm requires
that I find the "strictly left of" operator.

So, after I recognize that a temporal join is required, I need to be
able to identify the << operator. But how? In other languages, it would
probably be done with something like an "interface", but we don't have
that concept here. The internals generally use operators attached to the
default btree opclass, but I don't think that works very well here.

The way I see it, we have two approaches:
1. Try to make the current system work by standardizing the strategy
numbers for GiST somehow, and then use the default GiST operator class,
if available.
2. Invent a new system, perhaps interfaces, perhaps something else.
3. Use extra flags in CREATE OPERATOR somehow

Thoughts?

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 14:34:34
Message-ID: 10295.1256308474@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> For instance, one feature that I'm considering now is a "temporal join"
> which is a join on "overlaps" rather than "equals", e.g.:

> SELECT * FROM a, b WHERE a.x && b.x;

> I might try to provide a modified merge join algorithm to implement that
> more efficiently in some cases. I don't mean to discuss the feature in
> detail here (I will make a separate proposal) but the algorithm requires
> that I find the "strictly left of" operator.

Well, actually you need *two* things. The first prerequisite is to know
that the operator named && has the semantics of "overlaps" in some
generalized sense. The second prerequisite is to identify the
associated "strictly left of" operator.

As you say, we've done this in the past using operator classes. That's
worked reasonably well because what the existing code needs to know
about is equality, ordering, and hashing, and that all matches up nicely
with btree and hash opclasses.

> The way I see it, we have two approaches:
> 1. Try to make the current system work by standardizing the strategy
> numbers for GiST somehow, and then use the default GiST operator class,
> if available.

This proposal is a non-starter, because one of the whole points of GIST
is that it can support multiple sets of semantics. There is no reason
at all to assume that every GIST opclass must include operators having
the semantics of overlaps and to-left-of.

> 2. Invent a new system, perhaps interfaces, perhaps something else.
> 3. Use extra flags in CREATE OPERATOR somehow

The thing that would require the least amount of invention is to use the
operator class/family machinery with a new index type. Perhaps a dummy
entry in pg_am would be acceptable. (You would probably create just
operator families, with no contained classes, since the only point of a
class is to identify the minimum support set for an index and there
won't be any indexes.) An alternative is to somehow mark those GIST
opclasses that do meet the expectation about operator semantics.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 17:16:34
Message-ID: 407d949e0910231016k2d11cf6ai54fa304f7bc25515@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 23, 2009 at 7:34 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The way I see it, we have two approaches:
>>  1. Try to make the current system work by standardizing the strategy
>> numbers for GiST somehow, and then use the default GiST operator class,
>> if available.
>
> This proposal is a non-starter, because one of the whole points of GIST
> is that it can support multiple sets of semantics.  There is no reason
> at all to assume that every GIST opclass must include operators having
> the semantics of overlaps and to-left-of.

I always thought it was strange that the GIST strategy numbers were
completely meaningless. It does seem like assigning meaning to
strategy numbers gradually as we learn new interrelated indexable
strategies. We would still have a range of values for new non-standard
semantics, but at least the common ones would be nailed down.

I'm not sure that solves the problem because the "default" gist
operator class isn't necessarily going to be the one with the
strategies this needs, though I suppose normally people would want to
make the default operator class for each data type one which supports
the most strategies.

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 19:16:06
Message-ID: 25418.1256325366@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> I always thought it was strange that the GIST strategy numbers were
> completely meaningless. It does seem like assigning meaning to
> strategy numbers gradually as we learn new interrelated indexable
> strategies. We would still have a range of values for new non-standard
> semantics, but at least the common ones would be nailed down.

Well, the problem with that is that GIST strategy numbers are
historically an internal implementation detail for any particular
opclass, and so trying to standardize them now is going to mean lots
of incompatibility with third-party opclasses.

I'd feel more comfortable with being able to add some flags to an
opclass (or more likely an opfamily) that assert that its strategy
numbers agree with some convention or other. Maybe a bitmap so
that there's room for multiple future conventions of this kind?
For instance it's certainly conceivable that a GIST class could
support both btree-compatible and overlaps operators.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 20:25:26
Message-ID: 26048.1256329526@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> I'm not sure that solves the problem because the "default" gist
> operator class isn't necessarily going to be the one with the
> strategies this needs,

Forgot to mention: I do not think default-ness of opclasses enters
into it at all. The meaning of the query is fully defined by the
operator that is in it. All we need to know is what are the
semantics of that operator. If we can find it in the "overlaps"
position of *any* opclass, we are entitled to suppose that it
behaves like overlaps and the associated left-of operator can be
used to optimize it. Conceivably we could get different left-of
operators out of different opclasses, but if they don't behave
effectively the same, the user has messed up the opclasses.

As an example, suppose we are trying to implement DISTINCT via
sort-and-unique, and we've chosen an appropriate '=' operator that
defines what distinct-ness means. That operator might appear in more
than one opclass having more than one sort order, but it doesn't matter
which one we choose. As long as the opclasses are self-consistent,
we should get correct answers with any one.

The case where default-ness of opclasses matters is where we are
trying to assign specific meaning to some generic construct like
DISTINCT or ORDER BY. For instance, it makes sense to require
that ORDER BY be interpreted by reference to a default opclass,
because otherwise you don't have a way to know which sort ordering
the user wants.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 20:40:44
Message-ID: 1256330444.28858.135.camel@monkey-cat.sm.truviso.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote:
> Forgot to mention: I do not think default-ness of opclasses enters
> into it at all. The meaning of the query is fully defined by the
> operator that is in it. All we need to know is what are the
> semantics of that operator. If we can find it in the "overlaps"
> position of *any* opclass, we are entitled to suppose that it
> behaves like overlaps and the associated left-of operator can be
> used to optimize it.

Interesting, that sounds we've got a good approach to the problem now.
This thread has been useful.

> Conceivably we could get different left-of
> operators out of different opclasses, but if they don't behave
> effectively the same, the user has messed up the opclasses.

It would probably be worthwhile to make an attempt to throw a useful
error there, but I agree it's not really a problem.

> The case where default-ness of opclasses matters is where we are
> trying to assign specific meaning to some generic construct like
> DISTINCT or ORDER BY. For instance, it makes sense to require
> that ORDER BY be interpreted by reference to a default opclass,
> because otherwise you don't have a way to know which sort ordering
> the user wants.

That makes sense.

Thanks,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-23 21:55:30
Message-ID: 29044.1256334930@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Fri, 2009-10-23 at 16:25 -0400, Tom Lane wrote:
>> Conceivably we could get different left-of
>> operators out of different opclasses, but if they don't behave
>> effectively the same, the user has messed up the opclasses.

> It would probably be worthwhile to make an attempt to throw a useful
> error there, but I agree it's not really a problem.

Sure, right after we solve the halting problem ;-). The point I
was trying to make is that getting different operators is not wrong.
It's only wrong if their behavior isn't consistent with what the
opclass asserts, and there's no practical way to determine that.
We have to trust the opclass maker. (This is one of the main
reasons why CREATE OPERATOR CLASS is superuser-only.)

regards, tom lane


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pre-proposal: type interfaces
Date: 2009-10-24 20:10:43
Message-ID: 7A2BCECB-EB5D-4084-AE40-42DCFBCEBB40@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Still on the phone...

--
dim

Le 23 oct. 2009 à 21:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> a écrit :
> I'd feel more comfortable with being able to add some flags to an
> opclass (or more likely an opfamily) that assert that its strategy
> numbers agree with some convention or other.

The overlap operator only concern multi dimensional data types I
think, it seems to me we can use the term "range".

So whart about applying semantics to strategy numbers for RANGE
OPERATOR CLASS ... GIST ...

I'm not sure how scalable we want to be there, so maybe those
(options, range, ...) would be better, but the flexibility Will
certainly not be that good as each option Will require unique strategy
numbers...

Regards,