Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

Lists: pgsql-hackers
From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-07 08:54:21
Message-ID: CAPpHfduGQ==r0sf7JFG5C=EBbyc5v=jFhhzg0Kqgm76cbGARoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 7, 2011 at 7:41 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> I'd prefer to include it in the initial patch. If the current GiST code
> is going to be replaced, then there's not much sense reviewing/testing
> it.
>
> You may need to consider unbounded and empty ranges specially. I made an
> attempt to do so in the current GiST code, and you might want to take a
> look at that first. I'm not particularly attached to my approach, but we
> should do something reasonable with unbounded and empty ranges.
>

The first thing caught my eye in existing GiST code is idea of
subtype_float. float8 has limited precision and can't respresent, for
example, varlena values good enough. Even if we have large int8 value we can
loose lower bits, but data distribution can be so that these bits are
valuable. Wouldn't it better to have function like subtype_diff_float which
returns difference between two values of subtype as an float? Using of such
function could make penalty more sensible to even small difference between
values, and accordingly more relevant.

------
With best regards,
Alexander Korotkov.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-08 09:01:29
Message-ID: 1318064489.1724.83.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:

> The first thing caught my eye in existing GiST code is idea of
> subtype_float. float8 has limited precision and can't respresent, for
> example, varlena values good enough. Even if we have large int8 value
> we can loose lower bits, but data distribution can be so that these
> bits are valuable. Wouldn't it better to have function like
> subtype_diff_float which returns difference between two values of
> subtype as an float? Using of such function could make penalty more
> sensible to even small difference between values, and accordingly more
> relevant.

The reason I did it that way is for unbounded ranges. With
subtype_diff_float, it's difficult for the GiST code to differentiate
between [10,) and [100000,), because infinity minus anything is
infinity. But when inserting the range [100,200), the penalty for the
first one should be zero and the second one should have some positive
penalty, right?

Maybe we can still use subtype_diff_float by calling it on various pairs
of bounds to come up with a reasonable cost?

I'm open to suggestion. I'd just like to make sure that unbounded ranges
are a part of the consideration.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-08 14:43:55
Message-ID: CAPpHfdsQrW74FpMe9ndb0PDddXGZNjvZvKJ-=hY0WMc0_Fh5tQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 8, 2011 at 1:01 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:
>
> > The first thing caught my eye in existing GiST code is idea of
> > subtype_float. float8 has limited precision and can't respresent, for
> > example, varlena values good enough. Even if we have large int8 value
> > we can loose lower bits, but data distribution can be so that these
> > bits are valuable. Wouldn't it better to have function like
> > subtype_diff_float which returns difference between two values of
> > subtype as an float? Using of such function could make penalty more
> > sensible to even small difference between values, and accordingly more
> > relevant.
>
> The reason I did it that way is for unbounded ranges. With
> subtype_diff_float, it's difficult for the GiST code to differentiate
> between [10,) and [100000,), because infinity minus anything is
> infinity. But when inserting the range [100,200), the penalty for the
> first one should be zero and the second one should have some positive
> penalty, right?
>
I meant that penalty can be determined as sum of difference of old and new
bounds of range, i.e. penalty = subtype_diff_float(new_lower, old_lower)
+ subtype_diff_float(old_upper, new_upper).
When we insert [100,200) into [10,+inf), union([100,200), [10,+inf))
= [10,+inf), so penalty = subtype_diff_float(10,10)
+ subtype_diff_float(+inf, +inf) = 0 + 0 = 0.
When we insert [100,200) into [100000,), union([100,200), [100000,+inf))
= [100,+inf), so penalty = subtype_diff_float(100,100000)
+ subtype_diff_float(+inf, +inf) = 99900 + 0 = 99900.

But, there are still the problem, when we'are inserting open interval when
there is no such open intervals yet. For example, we're going to insert
[0,+inf), while root page contains [0,10), [10,20), [20,30). Each penalty
will be infinity, while it seems to be better to insert it into [0,10). But,
it seems to me to be general limitation of current GiST interface, when we
have to express penalty in a single float.

------
With best regards,
Alexander Korotkov.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-08 16:27:53
Message-ID: 1318091273.1724.89.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2011-10-08 at 18:43 +0400, Alexander Korotkov wrote:

> I meant that penalty can be determined as sum of difference of old and
> new bounds of range, i.e. penalty = subtype_diff_float(new_lower,
> old_lower) + subtype_diff_float(old_upper, new_upper).
> When we insert [100,200) into [10,+inf), union([100,200), [10,+inf))
> = [10,+inf), so penalty = subtype_diff_float(10,10)
> + subtype_diff_float(+inf, +inf) = 0 + 0 = 0.
> When we insert [100,200) into [100000,), union([100,200), [100000,
> +inf)) = [100,+inf), so penalty = subtype_diff_float(100,100000)
> + subtype_diff_float(+inf, +inf) = 99900 + 0 = 99900.
>
OK, I like that. I will make the change.

> But, there are still the problem, when we'are inserting open interval
> when there is no such open intervals yet. For example, we're going to
> insert [0,+inf), while root page contains [0,10), [10,20), [20,30).
> Each penalty will be infinity, while it seems to be better to insert
> it into [0,10). But, it seems to me to be general limitation of
> current GiST interface, when we have to express penalty in a single
> float.

That seems like an acceptable limitation. I don't think my solution
handles it any better.

Regards,
Jeff Davis

>


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-16 21:43:05
Message-ID: 1318801385.10757.22.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:

> The first thing caught my eye in existing GiST code is idea of
> subtype_float. float8 has limited precision and can't respresent, for
> example, varlena values good enough. Even if we have large int8 value
> we can loose lower bits, but data distribution can be so that these
> bits are valuable. Wouldn't it better to have function like
> subtype_diff_float which returns difference between two values of
> subtype as an float? Using of such function could make penalty more
> sensible to even small difference between values, and accordingly more
> relevant.
>
I started implementing subtype_diff, and I noticed that it requires
defining an extra function for each range type. Previously, the numeric
types could just use a cast, which was convenient for user-defined range
types.

If you have any other ideas to make that cleaner, please let me know.
Otherwise I'll just finish implementing subtype_diff.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-17 08:38:12
Message-ID: 1318840692.10757.34.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2011-10-16 at 14:43 -0700, Jeff Davis wrote:
> On Fri, 2011-10-07 at 12:54 +0400, Alexander Korotkov wrote:
>
> > The first thing caught my eye in existing GiST code is idea of
> > subtype_float. float8 has limited precision and can't respresent, for
> > example, varlena values good enough. Even if we have large int8 value
> > we can loose lower bits, but data distribution can be so that these
> > bits are valuable. Wouldn't it better to have function like
> > subtype_diff_float which returns difference between two values of
> > subtype as an float? Using of such function could make penalty more
> > sensible to even small difference between values, and accordingly more
> > relevant.
> >
> I started implementing subtype_diff, and I noticed that it requires
> defining an extra function for each range type. Previously, the numeric
> types could just use a cast, which was convenient for user-defined range
> types.
>
> If you have any other ideas to make that cleaner, please let me know.
> Otherwise I'll just finish implementing subtype_diff.

I'm beginning to think that we should just allow the user to specify
their own gist_penalty function. Specifying just the subtype_diff
doesn't save much time, and it can only be limiting. Additionally, it's
harder for users to understand the purpose of the function.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-24 11:05:15
Message-ID: CAPpHfduORTpMLB=Lxi5eDEjx8mfvy2g7=wKyN83iXxKHW0P3+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On Mon, Oct 17, 2011 at 12:38 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> > I started implementing subtype_diff, and I noticed that it requires
> > defining an extra function for each range type. Previously, the numeric
> > types could just use a cast, which was convenient for user-defined range
> > types.
> >
> > If you have any other ideas to make that cleaner, please let me know.
> > Otherwise I'll just finish implementing subtype_diff.
>
I think implementing subtype_diff for each datatype is ok. We could
implement some universal function based on minus operator and casting to
double precision. But such solution might be unacceptable in both
*predictability
(operator and casting function might do not the things we expect) and
performance.*

I'm beginning to think that we should just allow the user to specify
> their own gist_penalty function. Specifying just the subtype_diff
> doesn't save much time, and it can only be limiting. Additionally, it's
> harder for users to understand the purpose of the function.
>
If we allow user to specify own gist_penalty function, then such function
should deal with:
1) GiST-specific data structures such as GISTENTRY.
2) Decomposing ranges using range_deserialize.
3) Inifinities, which we could handle in general penalty functions.
Thats why I prefere to implement subtype_diff.

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-10-25 18:48:05
Message-ID: CAPpHfdvBZEoDjdYKdV5nbB_P7Q5DUSRZS3HmZS4xb-Fc=7qieQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 24, 2011 at 3:05 PM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> If we allow user to specify own gist_penalty function, then such function
> should deal with:
> 1) GiST-specific data structures such as GISTENTRY.
> 2) Decomposing ranges using range_deserialize.
> 3) Inifinities, which we could handle in general penalty functions.
> Thats why I prefere to implement subtype_diff.
>
I forgot another agument for having subtype_diff:
4) In my picksplit algorithm it would be more natural to use subtype_diff
for measuring overlap than use penalty function.

------
With best regards,
Alexander Korotkov.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-01 04:33:32
Message-ID: 1320122012.23464.4.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-10-24 at 15:05 +0400, Alexander Korotkov wrote:

> I think implementing subtype_diff for each datatype is ok. We could
> implement some universal function based on minus operator and casting
> to double precision. But such solution might be unacceptable in
> both predictability (operator and casting function might do not the
> things we expect) and performance.
>
Done.

Everything is complete in this patch with the exception of two optional
things, which I still intend to do but might best be done in a separate
commit:

* support typmod for ranges
* support casts between different range types

Both of these things, I believe, require the introduction of an
RangeCoerceExpr, similar to ArrayCoerceExpr. That's fine, but it creates
a rather large diff, so it might be best left for a later commit.

Regards,
Jeff Davis

Attachment Content-Type Size
rangetypes-20111101.gz application/x-gzip 49.5 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-02 19:29:31
Message-ID: 4EB19A1B.3070205@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.11.2011 06:33, Jeff Davis wrote:
> On Mon, 2011-10-24 at 15:05 +0400, Alexander Korotkov wrote:
>
>> I think implementing subtype_diff for each datatype is ok. We could
>> implement some universal function based on minus operator and casting
>> to double precision. But such solution might be unacceptable in
>> both predictability (operator and casting function might do not the
>> things we expect) and performance.
>>
> Done.

Thanks, I'm looking into this now.

> + else if (lower1.infinite || upper1.infinite)
> + length1 = 1.0/0.0;

That seems wrong. I take it that the point is to set length1 to infinity?

PS. I note the docs still refer to subtype_float. I'll fix that before
committing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-02 19:48:22
Message-ID: 6217.1320263302@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> On 01.11.2011 06:33, Jeff Davis wrote:
>> + else if (lower1.infinite || upper1.infinite)
>> + length1 = 1.0/0.0;

> That seems wrong. I take it that the point is to set length1 to infinity?

Please use get_float[48]_infinity() or get_float[48]_nan(), as
appropriate (I think the latter may be intended here), rather than
making up your own way of getting those values.

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-02 20:59:49
Message-ID: 4EB1AF45.3070606@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01.11.2011 06:33, Jeff Davis wrote:
> On Mon, 2011-10-24 at 15:05 +0400, Alexander Korotkov wrote:
>
>> I think implementing subtype_diff for each datatype is ok. We could
>> implement some universal function based on minus operator and casting
>> to double precision. But such solution might be unacceptable in
>> both predictability (operator and casting function might do not the
>> things we expect) and performance.
>>
> Done.
>
> Everything is complete in this patch with the exception of two optional
> things, which I still intend to do but might best be done in a separate
> commit:
>
> * support typmod for ranges
> * support casts between different range types
>
> Both of these things, I believe, require the introduction of an
> RangeCoerceExpr, similar to ArrayCoerceExpr. That's fine, but it creates
> a rather large diff, so it might be best left for a later commit.

Using the test table from the rangetypes test case:

postgres=# select * from test_range_gist where 10 <@ ir;
ERROR: unsupported type: 3904

This seems to be coming from the selectivity estimation function. The
selectivity function for <@ is scalargtsel, which is usually used for
scalar > and >=. That doesn't seem right. But what do we store in the
statistics for range types in the first place, and what would be the
right thing to do for selectivity estimation?

I'll dig deeper into this tomorrow...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-02 21:01:26
Message-ID: 4EB1AFA6.4040904@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.11.2011 22:59, Heikki Linnakangas wrote:
> I'll dig deeper into this tomorrow...

Forgot to mention: I have pushed what I have done this far to my git
repository at git://git.postgresql.org/git/users/heikki/postgres.git, if
you want to take a look. Nothing major, just garden-variety cleanup.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 08:40:09
Message-ID: 1320309609.32341.24.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-11-02 at 21:29 +0200, Heikki Linnakangas wrote:
> > + else if (lower1.infinite || upper1.infinite)
> > + length1 = 1.0/0.0;
>
> That seems wrong. I take it that the point is to set length1 to infinity?

I reworked this in commit (on my private repo, of course):
6197fbffb00f729feba8082136801cdef5ac850e

For the archives, it's essentially taking the difference on the left
side of the range, and the difference on the right side of the range,
and adding them together. There are just a lot of special cases for
infinite boundaries, empty ranges, and the lack of a subtype_diff
function.

I think it's a little closer to what Alexander intended, which I think
is an improvement. It should now be able to recognize that expanding
[10,) into [0,) has a penalty of 10.

> PS. I note the docs still refer to subtype_float. I'll fix that before
> committing.

Thank you. The only change I found strange was the test that used \c to
reconnect; but I can't say that my solution was any better.

Regards,
Jeff Davis


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 08:42:29
Message-ID: 1320309749.32341.27.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-11-02 at 22:59 +0200, Heikki Linnakangas wrote:
> This seems to be coming from the selectivity estimation function. The
> selectivity function for <@ is scalargtsel, which is usually used for
> scalar > and >=. That doesn't seem right. But what do we store in the
> statistics for range types in the first place, and what would be the
> right thing to do for selectivity estimation?

I'll have to think more about that, and it depends on the operator. It
seems like an easier problem for "contains a point" than "contains
another range" or "overlaps with another range".

Right now I don't have a very good answer, and even for the "contains a
point" case I'll have to think about the representation in pg_statistic.

Regards,
Jeff Davis


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 11:59:56
Message-ID: 4EB2823C.4010701@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.11.2011 10:42, Jeff Davis wrote:
> On Wed, 2011-11-02 at 22:59 +0200, Heikki Linnakangas wrote:
>> This seems to be coming from the selectivity estimation function. The
>> selectivity function for<@ is scalargtsel, which is usually used for
>> scalar> and>=. That doesn't seem right. But what do we store in the
>> statistics for range types in the first place, and what would be the
>> right thing to do for selectivity estimation?
>
> I'll have to think more about that, and it depends on the operator. It
> seems like an easier problem for "contains a point" than "contains
> another range" or "overlaps with another range".
>
> Right now I don't have a very good answer, and even for the "contains a
> point" case I'll have to think about the representation in pg_statistic.

I've committed this now, after some more cleanup. I removed the
selectivity estimation functions from operators where they were bogus,
so writing those is a clear TODO. But that can well be done as a
separate patch.

Thanks!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 17:54:43
Message-ID: 2930549A-4802-4B60-AFBA-C3125A7B6847@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov 3, 2011, at 4:59 AM, Heikki Linnakangas wrote:

> I've committed this now, after some more cleanup. I removed the selectivity estimation functions from operators where they were bogus, so writing those is a clear TODO. But that can well be done as a separate patch.
>
> Thanks!

Woo! Congrats Jeff. Awesome news. Very excited about this feature. Thanks for getting this in, Heikki.

Best,

David


From: Florian Pflug <fgp(at)phlo(dot)org>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 18:20:58
Message-ID: 0CE6E428-7ED8-487B-93E8-34AF953C3887@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Nov3, 2011, at 18:54 , David E. Wheeler wrote:
> On Nov 3, 2011, at 4:59 AM, Heikki Linnakangas wrote:
>> I've committed this now, after some more cleanup. I removed the selectivity estimation functions from operators where they were bogus, so writing those is a clear TODO. But that can well be done as a separate patch.
>>
>> Thanks!
>
> Woo! Congrats Jeff. Awesome news. Very excited about this feature. Thanks for getting this in, Heikki.

+1. Great work, guys!

best regards,
Florian Pflug


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-03 20:31:18
Message-ID: CAPpHfduEyUp3jnBY69E+dEYQmuZv_UAtiL98ZbOPk2_C=UnTbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Nov 3, 2011 at 3:59 PM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I've committed this now, after some more cleanup. I removed the
> selectivity estimation functions from operators where they were bogus, so
> writing those is a clear TODO. But that can well be done as a separate
> patch.
>
Cool! Patch with GiST on range types improvements from me will be soon.

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-07 18:19:55
Message-ID: CAPpHfdsDRXz5KS1-67KatDzpmQfKYsiXp6X0edKEwxiBnfeczg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

During work on gist for range types I've faced with following problem:

test=# select 'empty'::int4range !?;
ERROR: operator does not exist: int4range !?
LINE 1: select 'empty'::int4range !?;
^
HINT: No operator matches the given name and argument type(s). You might
need to add explicit type casts.

test=# select 'empty'::int4range ?;
ERROR: operator does not exist: int4range ?
LINE 1: select 'empty'::int4range ?;
^
HINT: No operator matches the given name and argument type(s). You might
need to add explicit type casts.

So, !? and ? operators are mentioned in documentation, but don't present in
catalog. Are them just missed in the catalog or there is some more serious
problem?

------
With best regards,
Alexander Korotkov.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-07 18:36:23
Message-ID: 29235.1320690983@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> So, !? and ? operators are mentioned in documentation, but don't present in
> catalog. Are them just missed in the catalog or there is some more serious
> problem?

IIRC, Heikki removed them from the final commit. Sounds like he missed
some documentation.

regards, tom lane


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-07 19:34:51
Message-ID: CAPpHfdvH31CJ5rCm+g2UTtzDpOLFR9PDu9gkCcYx_ZruJQ-xrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

First version of GiST for range types patch is here. Comments & refactoring
& testing are coming soon.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.1.patch.gz application/x-gzip 8.2 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-08 07:48:06
Message-ID: 4EB8DEB6.2050203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.11.2011 20:36, Tom Lane wrote:
> Alexander Korotkov<aekorotkov(at)gmail(dot)com> writes:
>> So, !? and ? operators are mentioned in documentation, but don't present in
>> catalog. Are them just missed in the catalog or there is some more serious
>> problem?
>
> IIRC, Heikki removed them from the final commit. Sounds like he missed
> some documentation.

Yep. Fixed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-09 16:24:00
Message-ID: CAPpHfds5M21x4idTOO4gYyPEu3B0JBtUqX2srTDPa4dDNT3UyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

New version of GiST for range types patch is here. This version seems to be
complete and ready for review.

Main features of this patch are:
1) Different handling of empty ranges. Problem of "column <@ const"
acceleration is that all empty ranges satisfies to it. It's because empty
range can be anywhere in the tree, because empty range is contained in any
range. My solution is to use additional bit in range flags especially for
GiST. This bit means that internal range have underlying empty ranges. This
approach allows to accelerate "column <@ const" case and also fast search
for empty ranges.
2) Penalty and picksplit methods are trying not to mix ordinal ranges with
special cases ((-inf;x), (x; +inf), (-inf, +inf) and empty). Picksplit
divides ranges to classes (see get_gist_range_class for details). If
several classes are presented in input vector, split will be between
classes. If we have only one class, picksplit method splits ranges using
double sorting split for ordinal ranges, and simple sorting split for
(-inf,x) and (x,+inf) cases.

Questions:
1) I'm not sure about whether we need support of <> in GiST, because it
always produces full index scan (except search for non-empty ranges).
2) With no selectivity estimation functions we have no GiST index usage for
&&, <@, @>. Possible temporarty solution is to create lower constant
selectivity estimation like we have for geometric datatypes.

A little testing:
Let's create dataset with 1000 empty ranges, 1000 (-inf, x) ranges, 1000
(x, inf) ranges, 1000 (-inf, + inf) ranges and 1000000 ordinal ranges and
mix them.

create table temp (r int4range);
insert into temp (r) (select int4range() from generate_series(1,1000));
insert into temp (r) (select int4range(NULL, NULL) from
generate_series(1,1000));
insert into temp (r) (select int4range((random()*1000000)::int, NULL) from
generate_series(1,1000));
insert into temp (r) (select int4range(NULL,(random()*1000000)::int) from
generate_series(1,1000));
insert into temp (select int4range(v,v+1+(random()*100)::int) from (select
(random()*1000000)::int v from generate_series(1,1000000)) x);
create table range_test as (select * from temp order by random());
drop table temp;

There are some results without patch.

test=# create index range_idx on range_test using gist (r);
CREATE INDEX
Time: 174578,153 ms

test=# explain (analyze, buffers) select * from range_test where r @>
int4range(500000,500010);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=17659.03..29840.03 rows=502000
width=17) (actual time=87.961..116.074 rows=2028 loops=1)
Recheck Cond: (r @> '[500000,500010)'::int4range)
Buffers: shared hit=192 read=1856 written=886
-> Bitmap Index Scan on range_idx (cost=0.00..17533.53 rows=502000
width=0) (actual time=87.591..87.591 rows=2028 loops=1)
Index Cond: (r @> '[500000,500010)'::int4range)
Buffers: shared hit=185 read=139
Total runtime: 118.804 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r <@
int4range(500000,501000);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=17659.03..29840.03 rows=502000
width=17) (actual time=1725.249..1744.327 rows=1994 loops=1)
Recheck Cond: (r <@ '[500000,501000)'::int4range)
Buffers: shared hit=3 read=8604
-> Bitmap Index Scan on range_idx (cost=0.00..17533.53 rows=502000
width=0) (actual time=1724.834..1724.834 rows=1994 loops=1)
Index Cond: (r <@ '[500000,501000)'::int4range)
Buffers: shared hit=3 read=6880
Total runtime: 1747.007 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r &&
int4range(500000,501000);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=17659.03..29840.03 rows=502000
width=17) (actual time=88.196..105.423 rows=3082 loops=1)
Recheck Cond: (r && '[500000,501000)'::int4range)
Buffers: shared hit=1183 read=1545
-> Bitmap Index Scan on range_idx (cost=0.00..17533.53 rows=502000
width=0) (actual time=87.638..87.638 rows=3082 loops=1)
Index Cond: (r && '[500000,501000)'::int4range)
Buffers: shared hit=44 read=286
Total runtime: 109.486 ms
(7 rows)

And results for same queries with patch.

test=# create index range_idx on range_test using gist (r);
CREATE INDEX
Time: 79220,888 ms

test=# explain (analyze, buffers) select * from range_test where r @>
int4range(500000,500010);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=15954.99..28135.99 rows=502000
width=36) (actual time=9.699..36.352 rows=2028 loops=1)
Recheck Cond: (r @> '[500000,500010)'::int4range)
Buffers: shared hit=8 read=1734 written=556
-> Bitmap Index Scan on range_idx (cost=0.00..15829.49 rows=502000
width=0) (actual time=9.297..9.297 rows=2028 loops=1)
Index Cond: (r @> '[500000,500010)'::int4range)
Buffers: shared hit=8 read=10
Total runtime: 39.229 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r <@
int4range(500000,501000);
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=15954.99..28135.99 rows=502000
width=17) (actual time=15.886..25.571 rows=1994 loops=1)
Recheck Cond: (r <@ '[500000,501000)'::int4range)
Buffers: shared hit=1199 read=554
-> Bitmap Index Scan on range_idx (cost=0.00..15829.49 rows=502000
width=0) (actual time=15.518..15.518 rows=1994 loops=1)
Index Cond: (r <@ '[500000,501000)'::int4range)
Buffers: shared hit=23 read=6
Total runtime: 28.243 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where r &&
int4range(500000,501000);
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=15954.99..28135.99 rows=502000
width=17) (actual time=7.845..19.686 rows=3082 loops=1)
Recheck Cond: (r && '[500000,501000)'::int4range)
Buffers: shared hit=2389 read=33
-> Bitmap Index Scan on range_idx (cost=0.00..15829.49 rows=502000
width=0) (actual time=7.282..7.282 rows=3082 loops=1)
Index Cond: (r && '[500000,501000)'::int4range)
Buffers: shared hit=24
Total runtime: 23.728 ms
(7 rows)

We can see pretty dramatical acceleration. Also index build becomes faster,
actually I don't know why :)

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.2.patch.zip application/zip 10.6 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-26 07:11:13
Message-ID: 1322291473.3439.2.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-11-09 at 20:24 +0400, Alexander Korotkov wrote:
> New version of GiST for range types patch is here. This version seems
> to be complete and ready for review.
>
There's been some significant change in rangetypes_gist.c, can you
please rebase this patch?

I like the patch conceptually, though I'm still working through the
details.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-27 12:28:19
Message-ID: CAPpHfdtstaqm5OwJk1BpugfrgD8WcwEt=Ri2S5Nuo3xmtWQxQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> There's been some significant change in rangetypes_gist.c, can you
> please rebase this patch?
>
OK, rebased with head.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.3.patch.gz application/x-gzip 11.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-27 18:43:37
Message-ID: 4870.1322419417@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Sat, Nov 26, 2011 at 11:11 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> There's been some significant change in rangetypes_gist.c, can you
>> please rebase this patch?

> OK, rebased with head.

I looked at this patch a bit. I agree with the aspect of it that says
"let's add a flag bit so we can tell whether an upper GiST item includes
any empty ranges"; I think we really need that in order to make
contained_by searches usable. However, I'm not so happy with the
proposed rewrite of the penalty/picksplit functions. I see two problems
there:

1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
values obtained from subtype_diff. This is not good, because you have
no idea what scale the subtype differences will be expressed on. The
hard-wired values could be greatly larger than range widths, or greatly
smaller, resulting in randomly different index behavior.

2. It's too large/complicated. You're proposing to add nearly a
thousand lines to rangetypes_gist.c, and I do not see any reason to
think that this is so much better than what's there now as to justify
that kind of increment in the code size. I saw your performance
results, but one set of results on an arbitrary (not-real-world) test
case doesn't prove a lot to me; and in particular it doesn't prove that
we couldn't do as well with a much smaller and simpler patch.

There are a lot of garden-variety coding problems, too, for instance here:

+ *penalty = Max(DatumGetFloat8(FunctionCall2(
+ subtype_diff, orig_lower.val, new_lower.val)), 0.0);

which is going to uselessly call the subtype_diff function twice most of
the time (Max() is only a macro), plus you left off the collation
argument. But I don't think it's worth worrying about those until the
big picture is correct, which I feel it isn't yet.

Earlier in the thread you wrote:

> Questions:
> 1) I'm not sure about whether we need support of <> in GiST, because it
> always produces full index scan (except search for non-empty ranges).

I was thinking the same thing; that opclass entry seems pretty darn
useless.

I propose to pull out and apply the changes related to the
RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
because I think these are uncontroversial and in the nature of
"must fix quickly". The redesign of the penalty and picksplit
functions should be discussed separately.

regards, tom lane


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-27 19:11:28
Message-ID: CAPpHfdvkC23+hTzw+E7LwG23r0dbCo-zT8zeYa47ycbo-XmqCw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> 1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
> values obtained from subtype_diff. This is not good, because you have
> no idea what scale the subtype differences will be expressed on. The
> hard-wired values could be greatly larger than range widths, or greatly
> smaller, resulting in randomly different index behavior.
>
Current GiST code only compare penalty values of inserting same tuple. And
don't see why it may alters. So, values obtained from subtype_diff
and hard-wired values would be never compared each other.

> 2. It's too large/complicated. You're proposing to add nearly a
> thousand lines to rangetypes_gist.c, and I do not see any reason to
> think that this is so much better than what's there now as to justify
> that kind of increment in the code size. I saw your performance
> results, but one set of results on an arbitrary (not-real-world) test
> case doesn't prove a lot to me; and in particular it doesn't prove that
> we couldn't do as well with a much smaller and simpler patch.
>
I've tested double sorting split algorithm itself pretty much on synthetic
datasets. See paper for details. Strategy of separation of different
classes of ranges really need more testing. But obtaining large enough
real-life datasets is pretty *problematic for me.*

> There are a lot of garden-variety coding problems, too, for instance here:
>
> + *penalty = Max(DatumGetFloat8(FunctionCall2(
> + subtype_diff, orig_lower.val, new_lower.val)), 0.0);
>
> which is going to uselessly call the subtype_diff function twice most of
> the time (Max() is only a macro), plus you left off the collation
> argument. But I don't think it's worth worrying about those until the
> big picture is correct, which I feel it isn't yet.
>
Oh, I see. It will be fixed.

I propose to pull out and apply the changes related to the
> RANGE_CONTAIN_EMPTY flag, and also remove the <> opclass entry,
> because I think these are uncontroversial and in the nature of
> "must fix quickly". The redesign of the penalty and picksplit
> functions should be discussed separately.
>
I think the same.

------
With best regards,
Alexander Korotkov.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-28 00:00:02
Message-ID: 21060.1322438402@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Sun, Nov 27, 2011 at 10:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> 1. penalty is using both hard-wired penalty values (1.0, 2.0, etc) and
>> values obtained from subtype_diff. This is not good, because you have
>> no idea what scale the subtype differences will be expressed on. The
>> hard-wired values could be greatly larger than range widths, or greatly
>> smaller, resulting in randomly different index behavior.

> Current GiST code only compare penalty values of inserting same tuple. And
> don't see why it may alters. So, values obtained from subtype_diff
> and hard-wired values would be never compared each other.

I see your point that we only need the penalty values to be comparable
for the same "new" value, but I don't think that really answers my
objection, because you've had to lobotomize the logic. As an example,
if we have a new empty range to insert, and all the existing root-page
entries are ordinary finite ranges, this code will throw up its hands
and give them all the same 4.0 penalty value. Surely it would be better
to attempt to pick the smallest (narrowest) existing range. But to do
that, you have to pay attention to the subdiff value.

regards, tom lane


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-11-28 07:24:27
Message-ID: CAPpHfdvv1g6J30iMDpxnMD_BCVPzCaxZPR3xryBhjhxKMGKppw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 28, 2011 at 3:00 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I see your point that we only need the penalty values to be comparable
> for the same "new" value, but I don't think that really answers my
> objection, because you've had to lobotomize the logic. As an example,
> if we have a new empty range to insert, and all the existing root-page
> entries are ordinary finite ranges, this code will throw up its hands
> and give them all the same 4.0 penalty value. Surely it would be better
> to attempt to pick the smallest (narrowest) existing range. But to do
> that, you have to pay attention to the subdiff value.
>
I believe it's a problem of the current GiST interface. If using subdiff
value as an penalty for insertion of empty range, we have to return 0
penalty for any entry with RANGE_CONTAIN_EMPTY flag. And for plain empty
entry too without any chance to define priority between them. In my opinion
solution is that penalty function should return vector of floats instead of
single float. With current GiST interface we have to do will solution of
handling some cases better and some cases worse. For example, GiST for
boxes also suffers from interface limitation. In many papers I met
recommendation to choose smallest box from boxes with same extention (it's
not a rare situation to have multiple boxes with zero extention) for tuple
insertion. But with current interface, we can't implement it.

------
With best regards,
Alexander Korotkov.


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org, Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-02 11:48:59
Message-ID: CAPpHfdt6cr+s6CATtyKiS3MbwThGn4OL5H3gqNuf8O1O_joo+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Rebased with head.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.4.patch.gz application/x-gzip 10.3 KB

From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-10 14:14:09
Message-ID: 4EE36931.9040207@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/02/2011 06:48 AM, Alexander Korotkov wrote:
> Rebased with head.

Could you comment a little more on what changed? There were a couple of
areas Tom commented on:

-General code fixes
-"pull out and apply the changes related to the RANGE_CONTAIN_EMPTY
flag, and also remove the <> opclass entry"
-Subdiff issues

The third one sounded hard to deal with, so presumably nothing there.
I'm not sure if your updated rebase addresses either of those first two
yet or not, or if it was just fixing bitrot from upstream code changes.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-12 18:41:10
Message-ID: 1323715270.8276.14.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2011-12-02 at 15:48 +0400, Alexander Korotkov wrote:
> Rebased with head.
>
Thank you. I have attached a patch that's mostly just cleanup to this
one.

Comments:

* You use the term "ordinal range" quite a lot, which I haven't heard
before. Is that a mathematical term, or do you mean something more like
"ordinary"?

* There's a lot of code for range_gist_penalty. Rather than having
special cases for all combinations of properties in the new an original,
is it possible to use something a little simpler? Maybe just start the
penalty at zero, and add something for each property of the predicate
range that must be changed. The penalties added might vary, e.g., if the
original range has an infinite lower bound, changing it to have an
infinite upper bound might be a higher penalty.

* It looks like LIMIT_RATIO is not always considered. Should it be?

* You defined get/set_range_contain_empty, but didn't use them. I think
this was a merge error, but I removed them. So now there are no changes
in rangetypes.c.

Regards,
Jeff Davis

Attachment Content-Type Size
range_gist_cleanup.patch.gz application/x-gzip 6.2 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-13 21:04:21
Message-ID: CAPpHfdvxepwsCPiVZW72RYw0NXG=fot6GYoGZsq68UbU=WixTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

On Mon, Dec 12, 2011 at 10:41 PM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> Thank you. I have attached a patch that's mostly just cleanup to this
> one.
>
Thanks a lot for cleanup. Path with applied cleanup is attached.

> Comments:
>
> * You use the term "ordinal range" quite a lot, which I haven't heard
> before. Is that a mathematical term, or do you mean something more like
> "ordinary"?
>
Actually I meant "ordinal" range to be finite, non-empty and
non-contain-empty range. It's not mathematical term. Probably there is some
better word for that, but my english is not strong enough :).

> * There's a lot of code for range_gist_penalty. Rather than having
> special cases for all combinations of properties in the new an original,
> is it possible to use something a little simpler? Maybe just start the
> penalty at zero, and add something for each property of the predicate
> range that must be changed. The penalties added might vary, e.g., if the
> original range has an infinite lower bound, changing it to have an
> infinite upper bound might be a higher penalty.
>
I belive it's possible to make it simplier. I've coded quite intuitively.
Probably, we should select some representive datasets in order to determine
which logic is reasonable by tests.

* It looks like LIMIT_RATIO is not always considered. Should it be?
>
Yes, it's so. In this part I repeat logic of GiST with NULLs. It makes
NULLs to be separated from non-NULLs even if it's produce worse ratio. I'm
not sure about how it should be. It seems to be tradeoff between having
some excess pages and having slightly worse tree.

> * You defined get/set_range_contain_empty, but didn't use them. I think
> this was a merge error, but I removed them. So now there are no changes
> in rangetypes.c.
>
Ok, thanks.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.5.patch.gz application/x-gzip 9.8 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-13 21:11:17
Message-ID: CAPpHfdusUvNn6MHSWkUwPtrCFyX+N7QCsH0bVWm=mHvQ5kEpSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 10, 2011 at 6:14 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> On 12/02/2011 06:48 AM, Alexander Korotkov wrote:
>
>> Rebased with head.
>>
>
> Could you comment a little more on what changed? There were a couple of
> areas Tom commented on:
>
> -General code fixes
>
Expensibe usage of "Max" macro is fixed in 0.5 version of patch.

-"pull out and apply the changes related to the RANGE_CONTAIN_EMPTY flag,
> and also remove the <> opclass entry"
>
It's already done by Tom.

-Subdiff issues
>
> The third one sounded hard to deal with, so presumably nothing there.

As I wrote before, I believe there is some limitation of current GiST
interface. Most likely we're not going to change GiST interface now and
have to do will solution of tradeoff. I think good way to do it is to
select representive datasets and do some tests which will show which logic
is more reasonable. Actually, I need some help with that, because I don't
have enough of datasets.

------
With best regards,
Alexander Korotkov.


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-16 19:36:03
Message-ID: 4EEB9DA3.6000300@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/13/2011 04:04 PM, Alexander Korotkov wrote:
> On Mon, Dec 12, 2011 at 10:41 PM, Jeff Davis <pgsql(at)j-davis(dot)com
> <mailto:pgsql(at)j-davis(dot)com>> wrote:
>
> * There's a lot of code for range_gist_penalty. Rather than having
> special cases for all combinations of properties in the new an
> original,
> is it possible to use something a little simpler? Maybe just start the
> penalty at zero, and add something for each property of the predicate
> range that must be changed. The penalties added might vary, e.g.,
> if the
> original range has an infinite lower bound, changing it to have an
> infinite upper bound might be a higher penalty.
>
> I belive it's possible to make it simplier. I've coded quite
> intuitively. Probably, we should select some representive datasets in
> order to determine which logic is reasonable by tests.

That seems to be a sticking point; you mentioned before that finding
larger data sets useful for your purposes was hard.

I'm not sure where you'll find data fitting your needs here, but it
seems difficult to validate all of what you've done so far without it.
I'm going to mark this one returned and hope you can dig up something
useful to nail this down. You might also describe what it is you're
looking for better and see if anyone else has a suggestion.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-20 09:22:41
Message-ID: CAPpHfduBWP15Ufc6F_W=+VYxQmBdSdXbjz3EygM-Uc-5i60W8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

Studying this question little more I found that current approach of range
indexing can be dramatically inefficient in some cases. It's not because of
penalty or split implementation, but because of approach itself. Mapping
intervals to two-dimensional space produce much better results in case of
high-overlapping ranges and "@>", "<@" operators with low selectivity.

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select
(random()*10000)::int as l, (random()*1000 + 1)::int s from
generate_series(1,1000000) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x <@
int4range(5000,5010);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32)
(actual time=37.304..37.310 rows=2 loops=1)
Recheck Cond: (x <@ '[5000,5010)'::int4range)
Buffers: shared hit=767
-> Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000
width=0) (actual time=37.288..37.288 rows=2 loops=1)
Index Cond: (x <@ '[5000,5010)'::int4range)
Buffers: shared hit=765
Total runtime: 37.385 ms
(7 rows)

test=# explain (analyze, buffers) select * from point_test where x <@
box(point(5000,5000),point(5010,5010));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.197..0.206 rows=2 loops=1)
Recheck Cond: (x <@ '(5010,5010),(5000,5000)'::box)
Buffers: shared hit=5
-> Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000
width=0) (actual time=0.182..0.182 rows=2 loops=1)
Index Cond: (x <@ '(5010,5010),(5000,5000)'::box)
Buffers: shared hit=3
Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @>
int4range(5000,5990);
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32)
(actual time=4.578..4.603
rows=5 loops=1)
Recheck Cond: (x @> '[5000,5990)'::int4range)
Buffers: shared hit=52
-> Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000
width=0) (actual time=4.561..4.561 rows=5 loops=1)
Index Cond: (x @> '[5000,5990)'::int4range)
Buffers: shared hit=47
Total runtime: 4.669 ms
(7 rows)

test=# explain (analyze, buffers) select * from point_test where x <@
box(point('-inf'::float,5990),point(5000,'+inf'::float));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.328..0.353 rows=5 loops=1)
Recheck Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
Buffers: shared hit=8
-> Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000
width=0) (actual time=0.312..0.312 rows=5 loops=1)
Index Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
Buffers: shared hit=3
Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from
here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-----
With best regards,
Alexander Korotkov.


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-22 07:46:46
Message-ID: 1324540006.7608.79.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-12-14 at 01:04 +0400, Alexander Korotkov wrote:
> Hi!
>
Thank you! Attached a few changes:

* Change "ordinal" to "normal" for clarity (at least to me).
* Some comment cleanup
* Change classes_groups to be an enum of SPLIT_LEFT and SPLIT_RIGHT,
rather than using 1 and 2.
* Changed the "bounds_lower" and "bounds_upper" variables into
"by_lower" and "by_upper" to indicate that the arrays are distinguished
by sort order. It was confusing me to read it otherwise.

A few comments:

* In range_gist_picksplit, it would be nice to have a little bit more
intuitive description of what's going on with the nonEmptyCount and
nonInfCount numbers. For instance, it appears to depend on the fact that
a range must either be in nonEmptyCount or in nonInfCount. Also, can you
describe the reason you're multiplying by two and taking the absolute
value? It seems to work, but I am missing the intuition behind those
operations.

* The penalty function is fairly hard to read still. At a high level, I
think we're trying to accomplish a few things (in order from most to
least important):
- Keep normal ranges separate.
- Avoid broadening the class of the original predicate (e.g. turning
single-infinite into double-infinite).
- Avoid broadening (as determined by subtype_diff) the original
predicate.
- Favor adding ranges to narrower original predicates.

Do you agree? If so, perhaps we can attack those more directly and it
might be a little more readable.

Additionally, the arbitrary numbers might become a problem. Can we
choose better constants there? They would still be arbitrary when
compared with real numbers derived from subtype_diff, but maybe we can
still do better than what's there.

* Regarding the leftover "common" entries that can go to either side:
what is the "delta" measure trying to accomplish? When I work through
some examples, it seems to favor putting larger common ranges on the
left (small delta) and smaller common ranges on the right (smaller
delta). Why is that good? Or did I misread the code?

Intuitively, I would think that we'd want to push the ranges with lower
upper bounds to the left and higher lower bounds to the right -- in
other words, recurse. Obviously, we'd need to make sure it terminated at
some point, but splitting the common entries does seem like a smaller
version of the original problem. Thoughts?

Thank you for the helpful comments! It took me a while to work through
the logic, but I would have been lost completely without the comments
around the double sorting split.

Regards,
Jeff Davis

Attachment Content-Type Size
range_gist_changes.diff text/x-patch 22.2 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-22 07:52:18
Message-ID: 1324540338.7608.85.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-12-20 at 13:22 +0400, Alexander Korotkov wrote:
> Hi!
>
>
> Studying this question little more I found that current approach of
> range indexing can be dramatically inefficient in some cases. It's not
> because of penalty or split implementation, but because of approach
> itself. Mapping intervals to two-dimensional space produce much better
> results in case of high-overlapping ranges and "@>", "<@" operators
> with low selectivity.
>
Thank you for testing this. I agree that your approach is much better
especially dealing with widely varying range sizes, etc. My approach
really only tackled the simple (and hopefully common) case when the
ranges are about the same size.

Regards,
Jeff Davis


From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2012-01-24 12:07:15
Message-ID: CAPpHfduv5bwwXgdErZvneX4bT0hDb6MLznk63+Ht9vGKBE2n1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

New version of patch is attached.
On Thu, Dec 22, 2011 at 11:46 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> A few comments:
>
> * In range_gist_picksplit, it would be nice to have a little bit more
> intuitive description of what's going on with the nonEmptyCount and
> nonInfCount numbers. For instance, it appears to depend on the fact that
> a range must either be in nonEmptyCount or in nonInfCount. Also, can you
> describe the reason you're multiplying by two and taking the absolute
> value? It seems to work, but I am missing the intuition behind those
> operations.
>
total_count - 2*nonEmptyCount = (total_count - nonEmptyCount) -
nonEmptyCount = emptyCount - nonEmptyCount
So, it's really not evident. I've simplified it.

> * The penalty function is fairly hard to read still. At a high level, I
> think we're trying to accomplish a few things (in order from most to
> least important):
> - Keep normal ranges separate.
> - Avoid broadening the class of the original predicate (e.g. turning
> single-infinite into double-infinite).
> - Avoid broadening (as determined by subtype_diff) the original
> predicate.
> - Favor adding ranges to narrower original predicates.
>
> Do you agree? If so, perhaps we can attack those more directly and it
> might be a little more readable.
>
> Additionally, the arbitrary numbers might become a problem. Can we
> choose better constants there? They would still be arbitrary when
> compared with real numbers derived from subtype_diff, but maybe we can
> still do better than what's there.
>
I've changes some comments and add constants for penalty values.

> * Regarding the leftover "common" entries that can go to either side:
> what is the "delta" measure trying to accomplish? When I work through
> some examples, it seems to favor putting larger common ranges on the
> left (small delta) and smaller common ranges on the right (smaller
> delta). Why is that good? Or did I misread the code?
>
> Intuitively, I would think that we'd want to push the ranges with lower
> upper bounds to the left and higher lower bounds to the right -- in
> other words, recurse. Obviously, we'd need to make sure it terminated at
> some point, but splitting the common entries does seem like a smaller
> version of the original problem. Thoughts?
>
That was a bug. Actually, no "abs" is needed. Indeed it doesn't affect
result significantly.

-----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.6.patch.gz application/x-gzip 10.2 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2012-01-29 21:39:48
Message-ID: 1327873188.19000.17.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-01-24 at 16:07 +0400, Alexander Korotkov wrote:
> Hi!
>
>
> New version of patch is attached.

Thank you for the updates. I have a small patch attached.

The only code change I made was very minor: I changed the constants used
in the penalty function because your version used INFINITE_BOUND_PENALTY
when adding an empty range, and that didn't quite make sense to me. If
I'm mistaken you can leave it as-is.

I also attached range-gist-test.sql, which I used for a performance
test. I mix various types of ranges together in a larger table of 1.1M
tuples. And then I create a smaller table that only contains normal
ranges and empty ranges. There are two tests:
1. Create an index on the big table
2. Do a "range join" (using "overlaps" rather than "equals") where the
smaller table is on the outer side of a nested loop join and an index
scan over the larger table on the inner.

The index creation time reduces by a small amount with the patch, from
around 16s without the patch to around 13s with the patch. The query
time, however, dropped from around 26s to around 14s! Almost 2x speedup
with the patch!

Moreover, looking at the loop timing in the explain analyze output, it
goes from about "7..24" ms per loop down to about "1.5..13" ms per loop.
That seems to indicate that the index distribution is better, with more
queries returning quickly.

So, great work Alexander! Very convincing results.

Marking "ready for committer", but please apply my comment fixes at your
discretion.

Regards,
Jeff Davis

PS: the test was run on my workstation (Intel(R) Core(TM) i7-2600 CPU @
3.40GHz) with work_mem=512MB, shared_buffers=512MB, and
checkpoint_segments=32. The rest of the settings were default.

Attachment Content-Type Size
range-gist-comments.patch.gz application/x-gzip 1.2 KB
range-gist-test.sql text/x-sql 1.8 KB

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2012-01-30 07:31:49
Message-ID: CAPpHfdvvpBCUyEjUyRZHPMLqq-XHNgbc+BdBc6sp0hNLyCjRVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 30, 2012 at 1:39 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> Thank you for the updates. I have a small patch attached.
>
> The only code change I made was very minor: I changed the constants used
> in the penalty function because your version used INFINITE_BOUND_PENALTY
> when adding an empty range, and that didn't quite make sense to me. If
> I'm mistaken you can leave it as-is.
>
> I also attached range-gist-test.sql, which I used for a performance
> test. I mix various types of ranges together in a larger table of 1.1M
> tuples. And then I create a smaller table that only contains normal
> ranges and empty ranges. There are two tests:
> 1. Create an index on the big table
> 2. Do a "range join" (using "overlaps" rather than "equals") where the
> smaller table is on the outer side of a nested loop join and an index
> scan over the larger table on the inner.
>
> The index creation time reduces by a small amount with the patch, from
> around 16s without the patch to around 13s with the patch. The query
> time, however, dropped from around 26s to around 14s! Almost 2x speedup
> with the patch!
>
> Moreover, looking at the loop timing in the explain analyze output, it
> goes from about "7..24" ms per loop down to about "1.5..13" ms per loop.
> That seems to indicate that the index distribution is better, with more
> queries returning quickly.
>
> So, great work Alexander! Very convincing results.
>
Great! Thank you for reviewing this patch!

> Marking "ready for committer", but please apply my comment fixes at your
> discretion.
>
Patch with your comment fixes is attached.

-----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
rangetypegist-0.7.patch.gz application/x-gzip 10.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2012-03-05 03:52:30
Message-ID: 18459.1330919550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Mon, Jan 30, 2012 at 1:39 AM, Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> Marking "ready for committer", but please apply my comment fixes at your
>> discretion.

> Patch with your comment fixes is attached.

Applied with revisions, some cosmetic, some not so much.

regards, tom lane