Re: overlaps performance

Lists: pgsql-hackers
From: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>
To: pgsql-hackers(at)postgresql(dot)org
Subject: overlaps performance
Date: 2008-07-21 11:08:50
Message-ID: 48846E42.4050404@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hey guys,

I am asking here, because I know there is bunch of people here that know
the topic very well.
I need to use few 'overlaps' for timedate in my query. And I guess it is
quite inefficient there. So my question would be, why isn't postgresql
using indexes for OVERLAPS, and why optimizer doesn't substitute it with
something like:

(c <= a AND d > a) OR ( c >= a AND c < b)

instead of

(a,b) overlaps (c,d)

any corner cases, or particular reasons ?

(source of example)
http://www.depesz.com/index.php/2007/11/21/find-overlapping-time-ranges/

thanks.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: overlaps performance
Date: 2008-07-21 13:46:55
Message-ID: 877ibf73jk.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl> writes:

> So my question would be, why isn't postgresql using indexes for OVERLAPS,
> and why optimizer doesn't substitute it with something like:
>
> (c <= a AND d > a) OR ( c >= a AND c < b)

How would you use an index for that?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!


From: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: overlaps performance
Date: 2008-07-21 14:29:09
Message-ID: 48849D35.1090100@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark pisze:
> Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl> writes:
>
>> So my question would be, why isn't postgresql using indexes for OVERLAPS,
>> and why optimizer doesn't substitute it with something like:
>>
>> (c <= a AND d > a) OR ( c >= a AND c < b)
>
> How would you use an index for that?
>
check Depesz'es article for that. I included it at the bottom of my email.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: overlaps performance
Date: 2008-07-21 14:44:12
Message-ID: 13304.1216651452@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Grzegorz Jakiewicz <gj(at)pointblue(dot)com(dot)pl> writes:
>> So my question would be, why isn't postgresql using indexes for OVERLAPS,
>> and why optimizer doesn't substitute it with something like:
>>
>> (c <= a AND d > a) OR ( c >= a AND c < b)

> How would you use an index for that?

I believe you can index overlaps-like tests using GIST on an
interval-like data type --- look at contrib/seg for an example.

The reason we don't automatically translate OVERLAPS is that the spec's
definition of OVERLAPS is too weird for that to work; in particular
it demands a true result for some cases in which one of the four
endpoints is NULL, which'd be pretty hard to do with an interval-style
index.

regards, tom lane


From: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: overlaps performance
Date: 2008-07-21 15:09:28
Message-ID: 4884A6A8.90401@pointblue.com.pl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane pisze:
> The reason we don't automatically translate OVERLAPS is that the spec's
> definition of OVERLAPS is too weird for that to work; in particular
> it demands a true result for some cases in which one of the four
> endpoints is NULL, which'd be pretty hard to do with an interval-style
> index.
shame, I just work on a thing that would benefit from index that could
be used in OVERLAPS. I don't know psql internals , except for how GiST
works, hence my question.

thanks for the answer.


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: overlaps performance
Date: 2008-07-22 09:15:45
Message-ID: 87wsje2sam.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Grzegorz Jaśkiewicz <gj(at)pointblue(dot)com(dot)pl> writes:

> Tom Lane pisze:
>> The reason we don't automatically translate OVERLAPS is that the spec's
>> definition of OVERLAPS is too weird for that to work; in particular
>> it demands a true result for some cases in which one of the four
>> endpoints is NULL, which'd be pretty hard to do with an interval-style
>> index.
>
> shame, I just work on a thing that would benefit from index that could be used
> in OVERLAPS. I don't know psql internals , except for how GiST works, hence my
> question.

Ah, but the transformation given is actually a bit of a red herring. If you
look at the plan it's doing two bitmap index scans which together are actually
effectively doing a full index scan. The benefit comes from applying the full
overlap condition to the index tuples and only scanning the heap for matching
tuples. Presumably this index is much smaller than the table and/or cached in
memory so the random accesses are outweighed by the lower i/o.

This does raise the possibility that we should check for index scan paths if
we have selective enough columns even if the pathkeys aren't a prefix of the
index pathkeys. We would have to do a full index scan but the cost might still
be lower.

I think the reason we don't (aside from it not being at all useful in he past)
is that it would lead to a lot of possible index scans being considered.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!