Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, tgl(at)sss(dot)pgh(dot)pa(dot)us
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: GiST for range types (was Re: Range Types - typo + NULL string constructor)
Date: 2011-12-20 09:22:41
Message-ID: CAPpHfduBWP15Ufc6F_W=+VYxQmBdSdXbjz3EygM-Uc-5i60W8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Studying this question little more I found that current approach of range
indexing can be dramatically inefficient in some cases. It's not because of
penalty or split implementation, but because of approach itself. Mapping
intervals to two-dimensional space produce much better results in case of
high-overlapping ranges and "@>", "<@" operators with low selectivity.

There is a simple test case for proof of concept.

create table source as (select l, (l + s) as r from (select
(random()*10000)::int as l, (random()*1000 + 1)::int s from
generate_series(1,1000000) g) x);
create table range_test as (select int4range(l,r) as x from source);
create table point_test as (select point(l,r) as x from source);
create index range_test_idx on range_test using gist (x);
create index point_test_idx on point_test using gist (x);

test=# explain (analyze, buffers) select * from range_test where x <@
int4range(5000,5010);
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32)
(actual time=37.304..37.310 rows=2 loops=1)
Recheck Cond: (x <@ '[5000,5010)'::int4range)
Buffers: shared hit=767
-> Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000
width=0) (actual time=37.288..37.288 rows=2 loops=1)
Index Cond: (x <@ '[5000,5010)'::int4range)
Buffers: shared hit=765
Total runtime: 37.385 ms
(7 rows)

test=# explain (analyze, buffers) select * from point_test where x <@
box(point(5000,5000),point(5010,5010));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.197..0.206 rows=2 loops=1)
Recheck Cond: (x <@ '(5010,5010),(5000,5000)'::box)
Buffers: shared hit=5
-> Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000
width=0) (actual time=0.182..0.182 rows=2 loops=1)
Index Cond: (x <@ '(5010,5010),(5000,5000)'::box)
Buffers: shared hit=3
Total runtime: 0.265 ms
(7 rows)

test=# explain (analyze, buffers) select * from range_test where x @>
int4range(5000,5990);
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on range_test (cost=40.31..2585.65 rows=1000 width=32)
(actual time=4.578..4.603
rows=5 loops=1)
Recheck Cond: (x @> '[5000,5990)'::int4range)
Buffers: shared hit=52
-> Bitmap Index Scan on range_test_idx (cost=0.00..40.06 rows=1000
width=0) (actual time=4.561..4.561 rows=5 loops=1)
Index Cond: (x @> '[5000,5990)'::int4range)
Buffers: shared hit=47
Total runtime: 4.669 ms
(7 rows)

test=# explain (analyze, buffers) select * from point_test where x <@
box(point('-inf'::float,5990),point(5000,'+inf'::float));
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on point_test (cost=44.36..2589.69 rows=1000 width=16)
(actual time=0.328..0.353 rows=5 loops=1)
Recheck Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
Buffers: shared hit=8
-> Bitmap Index Scan on point_test_idx (cost=0.00..44.11 rows=1000
width=0) (actual time=0.312..0.312 rows=5 loops=1)
Index Cond: (x <@ '(5000,inf),(-inf,5990)'::box)
Buffers: shared hit=3
Total runtime: 0.419 ms
(7 rows)

If you like to learn more information about such mapping you can start from
here: http://www.comsis.org/ComSIS/Vol7No4/RegularPapers/paper16.pdf

Any thoughts?

-----
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-12-20 10:13:27 Re: JSON for PG 9.2
Previous Message Nikhil Sontakke 2011-12-20 06:14:29 Re: Review: Non-inheritable check constraints