Fix for gistchoose

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Fix for gistchoose
Date: 2012-08-20 16:57:20
Message-ID: CAPpHfdunMMv3qUv-MXTzVwWP4L9mc-A9We_OKGQEB19pse6nGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found that code of gistchoose doesn't follow it's logic. Idea of
gistchoose is that first column penalty is more important than penalty of
second column. If we meet same penalty values of first column then we
choose minimal penalty value of second column among them.

Current code doesn't follow described logic. Let's illustrate it on
example. Imagine we've following input data.

tuple col1 col2
1 1 0
2 0 2
3 0 1

According to function logic, tuple #3 should be selected. But gistchoose
will choose #2, because after processing #2 which_grow will contain two
zeros. It's enough to remove "&& i == FirstOffsetNumber" from gistchoose in
order to fix it. I've discussed it with Teodor in person and he have
confirmed my thoughts.

Let's consider an example.

Create table with two columns where first column contain only few distinct
values.
test=# create table test as (select (random()*5)::integer, random() from
generate_series(1,1000000));
test=# create index test_idx on test using gist (x,y);

Without patch:

test=# explain (analyze, buffers) select * from test where x = 1 and y
between 0.1 and 0.2;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=986.89..6737.30 rows=19681 width=12)
(actual time=40.543..52.033 rows=19894 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <=
0.2::double precision))
Buffers: shared hit=5957
-> Bitmap Index Scan on test_idx (cost=0.00..981.96 rows=19681
width=0) (actual time=39.248..39.248 rows=19894 loops=1)
Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <=
0.2::double precision))
Buffers: shared hit=*699*
Total runtime: 53.494 ms
(7 rows)

test=# explain (analyze, buffers) select * from test where x = 1 and y
between 0.5 and 0.6;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=998.29..6753.38 rows=19948 width=12)
(actual time=38.646..50.136 rows=19830 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <=
0.6::double precision))
Buffers: shared hit=6243
-> Bitmap Index Scan on test_idx (cost=0.00..993.30 rows=19948
width=0) (actual time=37.342..37.342 rows=19830 loops=1)
Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <=
0.6::double precision))
Buffers: shared hit=*974*
Total runtime: 51.561 ms
(7 rows)

With patch

test=# explain (analyze, buffers) select * from test2 where x = 1 and y
between 0.1 and 0.2;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=889.96..6645.37 rows=19966 width=12)
(actual time=17.761..29.499 rows=19894 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <=
0.2::double precision))
Buffers: shared hit=5436
-> Bitmap Index Scan on test2_idx (cost=0.00..884.97 rows=19966
width=0) (actual time=16.598..16.598 rows=19894 loops=1)
Index Cond: ((x = 1) AND (y >= 0.1::double precision) AND (y <=
0.2::double precision))
Buffers: shared hit=*178*
Total runtime: 30.996 ms
(7 rows)

test=# explain (analyze, buffers) select * from test2 where x = 1 and y
between 0.5 and 0.6;
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test2 (cost=885.34..6639.89 rows=19917 width=12)
(actual time=21.734..33.655 rows=19830 loops=1)
Recheck Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <=
0.6::double precision))
Buffers: shared hit=5452
-> Bitmap Index Scan on test2_idx (cost=0.00..880.36 rows=19917
width=0) (actual time=20.604..20.604 rows=19830 loops=1)
Index Cond: ((x = 1) AND (y >= 0.5::double precision) AND (y <=
0.6::double precision))
Buffers: shared hit=*183*
Total runtime: 35.136 ms
(7 rows)

You can see that difference in number of read buffers during index scan is
pretty significant.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gistchoose_fix-1.patch application/octet-stream 1.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-08-20 17:13:16 Re: gistchoose vs. bloat
Previous Message Rushabh Lathia 2012-08-20 16:46:44 Re: Primary Key Constraint on inheritance table not getting route to child tables