Fix for cube picksplit function

Lists: pgsql-hackers
From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: greg(at)turnstep(dot)com, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Fix for cube picksplit function
Date: 2010-10-29 18:59:41
Message-ID: AANLkTimC8W6guHpWJeWdjQA6WGoVH-7qG9Ar4pem2N2V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

Gist picksplit method implementation in cube contrib module contains a bug
in Guttman's split algorithm implementation. It was mentioned before but it
is still not fixed yet. Current implementation doesn't cause incorrect
behavior, but index becomes very bad, because each index scan require to
read significant part of index.

test=# create table test (a cube);
test=# insert into test (select cube(array[random(),random(),random()]) from
generate_series(1,1000000));
test=# create index test_cube_idx on test using gist(a);

Before this patch, index scan of relatively small area requires read of 1235
index pages.

test=# explain (analyze,buffers) select * from test where a <@
cube(array[0.1,0.1,0.1],array[0.15,0.15,0.15]);
QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=88.77..3046.68 rows=1000 width=56) (actual
time=17.497..18.188 rows=120 loops=1)
Recheck Cond: (a <@ '(0.1, 0.1, 0.1),(0.15, 0.15, 0.15)'::cube)
Buffers: shared hit=1354
-> Bitmap Index Scan on test_cube_idx (cost=0.00..88.52 rows=1000
width=0) (actual time=17.428..17.428 rows=120 loops=1)
Index Cond: (a <@ '(0.1, 0.1, 0.1),(0.15, 0.15, 0.15)'::cube)
Buffers: shared hit=1235
Total runtime: 18.407 ms
(7 rows)

Time: 19,501 ms

After this patch, index scan of same area requires read of only 44 index
pages.

test=# explain (analyze,buffers) select * from test where a <@
cube(array[0.1,0.1,0.1],array[0.15,0.15,0.15]);
QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on test (cost=76.66..3034.57 rows=1000 width=56) (actual
time=0.828..1.543 rows=120 loops=1)
Recheck Cond: (a <@ '(0.1, 0.1, 0.1),(0.15, 0.15, 0.15)'::cube)
Buffers: shared hit=163
-> Bitmap Index Scan on test_cube_idx (cost=0.00..76.41 rows=1000
width=0) (actual time=0.767..0.767 rows=120 loops=1)
Index Cond: (a <@ '(0.1, 0.1, 0.1),(0.15, 0.15, 0.15)'::cube)
Buffers: shared hit=44
Total runtime: 1.759 ms
(7 rows)

Seg contrib module and pg_temporal project contain same bug. But there are
another fix for them, because there is no need to use Guttman's split
algorithm at all in unidimensional case.
I'm going to add this patch to commitfest.

----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
cube_picksplit.patch text/x-patch 361 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, greg(at)turnstep(dot)com
Subject: Re: Fix for cube picksplit function
Date: 2010-11-15 02:34:23
Message-ID: AANLkTikroS6akLbKQNrqiU9eEz7etCFhH=3yJ5zovqg3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 29, 2010 at 2:59 PM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com> wrote:
> Gist picksplit method implementation in cube contrib module contains a bug
> in Guttman's split algorithm implementation. It was mentioned before but it
> is still not fixed yet. Current implementation doesn't cause incorrect
> behavior, but index becomes very bad, because each index scan require to
> read significant part of index.

It looks to me like this can be safely back-patched, because it
doesn't affect the contents of index entries already on disk, just the
manner in which we construct new ones.

So I've committed this and back-patched it all the way back to 8.1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company