Re: Index-only scans for GIST

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only scans for GIST
Date: 2014-08-06 20:30:38
Message-ID: 53E2906E.8080707@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 08/01/2014 10:58 AM, Anastasia Lubennikova wrote:
> Hi, hackers!
> I work on a GSoC project "Index-only scans for GIST"
> https://wiki.postgresql.org/wiki/Support_for_Index-only_scans_for_GIST_GSoC_2014
>
> Repository is
> https://github.com/lubennikovaav/postgres/tree/indexonlygist2
> Patch is in attachments.

Thanks!

Some comments:

* I got this compiler warning:

gistget.c:556:5: warning: ISO C90 forbids mixed declarations and code
[-Wdeclaration-after-statement]
ListCell *tmpPageData = so->curPageData;
^

* I'm getting two regression failures with this (opr_sanity and join).

* After merging with master, build fails because of duplicate OIDs.

* The regression test queries that use LIMIT are not guaranteed to
always return the same rows, hence they're not very good regression test
cases. I'd suggest using more restricting WHERE clauses, so that each
query only returns a handful of rows.

* What's the reason for turning GISTScanOpaqueData.pageData from an
array to a List?

* I think it's leaking memory, in GIST scan context. I tested this with
a variant of the regression tests:

insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
0.05*i)),
point(0.05*i, 0.05*i) FROM generate_series(0,
10000000) as i;
CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);

set enable_seqscan=off;
set enable_bitmapscan=off;

explain analyze select p from gist_tbl where p <@ box(point(0,0),
point(9999999,9999999)) and length(p::text) < 10;

while the final query runs, 'top' shows constantly increasing memory usage.

> It includes index-only scans for multicolumn GIST and new regression test.
> Fetch() method is realized for box and point opclasses.

Can we have Fetch functions for all the datatypes in btree_gist contrib
module, please? Do other contrib modules contain GiST opclasses that
could have Fetch functions?

> Documentation is not updated yet, but I'm going to do it till the end of
> GSoC.
>
> I've got one question about query with OR condition. It is the last query
> in regression test "gist_indexonly". It doesn't fail but it doensn't use
> index-only scans. Could someone explain to me how it works?
> It seems to depend on build_paths_for_OR
> <http://doxygen.postgresql.org/indxpath_8c.html#ae660d2e886355e53ed3b9ec693e4afd2>
> function.
> But I couldn't understand how.

The query is:

select * from gist_tbl
where b <@ box(point(5,5), point(6,6))
or p <@ box(point(0,0), point(100,100)) limit 10;

It cannot use an index(-only) scan for this, because a single index scan
can only return rows based on one key. In this case, you need to do two
scans, and then return the rows returned by either scan, removing
duplicates. A bitmap scan is possible, because it can remove the
duplicates, but the planner can't produce a plain index scan plan that
would do the same.

A common trick when that happens in a real-world application is to
re-write the query using UNION:

select * from gist_tbl
where b <@ box(point(5,5), point(6,6))
UNION
select * from gist_tbl
where p <@ box(point(0,0), point(100,100))
limit 10;

Although that doesn't seem to actually work:

ERROR: could not identify an equality operator for type box
LINE 1: select * from gist_tbl
^

but that's not your patch's fault, the same happens with unpatched master.

IOW, you don't need to worry about that case.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-08-06 20:45:28 Re: select_common_type()'s behavior doesn't match the documentation
Previous Message Pavel Stehule 2014-08-06 20:18:37 Re: PostrgeSQL vs oracle doing 1 million sqrts am I doing it wrong?