Quick Links

Re: Index-only-scans, indexam API changes

Lists:	pgsql-hackers

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Index-only-scans, indexam API changes
Date:	2009-07-13 07:19:02
Message-ID:	4A5ADFE6.6060507@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

At the moment, amgettuple only returns pointers to heap tuples. There is
no way to return data from the index tuples. That needs to be changed to
support index-only scans.

I propose that we split index_getnext into two steps: fetching the next
match from the index (index_next()), and fetching the corresponding heap
tuple (index_fetch()). Patch attached. There is still a shorthand
index_getnext() that is compatible with the old function, but it now
just calls index_next()+index_fetch().

The new index_fetch() function can only fetch one match from a HOT
chain. That greatly simplifies the code in indexam.c. The only caller
needing to fetch more than one tuple from a HOT chain (= using
SnapshotAny) is CLUSTER, so I moved the HOT-chain following logic into
cluster.c, with small changes to heap_hot_search_buffer().

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
split-index_gettuple-1.patch	text/x-diff	27.0 KB

From:	Greg Stark <gsstark(at)mit(dot)edu>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-13 09:02:45
Message-ID:	407d949e0907130202s33124d3drea41fb66dddd2ea6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 13, 2009 at 8:19 AM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> I propose that we split index_getnext into two steps: fetching the next
> match from the index (index_next()), and fetching the corresponding heap
> tuple (index_fetch()).

A pretty trivial concern, but it seems confusing that the function to
fetch a tuple from the heap is called "index_fetch()"... Perhaps
something more explicit like index_fetch_heap_tuple() would be
clearer?

--
greg
http://mit.edu/~gsstark/resume.pdf

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-13 14:18:49
Message-ID:	22904.1247494729@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> At the moment, amgettuple only returns pointers to heap tuples. There is
> no way to return data from the index tuples. That needs to be changed to
> support index-only scans.

What are you going to do for index types that don't store the original
data (e.g. hash)?

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-13 15:03:34
Message-ID:	4A5B4CC6.9020702@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> At the moment, amgettuple only returns pointers to heap tuples. There is
>> no way to return data from the index tuples. That needs to be changed to
>> support index-only scans.
>
> What are you going to do for index types that don't store the original
> data (e.g. hash)?

They will obviously not be able to regurgitate index tuples. I have not
yet decided how that's going to be signaled. In the prototype patch I
have, I have hard-coded that only b-trees can do that. A new column
"amcanreturntuples" column in pg_am seems like the most straightforward way.

(This indexam API patch isn't bothered with that yet. It just splits
index_gettuple() into two)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-13 15:16:23
Message-ID:	24049.1247498183@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Tom Lane wrote:
>> What are you going to do for index types that don't store the original
>> data (e.g. hash)?

> They will obviously not be able to regurgitate index tuples. I have not
> yet decided how that's going to be signaled.

Well, I think that's a pretty critical part of the API change.

> (This indexam API patch isn't bothered with that yet. It just splits
> index_gettuple() into two)

One thought here is that an AM call isn't really free, and doing two of
them instead of one mightn't be such a good idea. I would suggest
either having a separate AM entry point to get both bits of data
("amgettupledata"?) or adding an optional parameter to amgettuple.
A small attraction of the alternate entry point is that then you can
flag the "unsupported" case by putting a zero in that pgam column, as
indeed we already do for amgettuple; so you don't need an additional
bool column.

[ thinks a bit ... ] At least for GIST, it is possible that whether
data can be regurgitated will vary depending on the selected opclass.
Some opclasses use the STORAGE modifier and some don't. I am not sure
how hard we want to work to support flexibility there. Would it be
sufficient to hard-code the check as "pgam says the AM can do it,
and the opclass does not have a STORAGE property"? Or do we need
additional intelligence about GIST opclasses?

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-13 15:32:22
Message-ID:	4A5B5386.3070100@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> One thought here is that an AM call isn't really free, and doing two of
> them instead of one mightn't be such a good idea. I would suggest
> either having a separate AM entry point to get both bits of data
> ("amgettupledata"?) or adding an optional parameter to amgettuple.

I'm thinking of adding a new flag to IndexScanDesc, "needIndexTuple".
When that is set, amgettuple stores a pointer to the IndexTuple in
another new field in IndexScanDesc, xs_itup. (In case of b-tree at
least, the IndexTuple is a palloc'd copy of the tuple on disk)

> [ thinks a bit ... ] At least for GIST, it is possible that whether
> data can be regurgitated will vary depending on the selected opclass.
> Some opclasses use the STORAGE modifier and some don't. I am not sure
> how hard we want to work to support flexibility there. Would it be
> sufficient to hard-code the check as "pgam says the AM can do it,
> and the opclass does not have a STORAGE property"? Or do we need
> additional intelligence about GIST opclasses?

Well, the way I have it implemented is that the am is not required to
return the index tuple, even if requested. I implemented the B-tree
changes similar to how we implement "kill_prior_tuple": in btgettuple,
lock the index page and see if the tuple is still at the same position
that we remembered in the scan opaque struct. If not (because of
concurrent changes to the page), we give up and don't return the index
tuple. The executor will then perform a heap fetch as before.

Alternatively, we could copy all the matching index tuples to private
memory when we step to a new index page, but that seems pretty expensive
if we're only going to use the first few matching tuples (LIMIT), and I
fear the memory management gets complicated. But in any case, the GiST
issue would still be there.

Since we're discussing it, I'm attaching the prototype patch I have for
returning tuples from b-tree and using them to filter rows before heap
fetches. I was going to post it in the morning along with description
about the planner and executor changes, but here goes. It applies on top
of the indexam API patch I started this thread with.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
indexfilter-1.patch	text/x-diff	40.6 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-21 17:10:57
Message-ID:	603c8f070907211010v266bcf2w744bfd533272a7a0@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jul 13, 2009 at 11:32 AM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Tom Lane wrote:
>> One thought here is that an AM call isn't really free, and doing two of
>> them instead of one mightn't be such a good idea. I would suggest
>> either having a separate AM entry point to get both bits of data
>> ("amgettupledata"?) or adding an optional parameter to amgettuple.
>
> I'm thinking of adding a new flag to IndexScanDesc, "needIndexTuple".
> When that is set, amgettuple stores a pointer to the IndexTuple in
> another new field in IndexScanDesc, xs_itup. (In case of b-tree at
> least, the IndexTuple is a palloc'd copy of the tuple on disk)
>
>> [ thinks a bit ... ] At least for GIST, it is possible that whether
>> data can be regurgitated will vary depending on the selected opclass.
>> Some opclasses use the STORAGE modifier and some don't. I am not sure
>> how hard we want to work to support flexibility there. Would it be
>> sufficient to hard-code the check as "pgam says the AM can do it,
>> and the opclass does not have a STORAGE property"? Or do we need
>> additional intelligence about GIST opclasses?
>
> Well, the way I have it implemented is that the am is not required to
> return the index tuple, even if requested. I implemented the B-tree
> changes similar to how we implement "kill_prior_tuple": in btgettuple,
> lock the index page and see if the tuple is still at the same position
> that we remembered in the scan opaque struct. If not (because of
> concurrent changes to the page), we give up and don't return the index
> tuple. The executor will then perform a heap fetch as before.
>
> Alternatively, we could copy all the matching index tuples to private
> memory when we step to a new index page, but that seems pretty expensive
> if we're only going to use the first few matching tuples (LIMIT), and I
> fear the memory management gets complicated. But in any case, the GiST
> issue would still be there.
>
> Since we're discussing it, I'm attaching the prototype patch I have for
> returning tuples from b-tree and using them to filter rows before heap
> fetches. I was going to post it in the morning along with description
> about the planner and executor changes, but here goes. It applies on top
> of the indexam API patch I started this thread with.

I'm not sure where we are on this patch for reviewing purposes.
Heikki, are you planning to provide an updated patch? Or what should
we be doing here from an RRR standpoint?

...Robert

From:	Teodor Sigaev <teodor(at)sigaev(dot)ru>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject:	Re: Index-only-scans, indexam API changes
Date:	2009-07-28 15:46:31
Message-ID:	4A6F1D57.1000204@sigaev.ru
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

GiST: btree_gist uses STORAGE option, although original value is accessible from
index's tuple.

GIN doesn't require setting of STORAGE option even if it's impossible to
reconstruct original heap value from indexed value. Right now, only btree_gin's
opclasses could be used for index only scans (and only for single-column index
scan!).

So, STORAGE option could not indicate "reconstruct-ability" original heap value
:(. It seems to me, opclass definition could contain boolean field about that,
but with STORAGE option specified it's needed to have separate "reconstructing"
interface method. IMHO, it's too complex for now and it doesn't give significant
benefits.

Although index-only scan over GIN/GiST could be useful for some aggregates
queries like count(*).

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/