Index-only quals

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Index-only quals
Date: 2009-08-21 11:43:45
Message-ID: 4A8E8871.9040105@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is an updated version of my patch to return data from b-tree
indexes, and use it to satisfy quals.

I added a new column 'amregurgitate' to pg_am, to mark which indexams
can return index tuples. Also, the data type of the index column in
pg_attribute must match the type of the heap column - this catches the
hack that 'name' is stored as cstring, that I had hardcoded before.

As discussed, GiST/GIN would need more infrastructure to mark which
opclasses can return tuples, but as long as GiST/GIN doesn't support
regurgitation at all, I'm not going to complicate the catalogs with that.

There's also some planner fixes - indexes that are only useful because
of index-only quals are not considered for bitmap scans, and volatile
expressions mustn't be used as index-only quals.

This patch comes in two parts. Indexam API changes, which just splits
the indexam_getnext function into two without providing any new
functionality, and the main patch that applies on top of the indexam API
changes. The patches are also available at
git://git.postgresql.org/git/users/heikki/postgres.git, branches
'indexam-api-changes and 'indexfilter'.

Barring objections, I'm going to apply the indexam API changes part,
since that simplifies the code in question regardless of the rest of the
work. I'm pretty happy now with the indexfilter patch as well, but want
to do some more testing on that before committing. Some more eyeballs
would be appreciated as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
split-index_gettuple-2.patch text/x-diff 27.0 KB
indexfilter-2.patch text/x-diff 51.7 KB

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-21 11:57:05
Message-ID: 407d949e0908210457h5d4b6b94lef1910ac2edd594a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 21, 2009 at 12:43 PM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> I added a new column 'amregurgitate' to pg_am, to mark which indexams
> can return index tuples.

Very picturesque but uh, perhaps the more mundane "amcanrettuples"
would be clearer?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-21 12:13:55
Message-ID: 407d949e0908210513o24812144ncb82df2f3d8b34dc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 21, 2009 at 12:43 PM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Here is an updated version of my patch to return data from b-tree
> indexes, and use it to satisfy quals.

+ if (!found_clause && useful_pathkeys == NIL && !useful_predicate)
+ ipath->scantype = ST_INDEXSCAN;
+ else
+ {
+ ipath->scantype = 0;
+ if (index->amhasgettuple)
+ ipath->scantype |= ST_INDEXSCAN;
+ if (index->amhasgetbitmap)
+ ipath->scantype |= ST_BITMAPSCAN;
+ }
+

Does this section need to check amhasgettuple for the index-only scan
case as well? It looks like right now if an indexam has amregurgitate
set but not amhasgettuple then weird things could happen.

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-21 12:24:56
Message-ID: 4A8E9218.2070206@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> On Fri, Aug 21, 2009 at 12:43 PM, Heikki
> Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Here is an updated version of my patch to return data from b-tree
>> indexes, and use it to satisfy quals.
>
> + if (!found_clause && useful_pathkeys == NIL && !useful_predicate)
> + ipath->scantype = ST_INDEXSCAN;
> + else
> + {
> + ipath->scantype = 0;
> + if (index->amhasgettuple)
> + ipath->scantype |= ST_INDEXSCAN;
> + if (index->amhasgetbitmap)
> + ipath->scantype |= ST_BITMAPSCAN;
> + }
> +
>
>
> Does this section need to check amhasgettuple for the index-only scan
> case as well? It looks like right now if an indexam has amregurgitate
> set but not amhasgettuple then weird things could happen.

We check earlier in the function before we construct indexonlyQuals that
the index has amhasgettuple. Hmm, can you find an easier-to-understand
way to write that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-21 12:27:54
Message-ID: 4A8E92CA.7000604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> It looks like right now if an indexam has amregurgitate
> set but not amhasgettuple then weird things could happen.

The combination (amregurgitate && !amhasgettuple) makes no sense, BTW.
If an indexam has no gettuple function, there's no way it can return
data from the index.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jaime Casanova <jcasanov(at)systemguards(dot)com(dot)ec>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-21 17:28:33
Message-ID: 3073cc9b0908211028h6acca53bld3338ba8fda0b140@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 21, 2009 at 7:27 AM, Heikki
Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Greg Stark wrote:
>> It looks like right now if an indexam has amregurgitate
>> set but not amhasgettuple then weird things could happen.
>
> The combination (amregurgitate && !amhasgettuple) makes no sense, BTW.
> If an indexam has no gettuple function, there's no way it can return
> data from the index.
>

to have two columns that can conflict is not error prone? why not make
amhasgettuple an enum?

--
Atentamente,
Jaime Casanova
Soporte y capacitación de PostgreSQL
Asesoría y desarrollo de sistemas
Guayaquil - Ecuador
Cel. +59387171157


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-22 15:14:27
Message-ID: 9475.1250954067@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Barring objections, I'm going to apply the indexam API changes part,
> since that simplifies the code in question regardless of the rest of the
> work. I'm pretty happy now with the indexfilter patch as well, but want
> to do some more testing on that before committing. Some more eyeballs
> would be appreciated as well.

The first patch definitely needs another editorial pass, eg you have
complicated the behavior of heap_hot_search_buffer's all_dead parameter
without adjusting its documentation. The patch as-submitted is really
quite hard to review, though. It's hard to convince myself that all the
logic you removed from index_getnext is all accounted for elsewhere.
Could you run through the reasoning on that?

I think the second patch needs a fair amount of work. The fact that
_bt_getindextuple can fail to get the right tuple seems quite bogus;
what I think it means is that you've attached the functionality in the
wrong place. nbtree certainly had its hands on the right tuple at some
point, and what you should have done was saved the tuple aside at that
point. I feel it's important that an indexam either be able to give
back tuples or not; this "maybe I can" semantics will prevent us from
doing many interesting things with the capability. (More about that
in an upcoming message.)

ExecStoreIndexTuple seems rather confused as well, as it's applying what
apparently is a heap tupdesc to an index tuple. That might accidentally
fail to fail at the moment, but I think it would be better to keep the
two concepts well separated. Moreover attr-by-attr extraction is not
what you want, for efficiency reasons. Use index_deform_tuple(), now
that that's there. (The internal implementation of that is no better,
but now that there is actually a performance reason to improve it, we
could do that.)

I concur with the objection to "regurgitate" terminology, it seems a
bit yucky.

I haven't had time to go through the planner part in any detail...

regards, tom lane


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2009-08-22 20:01:53
Message-ID: 4A904EB1.70100@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Barring objections, I'm going to apply the indexam API changes part,
>> since that simplifies the code in question regardless of the rest of the
>> work. I'm pretty happy now with the indexfilter patch as well, but want
>> to do some more testing on that before committing. Some more eyeballs
>> would be appreciated as well.
>
> The first patch definitely needs another editorial pass, eg you have
> complicated the behavior of heap_hot_search_buffer's all_dead parameter
> without adjusting its documentation.

Ok, I'll take a look.

> The patch as-submitted is really
> quite hard to review, though. It's hard to convince myself that all the
> logic you removed from index_getnext is all accounted for elsewhere.
> Could you run through the reasoning on that?

The removed logic in index_getnext() dealt with HOT chains. As the code
stands without the patch, index_getnext() returns all visible tuples
from a HOT chain, and therefore needs bookkeeping to keep track of the
current tuple in the current HOT chain.

With a normal MVCC snapshot, only one tuple in a HOT chain can be
visible, so all that logic is unnecessary for a MVCC snapshot. The same
holds for SnapshotSelf and SnapshotNow, but not for SnapshotAny.
However, there is only caller of index_getnext() that uses SnapshotAny:
CLUSTER. I modified heap_hot_search_buffer() so that it can be used to
continue searching a HOT chain after the first visible tuple, and
modified cluster.c to use that for the HOT chain following. This allowed
me to dumb down index_next()/index_fetch() so that index_fetch() only
needs to find the first visible tuple in a HOT chain.

> I think the second patch needs a fair amount of work. The fact that
> _bt_getindextuple can fail to get the right tuple seems quite bogus;
> what I think it means is that you've attached the functionality in the
> wrong place. nbtree certainly had its hands on the right tuple at some
> point, and what you should have done was saved the tuple aside at that
> point. I feel it's important that an indexam either be able to give
> back tuples or not; this "maybe I can" semantics will prevent us from
> doing many interesting things with the capability. (More about that
> in an upcoming message.)

Yeah, I think you're right. That seemed like the quickest way to get it
done, but I'm seeing that it's also pretty bad from a performance
standpoint - the index page needs to be locked and unlocked for each
tuple, which in a quick microbenchmark makes the index scan slower than
a seqscan when the data is in cache. I'm seeing about 10-15% of CPU time
spent on that locking, and the index scan is about that much slower than
the seq scan.

That can certainly be improved. Unfortunately we can't just return
pointers to the pinned index page in shared buffer, because a pin
doesn't stop page splits or moving tuples like it does on heap pages.
But we can copy all matching tuples to local memory along with the
matching TIDs when we step on a page.

> I concur with the objection to "regurgitate" terminology, it seems a
> bit yucky.

Apparently that word evokes stronger images in native speakers :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only quals
Date: 2010-02-23 17:11:37
Message-ID: 201002231711.o1NHBbZ17363@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I added this URL to the existing TODO item.

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Here is an updated version of my patch to return data from b-tree
> indexes, and use it to satisfy quals.
>
> I added a new column 'amregurgitate' to pg_am, to mark which indexams
> can return index tuples. Also, the data type of the index column in
> pg_attribute must match the type of the heap column - this catches the
> hack that 'name' is stored as cstring, that I had hardcoded before.
>
> As discussed, GiST/GIN would need more infrastructure to mark which
> opclasses can return tuples, but as long as GiST/GIN doesn't support
> regurgitation at all, I'm not going to complicate the catalogs with that.
>
> There's also some planner fixes - indexes that are only useful because
> of index-only quals are not considered for bitmap scans, and volatile
> expressions mustn't be used as index-only quals.
>
>
> This patch comes in two parts. Indexam API changes, which just splits
> the indexam_getnext function into two without providing any new
> functionality, and the main patch that applies on top of the indexam API
> changes. The patches are also available at
> git://git.postgresql.org/git/users/heikki/postgres.git, branches
> 'indexam-api-changes and 'indexfilter'.
>
> Barring objections, I'm going to apply the indexam API changes part,
> since that simplifies the code in question regardless of the rest of the
> work. I'm pretty happy now with the indexfilter patch as well, but want
> to do some more testing on that before committing. Some more eyeballs
> would be appreciated as well.
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +