Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date: 2009-09-14 09:41:55
Message-ID: 4AAE0FE3.40600@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> It strikes me that in the cases where it wouldn't be necessary to
>> compute junk sort-key columns, it would be because we were scanning an
>> index that includes those values. So if the plan were set up to pull
>> those values from the index and return them, then we'd not have to add
>> this extra complexity to grouping_planner --- the argument that it's not
>> worth it to get rid of the junk columns comes back into play. Moreover,
>> such an ability would also mean that if the user *does* ask for the
>> sort column value as output (ie it's not resjunk), we can still satisfy
>> the query from the index without recomputing the expensive function.
>>
>> So this is where we come to the connection to Heikki's index-only-quals
>> patch. As submitted, that code is only able to use an index column in
>> a scan qual, it's not able to return it as part of the scan result.
>> This example makes it clear that that definition is missing a large
>> part of the potential benefit of an index value extraction capability.
>>
>> To be able to do anything along that line would require some more work
>> in the executor and a *lot* more work in the planner, and I'm honestly
>> not sure what the planner part of it would look like.
>
> I think we should separate the Heap Fetch operation from the IndexScan.

I've been hacking on that approach. It's quite unfinished, but before I
spend any more time on it, I'd like to get some feedback on the overall
design.

The attached patch can create plans where quals are checked and joins
are performed using values from indexes only, and the heap tuples are
fetched only for matching rows. Passes regression tests, but code is
quite ugly at points. Cost estimation is bogus. The patch builds on the
indexam-api-changes patch I posted earlier, which is also attached. I
haven't yet done the changes to that patch that were discussed.

I haven't done any performance testing. The overhead of an extra
executor node for each index scan could slow down simple queries, we
might need to compensate that somehow, maybe reintroduce a fastpath
combined IndexScan+HeapFetch node. I'm also afraid the extra work I've
pushed to the stage where Paths are constructed could slow down planning
quite a bit if you have a lot of indexes.

Path nodes now carry a targetlist. That's because when you have a path like:

HeapFetch
-> Join
...

You won't have all the columns of the join rel available at the join
node yet, because they will be fetched in the HeapFetch node above. The
targetlist in Path nodes reflect that, and the targetlist of the final
Plan nodes are created from the targetlists in the Path nodes instead of
the ones in RelOptInfos.

Per earlier discussion, I changed the way index tuple fetching works in
B-tree, so that it can now be relied on. Matching index tuples are
copied to backend-local memory when the scan steps on a page.

Var nodes that refer to index columns (indexquals and the new index-only
filters) now have a new field, varindexno, set. While we could've
continued with the old representation, now that we have more expressions
that refer to index vars instead of heap vars, this makes debugging easier.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
indexam-api-changes-2.patch text/x-diff 23.2 KB
heapfetch-1.patch text/x-diff 118.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2009-09-14 09:58:55 Re: Triggers on columns
Previous Message Jeff Davis 2009-09-14 09:33:59 Re: WIP: generalized index constraints