Quick Links

Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000

Lists:	pgsql-hackers

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-08-22 15:59:07
Message-ID:	9957.1250956747@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've been looking at bug #5000 (must be some kind of milestone), in
which the complaint was that the planner won't use an indexscan on a
functional index to satisfy an ORDER BY. Of course it *can* do that,
it's just not being very bright about it. Consider the following
example in the regression database, using all default settings:

regression=# create or replace function foo(int) returns int as
regression-# 'begin return $1; end' language plpgsql immutable strict
regression-# cost 10000;
CREATE FUNCTION
regression=# create index fooi on tenk1 (foo(unique1));
CREATE INDEX
regression=# explain verbose select * from tenk1 order by foo(unique1);
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using fooi on public.tenk1 (cost=0.00..251702.25 rows=10000 width=244)
Output: unique1, unique2, two, four, ten, twenty, hundred, thousand, twothousand, fivethous, tenthous, odd, even, stringu1, stringu2, string4, foo(unique1)
(2 rows)

The first thing you'll notice is that the cost estimate is supposing
that the expensive function gets re-evaluated at each row. And the
second thing you'll notice is that the cost estimate is not incorrect,
because this plan actually *is* evaluating foo(unique1) at each row.
(I am using CVS HEAD here so that you can see resjunk output columns...)
So this is hardly going to satisfy the complainant, even if the planner
chose it over the seqscan-and-sort alternative.

The reason for this behavior is that the Query representation emitted
by the parser includes the ORDER BY expressions as "resjunk" columns
in the query targetlist. grouping_planner() faithfully includes all
such columns in the final plan's targetlist, even ones that are not
really needed because of the form of the chosen plan. Because it always
uses the same targetlist, it also feels that it's not necessary to worry
about the evaluation cost of the targetlist while choosing the plan
--- so it settles on indexscan or seqscan before it's ever even bothered
to compute the cost of the tlist.

This isn't a horrid strategy in "normal" cases where the sort keys
are just simple columns; even if they end up not being referenced,
pulling a couple of extra datums from the heap tuples isn't all that
much extra expense. But if you're talking about an expensive function
then it gets objectionable.

I looked into fixing the code to omit unused resjunk sort columns, and
think that it could probably be dealt with via some reasonably
straightforward changes in grouping_planner and some of its immediate
subroutines. However, grouping_planner is already a huge pile of subtly
intertwined spaghetti, and adding yet another set of considerations to
it will make it that much harder to understand or maintain. (If
anyone's got some ideas about refactoring it, I'm all ears.) So I
looked around for alternatives.

It strikes me that in the cases where it wouldn't be necessary to
compute junk sort-key columns, it would be because we were scanning an
index that includes those values. So if the plan were set up to pull
those values from the index and return them, then we'd not have to add
this extra complexity to grouping_planner --- the argument that it's not
worth it to get rid of the junk columns comes back into play. Moreover,
such an ability would also mean that if the user *does* ask for the
sort column value as output (ie it's not resjunk), we can still satisfy
the query from the index without recomputing the expensive function.

So this is where we come to the connection to Heikki's index-only-quals
patch. As submitted, that code is only able to use an index column in
a scan qual, it's not able to return it as part of the scan result.
This example makes it clear that that definition is missing a large
part of the potential benefit of an index value extraction capability.

To be able to do anything along that line would require some more work
in the executor and a *lot* more work in the planner, and I'm honestly
not sure what the planner part of it would look like. The main point
at the moment is that to do something like this, we'd want the indexscan
to certainly extract the function value from the index. Heikki's patch
views that as an optional behavior with no real penalty for failure.
I don't think that's good enough.

I'm not sure whether to go ahead with trying to fix the unused-sort-key
behavior right now, or to leave it until something gets done on the
index value extraction front. Comments?

regards, tom lane

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgreSQL(dot)org
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-08-22 19:18:39
Message-ID:	4A90448F.4060601@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> It strikes me that in the cases where it wouldn't be necessary to
> compute junk sort-key columns, it would be because we were scanning an
> index that includes those values. So if the plan were set up to pull
> those values from the index and return them, then we'd not have to add
> this extra complexity to grouping_planner --- the argument that it's not
> worth it to get rid of the junk columns comes back into play. Moreover,
> such an ability would also mean that if the user *does* ask for the
> sort column value as output (ie it's not resjunk), we can still satisfy
> the query from the index without recomputing the expensive function.
>
> So this is where we come to the connection to Heikki's index-only-quals
> patch. As submitted, that code is only able to use an index column in
> a scan qual, it's not able to return it as part of the scan result.
> This example makes it clear that that definition is missing a large
> part of the potential benefit of an index value extraction capability.
>
> To be able to do anything along that line would require some more work
> in the executor and a *lot* more work in the planner, and I'm honestly
> not sure what the planner part of it would look like.

I think we should separate the Heap Fetch operation from the IndexScan.
So in your example, you'd get a plan like:

postgres=# explain verbose select unique1 from tenk1 order by foo(unique1);
QUERY
PLAN

--------------------------------------------------------------------------------
-----------------------------------------------------------------
Heap Fetch on public.tenk1
Output: unique1
-> Index Scan using fooi on public.tenk1 (cost=0.00..7298.60
rows=290 width=244)
Output: foo(unique1), ctid

IndexScanPath would be correspondingly split into HeapFetchPath and
IndexPath. There would need to be magic to figure out how to decompose
each top target list entry to components fetched from the heap and the
index. In a more complex scenario, we might have
"foo(unique1)*foo(unique2)" in the SELECT list. That could be evaluated
by fetching "foo(unique1)" from the index, "unique2" from the heap, and
projecting those with "A*foo(B)" at the Heap Fetch node.

The cost estimates of IndexPath would no longer include the cost of
fetching from heap - that would be taken into account in HeapFetchPath.
The IndexScan executor node would be a lot simpler as well, as it would
only need to deal with the index, not the heap.

The really cool thing about this is that we could then "bubble up" the
HeapFetch nodes when we construct joins. For example, when we construct
a nested loop join path on a SeqScan on tenk2 and HeapFetch+IndexPath on
tenk1, we could move the HeapFetch above the join node:

postgres=# explain select tenk1.unique1, tenk2.unique2 from tenk1, tenk2
where tenk1.unique1 = tenk2.unique1

Heap Fetch on public.tenk1
Output: tenk1.unique1, tenk2.unique1
-> Nested Loop (tenk1.unique1 = tenk2.unique1)
Output: tenk1.unique1, tenk2.unique1, tenk1.ctid
-> Index Scan using foo_unique1 on public.tenk1
Output: tenk1.unique1, tenk1.ctid
-> Seq Scan on tenk2
Output: tenk2.unique1

In this example, HeapFetch actually only needs to check the visibility -
it's easy to see how we could turn this into a true index-only scan if
we had a crash-safe visibility map.

> The main point
> at the moment is that to do something like this, we'd want the indexscan
> to certainly extract the function value from the index. Heikki's patch
> views that as an optional behavior with no real penalty for failure.
> I don't think that's good enough.

Yes, so it seems. One complication is the 'rechecks' that an Index scan
sometimes has to do. The heap tuple is required for that. If we split
the index scan to HeapFetch and IndexScan as I'm envisioning, the
rechecks could be done in the HeapFetch node, but then we need to
propagate the recheck flag from IndexScan to HeapFetch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	pgsql-hackers(at)postgreSQL(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-14 09:41:55
Message-ID:	4AAE0FE3.40600@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas wrote:
> Tom Lane wrote:
>> It strikes me that in the cases where it wouldn't be necessary to
>> compute junk sort-key columns, it would be because we were scanning an
>> index that includes those values. So if the plan were set up to pull
>> those values from the index and return them, then we'd not have to add
>> this extra complexity to grouping_planner --- the argument that it's not
>> worth it to get rid of the junk columns comes back into play. Moreover,
>> such an ability would also mean that if the user *does* ask for the
>> sort column value as output (ie it's not resjunk), we can still satisfy
>> the query from the index without recomputing the expensive function.
>>
>> So this is where we come to the connection to Heikki's index-only-quals
>> patch. As submitted, that code is only able to use an index column in
>> a scan qual, it's not able to return it as part of the scan result.
>> This example makes it clear that that definition is missing a large
>> part of the potential benefit of an index value extraction capability.
>>
>> To be able to do anything along that line would require some more work
>> in the executor and a *lot* more work in the planner, and I'm honestly
>> not sure what the planner part of it would look like.
>
> I think we should separate the Heap Fetch operation from the IndexScan.

I've been hacking on that approach. It's quite unfinished, but before I
spend any more time on it, I'd like to get some feedback on the overall
design.

The attached patch can create plans where quals are checked and joins
are performed using values from indexes only, and the heap tuples are
fetched only for matching rows. Passes regression tests, but code is
quite ugly at points. Cost estimation is bogus. The patch builds on the
indexam-api-changes patch I posted earlier, which is also attached. I
haven't yet done the changes to that patch that were discussed.

I haven't done any performance testing. The overhead of an extra
executor node for each index scan could slow down simple queries, we
might need to compensate that somehow, maybe reintroduce a fastpath
combined IndexScan+HeapFetch node. I'm also afraid the extra work I've
pushed to the stage where Paths are constructed could slow down planning
quite a bit if you have a lot of indexes.

Path nodes now carry a targetlist. That's because when you have a path like:

HeapFetch
-> Join
...

You won't have all the columns of the join rel available at the join
node yet, because they will be fetched in the HeapFetch node above. The
targetlist in Path nodes reflect that, and the targetlist of the final
Plan nodes are created from the targetlists in the Path nodes instead of
the ones in RelOptInfos.

Per earlier discussion, I changed the way index tuple fetching works in
B-tree, so that it can now be relied on. Matching index tuples are
copied to backend-local memory when the scan steps on a page.

Var nodes that refer to index columns (indexquals and the new index-only
filters) now have a new field, varindexno, set. While we could've
continued with the old representation, now that we have more expressions
that refer to index vars instead of heap vars, this makes debugging easier.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
indexam-api-changes-2.patch	text/x-diff	23.2 KB
heapfetch-1.patch	text/x-diff	118.4 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-15 03:15:17
Message-ID:	603c8f070909142015t1edd5067kaeccf4a9f148d2b3@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Sep 14, 2009 at 5:41 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Heikki Linnakangas wrote:
>> Tom Lane wrote:
>>> It strikes me that in the cases where it wouldn't be necessary to
>>> compute junk sort-key columns, it would be because we were scanning an
>>> index that includes those values. So if the plan were set up to pull
>>> those values from the index and return them, then we'd not have to add
>>> this extra complexity to grouping_planner --- the argument that it's not
>>> worth it to get rid of the junk columns comes back into play. Moreover,
>>> such an ability would also mean that if the user *does* ask for the
>>> sort column value as output (ie it's not resjunk), we can still satisfy
>>> the query from the index without recomputing the expensive function.
>>>
>>> So this is where we come to the connection to Heikki's index-only-quals
>>> patch. As submitted, that code is only able to use an index column in
>>> a scan qual, it's not able to return it as part of the scan result.
>>> This example makes it clear that that definition is missing a large
>>> part of the potential benefit of an index value extraction capability.
>>>
>>> To be able to do anything along that line would require some more work
>>> in the executor and a *lot* more work in the planner, and I'm honestly
>>> not sure what the planner part of it would look like.
>>
>> I think we should separate the Heap Fetch operation from the IndexScan.
>
> I've been hacking on that approach. It's quite unfinished, but before I
> spend any more time on it, I'd like to get some feedback on the overall
> design.
>
> The attached patch can create plans where quals are checked and joins
> are performed using values from indexes only, and the heap tuples are
> fetched only for matching rows. Passes regression tests, but code is
> quite ugly at points. Cost estimation is bogus. The patch builds on the
> indexam-api-changes patch I posted earlier, which is also attached. I
> haven't yet done the changes to that patch that were discussed.
>
> I haven't done any performance testing. The overhead of an extra
> executor node for each index scan could slow down simple queries, we
> might need to compensate that somehow, maybe reintroduce a fastpath
> combined IndexScan+HeapFetch node. I'm also afraid the extra work I've
> pushed to the stage where Paths are constructed could slow down planning
> quite a bit if you have a lot of indexes.
>
>
> Path nodes now carry a targetlist. That's because when you have a path like:
>
> HeapFetch
> -> Join
> ...
>
> You won't have all the columns of the join rel available at the join
> node yet, because they will be fetched in the HeapFetch node above. The
> targetlist in Path nodes reflect that, and the targetlist of the final
> Plan nodes are created from the targetlists in the Path nodes instead of
> the ones in RelOptInfos.
>
> Per earlier discussion, I changed the way index tuple fetching works in
> B-tree, so that it can now be relied on. Matching index tuples are
> copied to backend-local memory when the scan steps on a page.
>
> Var nodes that refer to index columns (indexquals and the new index-only
> filters) now have a new field, varindexno, set. While we could've
> continued with the old representation, now that we have more expressions
> that refer to index vars instead of heap vars, this makes debugging easier.

Hi, I'm reviewing this patch for the 2009-09 CommitFest.

It doesn't seem to compile.

make[4]: Entering directory `/home/rhaas/pgsql-git/src/backend/optimizer/path'
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE
-I/usr/include/libxml2 -c -o joinpath.o joinpath.c -MMD -MP -MF
.deps/joinpath.Po
joinpath.c: In function ‘can_bubbleup’:
joinpath.c:170: warning: implicit declaration of function ‘make_indexonly_expr’
joinpath.c: In function ‘bubbleup_step’:
joinpath.c:187: warning: implicit declaration of function ‘makeVar’
joinpath.c:188: error: ‘SelfItemPointerAttributeNumber’ undeclared
(first use in this function)
joinpath.c:188: error: (Each undeclared identifier is reported only once
joinpath.c:188: error: for each function it appears in.)
joinpath.c:189: error: ‘TIDOID’ undeclared (first use in this function)
joinpath.c:189: warning: assignment makes pointer from integer without a cast

Actually, before I even tried compiling this, I was looking through
the joinpath.c changes, since that is an area of the code with which I
have some familiarity. As I'm sure you're aware, the lack of
commenting makes it quite difficult to understand what this is trying
to do, and the functions are poorly named. It isn't self-explanatory
what "bubbling up" means, even in the limited context of joinpath.c.

Leaving that aside, I think that the approach here is likely wrong;
the decision about when to perform a heap fetch doesn't seem to be
based on cost, which I think it needs to be. Consider A IJ B, with
the scan over A implemented as an index scan. It seems to me that if
the join selectivity is < 1, then assuming there's a choice, we
probably want to join A to B and then do the heap fetches against A
afterwards. But if the join selectivity is > 1 (consider, for
example, a cross join), we probably want to do the heap fetches first.

Am I right to think that the heap fetch node is not optional? i.e.
even if only columns in the index are ever used, we need to make sure
that a heap fetch node gets inserted to check visibility. Is that
right?

I also think that the use of the term index-only quals is misleading.
It seemed good when you first posted to -hackers about it, but looking
at the patch, it means we have both index quals and index-only quals,
and it seems like that might not be the greatest naming. Maybe we
could call them index-tuple quals or index-evaluated quals or
something.

I'll try to review this a bit more later in the week - need to get
some sleep now.

...Robert

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-15 09:47:37
Message-ID:	4AAF62B9.2000003@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> Hi, I'm reviewing this patch for the 2009-09 CommitFest.

Thank you!

> It doesn't seem to compile.

Looks like the patch has bitrotted, sorry about that. Attached is an
updated version. I've also pushed this to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "heapfetch".

> Actually, before I even tried compiling this, I was looking through
> the joinpath.c changes, since that is an area of the code with which I
> have some familiarity. As I'm sure you're aware, the lack of
> commenting makes it quite difficult to understand what this is trying
> to do, and the functions are poorly named. It isn't self-explanatory
> what "bubbling up" means, even in the limited context of joinpath.c.

Yeah, understood :-(. The bubbling up refers to moving HeapFetch nodes
above a join, which I explained earlier in this thread:
http://archives.postgresql.org/message-id/4A90448F.4060601@enterprisedb.com,

> Leaving that aside, I think that the approach here is likely wrong;
> the decision about when to perform a heap fetch doesn't seem to be
> based on cost, which I think it needs to be.

Right, cost estimation and making choices based on it still missing.

> Consider A IJ B, with
> the scan over A implemented as an index scan. It seems to me that if
> the join selectivity is < 1, then assuming there's a choice, we
> probably want to join A to B and then do the heap fetches against A
> afterwards. But if the join selectivity is > 1 (consider, for
> example, a cross join), we probably want to do the heap fetches first.

Hmm, good point. I didn't consider that join selectivity can be > 1.

A more common scenario is that there's an additional filter condition on
the HeapFetch, with a selectivity < 1. It can then cheaper to perform
the heap fetches first and only join the remaining rows that satisfy the
filter condition.

It could also be cheaper to perform HeapFetch first if there's a lot of
dead tuples in the table, as the heap fetch weeds them out. I'm not sure
if we can estimate that in a meaningful way.

> Am I right to think that the heap fetch node is not optional? i.e.
> even if only columns in the index are ever used, we need to make sure
> that a heap fetch node gets inserted to check visibility. Is that
> right?

Correct. The plan is to eventually use the visibility map to skip the
heap access altogether, but we'll need to make the visibility map 100%
reliable first. We'll still need a HeapFetch node to perform the
visibility check, but it could perform it against the visibility map
instead of the heap.

> I also think that the use of the term index-only quals is misleading.
> It seemed good when you first posted to -hackers about it, but looking
> at the patch, it means we have both index quals and index-only quals,
> and it seems like that might not be the greatest naming. Maybe we
> could call them index-tuple quals or index-evaluated quals or
> something.

Hmm, I'm not too fond of the naming either. Not sure I like those
alternatives any better, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
indexam-api-changes-3.patch	text/x-diff	23.2 KB
heapfetch-2.patch	text/x-diff	118.4 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-15 11:53:13
Message-ID:	603c8f070909150453g4b3a9871x50b05314945b6d55@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 15, 2009 at 5:47 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Robert Haas wrote:
>> Hi, I'm reviewing this patch for the 2009-09 CommitFest.
>
> Thank you!
>
>> It doesn't seem to compile.
>
> Looks like the patch has bitrotted, sorry about that. Attached is an
> updated version. I've also pushed this to my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch "heapfetch".

OK, I'll pull from that.

>> Actually, before I even tried compiling this, I was looking through
>> the joinpath.c changes, since that is an area of the code with which I
>> have some familiarity. As I'm sure you're aware, the lack of
>> commenting makes it quite difficult to understand what this is trying
>> to do, and the functions are poorly named. It isn't self-explanatory
>> what "bubbling up" means, even in the limited context of joinpath.c.
>
> Yeah, understood :-(. The bubbling up refers to moving HeapFetch nodes
> above a join, which I explained earlier in this thread:
> http://archives.postgresql.org/message-id/4A90448F.4060601@enterprisedb.com,
>
>> Leaving that aside, I think that the approach here is likely wrong;
>> the decision about when to perform a heap fetch doesn't seem to be
>> based on cost, which I think it needs to be.
>
> Right, cost estimation and making choices based on it still missing.
>
>> Consider A IJ B, with
>> the scan over A implemented as an index scan. It seems to me that if
>> the join selectivity is < 1, then assuming there's a choice, we
>> probably want to join A to B and then do the heap fetches against A
>> afterwards. But if the join selectivity is > 1 (consider, for
>> example, a cross join), we probably want to do the heap fetches first.
>
> Hmm, good point. I didn't consider that join selectivity can be > 1.
>
> A more common scenario is that there's an additional filter condition on
> the HeapFetch, with a selectivity < 1. It can then cheaper to perform
> the heap fetches first and only join the remaining rows that satisfy the
> filter condition.

Well, again, it seems to me that it entirely depends on whether the IJ
increases or decreases the number of rows. You want to do the heap
fetches at the point where there are the fewest of them to do, and you
can't know that a priori. When you start talking about "more common"
scenarios, what you really mean is "more common in the queries I
normally do", and that's not the same as what other people's queries
do. (See, for example, previous discussions on -performance, where it
turns out that my suggested "fix" for a particular kind of planner
problem is the exact opposite of Kevin Grittner's fix for a problem
with the same code; the existing code bounds a certain value from
below at 1 - I suggested raising it to 2, he suggested lowering it to
0.)

> It could also be cheaper to perform HeapFetch first if there's a lot of
> dead tuples in the table, as the heap fetch weeds them out. I'm not sure
> if we can estimate that in a meaningful way.

Depends whether the selectivity calculations take this into account -
off the top of my head, I'm not sure.

>> Am I right to think that the heap fetch node is not optional? i.e.
>> even if only columns in the index are ever used, we need to make sure
>> that a heap fetch node gets inserted to check visibility. Is that
>> right?
>
> Correct. The plan is to eventually use the visibility map to skip the
> heap access altogether, but we'll need to make the visibility map 100%
> reliable first. We'll still need a HeapFetch node to perform the
> visibility check, but it could perform it against the visibility map
> instead of the heap.
>
>> I also think that the use of the term index-only quals is misleading.
>> It seemed good when you first posted to -hackers about it, but looking
>> at the patch, it means we have both index quals and index-only quals,
>> and it seems like that might not be the greatest naming. Maybe we
>> could call them index-tuple quals or index-evaluated quals or
>> something.
>
> Hmm, I'm not too fond of the naming either. Not sure I like those
> alternatives any better, though.

Maybe someone else will come up with a better idea. :-)

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-21 00:43:45
Message-ID:	603c8f070909201743r1fcc3405w1980962d2bd30455@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Sep 15, 2009 at 7:53 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> Consider A IJ B, with
>>> the scan over A implemented as an index scan. It seems to me that if
>>> the join selectivity is < 1, then assuming there's a choice, we
>>> probably want to join A to B and then do the heap fetches against A
>>> afterwards. But if the join selectivity is > 1 (consider, for
>>> example, a cross join), we probably want to do the heap fetches first.
>>
>> Hmm, good point. I didn't consider that join selectivity can be > 1.
>>
>> A more common scenario is that there's an additional filter condition on
>> the HeapFetch, with a selectivity < 1. It can then cheaper to perform
>> the heap fetches first and only join the remaining rows that satisfy the
>> filter condition.
>
> Well, again, it seems to me that it entirely depends on whether the IJ
> increases or decreases the number of rows. You want to do the heap
> fetches at the point where there are the fewest of them to do, and you
> can't know that a priori. When you start talking about "more common"
> scenarios, what you really mean is "more common in the queries I
> normally do", and that's not the same as what other people's queries
> do. (See, for example, previous discussions on -performance, where it
> turns out that my suggested "fix" for a particular kind of planner
> problem is the exact opposite of Kevin Grittner's fix for a problem
> with the same code; the existing code bounds a certain value from
> below at 1 - I suggested raising it to 2, he suggested lowering it to
> 0.)

I've been mulling over this problem all week. I haven't gotten all
that far, but here are my thoughts.

Suppose we're planning some joinrel within a large join nest. We have
two partial paths A and B with identical pathkeys. Path A does not
involve an index scan, so we know its exact cost. Path B involves an
index scan, so a heap fetch will have to be done at some point, but
since we can't yet know where the best place to do that heap fetch is,
so we can't know the exact cost of B. Basically, the decision we face
here is whether to keep both plans A and B or to discard one of them
as clearly inferior to the other. As a preliminary observation, this
is the logic that is performed by add_path(), which gets called A LOT
in planning large join nests, so the performance impact of what
happens there needs to be measured carefully.

We can, however, put some bounds on the cost of B. We know what the
cost of B is disregarding the heap fetch (or heap fetches) that will
eventually need to be performed. This cost is also the minimum total
cost of B, in the case where some yet-to-be-performed join is
estimated to return no rows, and thus the heap fetch is estimated to
not actually need to fetch anything. We can also bound the maximum
cost of B: it certainly can't be any higher than the cost of doing all
the heap fetches immediately above the index fetches. We could
probably compute a tighter upper bound by considering various possible
positions for the heap fetches at or below the level of the joinrel,
but I doubt that it's worth the effort.

Instead, what we can do is compare the low-estimate for B to the
estimate for A. If it's higher, discard B. Otherwise, compare the
high-estimate for B to the estimate for A. If it's lower, discard A.
Otherwise, keep both paths. More generally, we can conceive of each
path as having low and high estimates, and we can say that for two
paths with identical pathkeys, P dominates P' if the high-estimate for
P is less than the low-estimate for P'.

In this view of the world, we don't actually need to represent the
heap-fetches in the path trees during joinrel planning. Instead, we
build up a set of possible paths for the whole joinrel, and then at
the end, we go back and look at any remaining paths that require heap
fetches to be inserted and figure out the best place to put them. The
major problem I see with this approach is that it may slow down
add_path() too much, but if that turns out to be the case I don't have
another idea short of attacking the problem using some kind of
heuristics that will probably be less accurate than an analysis of the
type described above.

Thoughts?

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-21 01:25:17
Message-ID:	603c8f070909201825i7c411b79la8fa1e6fd7562467@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I've taken a quick look through the rest of this patch and there's
obviously some hackery that needs to be cleaned up before this is
committable, but you knew that already. I don't see a lot to object
to at a design level, but OTOH I'm not terribly familiar with the
executor, which is where most of the non-planner changes are to be
found, so I can't really offer an opinion with any certainty.

One other random thought: I notice that you copied a (completely
inscrutable, to me) comment beginning with "Check if we are evaluating
PlanQual..." from BitmapHeapNext to HeapFetchNext, so maybe this
hasn't escaped you either - but couldn't the heap fetches for a bitmap
index scan be postponed until higher up in the join tree just as
you're proposing to do for a regular index scan? It even seems
conceivable that it could be right to do a regular index scan (on,
say, the inner side of a nested loop), and then accumulate all the
results and do a single bitmap heap scan on the result. This might be
too marginal a case to worry about... especially for version one...
just throwing it out there.

I wonder if HeapFetchNext should be called just HeapNext for parity
with BitmapHeapNext, and the Heap Fetch node called a Heap Scan for
parity with Bitmap Heap Scan.

Since you previously stated that you were going to put this patch
aside to work on HS and SR[1], I'm going to move this to Returned with
Feedback for now. Hope that's OK, and that the feedback is sufficient
and useful.

...Robert

[1] http://archives.postgresql.org/message-id/4AB1DD0B.1080300@enterprisedb.com

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2009-09-21 14:46:06
Message-ID:	4AB791AE.7060409@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> Since you previously stated that you were going to put this patch
> aside to work on HS and SR[1], I'm going to move this to Returned with
> Feedback for now. Hope that's OK, and that the feedback is sufficient
> and useful.

Yes, on both counts. Thank you!

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date:	2010-02-23 16:52:27
Message-ID:	201002231652.o1NGqR105126@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Is this a TODO?

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Robert Haas wrote:
> > Since you previously stated that you were going to put this patch
> > aside to work on HS and SR[1], I'm going to move this to Returned with
> > Feedback for now. Hope that's OK, and that the feedback is sufficient
> > and useful.
>
> Yes, on both counts. Thank you!
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +