Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Resjunk sort columns, Heikki's index-only quals patch, and bug #5000
Date: 2009-09-15 09:47:37
Message-ID: 4AAF62B9.2000003@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> Hi, I'm reviewing this patch for the 2009-09 CommitFest.

Thank you!

> It doesn't seem to compile.

Looks like the patch has bitrotted, sorry about that. Attached is an
updated version. I've also pushed this to my git repository at
git://git.postgresql.org/git/users/heikki/postgres.git, branch "heapfetch".

> Actually, before I even tried compiling this, I was looking through
> the joinpath.c changes, since that is an area of the code with which I
> have some familiarity. As I'm sure you're aware, the lack of
> commenting makes it quite difficult to understand what this is trying
> to do, and the functions are poorly named. It isn't self-explanatory
> what "bubbling up" means, even in the limited context of joinpath.c.

Yeah, understood :-(. The bubbling up refers to moving HeapFetch nodes
above a join, which I explained earlier in this thread:
http://archives.postgresql.org/message-id/4A90448F.4060601@enterprisedb.com,

> Leaving that aside, I think that the approach here is likely wrong;
> the decision about when to perform a heap fetch doesn't seem to be
> based on cost, which I think it needs to be.

Right, cost estimation and making choices based on it still missing.

> Consider A IJ B, with
> the scan over A implemented as an index scan. It seems to me that if
> the join selectivity is < 1, then assuming there's a choice, we
> probably want to join A to B and then do the heap fetches against A
> afterwards. But if the join selectivity is > 1 (consider, for
> example, a cross join), we probably want to do the heap fetches first.

Hmm, good point. I didn't consider that join selectivity can be > 1.

A more common scenario is that there's an additional filter condition on
the HeapFetch, with a selectivity < 1. It can then cheaper to perform
the heap fetches first and only join the remaining rows that satisfy the
filter condition.

It could also be cheaper to perform HeapFetch first if there's a lot of
dead tuples in the table, as the heap fetch weeds them out. I'm not sure
if we can estimate that in a meaningful way.

> Am I right to think that the heap fetch node is not optional? i.e.
> even if only columns in the index are ever used, we need to make sure
> that a heap fetch node gets inserted to check visibility. Is that
> right?

Correct. The plan is to eventually use the visibility map to skip the
heap access altogether, but we'll need to make the visibility map 100%
reliable first. We'll still need a HeapFetch node to perform the
visibility check, but it could perform it against the visibility map
instead of the heap.

> I also think that the use of the term index-only quals is misleading.
> It seemed good when you first posted to -hackers about it, but looking
> at the patch, it means we have both index quals and index-only quals,
> and it seems like that might not be the greatest naming. Maybe we
> could call them index-tuple quals or index-evaluated quals or
> something.

Hmm, I'm not too fond of the naming either. Not sure I like those
alternatives any better, though.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
indexam-api-changes-3.patch text/x-diff 23.2 KB
heapfetch-2.patch text/x-diff 118.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Meskes 2009-09-15 10:06:01 Re: clang's static checker report.
Previous Message std pik 2009-09-15 09:34:29 hardware information