Re: Visibility map thoughts

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Visibility map thoughts
Date: 2007-11-06 14:18:21
Message-ID: 473077AD.30101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> I'm thinking that looking in the visibility map will have a cost also,
> so how will we know whether to bother looking? I'm assuming that we
> won't want to do that lookup in all cases, since it could easily just
> add pathlength and contention in the normal OLTP case.

Yeah, I believe that's true. There's a non-zero cost to return tuples
from the index as well. Now that we do b-tree scans page at a time, if
we're going to return data from the index, we need to copy all the
matching data to backend-private memory when we step to a page.

> Presumably there
> would be a test in the planner to see if an index-only plan was
> possible?

Yes, you can check if all the columns you need are in the index. You
won't know for sure how much of the visibility map is set until
execution time, however.

I'm thinking that we're going to have an entry in pg_stats on the
fraction of set bits in the map, and factor that in the cost estimate of
index scans.

> ISTM that it would make most sense to do it during BitmapIndex scans.
> Specifically, as an intermediate step between BitmapIndex scan and
> BitmapHeap scan. That would be fairly likely to be a win in most cases
> because the bitmaps should compare easily and the amortised cost per row
> is likely to be very small, even if no heap lookups are avoided.

That only helps with COUNT(*), because until the BitmapHeap scan all you
have is a bitmap of tuples. You can't return any values from just a
bitmap. (Seems easy to do, so it's still probably worth doing just for
COUNT(*), though)

A major design question is how where and how the executor for index-only
scans should work. The required planner changes will follow from that.
The simplest solution is to just modify nodeIndexScan.c a little bit to
check the visibility map before fetching the heap tuple. If the bit in
the visibility map is set, skip the heap fetch and return the required
values from the index tuple instead. That would give most of the benefit
already.

The indexam API needs to be modified as well, because there's currently
no API to return index tuples from an index.

There's some more fancy stuff we could do if we separated the index scan
and heap fetch to two different executor nodes. You could for example do
an index-only scan and join, and check visibility only after the join,
avoiding unnecessarily accessing the heap for tuples that don't satisfy
the join condition. In fact we could do that without the visibility map
as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Mielke 2007-11-06 14:21:00 Re: Visibility map thoughts
Previous Message Simon Riggs 2007-11-06 14:08:37 Re: Visibility map thoughts