Re: Index-only scans

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Index-only scans
Date: 2009-07-14 08:23:46
Message-ID: 4A5C4092.6060607@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2009-07-13 at 10:16 +0300, Heikki Linnakangas wrote:
>
>> Implementing index-only scans requires a few changes:
>
> I would like to see a clear exposition of the use cases and an an
> analysis of the costs and benefits of doing this. It sounds cool, but I
> want to know it is cool before we spend time solving all of the juicy
> problems. Perhaps a glue-and-string patch would help.

There's a working prototype at in my git repository at git.postgresql.org.

> Extra buffer accesses for vismap, crash-safe vismap sound like
> performance issues, as well as planner time, not to mention all the
> tuits needed. Will it damage the general case?

It does add some work to the planner, but I don't think it's noticeable.
The visibility map accesses are only needed when we're doing an
index-only scan, not in the general case, so the impact of those come
down to how well we can estimate the cost of index-only scans, so that
an index-only scan is not chosen when not beneficial.

> The single SQL example mentioned already has at least two mechanisms for
> improving performance of that type of query. We probably don't need
> another, or at least we need a good analysis of why.

Well, another class of queries where index-only scans are beneficial is
when you fetch a range of rows from index, where the heap fetches result
in a lot of random I/O. Clustering helps with that, but you can only
cluster a table on one column. A classic example where that's a problem
is a many-to-many relationship:

CREATE TABLE a (aid integer, ...);
CREATE TABLE b (bid integer, ...);
CREATE TABLE manytomany (aid integer, bid integer);
CREATE INDEX a_b ON manytomany (aid, bid);
CREATE INDEX b_a ON manytomany (bid, aid);

If you need to query the many-to-many relationship in "both directions", ie:
SELECT bid FROm manytomany WHERE aid = ?
SELECT aid FROM manytomany WHERE bid = ?

You have to choose which index you cluster the table on, which will be
fast, and the other query will be slow.

> The benefit that occurs to me most is covered indexes, i.e. it opens up
> new and interesting indexing strategies. Covered indexes are also one
> kind of materialized view. It may be better to implement mat views and
> gain wider benefits too.

Materialized view sure would be nice, but doesn't address quite the same
use cases. Doesn't help with the many-to-many example above, for
example. We should have both.

> Or maybe index-only scans are mat views, via
> some cunning plan?

Heh, no :-).

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-07-14 08:38:50 Re: Index-only scans
Previous Message Jaime Casanova 2009-07-14 07:58:48 Re: COPY WITH CSV FORCE QUOTE *