Re: the big picture for index-only scans

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: the big picture for index-only scans
Date: 2011-05-10 17:25:34
Message-ID: 4DC92EBE020000250003D4ED@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
>> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>>> This topic has been discussed many times, yet I have never seen
>>> an assessment that explains WHY we would want to do index-only
>>> scans.
>
>> In databases with this feature, it's not too unusual for a query
>> which uses just an index to run one or more orders of magnitude
>> faster than a query which has to randomly access the heap for
>> each index entry. That seems like enough evidence of its
>> possible value in PostgreSQL to proceed to the point where
>> benchmarks become possible. I'm assuming that, like all other
>> features added as performance optimizations, it won't be
>> committed until there are benchmarks showing the net benefit.
>
>> As a thought experiment, picture the relative costs of scanning a
>> portion of an index in index sequence, and being done, versus
>> scanning a portion of an index in index sequence and jumping to a
>> random heap access for each index entry as you go.
>
> It's already the case that we'll flip over to a bitmap indexscan,
> and thus get rid of most/all of the "random" page accesses, in
> situations where this is likely to be a big win. Pointing to the
> performance difference in databases that don't do that is
> therefore not too convincing.

Sure. Of course, if you're only accessing twenty thousand rows from
a table containing fifty million rows, bitmap index scans could come
out pretty close to random access times; but on the whole I agree
that the scale of benefit in PostgreSQL won't tend to match what
people see in other products. Note that my words were "enough
evidence of its possible value in PostgreSQL to proceed to the point
where benchmarks become possible."

In particular, we might want to somehow try to make clear to people
that the very wide indexes they are accustomed to creating to allow
this optimization in other products might be inefficient compared to
creating several one-column indexes which would enable bitmap
logical operations.

> I'm inclined to agree that index-only scans might be worth the
> amount of work that's involved

So we agree there.

> ... but I share Simon's desire to see some proof before anything
> gets committed.

And we agree there. In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.

My overall gut feel is that there will be some circumstances where
the "covering index" optmization is much faster, and some where
people expect it to be, but it isn't. The trickiest part of this
might be developing a costing model which allows us to make the
right choice most of the time. And even if we get it perfect, we
can expect questions about why the covering index wasn't used, and
requests for hints so they can force it. :-(

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2011-05-10 17:32:25 Re: hint bit cache v5
Previous Message Heikki Linnakangas 2011-05-10 17:22:20 Re: collateral benefits of a crash-safe visibility map