Re: COUNT(*) and index-only scans

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Greg Stark <stark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: COUNT(*) and index-only scans
Date: 2011-11-21 18:43:09
Message-ID: CA+Tgmoa_b=PnmF-qcNx03ufFTjVRGMuts9TTJKJh-P-P2kK9GQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Nov 19, 2011 at 11:22 AM, Thom Brown <thom(at)linux(dot)com> wrote:
> While I accept that maybe adapting the existing bitmap index scan
> functionality isn't necessarily desirable, would it be feasible to
> create a corresponding bitmap index-only scan method.

I've been thinking about this a bit more; I wonder whether we could
create yet another type of index scan - let's call it a Streaming
Index Only Scan. A streaming index only scan uses a buffer, which
holds index tuples and the corresponding CTIDs, and it has some
maximum size, probably based on work_mem. Tuples in the buffer are
kept sorted by CTID. The scan happens in two alternating phases:
buffer fill, and buffer drain.

In buffer fill mode, we scan the index and add matching tuples and
their CTIDs to the buffer. When the buffer is full or the index AM
reports that there are no more tuples in the scan, we switch to buffer
drain mode.

In buffer drain mode, we repeatedly select a heap block number and
attempt to return all buffered tuples on that page. We maintain a
counter, LastBlockNumber, initially zero, which tracks the last heap
block number so selected. To select the next block number, we choose
the first block number greater than or equal to LastBlockNumber
referenced by any CTID in the buffer (it should be possible to design
the buffer so that this value can be computed quickly); if there are
none, we instead choose the first block number referenced by any CTID
in the buffer, period. Having selected the block number, we check
whether the page is all-visible. If so, we can return all the index
tuples from that page without further ado. Otherwise, we fetch the
heap block, check visibility for each tuple, and return only those
index tuples for which the corresponding heap tuples are visible to
the scan. If there's now enough buffer space available to be certain
that the next index tuple will fit, we switch back to buffer fill
mode; otherwise, we remain in buffer drain mode.

As compared with a bitmap index scan, this doesn't have the advantage
of being able to combine multiple indexes effectively; I don't really
see any way to make that work with the index-only scan concept in
general, except perhaps for the special case of a zero-argument
aggregate. It also doesn't have the advantage that each heap page
will be guaranteed to be visited only once. But in practice duplicate
visits to the same page should be uncommon; they'll be avoided when
either work_mem is sufficient to buffer the whole scan, or when
there's some degree of table clustering with respect to the index.

While I'm building castles in the sky, a further idea would be to try
to optimize the case where there's a LIMIT node above the scan. If we
could pass down a hint indicating how many rows are thought likely to
be needed, we could enter buffer drain mode after about that many
tuples, instead of waiting until the buffer was full. If the hint is
right, we win (and if it's wrong, we can still go back and fetch some
more tuples, at a cost of possible performance loss).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-11-21 18:45:18 Re: psql \ir filename normalization
Previous Message Alexander Shulgin 2011-11-21 18:35:16 Notes on implementing URI syntax for libpq