Re: plans for bitmap indexes?

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Mark Kirkwood" <markir(at)coretech(dot)co(dot)nz>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, "Chris Browne" <cbbrowne(at)acm(dot)org>
Subject: Re: plans for bitmap indexes?
Date: 2004-10-19 23:05:05
Message-ID: 27879.1098227105@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> I was thinking about this recently, then realised that building the bitmap
> would not be as easily, since PostgreSQL doesn't index null values.

As Alvaro already pointed out, this statement is bogus; and I'm not sure
what it has to do with the topic anyway. All you care about is the rows
that the index fingers as matching your scan condition. If the scan
condition is strict (which it usually is) it does not matter whether the
index stores entries for nulls or not.

> How would you dynamically build the bit maps from the indexes?

> Or would you:
> - copy aside and sort the indexes on CTID
> - merge join them all to find matching CTIDs
> - probe into the main table

I've been taking "bitmap" to be a rather handwavy way of saying "a
compact representation of sets of CTIDs that is readily amenable to
being ANDed and ORed with other sets". I don't think it'll be a pure
bitmap with no other superstructure; at the very least you'd want to
apply some sort of sparse-bitmap and/or compression techniques. I do
suspect a bitmappy kind of representation will be more effective than
sorting arrays of CTIDs per se, although in principle you could do it
that way too.

But yeah, the basic idea is to scan an index and build some sort of
in-memory set of CTIDs of selected rows; possibly AND or OR this with
other sets built from other indexes; and then scan the set and probe
into the heap at the indicated places. One huge advantage is that the
actual heap visiting becomes efficient, eg you never visit the same page
more than once. (What you lose is the ability to retrieve data in
index order, so this isn't a replacement for existing indexscan methods,
just another plan type to consider.)

One interesting thought is that the bitmappy representation could be
lossy. For instance, once you get to the point of needing to examine
most of the rows on a particular page, it's probably not worth
remembering exactly which rows; you could just remember that that whole
page is a target, and sequentially scan all the rows on it when you do
visit the heap. (ANDing and ORing still works.) This can scale up to
visiting consecutive ranges of pages; in the limit the operation
degenerates to a seqscan. With this idea you can guarantee that the
in-memory bitmaps never get impracticably large. (Obviously if they get
so large as to push the system into swapping, or even run the backend
out of memory completely, you lose, so this is a real nice guarantee to
be able to make.) The whole thing starts to look like a self-adaptive
interpolation between our present indexscan and seqscan techniques,
which takes a lot of pressure off the planner to correctly guess the
number of matching rows in advance.

I remember batting these ideas around with people at the 2001 "OSDB
summit" conference ... I didn't think it would take us this long to get
around to doing it ...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2004-10-19 23:23:32 Re: plans for bitmap indexes?
Previous Message Mark Kirkwood 2004-10-19 22:52:24 Re: plans for bitmap indexes?