Re: plans for bitmap indexes?

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, Chris Browne <cbbrowne(at)acm(dot)org>
Subject: Re: plans for bitmap indexes?
Date: 2004-11-04 03:57:41
Message-ID: 200411040357.iA43vf925972@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Updated TODO:

* Allow the creation of bitmap indexes which can be quickly combined
with other bitmap indexes

Bitmap indexes index single columns that can be combined with other bitmap
indexes to dynamically create a composite index to match a specific query.
Each index is a bitmap, and the bitmaps are bitwise AND'ed or OR'ed to be
combined. Such indexes could be more compact if there are few unique
value. Also, perhaps they can be lossy requiring a scan of the heap page
to find matching rows.

* Allow non-bitmap indexes to be combined

Do lookups on non-bitmap indexes and create bitmaps in memory that can be
combined with other indexes.

---------------------------------------------------------------------------

Simon Riggs wrote:
> > Tom Lane
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
>
> > > How would you dynamically build the bit maps from the indexes?
> >
> > > Or would you:
> > > - copy aside and sort the indexes on CTID
> > > - merge join them all to find matching CTIDs
> > > - probe into the main table
> >
> > I've been taking "bitmap" to be a rather handwavy way of saying "a
> > compact representation of sets of CTIDs that is readily amenable to
> > being ANDed and ORed with other sets". I don't think it'll be a pure
> > bitmap with no other superstructure; at the very least you'd want to
> > apply some sort of sparse-bitmap and/or compression techniques. I do
> > suspect a bitmappy kind of representation will be more effective than
> > sorting arrays of CTIDs per se, although in principle you could do it
> > that way too.
> >
>
> OK. You seemed to be implying that.
>
> > (What you lose is the ability to retrieve data in
> > index order, so this isn't a replacement for existing indexscan methods,
> > just another plan type to consider.)
>
> Never seen an application that required a bitmap plan and sorted output.
> Have you? Mostly count(*), often sum() or avg(), but never sorted, surely.
>
> Considering there would always be >1 index, which index order did we want
> anyhow?
>
> > One interesting thought is that the bitmappy representation could be
> > lossy. For instance, once you get to the point of needing to examine
> > most of the rows on a particular page, it's probably not worth
> > remembering exactly which rows; you could just remember that that whole
> > page is a target, and sequentially scan all the rows on it when you do
> > visit the heap. (ANDing and ORing still works.) This can scale up to
> > visiting consecutive ranges of pages; in the limit the operation
> > degenerates to a seqscan. With this idea you can guarantee that the
> > in-memory bitmaps never get impracticably large. (Obviously if they get
> > so large as to push the system into swapping, or even run the backend
> > out of memory completely, you lose, so this is a real nice guarantee to
> > be able to make.) The whole thing starts to look like a self-adaptive
> > interpolation between our present indexscan and seqscan techniques,
> > which takes a lot of pressure off the planner to correctly guess the
> > number of matching rows in advance.
>
> Well, thats the best one yet. That's the solution, if ever I heard it.
>
> The reduction in bitmap size makes their use much safer. Size matters, since
> we're likely to start using these techniques on very large databases, which
> imply obviously have very large CTID lists. The problem with guessing the
> number of rows is you're never too sure whether its worth the startup
> overhead of using the bitmap technique. ....my next question was going to
> be, so how will you know when to use the technique?
>
> Hmmm....think....you'd need to be clear that the cost of scanning a block
> didn't make the whole thing impractical. Generally, since we're using this
> technique to access infrequent row combinations, we'd be looking at no more
> than one row per block usually anyway. So the technique is still I/O bound -
> a bit extra post I/O cpu work won't hurt much. OK, cool.
>
> > I remember batting these ideas around with people at the 2001 "OSDB
> > summit" conference ... I didn't think it would take us this long to get
> > around to doing it ...
>
> ...as if you haven't been busy... ;-)
>
> Best Regards, Simon Riggs
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-11-04 05:07:02 Re: plans for bitmap indexes?
Previous Message Robert Treat 2004-11-04 03:32:36 Re: pg_autovacuum is nice ... but ...