Re: [PATCHES] WIP: bitmap indexes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCHES] WIP: bitmap indexes
Date: 2006-08-15 03:37:54
Message-ID: 27950.1155613074@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> One of the main reasons for the uglification of the executor in Jie's
> original patch was that she wanted to avoid the inefficiency of
> translating the on disk bitmap representation to the TID bitmap
> representation.

Offhand that seems like micro-optimization, at least as stated -- you
want to think about number of page fetches rather than exactly how a
bitmap is represented. I've still not got round to reading the patch
in detail, but after thinking about it in the shower on a few mornings,
it seems to me that the key concept we'd like to implement is that a
bitmap index can provide its data in a page-by-page fashion without the
overhead of fabricating the bitmap in-memory as btree forces us to do.
That is, the current implementation involves doing the whole index scan
and building a bitmap in memory, which we then read out page-wise for
the heap scan. The weak spot of this approach is that the in-memory
bitmap can get unreasonably large. With a bitmap index, you can pass
data back in a page-by-page fashion: here are the TID indexes to hit on
this page, then here are the TID indexes to hit on the next page, etc.
Correct me if I'm wrong, but isn't the patch's present hacking on the
executor intended to make it happen like this?

The main architectural idea I have at the moment is that this should all
be private between tidbitmap.c and the bitmap index AM. I think we
could perhaps have a flavor of tidbitmap that is "serial access" as
opposed to the present random-access, ie, instead of keeping the whole
bitmap in memory, you have a function pointer that you can call to
demand the next bitmap page in sequence. tidbitmap.c will need to
manage both kinds of bitmaps and be prepared to do ANDs and ORs across
both types, but just at the moment I see no showstopper reason why this
can't work.

Or is that exactly what you said already? It's late here and I'm a
bit tired...

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Darcy Buskermolen 2006-08-15 04:09:49 New beginings
Previous Message Gavin Sherry 2006-08-15 02:36:13 Re: [PATCHES] WIP: bitmap indexes

Browse pgsql-patches by date

  From Date Subject
Next Message Gavin Sherry 2006-08-15 04:49:41 Re: [PATCHES] WIP: bitmap indexes
Previous Message Gavin Sherry 2006-08-15 02:36:13 Re: [PATCHES] WIP: bitmap indexes