using new bitmap scans to index bit columns?

Lists: pgsql-general
From: TJ O'Donnell <tjo(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: using new bitmap scans to index bit columns?
Date: 2005-11-09 18:18:35
Message-ID: 43723D7B.8070906@acm.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I like the new bitmap scans and I'm wondering if there is any way
I can utilize them for my bitmask column, defined as bit(1024).
I use this column as a sort of fingerprint to quickly scan my tables.
But it is a scan, not an index. I have not figured out a way to
index the bitmask column. Is there some way it can be used as
an index now that there are bitmap scans in 8.1?

Currently I do this:
Select * from mytable where contains(bitmask, fingerprint(user_data))
and matches(datacolumn, user_data);

user_data is a string, like a regexp but with different semantics for chemical data.
bitmask is precomputed/stored as bit(1024) = fingerprint(datacolumn)
contains(a,b) returns bool as 'select b=(a&b);'

This works well because matches() is an expensive functions.
But it would work better if bitmask could be indexed, no?

TJ O'Donnell


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "TJ O'Donnell" <tjo(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: using new bitmap scans to index bit columns?
Date: 2005-11-09 18:45:46
Message-ID: Pine.GSO.4.63.0511092145140.29329@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, 9 Nov 2005, TJ O'Donnell wrote:

> I like the new bitmap scans and I'm wondering if there is any way
> I can utilize them for my bitmask column, defined as bit(1024).
> I use this column as a sort of fingerprint to quickly scan my tables.
> But it is a scan, not an index. I have not figured out a way to
> index the bitmask column. Is there some way it can be used as
> an index now that there are bitmap scans in 8.1?
>
> Currently I do this:
> Select * from mytable where contains(bitmask, fingerprint(user_data))
> and matches(datacolumn, user_data);
>
> user_data is a string, like a regexp but with different semantics for
> chemical data.
> bitmask is precomputed/stored as bit(1024) = fingerprint(datacolumn)
> contains(a,b) returns bool as 'select b=(a&b);'
>
> This works well because matches() is an expensive functions.
> But it would work better if bitmask could be indexed, no?

You can use GiST to do that.

>
> TJ O'Donnell
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: TJ O'Donnell <tjo(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: using new bitmap scans to index bit columns?
Date: 2005-11-09 18:52:04
Message-ID: 20051109185204.GB713@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Nov 09, 2005 at 10:18:35AM -0800, TJ O'Donnell wrote:
> I like the new bitmap scans and I'm wondering if there is any way
> I can utilize them for my bitmask column, defined as bit(1024).
> I use this column as a sort of fingerprint to quickly scan my tables.
> But it is a scan, not an index. I have not figured out a way to
> index the bitmask column. Is there some way it can be used as
> an index now that there are bitmap scans in 8.1?

Note: the fact that they are called bitmaps indexes doesn't imply
anything about the types used. The fact that you have a field already
as a bitmap doesn't actually help. The operation still needs to be an
indexable.

However, bitmaps indexes does mean that a single query can use multiple
indexes. So if you can split your contains into different parts of the
string, the optimozier can combine them. Whether this is more
efficient, who knows...

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.