Re: State of the on-disk bitmap index

From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Daniel Bausch *EXTERN*" <bausch(at)dvs(dot)tu-darmstadt(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: State of the on-disk bitmap index
Date: 2012-08-20 07:40:19
Message-ID: D960CB61B694CF459DCFB4B0128514C2084EEFA5@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Daniel Bausch wrote:
> Hello Jonah, Simon, and the hackers,
>
> I am going to implement a simple kind of "encoded bitmap indexes"
(EBI).
> That is an index type where the bitmap columns may not only contain
> only a single '1' in the set of bits belonging to a tuple. Instead,
an
> additional mapping table translates the distinct values of the table
> column into a unique encoding. To select for a given value all bitmap
> columns must be compared instead of only one. Queries that match
> multiple different values (like IN lists or range queries) simplify to
> less than the full set of bitmaps that needs to be compared because of
> boolean logic. The total number of bitmaps required to represent
unique
> encodings for all different values is ceil(ld(n)), where n is the
number
> of distinct values. Compared to normal bitmap indexes this solves the
> problem of high-cardinality columns. It is targetet at data
warehousing
> scenarios with insert only data.
>
> The respective scientific paper can be found at
> http://www.dvs.tu-darmstadt.de/publications/pdf/ebi_a4.pdf

I cannot answer your questions, but I read the paper and have some
questions myself.

1) As you mention, a WHERE clause that checks for only one value
will be more expensive with an encoded bitmap index than with
a regular bitmap index. If you want to implement encoded bitmap
indexes, wouldn't it be good to also implement regular bitmap
indexes so that the user has a choice?

2) The paper mentions that finding a good encoding and simplifying
bitmap access for a certain query are nontrivial problems.
Moreover, an encoding is good or bad only with respect to
certain queries, which the system does not know at index
creation time.
Do you have any ideas how to approach that?
If not, the paper suggests that, with enough values to check for,
even a non-optimized encoded bitmap index should perform
much better than a normal bitmap index, so maybe that's the way
to go (maybe only encode the NULL value as all zeros).

Yours,
Laurenz Albe

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2012-08-20 08:45:36 Re: [PATCH] Docs: Make notes on sequences and rollback more obvious
Previous Message Rushabh Lathia 2012-08-20 06:50:52 Primary Key Constraint on inheritance table not getting route to child tables