Re: State of the on-disk bitmap index

From: Daniel Bausch <bausch(at)dvs(dot)tu-darmstadt(dot)de>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: State of the on-disk bitmap index
Date: 2012-08-20 09:44:42
Message-ID: 5032070A.302@dvs.tu-darmstadt.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Am 20.08.2012 09:40, schrieb Albe Laurenz:
> Daniel Bausch wrote:
>> Hello Jonah, Simon, and the hackers,
>>
>> I am going to implement a simple kind of "encoded bitmap indexes"
> (EBI).
>> That is an index type where the bitmap columns may not only contain
>> only a single '1' in the set of bits belonging to a tuple. Instead,
> an
>> additional mapping table translates the distinct values of the table
>> column into a unique encoding. To select for a given value all bitmap
>> columns must be compared instead of only one. Queries that match
>> multiple different values (like IN lists or range queries) simplify to
>> less than the full set of bitmaps that needs to be compared because of
>> boolean logic. The total number of bitmaps required to represent
> unique
>> encodings for all different values is ceil(ld(n)), where n is the
> number
>> of distinct values. Compared to normal bitmap indexes this solves the
>> problem of high-cardinality columns. It is targetet at data
> warehousing
>> scenarios with insert only data.
>>
>> The respective scientific paper can be found at
>> http://www.dvs.tu-darmstadt.de/publications/pdf/ebi_a4.pdf
>
> I cannot answer your questions, but I read the paper and have some
> questions myself.
>
> 1) As you mention, a WHERE clause that checks for only one value
> will be more expensive with an encoded bitmap index than with
> a regular bitmap index. If you want to implement encoded bitmap
> indexes, wouldn't it be good to also implement regular bitmap
> indexes so that the user has a choice?

Sorry if that one was not clear: The first thing, I am going to do, is
to work on the normal bitmap indexes (the one based on the Bizgres
patch). I want to port it to master HEAD and give it back to the
community. After that I want to base my EBI implementation on that.
Eventually, I will publish that implementation, too. (After doing
tuning, experiments, and make sure it works well.)

> 2) The paper mentions that finding a good encoding and simplifying
> bitmap access for a certain query are nontrivial problems.
> Moreover, an encoding is good or bad only with respect to
> certain queries, which the system does not know at index
> creation time.

Actually, I was not involved in writing that paper. I want to use that
idea to show something different. I know of a follow up work by Golam
Rabilul Alam et al. that uses the query history and data mining on that
to optimize for the most common cases. There may be others. A more
detailed discussion of EBI can also be found in:

http://www-old.dvs.informatik.tu-darmstadt.de/staff/wu/query.TR.ps.gz

> Do you have any ideas how to approach that?
> If not, the paper suggests that, with enough values to check for,
> even a non-optimized encoded bitmap index should perform
> much better than a normal bitmap index, so maybe that's the way
> to go (maybe only encode the NULL value as all zeros).

Actually "all zeros" is reserved for "non-existent" (a.k.a. "deleted" or
"invisible").

The thing with the "enough values" is a bit problematic, indeed, because
even a DBA cannot influence how the queries of the user or the user
application look like. You will not use encoded bitmap indexes or
normal bitmap indexes for a column that is usually point accessed like
the ID column. For that you will stick to hash or tree indexes.

Kind regards,
Daniel

--
Daniel Bausch
Wissenschaftlicher Mitarbeiter
Technische Universität Darmstadt
Fachbereich Informatik
Fachgebiet Datenbanken und Verteilte Systeme

Hochschulstraße 10
64289 Darmstadt
Germany

Tel.: +49 6151 16 6706
Fax: +49 6151 16 6229

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Bausch 2012-08-20 10:33:19 Re: State of the on-disk bitmap index
Previous Message Craig Ringer 2012-08-20 08:45:36 Re: [PATCH] Docs: Make notes on sequences and rollback more obvious