Re: On-disk bitmap index patch

Lists: pgsql-hackers
From: "Dave Page" <dpage(at)vale-housing(dot)co(dot)uk>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: automatic system info tool?
Date: 2006-07-16 23:06:14
Message-ID: 000401c6a92c$622a5b66$6a01a8c0@valehousing.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

-----Original Message-----
From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Sent: 16/07/06 23:50
Subject: Re: [HACKERS] automatic system info tool?

> We also classify buildfarm machines by <os, os_version, compiler,
> compiler_version> and config.guess doesn't give us that, unfortunately.

It also won't work on Win32 without msys (or SFU/Interix).

/D


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: On-disk bitmap index patch
Date: 2006-07-18 06:28:29
Message-ID: C0E1CD9D.8DC0%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I have posted a patch to the CVS head for on-disk bitmap index to
pgsql-patches. If this can get in 8.2, that would be great. Any comments and
suggestions are welcome.

I still need to add several items:

(1) README file in src/backend/access/bitmap.
(2) Bitmap index documentation.
(3) Hiding the internal btree.

Also, I have disabled the multi-column index support because there is a
known problem. Assume that there is a bitmap index on a and b. When a query
predicate has only a, the current code may generate a wrong result. That's
because the current code assumes that b is null. The ultimate problem is
because the search code only handles one bitmap vector now. I need a fix to
support manipulating multiple bitmap vectors.

If you find any other problems, please let me know.

Thanks,
Jie


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Jie Zhang <jzhang(at)greenplum(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-23 21:21:33
Message-ID: Pine.LNX.4.58.0607240720180.19004@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 17 Jul 2006, Jie Zhang wrote:

> Hi,
>
> I have posted a patch to the CVS head for on-disk bitmap index to
> pgsql-patches. If this can get in 8.2, that would be great. Any comments and
> suggestions are welcome.
>
> I still need to add several items:
>
> (1) README file in src/backend/access/bitmap.
> (2) Bitmap index documentation.
> (3) Hiding the internal btree.
>
> Also, I have disabled the multi-column index support because there is a
> known problem. Assume that there is a bitmap index on a and b. When a query
> predicate has only a, the current code may generate a wrong result. That's
> because the current code assumes that b is null. The ultimate problem is
> because the search code only handles one bitmap vector now. I need a fix to
> support manipulating multiple bitmap vectors.
>
> If you find any other problems, please let me know.

Is anyone else looking at this patch? I am reviewing it with Jie, tidying
it up and trying to solve some of the problems mentioned above. If anyone
wants to take a look at us, please ask for an updated patch.

Thanks,

Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-23 23:43:35
Message-ID: 28525.1153698215@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> Is anyone else looking at this patch?

It's on my list of things to look at, but so are a lot of other patches ;-)

A couple of comments after a *very* fast scan through the patch:

* The xlog routines need help; they seem to not be updated for recent
changes in the API for xlog recovery code.

* The hacks on vacuum.c (where it tests the AM name) are utterly
unacceptable. If you need the main code to do something special for a
particular index AM, define a bool flag column for it in pg_am.

* The interface to the existing executor bitmap scan code is mighty
messy --- seems like there's a lot of almost duplicate code, a lot
of rather invasive hacking, etc. This needs to be rethought and
refactored.

* Why in the world is it introducing duplicate copies of a lot
of btree comparison functions? Use the ones that are there.

* The patch itself is a mess: it introduces .orig and .rej files,
changes around $PostgreSQL$ lines, etc.

However, the main problem I've got with this is that a new index AM is a
pretty large burden, and no one's made the slightest effort to sell
pghackers on taking this on. What's the use-case, what's the
performance like, where is it really an improvement over what we've got?

regards, tom lane


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-24 00:06:23
Message-ID: Pine.LNX.4.58.0607240952250.19854@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 23 Jul 2006, Tom Lane wrote:

> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > Is anyone else looking at this patch?
>
> It's on my list of things to look at, but so are a lot of other patches ;-)
>
> A couple of comments after a *very* fast scan through the patch:
>
> * The xlog routines need help; they seem to not be updated for recent
> changes in the API for xlog recovery code.

Yep. The patch was actually against 8.1 and was hastily brought up to 8.2.
I think Jie's intention was to simply let everyone know that this was
going on.

>
> * The hacks on vacuum.c (where it tests the AM name) are utterly
> unacceptable. If you need the main code to do something special for a
> particular index AM, define a bool flag column for it in pg_am.

Yes.

>
> * The interface to the existing executor bitmap scan code is mighty
> messy --- seems like there's a lot of almost duplicate code, a lot
> of rather invasive hacking, etc. This needs to be rethought and
> refactored.

I agree.

>
> * Why in the world is it introducing duplicate copies of a lot
> of btree comparison functions? Use the ones that are there.

Yes, I raised this with Jie and she has fixed it. One thought is, we may
want to rename those comparison functions prefixed with 'bm' to make their
naming less confusing. They'll be used by btree, gin and bitmap index
methods. Anyway, a seperate patch.

>
> * The patch itself is a mess: it introduces .orig and .rej files,
> changes around $PostgreSQL$ lines, etc.
>

Right, not to mention patches to configure and a lot of style which needs
to be knocked into shape.

> However, the main problem I've got with this is that a new index AM is a
> pretty large burden, and no one's made the slightest effort to sell
> pghackers on taking this on. What's the use-case, what's the
> performance like, where is it really an improvement over what we've got?

For low cardinality sets, bitmaps greatly out perform btree. Jie and
others at Greenplum tested this implementation (and others) against very
large, low cardinality sets. I wasn't involved in it. I urge Jie and/or
Luke to present those results.

Thanks,

Gavin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-24 00:25:18
Message-ID: 28750.1153700718@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> On Sun, 23 Jul 2006, Tom Lane wrote:
>> However, the main problem I've got with this is that a new index AM is a
>> pretty large burden, and no one's made the slightest effort to sell
>> pghackers on taking this on.

> For low cardinality sets, bitmaps greatly out perform btree.

If the column is sufficiently low cardinality, you might as well just do
a seqscan --- you'll be hitting most of the heap's pages anyway. I'm
still waiting to be convinced that there's a sweet spot wide enough to
justify supporting another index AM. (I'm also wondering whether this
doesn't overlap the use-case for GIN.)

regards, tom lane


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gavin Sherry" <swm(at)alcove(dot)com(dot)au>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, "Ayush Parashar" <aparashar(at)greenplum(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-24 00:35:37
Message-ID: C0E963E9.2AA43%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

On 7/23/06 5:25 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> If the column is sufficiently low cardinality, you might as well just do
> a seqscan --- you'll be hitting most of the heap's pages anyway. I'm
> still waiting to be convinced that there's a sweet spot wide enough to
> justify supporting another index AM. (I'm also wondering whether this
> doesn't overlap the use-case for GIN.)

We presented them at the Postgres Anniversary summit talk (Bruce M. was
there) and the reaction was let's get this into 8.2 because many people
there (more than 5) really wanted to use it as a standard feature.

A version of the slides with only the bitmap index results are located here:
http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt.

- Luke


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-24 06:23:48
Message-ID: C0E9B584.8F43%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Thanks Tom and Gavin for your comments!

Yes, this patch is generated against 8.2 in a short time. As long as the
code is working, I post the patch to get some comments and help.

>>
>> * The xlog routines need help; they seem to not be updated for recent
>> changes in the API for xlog recovery code.
>
> Yep. The patch was actually against 8.1 and was hastily brought up to 8.2.
> I think Jie's intention was to simply let everyone know that this was
> going on.

Thanks for pointing this out. I didn't notice that these are changed in 8.2.

>
>>
>> * The hacks on vacuum.c (where it tests the AM name) are utterly
>> unacceptable. If you need the main code to do something special for a
>> particular index AM, define a bool flag column for it in pg_am.
>
> Yes.

Sounds good.

>
>>
>> * The interface to the existing executor bitmap scan code is mighty
>> messy --- seems like there's a lot of almost duplicate code, a lot
>> of rather invasive hacking, etc. This needs to be rethought and
>> refactored.
>
> I agree.

I will think about this more.

>
>>
>> * Why in the world is it introducing duplicate copies of a lot
>> of btree comparison functions? Use the ones that are there.
>
> Yes, I raised this with Jie and she has fixed it. One thought is, we may
> want to rename those comparison functions prefixed with 'bm' to make their
> naming less confusing. They'll be used by btree, gin and bitmap index
> methods. Anyway, a seperate patch.

Yeah, the main problem I hesitated to use btree's comparison functions
because of those function names starting with 'bt'. Since Gavin told me that
Gin is using those functions as well, I had changed them. Renaming them
would be good.

>
>>
>> * The patch itself is a mess: it introduces .orig and .rej files,
>> changes around $PostgreSQL$ lines, etc.
>>
>
> Right, not to mention patches to configure and a lot of style which needs
> to be knocked into shape.
>

The way I generate a patch is kind of clumsy. I need to find a better way to
do that.

I will start fixing these.

Thanks,
Jie


From: Hannu Krosing <hannu(at)skype(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-24 13:59:35
Message-ID: 1153749575.2909.3.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane:
> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
> > On Sun, 23 Jul 2006, Tom Lane wrote:
> >> However, the main problem I've got with this is that a new index AM is a
> >> pretty large burden, and no one's made the slightest effort to sell
> >> pghackers on taking this on.
>
> > For low cardinality sets, bitmaps greatly out perform btree.
>
> If the column is sufficiently low cardinality, you might as well just do
> a seqscan --- you'll be hitting most of the heap's pages anyway. I'm
> still waiting to be convinced that there's a sweet spot wide enough to
> justify supporting another index AM. (I'm also wondering whether this
> doesn't overlap the use-case for GIN.)

IIRC they quoted the cardinality of 10000 as something that is still
faster than btree for several usecases.

And also for AND-s of several indexes, where indexes are BIG, your btree
indexes may be almost as big as tables but the resulting set of pages is
small.

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Hannu Krosing" <hannu(at)skype(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 00:47:59
Message-ID: C0EAB84F.8FB4%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/24/06 6:59 AM, "Hannu Krosing" <hannu(at)skype(dot)net> wrote:

> Ühel kenal päeval, P, 2006-07-23 kell 20:25, kirjutas Tom Lane:
>> Gavin Sherry <swm(at)linuxworld(dot)com(dot)au> writes:
>>> On Sun, 23 Jul 2006, Tom Lane wrote:
>>>> However, the main problem I've got with this is that a new index AM is a
>>>> pretty large burden, and no one's made the slightest effort to sell
>>>> pghackers on taking this on.
>>
>>> For low cardinality sets, bitmaps greatly out perform btree.
>>
>> If the column is sufficiently low cardinality, you might as well just do
>> a seqscan --- you'll be hitting most of the heap's pages anyway. I'm
>> still waiting to be convinced that there's a sweet spot wide enough to
>> justify supporting another index AM. (I'm also wondering whether this
>> doesn't overlap the use-case for GIN.)
>
> IIRC they quoted the cardinality of 10000 as something that is still
> faster than btree for several usecases.
>
> And also for AND-s of several indexes, where indexes are BIG, your btree
> indexes may be almost as big as tables but the resulting set of pages is
> small.

Yeah, Hannu points it out very well -- the bitmap index works very well when
columns have low cardinalities and AND operations will produce small number
of results.

Also, the bitmap index is very small in low cardinality cases, where the
btree tends to take up at least 10 times more space.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jie Zhang <jzhang(at)greenplum(dot)com>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:04:28
Message-ID: 200607250104.k6P14St25388@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jie Zhang wrote:
> > IIRC they quoted the cardinality of 10000 as something that is still
> > faster than btree for several usecases.
> >
> > And also for AND-s of several indexes, where indexes are BIG, your btree
> > indexes may be almost as big as tables but the resulting set of pages is
> > small.
>
> Yeah, Hannu points it out very well -- the bitmap index works very well when
> columns have low cardinalities and AND operations will produce small number
> of results.

What operations on columns of low cardinality produce a small number of
results? That seems contradictory.

> Also, the bitmap index is very small in low cardinality cases, where the
> btree tends to take up at least 10 times more space.

Also, are adding/changing rows is more expensive with bitmaps than
btrees?

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Hannu Krosing" <hannu(at)skype(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:49:12
Message-ID: C0EAC6A8.8FC4%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/24/06 6:04 PM, "Bruce Momjian" <bruce(at)momjian(dot)us> wrote:

> Jie Zhang wrote:
>>> IIRC they quoted the cardinality of 10000 as something that is still
>>> faster than btree for several usecases.
>>>
>>> And also for AND-s of several indexes, where indexes are BIG, your btree
>>> indexes may be almost as big as tables but the resulting set of pages is
>>> small.
>>
>> Yeah, Hannu points it out very well -- the bitmap index works very well when
>> columns have low cardinalities and AND operations will produce small number
>> of results.
>
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.

Let's see an example. Table 'T' includes two columns, 'p' and 's'. The
column 'p' has 100 distinct values, say p1-p100 and the column 'status' has
20 distinct values, say s1-s20. The query

'select * from order where priority=p1 and status=s1'

may produce small number of results. Also, if these related rows are
clustered together, that would be even better.

>
>> Also, the bitmap index is very small in low cardinality cases, where the
>> btree tends to take up at least 10 times more space.
>
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?

Inserting a row will only affect the last word (at most last several words)
of a bitmap vector, so this should not be very expensive: 3-4 IOs. When a
row is updated and the new row is inserted in the middle of the heap,
currently the code will update the bit in the place -- where the bit should
be. Searching for the page which includes the bit to be updated is not very
efficient now, but this can be fixed. Currently, we have to scan the pages
for a bitmap vector one by one until we hit the right page. Since the bitmap
vector is compressed, updating a bit in the middle may cause its page to
overflow. In this case, we create a new page to accommodate those extra
bits, and insert this new page right after the original page.

Overall, inserting a row or updating a row can be done efficiently. But it
is true that the bitmap index does not perform well if there are lots of
inserts and updates, especially updates.


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:51:05
Message-ID: Pine.LNX.4.58.0607251149250.445@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 24 Jul 2006, Bruce Momjian wrote:

> Jie Zhang wrote:
> > > IIRC they quoted the cardinality of 10000 as something that is still
> > > faster than btree for several usecases.
> > >
> > > And also for AND-s of several indexes, where indexes are BIG, your btree
> > > indexes may be almost as big as tables but the resulting set of pages is
> > > small.
> >
> > Yeah, Hannu points it out very well -- the bitmap index works very well when
> > columns have low cardinalities and AND operations will produce small number
> > of results.
>
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.

WHERE a = 1 and b = 2

a = 1 may be 5% of the table and b = 2 may be 5% of the table but their
intersection may be .001%.

Luke: the URL you sent to the bitmap slides was internal to Greenplum.
Would you be able to put them on a public site?

Thanks,

Gavin


From: mark(at)mark(dot)mielke(dot)cc
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 01:58:05
Message-ID: 20060725015805.GA30087@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jul 24, 2006 at 09:04:28PM -0400, Bruce Momjian wrote:
> Jie Zhang wrote:
> > > IIRC they quoted the cardinality of 10000 as something that is still
> > > faster than btree for several usecases.
> > >
> > > And also for AND-s of several indexes, where indexes are BIG, your btree
> > > indexes may be almost as big as tables but the resulting set of pages is
> > > small.
> > Yeah, Hannu points it out very well -- the bitmap index works very well
> > when columns have low cardinalities and AND operations will produce small
> > number of results.
> What operations on columns of low cardinality produce a small number of
> results? That seems contradictory.

Not necessarily. Cardinality of 2 means 1/2. 3 means 1/3. 4 means 1/4. Etc.

Reading 1/4, for a larger table, has a good chance of being faster than
reading 4/4 of the table. :-)

No opinion on whether such tables exist in the real world - but the
concept itself seems sound.

> > Also, the bitmap index is very small in low cardinality cases, where the
> > btree tends to take up at least 10 times more space.
> Also, are adding/changing rows is more expensive with bitmaps than
> btrees?

Without looking at the code, but having read the Oracle docs, I would
guess yes. Every vacuum/delete would need to clear all of the bits for
the row. Every insert/update would need to allocate a bit in at least
the bitmap tree for the row inserted/updated. Seems like more pages
may need to be written. Although perhaps some clever optimization
would limit this. :-)

It seems interesting though. We won't really know until we see the
benchmarks. I'm seeing it as a form of working directly with the
intermediate form of the bitmap index scanner. If the existing index
scanner, building the bitmaps on the fly can out-perform the regular
index scanner, I'm seeing potentially in a pre-built bitmap.

Obviously, it isn't the answer to everything. The use I see for it,
are a few of my 1:N object attribute tables. The table has an object
identifier, and a column indicating the attribute type that the value
is for. If I have 20 or more attribute type values, however, most
objects include rows for most attribute types, my regular index ends
up repeating the attribute type for every row. If I want to scan the
table for all rows that have a particular attribute type with a
particular value, it's a seqscan right now. With the bitmap scanner,
knowing which rows to skip to immediately is readily available.

Will it be worth it or not? I won't know until I try it. :-)

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Jie Zhang <jzhang(at)greenplum(dot)com>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 03:54:48
Message-ID: 44C59608.6030207@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jie Zhang wrote:
>
> On 7/24/06 6:59 AM, "Hannu Krosing" <hannu(at)skype(dot)net> wrote:
>
>>
>>
>> And also for AND-s of several indexes, where indexes are BIG, your btree
>> indexes may be almost as big as tables but the resulting set of pages is
>> small.
>
> Yeah, Hannu points it out very well -- the bitmap index works very well when
> columns have low cardinalities and AND operations will produce small number
> of results.
>
> Also, the bitmap index is very small in low cardinality cases, where the
> btree tends to take up at least 10 times more space.
>
>

The smallness of the bitmap index also means that some queries will
require much less work_mem to achieve good performance e.g consider:

TPCH dataset with scale factor 10 on my usual PIII HW, query -
select count(*) from lineitem where l_linenumber=1;

This executes in about 100 seconds with work_mem = 20M if there is a
bitmap index on l_linenumber. It takes 3832 seconds (!) if there is a
btree index on the same column. Obviously cranking up work_mem will even
up the difference (200M gets the btree to about 110 seconds), but being
able to get good performance with less memory is a good thing!

Cheers

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: mark(at)mark(dot)mielke(dot)cc
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 04:36:42
Message-ID: 5085.1153802202@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

mark(at)mark(dot)mielke(dot)cc writes:
> Reading 1/4, for a larger table, has a good chance of being faster than
> reading 4/4 of the table. :-)

Really?

If you have to hit one tuple out of four, it's pretty much guaranteed
that you will need to fetch every heap page. So using an index provides
zero I/O savings on the heap side, and any fetches needed to read the
index are pure cost. Now you have to demonstrate that the CPU costs
involved in processing the index are significantly cheaper than the cost
of just testing the WHERE qual at every heap tuple --- not a bet that's
likely to win at a one-in-four ratio.

> Will it be worth it or not? I won't know until I try it. :-)

Agreed.

regards, tom lane


From: mark(at)mark(dot)mielke(dot)cc
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Jie Zhang <jzhang(at)greenplum(dot)com>, Hannu Krosing <hannu(at)skype(dot)net>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 05:08:55
Message-ID: 20060725050855.GA31414@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 25, 2006 at 12:36:42AM -0400, Tom Lane wrote:
> mark(at)mark(dot)mielke(dot)cc writes:
> > Reading 1/4, for a larger table, has a good chance of being faster than
> > reading 4/4 of the table. :-)
> Really?
>
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page. So using an index provides
> zero I/O savings on the heap side, and any fetches needed to read the
> index are pure cost. Now you have to demonstrate that the CPU costs
> involved in processing the index are significantly cheaper than the cost
> of just testing the WHERE qual at every heap tuple --- not a bet that's
> likely to win at a one-in-four ratio.

Haha. Of course - but that's assuming uniform spread of the values.
Next I would try clustering the table on the bitmap index... :-)

My databases aren't as large as many of yours. Most or all of them
will fit in 1 Gbytes of RAM. The I/O cost isn't substantial for these,
but the WHERE clause might be.

But yeah - we don't know. Waste of code or performance boost.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, Ayush Parashar <aparashar(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 17:49:18
Message-ID: 20060725174918.GZ83250@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote:
> We presented them at the Postgres Anniversary summit talk (Bruce M. was
> there) and the reaction was let's get this into 8.2 because many people
> there (more than 5) really wanted to use it as a standard feature.
>
> A version of the slides with only the bitmap index results are located here:
> http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt.

404
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Hannu Krosing <hannu(at)skype(dot)net>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gavin Sherry <swm(at)alcove(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, Ayush Parashar <aparashar(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 18:19:35
Message-ID: 1153851576.2924.0.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ühel kenal päeval, T, 2006-07-25 kell 12:49, kirjutas Jim C. Nasby:
> On Sun, Jul 23, 2006 at 05:35:37PM -0700, Luke Lonergan wrote:
> > We presented them at the Postgres Anniversary summit talk (Bruce M. was
> > there) and the reaction was let's get this into 8.2 because many people
> > there (more than 5) really wanted to use it as a standard feature.
> >
> > A version of the slides with only the bitmap index results are located here:
> > http://intranet.greenplum.com/bitmap-index-perf-ayush.ppt.
>
> 404

Strange. I can download it both from my work and home .

--
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me: callto:hkrosing
Get Skype for free: http://www.skype.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 18:38:36
Message-ID: 44C6652C.1020203@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

> (I'm also wondering whether this
> doesn't overlap the use-case for GIN.)

It does not. GIN is strictly for multi-value fields. I can think of
applications where either GIN or Bitmaps would be an option, but for the
majority, they wouldn't.

One particular compelling situation for on-disk bitmaps is for terabyte
tables where a btree index would not fit into memory. Index
performance for an index which is 10x or more the size of RAM really
sucks ... I can come up with some test results if you doubt that.

Also note that "low cardinality" is relative. For a 1 billion row
table, a column with 10,000 values is "low-cardinality", having around
100,000 rows per value ... but that's still 0.01% of the table per
value, making index use still applicable.

--Josh


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 20:46:29
Message-ID: 24309.1153860389@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> One particular compelling situation for on-disk bitmaps is for terabyte
> tables where a btree index would not fit into memory. Index
> performance for an index which is 10x or more the size of RAM really
> sucks ... I can come up with some test results if you doubt that.

Sure...

> Also note that "low cardinality" is relative. For a 1 billion row
> table, a column with 10,000 values is "low-cardinality", having around
> 100,000 rows per value ... but that's still 0.01% of the table per
> value, making index use still applicable.

And your point is? Assuming a reasonable datatype like int4, a btree
index on this table would occupy say 20GB (16 bytes per entry plus
fillfactor overhead). The bitmap version would require 10,000 bitmaps
of 1G bits apiece, or 1250GB (not even counting overhead). You need
some wildly optimistic assumptions about the compressibility of the
bitmaps to get even within hailing distance of the btree, let alone
fit in RAM. A realistic assumption for the numbers you mention is
that the bitmaps have 1-bits about 100 bits apart, which means you
could get maybe 3-to-1 compression using the runlength scheme that's
in there ... leaving the bitmap a factor of 20 bigger than the btree.

AFAICS "low cardinality" has to mean just that, a few dozen distinct
values at most, for this scheme to have any hope.

regards, tom lane


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 21:53:29
Message-ID: 44C692D9.3050406@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> mark(at)mark(dot)mielke(dot)cc writes:
>> Reading 1/4, for a larger table, has a good chance of being faster than
>> reading 4/4 of the table. :-)
>
> Really?
>
> If you have to hit one tuple out of four, it's pretty much guaranteed
> that you will need to fetch every heap page.

I think it's not uncommon to have data that is more clustered than expected.

In my case it shows up often in GIS data; where often a query
that only accesses 1/4 of the rows in the table really will
only access about 1/4 of the pages in the table -- for example
when analyzing Texas or the Southwest US.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-25 22:16:51
Message-ID: 24886.1153865811@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> Tom Lane wrote:
>> If you have to hit one tuple out of four, it's pretty much guaranteed
>> that you will need to fetch every heap page.

> I think it's not uncommon to have data that is more clustered than expected.

Sure, if the data is fairly static ... and now that there's a fillfactor
option for heaps, you can even tolerate (some low rate of) updates
without losing the clusteredness. However, that benefit accrues to
all index types not only bitmap.

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-26 01:43:39
Message-ID: 44C6C8CB.9030802@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>
> And your point is? Assuming a reasonable datatype like int4, a btree
> index on this table would occupy say 20GB (16 bytes per entry plus
> fillfactor overhead). The bitmap version would require 10,000 bitmaps
> of 1G bits apiece, or 1250GB (not even counting overhead). You need
> some wildly optimistic assumptions about the compressibility of the
> bitmaps to get even within hailing distance of the btree, let alone
> fit in RAM. A realistic assumption for the numbers you mention is
> that the bitmaps have 1-bits about 100 bits apart, which means you
> could get maybe 3-to-1 compression using the runlength scheme that's
> in there ... leaving the bitmap a factor of 20 bigger than the btree.
>
> AFAICS "low cardinality" has to mean just that, a few dozen distinct
> values at most, for this scheme to have any hope.
>

I did a little testing of this when Jie first submitted the patch - for
a basically Zipfian distribution of int4's the bitmap is still clearly
smaller even at 1000 distinct values (see below). It is twice as big at
10000, so the crossover point is somewhere in the 1000-10000 range (for
this test - however the results seem to be reasonably typical).

In DSS distinct values < 1000 is very common (days of week, months of
year, lineitems of order, sex of animal...) so the applicability of this
range is perhaps larger than it seems at first sight.

Cheers

Mark
------------------------------------------------------------------------

bitmap=# \d bitmaptest
Table "public.bitmaptest"
Column | Type | Modifiers
--------+---------+-----------
id | integer | not null
val0 | integer |
val1 | integer |
val2 | integer |
val3 | integer |
fil | text |

bitmap=# select count(distinct val0),
count(distinct val1),
count(distinct val2),
count(distinct val3)
from bitmaptest;
count | count | count | count
-------+-------+-------+-------
10 | 100 | 1000 | 10000

bitmap=# \i relsizes.sql (BTREE)
relname | relpages
-----------------+----------
bitmaptest | 93458
bitmaptest_val0 | 21899
bitmaptest_val1 | 21899
bitmaptest_val2 | 21899
bitmaptest_val3 | 21899

bitmap=# \i relsizes.sql (BITMAP)
relname | relpages
-----------------+----------
bitmaptest | 93458
bitmaptest_val0 | 1286
bitmaptest_val1 | 2313
bitmaptest_val2 | 5726
bitmaptest_val3 | 41292

For the interested, the data generator, schema, index files are here:
http://homepages.paradise.net.nz/markir/download/bizgres/bitmaptest.tar.gz


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-26 14:26:01
Message-ID: 1593.1153923961@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
>> ... A realistic assumption for the numbers you mention is
>> that the bitmaps have 1-bits about 100 bits apart, which means you
>> could get maybe 3-to-1 compression using the runlength scheme that's
>> in there ... leaving the bitmap a factor of 20 bigger than the btree.

I'm surprised no one caught me making this bogus computation. I
realized this morning it's wrong: if there are 10000 distinct values
then on the average the 1-bits would be about 10000 bits apart, not 100.
At that rate there *is* substantial compression. The representation
used in the bitmap patch isn't amazingly efficient for this (I wonder
whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs) but it
still manages to get down to something like 11 bytes per 1-bit. Since
each 1-bit corresponds to a btree entry, this means the bitmap does come
out somewhat smaller than a btree (16 bytes per entry ignoring overhead)
--- at least if overhead such as wasted space on a page doesn't kill it.

Still, this isn't going to make the difference between fitting in RAM or
not. For small numbers of distinct values, like less than 100, the
bitmap representation looks more plausible. In that range you're down
to a byte or two per entry and so there is (very roughly) a 10x storage
savings over btree. I don't believe the 100x numbers that have been
bandied around in this discussion, but 10x is plenty enough to be
interesting.

regards, tom lane


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 02:00:18
Message-ID: 44C81E32.8090500@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
>
> I'm surprised no one caught me making this bogus computation. I
> realized this morning it's wrong: if there are 10000 distinct values
> then on the average the 1-bits would be about 10000 bits apart, not 100.

Right - I didn't think 10000 was *that* bad, but was too sleepy to try
working it out :-).

>
>
> I don't believe the 100x numbers that have been
> bandied around in this discussion, but 10x is plenty enough to be
> interesting.
>

Yep - I have not managed to get 100x in any of my tests. However, I do
see some about half that for the TPCH scale 10 dataset:

tpch=# \i relsizes.sql (BTREE)
relname | relpages
------------------------+----------
customer | 41019
customer_c_custkey | 3288
customer_c_mktsegment | 5779
customer_c_nationkey | 3288
lineitem | 1535724
lineitem_l_linenumber | 131347
lineitem_l_orderkey | 131347
orders | 307567
orders_o_custkey | 32847
orders_o_orderpriority | 65876
orders_o_orderstatus | 41131

tpch=# \i relsizes.sql (MAINLY BITMAP)
relname | relpages
------------------------+----------
customer | 41019
customer_c_custkey | 3288
customer_c_mktsegment | 157
customer_c_nationkey | 336
lineitem | 1535724
lineitem_l_linenumber | 7571
lineitem_l_orderkey | 131347
orders | 307567
orders_o_custkey | 32847
orders_o_orderpriority | 1427
orders_o_orderstatus | 717

The orders_o_orderpriority and orders_o_orderstatus bitmap indexes are
46 and 57 times smaller than their btree counterparts (hmm...might we
see even better compression for larger scale factors?).

An obvious deduction is that the TPCH dataset is much more amenable to
run compression than my synthetic Zipfian data was. The interesting
question is how well "real" datasets are run compressable, I suspect
"better than my Zipfian data" is a safe assumption!

Cheers

Mark


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, "Jie Zhang" <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 03:55:38
Message-ID: C0ED874A.2AFBB%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

On 7/26/06 7:26 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wonder
> whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs

We tried variations from 8-bit to 32-bit and came to the conclusion that
8-bit was a better match for the more general case. For the moment I forget
exactly why (maybe Jie or Ayush will recall). It's a #define I believe, so
you can play with it to find out.

There's a lot more to be done with this - I think we've got some big gains
ahead.

BTW - lots of excitement here at OSCON about bitmap index, and there's a
fellow who is using the current Bizgres version, but wants *higher*
cardinality support, so I discussed Jie's thoughts about implementing
binning on values, basically combining bit vectors into value bins as the
cardinality increases.

I am still puzzled why our index creation time increases so dramatically as
we approach cardinality 10,000. I know that we found that index page
traversal for append began to take a lot of time as we started to increase
the number of bitvector pages - we had talked about aggregating the append
operations into groups before they were implemented to sequentialize the
access.

- Luke


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Luke Lonergan" <llonergan(at)greenplum(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 05:09:34
Message-ID: C0ED989E.90E0%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 7/26/06 8:55 PM, "Luke Lonergan" <llonergan(at)greenplum(dot)com> wrote:

> Tom,
>
> On 7/26/06 7:26 AM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> I wonder
>> whether they oughtn't use 16-bit instead of 8-bit HRL_WORDs
>
> We tried variations from 8-bit to 32-bit and came to the conclusion that
> 8-bit was a better match for the more general case. For the moment I forget
> exactly why (maybe Jie or Ayush will recall). It's a #define I believe, so
> you can play with it to find out.
>
> There's a lot more to be done with this - I think we've got some big gains
> ahead.

For low-cardinality cases, 8-bit word size is usually better than 16-bit and
32-bit. The reason is that in this case, the compression is better in 8-bit
than in 16-bit or 32-bit. But for high-cardinality cases, like 10,000,
16-bit is better. That's because a compressed 8-bit can only represent at
most 2^7*8=1024 bits. So every 10,000 bits requires about 10 bytes. However,
a compressed 16-bit can represent 2^15*16=524288 bits. We only need 2 bytes.

I have been thinking about this recently -- maybe we can support different
word sizes for different cardinalities.

>
> BTW - lots of excitement here at OSCON about bitmap index, and there's a
> fellow who is using the current Bizgres version, but wants *higher*
> cardinality support, so I discussed Jie's thoughts about implementing
> binning on values, basically combining bit vectors into value bins as the
> cardinality increases.

Yes, that's the basic idea. Essentially we want to limit the number of bins
into an ideal value so that (1) the size of the bitmap index can be small;
(2) this will still guarantee good query performance. The details still need
to be working out.

>
> I am still puzzled why our index creation time increases so dramatically as
> we approach cardinality 10,000. I know that we found that index page
> traversal for append began to take a lot of time as we started to increase
> the number of bitvector pages - we had talked about aggregating the append
> operations into groups before they were implemented to sequentialize the
> access.
>

I believe that this is because of 8-bit word size. We can try to increase it
to 16-bit, and see how the result looks like.

Thanks,
Jie


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, Jie Zhang <jzhang(at)greenplum(dot)com>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 05:14:43
Message-ID: 29908.1153977283@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
> An obvious deduction is that the TPCH dataset is much more amenable to
> run compression than my synthetic Zipfian data was. The interesting
> question is how well "real" datasets are run compressable,

Yeah --- the back-of-the-envelope calculations I was making presupposed
uniform random distribution, and we know that's often not realistic for
real datasets. A nonuniform distribution would probably mean that some
of the bitmaps compress better-than-expected and others worse. I have
no idea how to model that and guess what the overall result is ...

regards, tom lane


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 06:21:26
Message-ID: C0EDA976.90E7%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Mark Kirkwood <markir(at)paradise(dot)net(dot)nz> writes:
>> An obvious deduction is that the TPCH dataset is much more amenable to
>> run compression than my synthetic Zipfian data was. The interesting
>> question is how well "real" datasets are run compressable,
>
> Yeah --- the back-of-the-envelope calculations I was making presupposed
> uniform random distribution, and we know that's often not realistic for
> real datasets. A nonuniform distribution would probably mean that some
> of the bitmaps compress better-than-expected and others worse. I have
> no idea how to model that and guess what the overall result is ...
>

The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
Wu et al gave an approximate answer for this question. Assume that there are
c distinct values. Let the i-th value has a probability of p_i, the number
of rows r, and the word size w. then the total size of the compressed bitmap
index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
both \sum's, i is from 1 to c.

The constraint for this equation is \sum(p_i)=1. Therefore, when all p_i are
equal, or the attribute has randomly distributed values, the size of the
bitmap index is the largest.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 06:50:35
Message-ID: 748.1153983035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> ... A nonuniform distribution would probably mean that some
>> of the bitmaps compress better-than-expected and others worse. I have
>> no idea how to model that and guess what the overall result is ...

> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
> Wu et al gave an approximate answer for this question. Assume that there are
> c distinct values. Let the i-th value has a probability of p_i, the number
> of rows r, and the word size w. then the total size of the compressed bitmap
> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
> both \sum's, i is from 1 to c.

Hm, but that's still begging the question no? It's still assuming that
any one value is uniformly distributed. ISTM the cases that would break
my simplistic calculation involve clustering of particular values, such
that some areas of the bitmap are all-zero while other areas have lots
of ones.

regards, tom lane


From: "Jie Zhang" <jzhang(at)greenplum(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, "Luke Lonergan" <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-27 16:13:21
Message-ID: C0EE3431.9116%jzhang@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/26/06 11:50 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> "Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
>> On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... A nonuniform distribution would probably mean that some
>>> of the bitmaps compress better-than-expected and others worse. I have
>>> no idea how to model that and guess what the overall result is ...
>
>> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
>> Wu et al gave an approximate answer for this question. Assume that there are
>> c distinct values. Let the i-th value has a probability of p_i, the number
>> of rows r, and the word size w. then the total size of the compressed bitmap
>> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
>> both \sum's, i is from 1 to c.
>
> Hm, but that's still begging the question no? It's still assuming that
> any one value is uniformly distributed. ISTM the cases that would break
> my simplistic calculation involve clustering of particular values, such
> that some areas of the bitmap are all-zero while other areas have lots
> of ones.

Yes, you are right -- each value is still uniformly distributed. But this
will be the worst case in terms of the size of a bitmap vector. As for how
to model the size of a bitmap vector for an non-uniformly distributed value,
that's a good question. I don't really know. But we do know the best case
and the worse case.


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Jie Zhang <jzhang(at)greenplum(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org, Luke Lonergan <LLonergan(at)greenplum(dot)com>
Subject: Re: On-disk bitmap index patch
Date: 2006-07-28 17:17:14
Message-ID: 20060728171714.GR66525@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 27, 2006 at 09:13:21AM -0700, Jie Zhang wrote:
> On 7/26/06 11:50 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > "Jie Zhang" <jzhang(at)greenplum(dot)com> writes:
> >> On 7/26/06 10:14 PM, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> ... A nonuniform distribution would probably mean that some
> >>> of the bitmaps compress better-than-expected and others worse. I have
> >>> no idea how to model that and guess what the overall result is ...
> >
> >> The paper "Optimizing Bitmap Indices With Efficient Compression" by Kesheng
> >> Wu et al gave an approximate answer for this question. Assume that there are
> >> c distinct values. Let the i-th value has a probability of p_i, the number
> >> of rows r, and the word size w. then the total size of the compressed bitmap
> >> index is about (N/(w-1))(c- \sum(1-p_i)^(2w-2) - \sum(p_i)^(2w-2)), where in
> >> both \sum's, i is from 1 to c.
> >
> > Hm, but that's still begging the question no? It's still assuming that
> > any one value is uniformly distributed. ISTM the cases that would break
> > my simplistic calculation involve clustering of particular values, such
> > that some areas of the bitmap are all-zero while other areas have lots
> > of ones.
>
> Yes, you are right -- each value is still uniformly distributed. But this
> will be the worst case in terms of the size of a bitmap vector. As for how
> to model the size of a bitmap vector for an non-uniformly distributed value,
> that's a good question. I don't really know. But we do know the best case
> and the worse case.

If the usefulness of bitmap indexes is still in doubt, could someone at
Greenplum provide data from actual data warehouses from actual
customers?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, "Jie Zhang" <jzhang(at)greenplum(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-28 19:17:53
Message-ID: C0EFB0F2.2B299%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim,

On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby(at)pervasive(dot)com> wrote:

> If the usefulness of bitmap indexes is still in doubt, could someone at
> Greenplum provide data from actual data warehouses from actual
> customers?

First, is anyone in doubt?

- Luke


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Jie Zhang <jzhang(at)greenplum(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>, Josh Berkus <josh(at)agliodbs(dot)com>, Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: On-disk bitmap index patch
Date: 2006-07-28 19:25:53
Message-ID: 200607281925.k6SJPr929778@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Luke Lonergan wrote:
> Jim,
>
> On 7/28/06 10:17 AM, "Jim C. Nasby" <jnasby(at)pervasive(dot)com> wrote:
>
> > If the usefulness of bitmap indexes is still in doubt, could someone at
> > Greenplum provide data from actual data warehouses from actual
> > customers?
>
> First, is anyone in doubt?

Sure. I think we are going to have to see the final patch and have
users test it with their workload to find the useful range.

--
Bruce Momjian bruce(at)momjian(dot)us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +