bitmap indexes - performance

Lists: pgsql-hackers
From: Leonardo F <m_lists(at)yahoo(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: bitmap indexes - performance
Date: 2010-07-01 13:23:49
Message-ID: 800923.27831.qm@web29010.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Using as a starting point the old bitmap patch in:

http://archives.postgresql.org/message-id/20081101000154.GO27872@fune

I re-applied and re-worked the patch to see what kind of improvements over
btrees bitmaps actually provided.

Using a 20M rows table of 10/100/1000 random values, I've found that:

1) bulk index creation time is roughly 6 times better
2) index size is 6-15 times smaller (depending on column cardinality)
3) there's almost no difference in query times (but I have to make more
tests)
4) I can't say anything about the insertion performance, but I guess
bitmap will perform way worse than btree

Are these improvements (index creation time, index size) worth enough
to keep on working on this?

I mean: given that bitmaps don't give any benefits in query times, but
only benefits related to disk size and bulk index creation times, and
will have horrible performance for insertions/deletions: would this job be
worthed?

In case it is: I will try to clean up the patch and post it...

As a side note: I guess that most of the bitmap indexes performance
improvements in the SELECT area are already implemented in postgres
in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
say that bitmap indexes are faster for selects, unless of course they
are ANDed/ORed together (which is something postgres already does
for regular btree indexes)


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-01 14:16:09
Message-ID: 201007011416.o61EG9l02763@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F wrote:
> Using as a starting point the old bitmap patch in:
>
> http://archives.postgresql.org/message-id/20081101000154.GO27872@fune
>
>
> I re-applied and re-worked the patch to see what kind of improvements over
> btrees bitmaps actually provided.
>
> Using a 20M rows table of 10/100/1000 random values, I've found that:
>
> 1) bulk index creation time is roughly 6 times better
> 2) index size is 6-15 times smaller (depending on column cardinality)
> 3) there's almost no difference in query times (but I have to make more
> tests)
> 4) I can't say anything about the insertion performance, but I guess
> bitmap will perform way worse than btree
>
> Are these improvements (index creation time, index size) worth enough
> to keep on working on this?
>
> I mean: given that bitmaps don't give any benefits in query times, but
> only benefits related to disk size and bulk index creation times, and
> will have horrible performance for insertions/deletions: would this job be
> worthed?
>
> In case it is: I will try to clean up the patch and post it...
>
>
> As a side note: I guess that most of the bitmap indexes performance
> improvements in the SELECT area are already implemented in postgres
> in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
> say that bitmap indexes are faster for selects, unless of course they
> are ANDed/ORed together (which is something postgres already does
> for regular btree indexes)

Great report, thanks. The other big problem with on-disk bitmap indexes
is removing expired values via vacuum.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-01 15:04:30
Message-ID: AANLkTilqa1c3OlUcZWfV1sRmkxIllWZ393u289x4OaEd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 1, 2010 at 9:23 AM, Leonardo F <m_lists(at)yahoo(dot)it> wrote:
> Using as a starting point the old bitmap patch in:
>
> http://archives.postgresql.org/message-id/20081101000154.GO27872@fune
>
>
> I re-applied and re-worked the patch to see what kind of improvements over
> btrees bitmaps actually provided.
>
> Using a 20M rows table of 10/100/1000 random values, I've found that:
>
> 1) bulk index creation time is roughly 6 times better
> 2) index size is 6-15 times smaller (depending on column cardinality)
> 3) there's almost no difference in query times (but I have to make more
> tests)
> 4) I can't say anything about the insertion performance, but I guess
> bitmap will perform way worse than btree
>
> Are these improvements (index creation time, index size) worth enough
> to keep on working on this?
>
> I mean: given that bitmaps don't give any benefits in query times, but
> only benefits related to disk size and bulk index creation times, and
> will have horrible performance for insertions/deletions: would this job be
> worthed?
>
> In case it is: I will try to clean up the patch and post it...
>
>
> As a side note: I guess that most of the bitmap indexes performance
> improvements in the SELECT area are already implemented in postgres
> in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
> say that bitmap indexes are faster for selects, unless of course they
> are ANDed/ORed together (which is something postgres already does
> for regular btree indexes)

Hmm... no performance improvement? That's not encouraging.

The index being smaller ought to by itself provide some performance
improvement if, say, the smaller index can fit in cache and the larger
one can't. With a 6-15x size difference, that's presumably not an
implausible scenario. But I guess the real point is to be able to AND
and OR bitmap indices on multiple columns. Not sure if this
implementation supports that or not (I haven't read the patch) and how
the performance compares to doing Bitmap Heap Scan -> BitmapAnd ->
Bitmap Index Scan with btree indices.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-01 15:21:26
Message-ID: 5147.1277997686@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Hmm... no performance improvement? That's not encouraging.

> The index being smaller ought to by itself provide some performance
> improvement if, say, the smaller index can fit in cache and the larger
> one can't. With a 6-15x size difference, that's presumably not an
> implausible scenario. But I guess the real point is to be able to AND
> and OR bitmap indices on multiple columns. Not sure if this
> implementation supports that or not (I haven't read the patch) and how
> the performance compares to doing Bitmap Heap Scan -> BitmapAnd ->
> Bitmap Index Scan with btree indices.

In principle a bitmap index scan should be significantly faster if the
index can return the bitmap more or less "natively" rather than having
to construct it. My recollection though is that a significant amount of
work is needed to make that happen, and that there is no existing patch
that tackled the problem. So I'm not sure that this report should be
taken as indicating that there's no chance of a SELECT performance
improvement. What it does say is that we have to do that work if we
want to make bitmap indexes useful.

In particular, I recall some discussions about developing a "streaming
API" whereby an index AM could return a bitmap page-by-page or so,
rather than having to construct the whole thing in-memory before
anything could happen. This would be a huge win for AND/OR cases,
and even for a simple indexscan it would eliminate the existing startup
cost penalty for a bitmap scan. Streaming like this would also
eliminate the problem of having to lossify large bitmaps in order to
not overrun memory.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Leonardo F <m_lists(at)yahoo(dot)it>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-01 15:40:22
Message-ID: AANLkTikbFpbFTySsUtzCDZOXfQaQJ05r9NuiNCHzWZ6C@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 1, 2010 at 11:21 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In particular, I recall some discussions about developing a "streaming
> API" whereby an index AM could return a bitmap page-by-page or so,
> rather than having to construct the whole thing in-memory before
> anything could happen.  This would be a huge win for AND/OR cases,
> and even for a simple indexscan it would eliminate the existing startup
> cost penalty for a bitmap scan.  Streaming like this would also
> eliminate the problem of having to lossify large bitmaps in order to
> not overrun memory.

Now that would be cool. The existing startup penalty for a bitmap
scan is the pits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Leonardo F <m_lists(at)yahoo(dot)it>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-01 16:30:39
Message-ID: 896669.40438.qm@web29004.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> In
> principle a bitmap index scan should be significantly faster if the
> index can
> return the bitmap more or less "natively" rather than having
> to construct
> it.

The problem I'm seeing is that even on a 20M rows table, doing a

select * from t where c1=10 and c2=1

where c1 and c2 are low cardinality columns, leads to a *very*
fast bitmap index scan, even with btree indexes (200ms per index
on my PC).
The rest of the time is spent in actually retrieving heap rows; and
of course no index type is going to help with that.

Now: if an index search on such a big table takes so little time,
what kind of improvement are we trying to get?
The btree indexes on c1 and c2 are about 340MB eaxh: maybe
I'm experiencing some caching weirdness? Or it's normal that an
index search on such a big table is that fast (again, not counting
the heap scan step, which will be required no matter the index
type)? I'll try to re-test it...

> In particular, I recall some discussions about developing
> a "streaming
> API" whereby an index AM could return a bitmap page-by-page or
> so,
> rather than having to construct the whole thing in-memory
> before
> anything could happen. This would be a huge win for AND/OR
> cases,
> and even for a simple indexscan it would eliminate the existing
> startup
> cost penalty for a bitmap scan. Streaming like this would
> also
> eliminate the problem of having to lossify large bitmaps in order
> to
> not overrun memory.

One of the improvements I was going to try was to avoid calling

tid_set_bit (or whatever is the function, I don't remember now) for
every row, and call something like tid_set_bits_in_page where
a whole page was passed in: this would remove a lot of the hash_*
calls that are made in each and every tid_set_bit call (now that's
something btree can't do, but bitmap indexes can do "easily").
But I stopped before implementing it, because, as I said, I don't
think the improvement would still be worth it (even calling
tid_set_bit 1/20th of the needed times didn't help that much; we're
still talking about going from 200ms to 180ms on a query that
takes seconds to execute). But I'm going to give more "tested"
numbers...

Talking about bitmap indexes I don't think we should mention
memory... I mean: bitmap indexes are supposed to be used on
huge tables, and I don't think that 100MB (which holds a lot of
rows in a tbm...) to spare as work_mem would be a big problem...
As for the "startup cost": again, I wouldn't see that as a big
improvement, as we're talking mostly OLAP scenarios, where
most likely there will be some other "blocking" operator (group by,
sort, sub select etc) that will "remove" any improvements in
startup time...

To sum up: IMHO nor improvements in memory usage nor
in startup time would be good reasons to switch to bitmap
indexes... but bulk index creation time (10 minutes to index
what it takes 60 minutes with btree... and maybe more if tables
are bigger) and (maybe) index disk space might be...
but I'm not 100% convinced...

I'm trying to find more docs that explain the "improvements" of
bitmap indexes in other products... but most of what I've found
talks about bitmapAND/OR.... which is something that is very
cool, but that postgres already does even with btree indexes...
or index creation time/size, which are, for the moment, the only
things that I'm pretty confident the patch would actually provide.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-02 01:31:32
Message-ID: 201007020131.o621VWK08371@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Leonardo F wrote:
> I'm trying to find more docs that explain the "improvements" of
> bitmap indexes in other products... but most of what I've found
> talks about bitmapAND/OR.... which is something that is very
> cool, but that postgres already does even with btree indexes...
> or index creation time/size, which are, for the moment, the only
> things that I'm pretty confident the patch would actually provide.

I think a real limitation of on-disk bitmap indexes is that they are
only feable for low cardinality columns, while btree handles all column
types.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-02 08:30:09
Message-ID: 4C2DA391.5010502@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/07/10 13:31, Bruce Momjian wrote:
> Leonardo F wrote:
>
>> I'm trying to find more docs that explain the "improvements" of
>> bitmap indexes in other products... but most of what I've found
>> talks about bitmapAND/OR.... which is something that is very
>> cool, but that postgres already does even with btree indexes...
>> or index creation time/size, which are, for the moment, the only
>> things that I'm pretty confident the patch would actually provide.
>>
> I think a real limitation of on-disk bitmap indexes is that they are
> only feable for low cardinality columns, while btree handles all column
> types.
>
>

I recall that for (some/most? of) those low cardinality cases, (on disk)
bitmap indexes would perform better too. I think the size saving alone
is a huge win for serious data warehousing situations. On the other hand
problems I recall are possibly reduced UPDATE/DELETE performance and
issues with CREATE INDEX CONCURRENTLY and also complications with VACUUM
(altho these last two may have been sorted - I've lost touch with what
was in the most recent patches).

regards

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-hackers(at)postgresql(dot)org
Cc: m_lists(at)yahoo(dot)it
Subject: Re: bitmap indexes - performance
Date: 2010-07-03 02:00:26
Message-ID: 4C2E99BA.3080009@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/07/10 20:30, Mark Kirkwood wrote:
>
> I recall that for (some/most? of) those low cardinality cases, (on
> disk) bitmap indexes would perform better too. I think the size saving
> alone is a huge win for serious data warehousing situations. On the
> other hand problems I recall are possibly reduced UPDATE/DELETE
> performance and issues with CREATE INDEX CONCURRENTLY and also
> complications with VACUUM (altho these last two may have been sorted -
> I've lost touch with what was in the most recent patches).
>
>

Sorry, missed the message earlier where Bruce mentioned VACUUM.

Re Performance, I definitely recall some pretty serious performance
improvements on some of the TPC D (or H) queries when the dataset was
large . However I am wondering if most of the improvement might have
been because the bitmap index(es) fitted in memory and the corresponding
btree ones did not.

Leonardo - maybe try larger datasets (20M rows probably means table and
btree indexes can all fit in memory). Also might be worth experimenting
with the TPC D,H dataset and query generator and seeing if any of those
queries tickle any bitmap sweet spot.

Cheers

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-08 05:09:58
Message-ID: 4C355DA6.5040305@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>> Are these improvements (index creation time, index size) worth enough
>> to keep on working on this?
>>
>> I mean: given that bitmaps don't give any benefits in query times, but
>> only benefits related to disk size and bulk index creation times, and
>> will have horrible performance for insertions/deletions: would this job be
>> worthed?
>>
>> In case it is: I will try to clean up the patch and post it...

Well, if you can fix the more basic missing stuff, I think we could live
with the performance issues. Bitmaps would still be a huge win for
relatively static tables with lots of low-cardinality columns (basic
data warehouse case).

If I recall correctly, the old patch was still missing both WAL and
VACUUM support. These would be required before tradeoffs of space vs.
update performance would be worth talking about.

>> As a side note: I guess that most of the bitmap indexes performance
>> improvements in the SELECT area are already implemented in postgres
>> in the bitmapand/or and bitmap scan stuff? I couldn't find any docs that
>> say that bitmap indexes are faster for selects, unless of course they
>> are ANDed/ORed together (which is something postgres already does
>> for regular btree indexes)

Have you tested this? The bitmap AND/OR for btrees in current postgres
isn't exactly cost-free, especially the recheck. It seems like there
could be room for better performance with bitmap indexes.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Leonardo F <m_lists(at)yahoo(dot)it>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: bitmap indexes - performance
Date: 2010-07-22 10:54:29
Message-ID: 1279796069.1795.54.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2010-07-01 at 16:30 +0000, Leonardo F wrote:

> To sum up: IMHO nor improvements in memory usage nor
> in startup time would be good reasons to switch to bitmap
> indexes... but bulk index creation time (10 minutes to index
> what it takes 60 minutes with btree... and maybe more if tables
> are bigger) and (maybe) index disk space might be...
> but I'm not 100% convinced...

Are you intending to work on this for 9.1?

--
Simon Riggs www.2ndQuadrant.com