Re: Bitmap indexes etc.

Lists: pgsql-performance
From: Ivan Voras <ivoras(at)fer(dot)hr>
To: pgsql-performance(at)postgresql(dot)org
Subject: Bitmap indexes etc.
Date: 2005-12-26 21:32:13
Message-ID: 20051226222915.G87624@geri.cc.fer.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi!

This is not actually a question about performance, but an inquiry to help
me understand what is going on. Below this text are two EXPLAIN ANALYZE
outputs, before and after VACUUM ANALYZE was run. I have several questions
about the proposed plans (mostly the first one). There is only one table
in the query, "layout", containing ~10k rows. In it, for each "page_id"
there are several (1-10) rows (i.e. there are around 10000/5 unique
page_id values). There's an index on "page_id" and I've upped statistics
collection on it to 150 at table creation time because sometimes the
planner didn't use the index at all.
This is PostgreSQL 8.1.0.

- what does "Bitmap Heap Scan" phase do?
- what is "Recheck condition" and why is it needed?
- why are proposed "width" fields in the plan different between the two
plans?
(actually, a nice explanation what exactly are those widths would also
be nice :) )
- I thought "Bitmap Index Scan" was only used when there are two or more
applicable indexes in the plan, so I don't understand why is it used
now?

cw2=> explain analyze select * from layout where page_id=10060;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on layout (cost=2.15..64.21 rows=43 width=60) (actual
time=0.043..0.053 rows=4 loops=1)
Recheck Cond: (page_id = 10060)
-> Bitmap Index Scan on layout_page_id (cost=0.00..2.15 rows=43
width=0) (actual time=0.034..0.034 rows=4 loops=1)
Index Cond: (page_id = 10060)
Total runtime: 0.112 ms
(5 rows)

cw2> VACUUM ANALYZE;

cw2=> explain analyze select * from layout where page_id=10060;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Scan using layout_page_id on layout (cost=0.00..12.14 rows=4
width=42) (actual time=0.014..0.025 rows=4 loops=1)
Index Cond: (page_id = 10060)
Total runtime: 0.076 ms
(3 rows)

--
Preserve wildlife -- pickle a squirrel today!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ivan Voras <ivoras(at)fer(dot)hr>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bitmap indexes etc.
Date: 2005-12-26 21:57:11
Message-ID: 12553.1135634231@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Ivan Voras <ivoras(at)fer(dot)hr> writes:
> This is PostgreSQL 8.1.0.

> - what does "Bitmap Heap Scan" phase do?

A plain indexscan fetches one tuple-pointer at a time from the index,
and immediately visits that tuple in the table. A bitmap scan fetches
all the tuple-pointers from the index in one go, sorts them using an
in-memory "bitmap" data structure, and then visits the table tuples in
physical tuple-location order. The bitmap scan improves locality of
reference to the table at the cost of more bookkeeping overhead to
manage the "bitmap" data structure --- and at the cost that the data
is no longer retrieved in index order, which doesn't matter for your
query but would matter if you said ORDER BY.

> - what is "Recheck condition" and why is it needed?

If the bitmap gets too large we convert it to "lossy" style, in which we
only remember which pages contain matching tuples instead of remembering
each tuple individually. When that happens, the table-visiting phase
has to examine each tuple on the page and recheck the scan condition to
see which tuples to return.

> - why are proposed "width" fields in the plan different between the two
> plans?

Updated statistics about average column widths, presumably.

> (actually, a nice explanation what exactly are those widths would also
> be nice :) )

Sum of the average widths of the columns being fetched from the table.

> - I thought "Bitmap Index Scan" was only used when there are two or more
> applicable indexes in the plan, so I don't understand why is it used
> now?

True, we can combine multiple bitmaps via AND/OR operations to merge
results from multiple indexes before visiting the table ... but it's
still potentially worthwhile even for one index. A rule of thumb is
that plain indexscan wins for fetching a small number of tuples, bitmap
scan wins for a somewhat larger number of tuples, and seqscan wins if
you're fetching a large percentage of the whole table.

regards, tom lane


From: Ivan Voras <ivoras(at)fer(dot)hr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bitmap indexes etc.
Date: 2005-12-27 13:19:55
Message-ID: 20051227141850.U5310@geri.cc.fer.hr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 26 Dec 2005, Tom Lane wrote:
> ...snip...

Thanks, it's a very good explanation!

--
Preserve wildlife -- pickle a squirrel today!