Re: Are bitmap index scans slow to start?

From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: "'Nikolas Everett'" <nik9000(at)gmail(dot)com>
Cc: "'Marc Mamin'" <M(dot)Mamin(at)intershop(dot)de>, "'Jeff Janes'" <jeff(dot)janes(at)gmail(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Are bitmap index scans slow to start?
Date: 2013-02-22 20:05:34
Message-ID: 009001ce1137$f9c31bf0$ed4953d0$@sympatico.ca
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

>> Also, you might could try clustering newly created tables on session_id
and setting the fillfactor down so rows with the same session id will stick
together on disk.

<<

My understanding of PG's cluster is that this is a one-time command that
creates a re-ordered table and doesn't maintain the clustered order until
the command is issued again. During the CLUSTER, the table is read and write
locked. So, in order for me to use this I would need to set up a timed event
to CLUSTER occasionally.

>> I can't really help, but I can make it more clear why postgres is
choosing a _bitmap_ index scan rather than a regular index scan

<<

The EXPLAIN ANALYZE is showing it is taking a long time to prepare the
bitmap (i.e.-> Bitmap Index Scan on log_2013_01_session_idx
(cost=0.00..63186.52

rows=2947664 width=0) (actual time=32611.918..32611.918 rows=2772042
loops=1)" Index Cond: (session_id = 27)" the bitmap scan is actually very
fast. Jeff sasys that the bitmap is not cached, so I will assume the PG
general caches being created are of general use.

I think what I need to do is figure out is:

1) Why does it take 36 seconds to set up the general index caches?

2) What can I do about it (what stats do I need to look at)?

3) How can I force these caches to expire so I can tell if the strategy
worked?

From: Nikolas Everett [mailto:nik9000(at)gmail(dot)com]
Sent: February 22, 2013 2:05 PM
To: Carlo Stonebanks
Cc: Marc Mamin; Jeff Janes; pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Are bitmap index scans slow to start?

I can't really help, but I can make it more clear why postgres is choosing a
_bitmap_ index scan rather than a regular index scan. With a regular index
scan it pumps the index for the locations of the rows that it points to and
loads those rows as it finds them. This works great if the rows in the
index are sorta sorted - that way it isn't jumping around the table
randomly. Random io is slow. In a bitmap index scan pg pumps the index and
buffers the by shoving them in a big bitmap. Then, it walks the bitmap in
order to produce in order io. PG makes the choice based on a measure of the
index's correlation.

The problem comes down to you inserting the sessions concurrently with one
another. My instinct would be to lower the FILLFACTOR on newly created
indecies so they can keep their entries more in order. I'm not sure why I
have that instinct but it feels right. Also, you might could try clustering
newly created tables on session_id and setting the fillfactor down so rows
with the same session id will stick together on disk.

Now that I look stuff up on the internet I'm not sure where I saw that pg
tries to maintain a cluster using empty space from FILLFACTOR but I _think_
it does. I'm not sure what is going on with my google foo today.

Nik

On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks
<stonec(dot)register(at)sympatico(dot)ca> wrote:

A cool idea, but if I understand it correctly very specific and fussy. New
DB's are spawned on this model, and all the developers would have to be
aware of this non-standard behaviour, and DBA"s would have to create these
indexes every month, for every DB (as the log tables are created every
month). There are 89 session_id values in the January log (log_2013_01) so
this would quickly get out of control. But - like I said - an interesting
idea for more specific challenges.

From: Marc Mamin [mailto:M(dot)Mamin(at)intershop(dot)de]
Sent: February 21, 2013 2:41 PM
To: Jeff Janes; Carlo Stonebanks
Cc: pgsql-performance(at)postgresql(dot)org
Subject: AW: [PERFORM] Are bitmap index scans slow to start?

>Rebuilding the index might help, as it would put all the leaf pages holding
values for session_id=27 adjacent to each other, so they would read from
disk faster. But with a name like >"session_id", I don't know how long such
clustering would last though.

>If I'm right about the index disk-read time, then switching to a plain
index scan rather than a bitmap index scan would make no difference--either
way the data has to come off the disk.

>>I'd prefer a
>>strategy that allowed fast performance the first time, rather than slow
the
>>first time and extremely fast subsequently.

Hello,

if the index is only used to locate rows for single session_id, you may
consider split it in a set of partial indexes.

e.g.
create index i_0 on foo where session_id%4 =0;
create index i_1 on foo where session_id%4 =1;
create index i_2 on foo where session_id%4 =2;
create index i_3 on foo where session_id%4 =3;

(can be built in parallel using separate threads)

Then you will have to ensure that all your WHERE clauses also contain the
index condition:

WHERE session_id = 27 AND session_id%4 =27%4

regards,

Marc Mamin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Gavin Flower 2013-02-22 20:07:20 Re: Are bitmap index scans slow to start?
Previous Message Maciek Sakrejda 2013-02-22 19:06:08 Re: Avoiding Recheck Cond when using Select Distinct