Re: Are bitmap index scans slow to start?

From: Nikolas Everett <nik9000(at)gmail(dot)com>
To: Carlo Stonebanks <stonec(dot)register(at)sympatico(dot)ca>
Cc: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Are bitmap index scans slow to start?
Date: 2013-02-22 19:05:02
Message-ID: CAPmjWd1On_uAp6suqYoHuh3ksKD0xn6tOhOM_Z2vaweO-vexSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I can't really help, but I can make it more clear why postgres is choosing
a _bitmap_ index scan rather than a regular index scan. With a regular
index scan it pumps the index for the locations of the rows that it points
to and loads those rows as it finds them. This works great if the rows in
the index are sorta sorted - that way it isn't jumping around the table
randomly. Random io is slow. In a bitmap index scan pg pumps the index
and buffers the by shoving them in a big bitmap. Then, it walks the bitmap
in order to produce in order io. PG makes the choice based on a measure of
the index's correlation.

The problem comes down to you inserting the sessions concurrently with one
another. My instinct would be to lower the FILLFACTOR on newly created
indecies so they can keep their entries more in order. I'm not sure why I
have that instinct but it feels right. Also, you might could try
clustering newly created tables on session_id and setting the fillfactor
down so rows with the same session id will stick together on disk.

Now that I look stuff up on the internet I'm not sure where I saw that pg
tries to maintain a cluster using empty space from FILLFACTOR but I _think_
it does. I'm not sure what is going on with my google foo today.

Nik

On Fri, Feb 22, 2013 at 12:50 PM, Carlo Stonebanks <
stonec(dot)register(at)sympatico(dot)ca> wrote:

> A cool idea, but if I understand it correctly very specific and fussy. New
> DB’s are spawned on this model, and all the developers would have to be
> aware of this non-standard behaviour, and DBA”s would have to create these
> indexes every month, for every DB (as the log tables are created every
> month). There are 89 session_id values in the January log (log_2013_01) so
> this would quickly get out of control. But – like I said – an interesting
> idea for more specific challenges.****
>
> ** **
>
> *From:* Marc Mamin [mailto:M(dot)Mamin(at)intershop(dot)de]
> *Sent:* February 21, 2013 2:41 PM
> *To:* Jeff Janes; Carlo Stonebanks
> *Cc:* pgsql-performance(at)postgresql(dot)org
> *Subject:* AW: [PERFORM] Are bitmap index scans slow to start?****
>
> ** **
>
> ** **
>
> >Rebuilding the index might help, as it would put all the leaf pages
> holding values for session_id=27 adjacent to each other, so they would read
> from disk faster. But with a name like >"session_id", I don't know how
> long such clustering would last though.
>
> >If I'm right about the index disk-read time, then switching to a plain
> index scan rather than a bitmap index scan would make no difference--either
> way the data has to come off the disk.
>
>
> ****
>
> >>I'd prefer a
> >>strategy that allowed fast performance the first time, rather than slow
> the
> >>first time and extremely fast subsequently.****
>
> Hello,
>
> if the index is only used to locate rows for single session_id, you may
> consider split it in a set of partial indexes.
>
> e.g.
> create index i_0 on foo where session_id%4 =0;
> create index i_1 on foo where session_id%4 =1;
> create index i_2 on foo where session_id%4 =2;
> create index i_3 on foo where session_id%4 =3;
>
> (can be built in parallel using separate threads)
>
> Then you will have to ensure that all your WHERE clauses also contain the
> index condition:
>
> WHERE session_id = 27 AND session_id%4 =27%4
>
> regards,
>
> Marc Mamin****
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Maciek Sakrejda 2013-02-22 19:06:08 Re: Avoiding Recheck Cond when using Select Distinct
Previous Message Alexander Staubo 2013-02-22 18:22:48 Bad query plan with high-cardinality column