Re: Avoiding Recheck Cond when using Select Distinct

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: jackrg <jack(at)groundbreakingsoftware(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Avoiding Recheck Cond when using Select Distinct
Date: 2013-02-22 17:19:27
Message-ID: CAMkU=1zvVrQRMqL=8byb+bfidrNTpN4Ayt4PgU0PBJbjRfAXyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Feb 22, 2013 at 8:36 AM, jackrg <jack(at)groundbreakingsoftware(dot)com>wrote:

> The following query produces a Recheck Cond and a costly Bitmap Heap Scan
> even though I have a composite index that covers both columns being
> filtered
> and selected.

Can you show us the definition of that index?

> I believe this is because the initial bitmap scan produces
> 2912 rows, which is too many for the available bitmap space. I've tried
> rewriting the command as "Select ... group by" but it still uses the BHS.
> Is
> there a way to rewrite this command that would improve performance by
> avoiding the costly Bitmap Heap Scan?
>

How do you know that the bitmap heap scan is costly, since you haven't
gotten it to use an alternative to compare it to? As a temporary
experimental measure, you could at least set enable_bitmapscan TO off, to
see what happens.

>
>
> SELECT distinct store_id, book_id FROM "sales_points" WHERE
> "sales_points"."store_id" IN (1, 2, 3, 4, 5, 6, 199, 201, 202) AND
> "sales_points"."book_id" IN (421, 422, 419, 420)
>
> Here is the explain/analyze output:
>
>
> "HashAggregate (cost=5938.72..5939.01 rows=97 width=8) (actual
> time=10.837..10.854 rows=32 loops=1)"
> " -> Bitmap Heap Scan on sales_points (cost=47.03..5936.53 rows=2191
> width=8) (actual time=0.547..5.296 rows=4233 loops=1)"
> " Recheck Cond: (book_id = ANY ('{421,422,419,420}'::integer[]))"
> " Filter: (store_id = ANY ('{1,2,3,4,5,6,199,201,202}'::integer[]))"
> " -> Bitmap Index Scan on index_sales_points_on_book_id
> (cost=0.00..46.92 rows=4430 width=0) (actual time=0.469..0.469 rows=4233
> loops=1)"
> " Index Cond: (book_id = ANY
> ('{421,422,419,420}'::integer[]))"
> "Total runtime: 10.935 ms"
>
>
> Actual runtime is more like 15ms when tested against a development database
> (which gave est. total runtime of 6ms).

I don't understand the parenthetical. In the explain plan you show, where
is 6ms coming from?

> Under load in production, the
> command takes 10,158 ms.

Do you mean 10.158 ms rather than 10,158 ms? If the production environment
really takes 1000 times longer than the environment in which you gathered
the EXPLAIN, then I would seriously doubt how useful that EXPLAIN could
possibly be.

Cheers,

Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Carlo Stonebanks 2013-02-22 17:50:59 Re: Are bitmap index scans slow to start?
Previous Message jackrg 2013-02-22 16:36:16 Avoiding Recheck Cond when using Select Distinct