Re: select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Adam PAPAI <wooh(at)wooh(dot)hu>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: select count(idn) is slow (Seq Scan) instead of Bitmap Heap.. why?
Date: 2009-08-04 13:46:33
Message-ID: 603c8f070908040646k7221da34u7abb2c0cc7cf3da0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

2009/8/2 Adam PAPAI <wooh(at)wooh(dot)hu>:
> Hello,
>
>
> I have a problem with an inner join + count().
>
> my query is:
>
> explain analyze select
> k.idn,k.kerdes_subject,k.kerdes_text,u.vezeteknev,u.keresztnev,u.idn as
> user_id, kg.kategoria_neve, count(v.idn)
>
> FROM kategoriak as kg
>
> INNER JOIN kerdesek as k on kg.idn = k.kategoria_id
> INNER JOIN users as u ON k.user_id = u.idn
> INNER JOIN valaszok as v ON k.idn = v.kerdes_id
>
> where kg.idn=15 group by k.idn, k.kerdes_subject,k.kerdes_text,
> u.idn,u.vezeteknev,u.keresztnev,kg.kategoria_neve
>
> The problem is with the count(v.idn).
>
> This column has a relation with: v.kerdes_id = k.idn => k.kategoria_id =
>  kg.idn
>
> and the WHERE says: kg.idn = 15.
>
> Why does it run through all lines in v?
>
> the explain sais:
>
>  GroupAggregate  (cost=103238.59..103602.66 rows=10402 width=1382) (actual
> time=8531.405..8536.633 rows=73 loops=1)
>   ->  Sort  (cost=103238.59..103264.59 rows=10402 width=1382) (actual
> time=8531.339..8533.199 rows=1203 loops=1)
>         Sort Key: k.idn, k.kerdes_subject, k.kerdes_text, u.idn,
> u.vezeteknev, u.keresztnev, kg.kategoria_neve
>         ->  Hash Join  (cost=3827.79..89951.54 rows=10402 width=1382)
> (actual time=1778.590..8523.015 rows=1203 loops=1)
>               Hash Cond: (v.kerdes_id = k.idn)
>               ->  Seq Scan on valaszok v  (cost=0.00..78215.98 rows=2080998
> width=8) (actual time=59.714..5009.171 rows=2080998 loops=1)
>               ->  Hash  (cost=3823.42..3823.42 rows=350 width=1378) (actual
> time=12.553..12.553 rows=74 loops=1)
>                     ->  Nested Loop  (cost=14.98..3823.42 rows=350
> width=1378) (actual time=0.714..12.253 rows=74 loops=1)
>                           ->  Nested Loop  (cost=14.98..1056.38 rows=350
> width=830) (actual time=0.498..5.952 rows=117 loops=1)
>                                 ->  Seq Scan on kategoriak kg
> (cost=0.00..1.30 rows=1 width=278) (actual time=0.066..0.076 rows=1 loops=1)
>                                       Filter: (idn = 15)
>                                 ->  Bitmap Heap Scan on kerdesek k
> (cost=14.98..1051.58 rows=350 width=560) (actual time=0.374..5.430 rows=117
> loops=1)
>                                       Recheck Cond: (15 = kategoria_id)
>                                       ->  Bitmap Index Scan on
> kategoria_id_id_idx  (cost=0.00..14.89 rows=350 width=0) (actual
> time=0.212..0.212 rows=117 loops=1)
>                                             Index Cond: (15 = kategoria_id)
>                           ->  Index Scan using users_pkey on users u
> (cost=0.00..7.89 rows=1 width=552) (actual time=0.047..0.048 rows=1
> loops=117)
>                                 Index Cond: (k.user_id = u.idn)
>  Total runtime: 8536.936 ms
>
>
>
> So it run through more than 2 mill lines... but why? It should only count
> those lines which has the category_id = 15...
>
> What am I doing wrong?

Well, I'm not sure if you're doing anything wrong, but you're
definitely thinking about it wrong. There's no way to skip the lines
in v that have kg.idn != 15 just by looking at v, because the idn
column is in kg, not in v. Obviously you have to look through kg
first and find the lines where kg.idn = 15. Or since kg.idn =
k.kategoria_id, you can alternatively start by scanning k for
kategoria_id = 15, which is what the planner chose to do here. Once
you know which lines from k you need, then you can go through v and
look for lines that have a match in k based on the join condition
k.idn = v.kerdes_id.

Do you have an index on valaszok (kerdes_id)? Might be worth investigating.

...Robert

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Alvaro Herrera 2009-08-04 15:30:25 Re: PostgreSQL 8.4 performance tuning questions
Previous Message Merlin Moncure 2009-08-04 13:28:55 Re: PostgreSQL 8.4 performance tuning questions