Re: Crazy looking actual row count from explain analyze

Lists: pgsql-general
From: Gordon Shannon <gordo169(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Crazy looking actual row count from explain analyze
Date: 2010-05-10 22:08:32
Message-ID: 28517643.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Running 8.4.3, I have a table with 43 million rows. Two of the columns are
(topic_id int not null) and (status message_status_enum not null), where
message_status_enum is defined as
CREATE TYPE message_status_enum AS ENUM ( 'V', 'X', 'S', 'R', 'U', 'D' );

Among the indexes there is this:
"m_20100201_topic_multi" btree (topic_id, status, source_category_id,
alg_ci_rank_rollup)

..see that topic_id and status are the leading edge of the index.
Fact: there are no rows with status 'S' or 'X'
Fact: there are no rows with topic_id = 1

Consider, then...

explain analyze select count(*) from m_20100201 where status in ('S','X');
QUERY
PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=987810.75..987810.76 rows=1 width=0) (actual
time=2340.193..2340.194 rows=1 loops=1)
-> Bitmap Heap Scan on m_20100201 (cost=987806.75..987810.75 rows=1
width=0) (actual time=2340.191..2340.191 rows=0 loops=1)
Recheck Cond: (status = ANY ('{S,X}'::message_status_enum[]))
-> Bitmap Index Scan on m_20100201_topic_multi
(cost=0.00..987806.75 rows=1 width=0) (actual time=2334.371..2334.371
rows=126336 loops=1)
Index Cond: (status = ANY ('{S,X}'::message_status_enum[]))

What I don't understand is the "actual rows" of 126,336 in the bitmap index
scan. I would expect it to have to scan every index entry, but doesn't this
output mean that it's *returning* 126K rows from that scan? Whereas I think
it should return zero.

I have already fixed this query by adding a better index. But the point of
this post is simply to understand this explain analyze output. Thanks!

--gordon

--
View this message in context: http://old.nabble.com/Crazy-looking-actual-row-count-from-explain-analyze-tp28517643p28517643.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gordon Shannon <gordo169(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Crazy looking actual row count from explain analyze
Date: 2010-05-10 23:47:00
Message-ID: 20386.1273535220@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Gordon Shannon <gordo169(at)gmail(dot)com> writes:
> -> Bitmap Heap Scan on m_20100201 (cost=987806.75..987810.75 rows=1
> width=0) (actual time=2340.191..2340.191 rows=0 loops=1)
> Recheck Cond: (status = ANY ('{S,X}'::message_status_enum[]))
> -> Bitmap Index Scan on m_20100201_topic_multi
> (cost=0.00..987806.75 rows=1 width=0) (actual time=2334.371..2334.371
> rows=126336 loops=1)
> Index Cond: (status = ANY ('{S,X}'::message_status_enum[]))

> What I don't understand is the "actual rows" of 126,336 in the bitmap index
> scan. I would expect it to have to scan every index entry, but doesn't this
> output mean that it's *returning* 126K rows from that scan? Whereas I think
> it should return zero.

Well, it does return zero rows from the actual heapscan. What the above
is telling you is that a whole lot of rows are being returned by the
index and then filtered out at the table scan stage. My first suspicion
is that those are unvacuumed dead rows ... what's your vacuuming policy
on this database?

> I have already fixed this query by adding a better index.

I think the new index might have "fixed" things largely by not bothering
to index already-dead rows.

regards, tom lane


From: Gordon Shannon <gordo169(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Crazy looking actual row count from explain analyze
Date: 2010-05-11 01:07:43
Message-ID: 28518862.post@talk.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Tom Lane-2 wrote:
>
> My first suspicion
> is that those are unvacuumed dead rows ... what's your vacuuming policy
> on this database?
>

Ah, I didn't know that number included dead tuples. That probably explains
it. pg_stat_user_tables says the table has 370,269 dead tuples. On this
table, I have autovacuum_vacuum_scale_factor set to 0.02, so I believe the
table will have to have 869K dead tuples before vacuum will kick in.

> I have already fixed this query by adding a better index.

Tom Lane-2 wrote:
>
> I think the new index might have "fixed" things largely by not bothering
> to index already-dead rows.
>

Actually, I put a partial index on status, where != 'V'. That fits our
usage pattern of 99% of the records being 'V', so it's a tiny index and
satisifies this type of query very quickly.

Thanks,

--gordon

--
View this message in context: http://old.nabble.com/Crazy-looking-actual-row-count-from-explain-analyze-tp28517643p28518862.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.