Re: strange query runtime

Lists: pgsql-general
From: Olivier Sirven <olivier(at)elma(dot)fr>
To: pgsql-general(at)postgresql(dot)org
Subject: strange query runtime
Date: 2006-02-07 10:16:58
Message-ID: 200602071116.59204.olivier@elma.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi,

I am running a query which makes a join on 3 tables:
- generals (contains 200 000 rows)
- category_generals (contains 200 000 rows)
- generals_topics (contains 15 000 000 rows)

The query is wrote this way:
SELECT gt.id_topic
FROM generals g,
category_generals cs,
generals_topics gt
WHERE g.media = 't'
AND g.id_general = cg.id_general
AND gt.id_general = g.id_general
AND cg.id_category = 15
ORDER BY gt.id_topic DESC
OFFSET 0
LIMIT 20;

The column id_general of table generals is a primary key. All the other
columns used to make the joins are indexes.
The query is slow but it works fine as it completes in less than 1 second.
The problem is that if I change the filter value of id_category from 15 to 3
the query will take more than 7 minutes to complete! The only difference
between id_category 3 and 15 is that there is about 4000 rows in the first
one (id_category = 3) and 2000 rows in the second one (id_category = 15).
An explain give me the following result:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..9677.68 rows=20 width=4)
-> Nested Loop (cost=0.00..61006657.19 rows=126077 width=4)
-> Nested Loop (cost=0.00..59991538.61 rows=252145 width=12)
-> Index Scan Backward using generals_topics_pkey on
generals_topics gt (cost=0.00..615679.86 rows=14750423 width=8)
-> Index Scan using ix_category_generals_id_general on
category_generals cs (cost=0.00..4.01 rows=1 width=4)
Index Cond: ("outer".id_general = cs.id_general)
Filter: (id_category = 3)
-> Index Scan using generals_id_topic_key on generals g
(cost=0.00..4.01 rows=1 width=4)
Index Cond: (g.id_general = "outer".id_general)
Filter: media

As you can see, every rows of generals_topics table is scanned and I don't
understand why? How can I do to make postgresql to work only with the tuples
resulting from the join conditions? Is it a configuration problem ?

Thanks in advance for any help.

--
Olivier Sirven

Elma Ingénierie Informatique
3, rue d'Uzès
F-75002 - Paris - France
http://www.elma.fr
Tel: +33-1-44882744
Fax: +33-1-44882747
Email: olivier(at)elma(dot)fr


From: Richard Huxton <dev(at)archonet(dot)com>
To: Olivier Sirven <olivier(at)elma(dot)fr>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strange query runtime
Date: 2006-02-07 10:39:05
Message-ID: 43E878C9.9050906@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Olivier Sirven wrote:
> The query is slow but it works fine as it completes in less than 1 second.
> The problem is that if I change the filter value of id_category from 15 to 3
> the query will take more than 7 minutes to complete! The only difference
> between id_category 3 and 15 is that there is about 4000 rows in the first
> one (id_category = 3) and 2000 rows in the second one (id_category = 15).
> An explain give me the following result:

EXPLAIN ANALYSE would be more useful - it'll show whether the row
estimates are actually accurate.

> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..9677.68 rows=20 width=4)
> -> Nested Loop (cost=0.00..61006657.19 rows=126077 width=4)
> -> Nested Loop (cost=0.00..59991538.61 rows=252145 width=12)
> -> Index Scan Backward using generals_topics_pkey on
> generals_topics gt (cost=0.00..615679.86 rows=14750423 width=8)
> -> Index Scan using ix_category_generals_id_general on
> category_generals cs (cost=0.00..4.01 rows=1 width=4)
> Index Cond: ("outer".id_general = cs.id_general)
> Filter: (id_category = 3)
> -> Index Scan using generals_id_topic_key on generals g
> (cost=0.00..4.01 rows=1 width=4)
> Index Cond: (g.id_general = "outer".id_general)
> Filter: media
>
> As you can see, every rows of generals_topics table is scanned and I don't
> understand why? How can I do to make postgresql to work only with the tuples
> resulting from the join conditions? Is it a configuration problem ?

It thinks you're going to get 126077 rows back at the top level. VACUUM
your table(s), ANALYSE them and then let's look at the EXPLAIN ANALYSE
for this query. It might be then that we need to increase the statistics
on one or more columns.

--
Richard Huxton
Archonet Ltd


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Olivier Sirven <olivier(at)elma(dot)fr>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: strange query runtime
Date: 2006-02-07 15:26:42
Message-ID: 7126.1139326002@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Olivier Sirven <olivier(at)elma(dot)fr> writes:
> Limit (cost=0.00..9677.68 rows=20 width=4)
> -> Nested Loop (cost=0.00..61006657.19 rows=126077 width=4)
> -> Nested Loop (cost=0.00..59991538.61 rows=252145 width=12)
> -> Index Scan Backward using generals_topics_pkey on
> generals_topics gt (cost=0.00..615679.86 rows=14750423 width=8)
> -> Index Scan using ix_category_generals_id_general on
> category_generals cs (cost=0.00..4.01 rows=1 width=4)
> Index Cond: ("outer".id_general = cs.id_general)
> Filter: (id_category = 3)
> -> Index Scan using generals_id_topic_key on generals g
> (cost=0.00..4.01 rows=1 width=4)
> Index Cond: (g.id_general = "outer".id_general)
> Filter: media

> As you can see, every rows of generals_topics table is scanned

No, they aren't, because of the LIMIT. The estimate for a plan node is
the number of rows it would return *if scanned to completion* ... but
the LIMIT will terminate execution as soon as it's gotten back 20 rows.
Hence the LIMIT cost estimate is only 20/126077 of the estimated cost
for the full nestloop.

Your problem probably has something to do with irregular distribution of
the id_category values in the id_topic sort order, but it's hard to tell
with no more data than you've provided.

regards, tom lane