hashed subplan 5000x slower than two sequential operations

Lists: pgsql-performance
From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 18:53:58
Message-ID: 4CFFD446.6040609@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Can you help me understand how to optimize the following. There's a
subplan which in this case returns 3 rows,
but it is really expensive:

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
Filter: indexed
-> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
-> Seq Scan on contexts (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
-> Index Scan using collection_data_context_key_index
on collection_data (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
Total runtime: 6002.976 ms
(11 rows)

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
-> Bitmap Heap Scan on contexts (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
-> BitmapOr (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
Index Cond: (parent_key = 392210)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-> Index Scan using article_key_idx on articles (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
Index Cond: (articles.context_key = contexts.context_key)
Filter: articles.indexed
Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

=========================================================================
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)


From: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
To: Bryce Nesbitt <bryce2(at)obviously(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 19:05:48
Message-ID: E6DB850FDAD49A459E3C217442489C9225FB9256AB@HOV-MAIL.hovservices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)

but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly seems to be.

HTH,

Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services

Office: (866) 808-0935 Ext: 39210
shrirang(dot)chitnis(at)hovservices(dot)com
www.hovservices.com

The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee. The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited. If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following. There's a
subplan which in this case returns 3 rows,
but it is really expensive:

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
Filter: indexed
-> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
-> Seq Scan on contexts (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
-> Index Scan using collection_data_context_key_index
on collection_data (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
Total runtime: 6002.976 ms
(11 rows)

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
-> Bitmap Heap Scan on contexts (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
-> BitmapOr (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
Index Cond: (parent_key = 392210)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-> Index Scan using article_key_idx on articles (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
Index Cond: (articles.context_key = contexts.context_key)
Filter: articles.indexed
Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

=========================================================================
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Cc: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:05:26
Message-ID: 4CFFE506.2010606@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Shrirang Chitnis wrote:
> Bryce,
> The two queries are different:
>
Ah, due to a mistake. The first version with the hashed subplan is from
production.
The second version should have read:

====================================================================================
production=> SELECT collection_data.context_key FROM collection_data
WHERE collection_data.collection_context_key = 392210;
392210
395073
1304250
production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.context_key IN
(392210,395073,1304250))
AND articles.indexed
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=12.32..414.41 rows=20 width=4) (actual
time=0.112..0.533 rows=28 loops=1)
-> Bitmap Heap Scan on contexts (cost=12.32..135.13 rows=62
width=4) (actual time=0.079..0.152 rows=31 loops=1)
Recheck Cond: ((parent_key = 392210) OR (context_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
-> BitmapOr (cost=12.32..12.32 rows=62 width=0) (actual
time=0.070..0.070 rows=0 loops=1)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.029..0.029 rows=28 loops=1)
Index Cond: (parent_key = 392210)
-> Bitmap Index Scan on contexts_pkey (cost=0.00..9.22
rows=4 width=0) (actual time=0.037..0.037 rows=4 loops=1)
Index Cond: (context_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-> Index Scan using article_key_idx on articles (cost=0.00..4.49
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=31)
Index Cond: (articles.context_key = contexts.context_key)
Filter: articles.indexed
Total runtime: 0.614 ms
(12 rows)

====================================================================================
production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=3415.609..6737.863 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.038..4587.914 rows=517416 loops=1)
Filter: indexed
-> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=926.965..926.965 rows=31 loops=1)
-> Seq Scan on contexts (cost=14.31..69921.25 rows=800493
width=4) (actual time=2.113..926.794 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
-> Index Scan using collection_data_context_key_index
on collection_data (cost=0.00..14.30 rows=6 width=4) (actual
time=0.084..0.088 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
Total runtime: 6738.042 ms
(11 rows)


From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Shrirang Chitnis" <Shrirang(dot)Chitnis(at)hovservices(dot)com>, "Bryce Nesbitt" <bryce2(at)obviously(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:06:12
Message-ID: C4DAC901169B624F933534A26ED7DF3103E915FA@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hello,

are the table freshly analyzed, with a sufficient default_statistics_target ?

You may try to get a better plan while rewriting the query as an UNION to get rid of the OR clause.
Something like (not tested):

SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE contexts.parent_key = 392210
AND articles.indexed

UNION
SELECT context_key
FROM
(
SELECT contexts.context_key
FROM contexts JOIN collection_data ON ( contexts.context_key = collection_data .context_key)
WHERE collection_data.collection_context_key = 392210)
) foo JOIN articles ON (foo.context_key=contexts.context_key)
WHERE articles.indexed
;

I've had one similar problem where there was no way for the planner to notice that the query would systematically return very few rows. Here, my last resort was to disable some planner methods within the given transaction.

regards,

Marc Mamin

-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner(at)postgresql(dot)org im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt; pgsql-performance(at)postgresql(dot)org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)

but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly seems to be.

HTH,

Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services

Office: (866) 808-0935 Ext: 39210
shrirang(dot)chitnis(at)hovservices(dot)com
www.hovservices.com

The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee. The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited. If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following. There's a
subplan which in this case returns 3 rows,
but it is really expensive:

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
Filter: indexed
-> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
-> Seq Scan on contexts (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
-> Index Scan using collection_data_context_key_index
on collection_data (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
Total runtime: 6002.976 ms
(11 rows)

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
-> Bitmap Heap Scan on contexts (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
-> BitmapOr (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
Index Cond: (parent_key = 392210)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-> Index Scan using article_key_idx on articles (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
Index Cond: (articles.context_key = contexts.context_key)
Filter: articles.indexed
Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

=========================================================================
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>
To: "Marc Mamin" <M(dot)Mamin(at)intershop(dot)de>, "Shrirang Chitnis" <Shrirang(dot)Chitnis(at)hovservices(dot)com>, "Bryce Nesbitt" <bryce2(at)obviously(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:12:23
Message-ID: C4DAC901169B624F933534A26ED7DF3103E915FC@JENMAIL01.ad.intershop.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


Another point: would a conditionl index help ?

on articles (context_key) where indexed

regards,

-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner(at)postgresql(dot)org im Auftrag von Marc Mamin
Gesendet: Mi 12/8/2010 9:06
An: Shrirang Chitnis; Bryce Nesbitt; pgsql-performance(at)postgresql(dot)org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

Hello,

are the table freshly analyzed, with a sufficient default_statistics_target ?

You may try to get a better plan while rewriting the query as an UNION to get rid of the OR clause.
Something like (not tested):

SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE contexts.parent_key = 392210
AND articles.indexed

UNION
SELECT context_key
FROM
(
SELECT contexts.context_key
FROM contexts JOIN collection_data ON ( contexts.context_key = collection_data .context_key)
WHERE collection_data.collection_context_key = 392210)
) foo JOIN articles ON (foo.context_key=contexts.context_key)
WHERE articles.indexed
;

I've had one similar problem where there was no way for the planner to notice that the query would systematically return very few rows. Here, my last resort was to disable some planner methods within the given transaction.

regards,

Marc Mamin

-----Ursprüngliche Nachricht-----
Von: pgsql-performance-owner(at)postgresql(dot)org im Auftrag von Shrirang Chitnis
Gesendet: Mi 12/8/2010 8:05
An: Bryce Nesbitt; pgsql-performance(at)postgresql(dot)org
Betreff: Re: [PERFORM] hashed subplan 5000x slower than two sequential operations

Bryce,

The two queries are different:

You are looking for contexts.context_key in first query

WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)

but second query has context.parent_key

WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))

Is the contexts.context_key an indexed field? contexts.parent_key certainly seems to be.

HTH,

Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services

Office: (866) 808-0935 Ext: 39210
shrirang(dot)chitnis(at)hovservices(dot)com
www.hovservices.com

The information contained in this message, including any attachments, is attorney privileged and/or confidential information intended only for the use of the individual or entity named as addressee. The review, dissemination, distribution or copying of this communication by or to anyone other than the intended addressee is strictly prohibited. If you have received this communication in error, please immediately notify the sender by replying to the message and destroy all copies of the original message.

-----Original Message-----
From: pgsql-performance-owner(at)postgresql(dot)org [mailto:pgsql-performance-owner(at)postgresql(dot)org] On Behalf Of Bryce Nesbitt
Sent: Thursday, December 09, 2010 12:24 AM
To: pgsql-performance(at)postgresql(dot)org
Subject: [PERFORM] hashed subplan 5000x slower than two sequential operations

Can you help me understand how to optimize the following. There's a
subplan which in this case returns 3 rows,
but it is really expensive:

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=83054.41..443755.45 rows=261077 width=4) (actual
time=4362.143..6002.808 rows=28 loops=1)
Hash Cond: (articles.context_key = contexts.context_key)
-> Seq Scan on articles (cost=0.00..345661.91 rows=522136 width=4)
(actual time=0.558..3953.002 rows=517356 loops=1)
Filter: indexed
-> Hash (cost=69921.25..69921.25 rows=800493 width=4) (actual
time=829.501..829.501 rows=31 loops=1)
-> Seq Scan on contexts (cost=14.31..69921.25 rows=800493
width=4) (actual time=1.641..829.339 rows=31 loops=1)
Filter: ((parent_key = 392210) OR (hashed subplan))
SubPlan
-> Index Scan using collection_data_context_key_index
on collection_data (cost=0.00..14.30 rows=6 width=4) (actual
time=0.018..0.023 rows=3 loops=1)
Index Cond: (collection_context_key = 392210)
Total runtime: 6002.976 ms
(11 rows)

=========================================================================
explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210 OR contexts.parent_key IN
(392210,392210,395073,1304250))
AND articles.indexed
;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=14.35..1863.85 rows=94 width=4) (actual
time=0.098..1.038 rows=57 loops=1)
-> Bitmap Heap Scan on contexts (cost=14.35..572.57 rows=288
width=4) (actual time=0.079..0.274 rows=59 loops=1)
Recheck Cond: ((parent_key = 392210) OR (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[])))
-> BitmapOr (cost=14.35..14.35 rows=288 width=0) (actual
time=0.066..0.066 rows=0 loops=1)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..3.07 rows=58 width=0) (actual time=0.028..0.028 rows=28 loops=1)
Index Cond: (parent_key = 392210)
-> Bitmap Index Scan on parent_key_idx
(cost=0.00..11.13 rows=231 width=0) (actual time=0.035..0.035 rows=87
loops=1)
Index Cond: (parent_key = ANY
('{392210,392210,395073,1304250}'::integer[]))
-> Index Scan using article_key_idx on articles (cost=0.00..4.47
rows=1 width=4) (actual time=0.007..0.008 rows=1 loops=59)
Index Cond: (articles.context_key = contexts.context_key)
Filter: articles.indexed
Total runtime: 1.166 ms
(12 rows)

production=> explain analyze SELECT contexts.context_key
FROM contexts
JOIN articles
ON (articles.context_key=contexts.context_key)
WHERE (contexts.parent_key = 392210
OR contexts.context_key IN
(SELECT collection_data.context_key
FROM collection_data
WHERE collection_data.collection_context_key = 392210)
)
AND articles.indexed
;

=========================================================================
# select version();
PostgreSQL 8.3.4 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.1.2
20061115 (prerelease) (Debian 4.1.1-21)

--
Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
Cc: Bryce Nesbitt <bryce2(at)obviously(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:12:26
Message-ID: 20170.1291839146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com> writes:
> Bryce,
> The two queries are different:

I suspect the second one is a typo and not what he really wanted.

> WHERE (contexts.parent_key = 392210
> OR contexts.context_key IN
> (SELECT collection_data.context_key
> FROM collection_data
> WHERE collection_data.collection_context_key = 392210)

The only really effective way the planner knows to optimize an
"IN (sub-SELECT)" is to turn it into a semi-join, which is not possible
here because of the unrelated OR clause. You might consider replacing
this with a UNION of two scans of "contexts". (And yes, I know it'd be
nicer if the planner did that for you.)

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>, Bryce Nesbitt <bryce2(at)obviously(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:25:40
Message-ID: AANLkTikQgD_ugqQOF4hQ0KgXXVgqfoAb7NjmZk+v0KfG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2010/12/8 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com> writes:
>> Bryce,
>> The two queries are different:
>
> I suspect the second one is a typo and not what he really wanted.
>
>> WHERE (contexts.parent_key = 392210
>>       OR contexts.context_key IN
>>          (SELECT collection_data.context_key
>>          FROM collection_data
>>           WHERE collection_data.collection_context_key = 392210)
>
> The only really effective way the planner knows to optimize an
> "IN (sub-SELECT)" is to turn it into a semi-join, which is not possible
> here because of the unrelated OR clause.  You might consider replacing
> this with a UNION of two scans of "contexts".  (And yes, I know it'd be
> nicer if the planner did that for you.)

I remeber a similar case - 9 years ago.

slow variant:

WHERE pk = C1 OR pk IN (SELECT .. FROM .. WHERE some = C2)

I had to rewrite to form

WHERE pk IN (SELECT .. FROM WHERE some = C2 UNION ALL SELECT C1)

Regards

Pavel Stehule

>
>                        regards, tom lane
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:31:53
Message-ID: 4CFFEB39.1010601@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html; charset=ISO-8859-1"
http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
<tt>Marc Mamin wrote:</tt>
<blockquote
cite="mid:C4DAC901169B624F933534A26ED7DF3103E915FA(at)JENMAIL01(dot)ad(dot)intershop(dot)net"
type="cite">
<meta http-equiv="Content-Type"
content="text/html; charset=ISO-8859-1">
<meta name="Generator"
content="MS Exchange Server version 6.5.7651.59">
<title>AW: [PERFORM] hashed subplan 5000x slower than two sequential
operations</title>
<tt><br>
</tt>
<p><tt><font size="2">Hello,<br>
are the table freshly analyzed, with a sufficient
default_statistics_target ?<br>
</font></tt></p>
</blockquote>
<tt><br>
autovacuum = on&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp; # Enable autovacuum
subprocess?&nbsp; 'on' <br>
autovacuum_naptime = 5min&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # time between autovacuum runs<br>
default_statistics_target = 150&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # range 1-1000<br>
<br>
<br>
</tt>
<blockquote
cite="mid:C4DAC901169B624F933534A26ED7DF3103E915FA(at)JENMAIL01(dot)ad(dot)intershop(dot)net"
type="cite">
<p><tt><font size="2"><br>
You may try to get a better plan while rewriting the query as an UNION
to get rid of the OR clause.<br>
Something like (not tested):<br>
</font></tt></p>
</blockquote>
<tt>It is way better<br>
<br>
<br>
EXPLAIN ANALYZE SELECT contexts.context_key<br>
FROM contexts<br>
&nbsp;&nbsp;&nbsp; JOIN articles<br>
&nbsp;&nbsp;&nbsp; ON (articles.context_key=contexts.context_key)<br>
WHERE (contexts.parent_key =
392210)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br>
AND articles.indexed<br>
<br>
UNION<br>
SELECT collection_data.context_key<br>
FROM collection_data<br>
JOIN articles ON (articles.context_key=collection_data.context_key)<br>
WHERE collection_data.collection_context_key = 392210<br>
AND articles.indexed;<br>
<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
QUERY
PLAN&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
<br>
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------<br>
&nbsp;Unique&nbsp; (cost=418.50..418.61 rows=22 width=4) (actual
time=0.582..0.671 rows=28 loops=1)<br>
&nbsp;&nbsp; -&gt;&nbsp; Sort&nbsp; (cost=418.50..418.55 rows=22 width=4) (actual
time=0.579..0.608 rows=28 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sort Key: contexts.context_key<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Sort Method:&nbsp; quicksort&nbsp; Memory: 26kB<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Append&nbsp; (cost=0.00..418.01 rows=22 width=4) (actual
time=0.042..0.524 rows=28 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Nested Loop&nbsp; (cost=0.00..376.46 rows=19 width=4)
(actual time=0.040..0.423 rows=28 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Index Scan using parent_key_idx on
contexts&nbsp; (cost=0.00..115.20 rows=58 width=4) (actual time=0.021..0.082
rows=28 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (parent_key = 392210)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Index Scan using article_key_idx on
articles&nbsp; (cost=0.00..4.49 rows=1 width=4) (actual time=0.007..0.008
rows=1 loops=28)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (public.articles.context_key =
contexts.context_key)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Filter: public.articles.indexed<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Nested Loop&nbsp; (cost=0.00..41.32 rows=3 width=4)
(actual time=0.043..0.043 rows=0 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Index Scan using
collection_data_context_key_index on collection_data&nbsp; (cost=0.00..14.30
rows=6 width=4) (actual time=0.012..0.015 rows=3 loops=1)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (collection_context_key = 392210)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -&gt;&nbsp; Index Scan using article_key_idx on
articles&nbsp; (cost=0.00..4.49 rows=1 width=4) (actual time=0.006..0.006
rows=0 loops=3)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Index Cond: (public.articles.context_key =
collection_data.context_key)<br>
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Filter: public.articles.indexed<br>
&nbsp;Total runtime: 0.812 ms<br>
<br>
<br>
<br>
</tt>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 6.7 KB

From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: Marc Mamin <M(dot)Mamin(at)intershop(dot)de>, pgsql-performance(at)postgresql(dot)org
Cc: Shrirang Chitnis <Shrirang(dot)Chitnis(at)hovservices(dot)com>
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2010-12-08 20:33:57
Message-ID: 4CFFEBB5.8090401@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Marc Mamin wrote:
>
> Another point: would a conditionl index help ?
> on articles (context_key) where indexed
>
no.

production=> select count(*),indexed from articles group by indexed;
count | indexed
--------+---------
517433 | t
695814 | f


From: masterchief <esimon(at)theiqgroup(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2011-01-18 18:56:59
Message-ID: 1295377019385-3346652.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


> Tom Lane wrote:
>
> The only really effective way the planner knows to optimize an
> "IN (sub-SELECT)" is to turn it into a semi-join, which is not possible
> here because of the unrelated OR clause. You might consider replacing
> this with a UNION of two scans of "contexts". (And yes, I know it'd be
> nicer if the planner did that for you.)

In moving our application from Oracle to Postgres, we've discovered that a
large number of our reports fall into this category. If we rewrite them as
a UNION of two scans, it would be quite a big undertaking. Is there a way
to tell the planner explicitly to use a semi-join (I may not grasp the
concepts here)? If not, would your advice be to hunker down and rewrite the
queries?
--
View this message in context: http://postgresql.1045698.n5.nabble.com/hashed-subplan-5000x-slower-than-two-sequential-operations-tp3297790p3346652.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


From: Віталій Тимчишин <tivv00(at)gmail(dot)com>
To: masterchief <esimon(at)theiqgroup(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: hashed subplan 5000x slower than two sequential operations
Date: 2011-01-18 21:29:40
Message-ID: AANLkTik4apwA1Dwbp+PYoz7OOiYY6TDniC+ECqaCwBHt@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

2011/1/18 masterchief <esimon(at)theiqgroup(dot)com>

>
> > Tom Lane wrote:
> >
> > The only really effective way the planner knows to optimize an
> > "IN (sub-SELECT)" is to turn it into a semi-join, which is not possible
> > here because of the unrelated OR clause. You might consider replacing
> > this with a UNION of two scans of "contexts". (And yes, I know it'd be
> > nicer if the planner did that for you.)
>
> In moving our application from Oracle to Postgres, we've discovered that a
> large number of our reports fall into this category. If we rewrite them as
> a UNION of two scans, it would be quite a big undertaking. Is there a way
> to tell the planner explicitly to use a semi-join (I may not grasp the
> concepts here)? If not, would your advice be to hunker down and rewrite
> the
> queries?
>
>
You can try "exists" instead of "in". Postgresql likes exists better.
Alternatively, you can do something like "set enable_seqscan=false". Note
that such set is more like a hammer, so should be avoided. If it is the only
thing that helps, it can be set right before calling query and reset to
default afterwards.
--

Best regards,
Vitalii Tymchyshyn