Very bad plan when using VIEW and IN (SELECT...*)

Lists: pgsql-performance
From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: pgsql-performance(at)postgresql(dot)org
Subject: Very bad plan when using VIEW and IN (SELECT...*)
Date: 2010-08-12 21:47:30
Message-ID: i41q52$28rb$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Ref these two queries against a view:

-- QUERY 1, executes < 0.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (13083101)

-- QUERY 2, executes > 13.5 secs
SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id IN (SELECT 13083101)

I am using the simple IN (SELECT n) in QUERY 2 to simplify the problem. I
noticed the oddity of the behaviour when I used a proper "IN (SELECT myId
FROM myTable)" but the planner shows the same behaviour even if not
selecting from a table - just the SELECT keyword is enough.

Plans are below. The view has an internal UNION.
Any explanation as to why this happens?

The actualt view is listed at the very bottom, if relevant.

Carlo

QUERY 1 PLAN
"Unique (cost=25.48..25.69 rows=2 width=417) (actual time=0.180..0.190
rows=2 loops=1)"
" -> Sort (cost=25.48..25.48 rows=2 width=417) (actual time=0.179..0.180
rows=2 loops=1)"
" Sort Key: "*SELECT* 1".provider_id, (NULL::integer), "*SELECT*
1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin, "*SELECT*
1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT* 1".npi,
"*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT* 1".aoa_id,
"*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT* 1".middle_name,
"*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT* 1".display_title,
"*SELECT* 1".nickname, "*SELECT* 1".familiar_name, "*SELECT* 1".pubmed_name,
"*SELECT* 1".master_name, "*SELECT* 1".display_name_orig, (NULL::text),
"*SELECT* 1".gender, "*SELECT* 1".birth_year, "*SELECT* 1".birth_month,
"*SELECT* 1".birth_day, "*SELECT* 1".clinical_interest, "*SELECT*
1".research_interest, "*SELECT* 1".summary, "*SELECT* 1".comments, "*SELECT*
1".degree_types, "*SELECT* 1".provider_type_ids, "*SELECT*
1".provider_status_code, "*SELECT* 1".provider_status_year, "*SELECT*
1".created, "*SELECT* 1".unique_flag, "*SELECT* 1".is_locked, "*SELECT*
1".provider_standing_code, "*SELECT* 1".impt_source_date, "*SELECT*
1".input_resource_id, "*SELECT* 1".input_source_ids"
" Sort Method: quicksort Memory: 27kB"
" -> Append (cost=0.00..25.47 rows=2 width=417) (actual
time=0.078..0.143 rows=2 loops=1)"
" -> Subquery Scan "*SELECT* 1" (cost=0.00..8.59 rows=1
width=408) (actual time=0.078..0.079 rows=1 loops=1)"
" -> Index Scan using provider_provider_id_idx on
provider p (cost=0.00..8.58 rows=1 width=408) (actual time=0.076..0.077
rows=1 loops=1)"
" Index Cond: (provider_id = 13083101)"
" -> Subquery Scan "*SELECT* 2" (cost=0.00..16.87 rows=1
width=417) (actual time=0.061..0.062 rows=1 loops=1)"
" -> Nested Loop (cost=0.00..16.86 rows=1 width=417)
(actual time=0.055..0.056 rows=1 loops=1)"
" -> Index Scan using
provider_name_pid_rec_stat_idx on provider_alias pa (cost=0.00..8.27 rows=1
width=32) (actual time=0.047..0.047 rows=1 loops=1)"
" Index Cond: (provider_id = 13083101)"
" -> Index Scan using provider_provider_id_idx on
provider p (cost=0.00..8.58 rows=1 width=389) (actual time=0.005..0.006
rows=1 loops=1)"
" Index Cond: (p.provider_id = 13083101)"
"Total runtime: 0.371 ms"

QUERY 2 PLAN
"Merge IN Join (cost=2421241.80..3142039.99 rows=30011 width=2032) (actual
time=13778.400..13778.411 rows=2 loops=1)"
" Merge Cond: ("*SELECT* 1".provider_id = (13083101))"
" -> Unique (cost=2421241.77..3066486.33 rows=6002275 width=417) (actual
time=13778.119..13778.372 rows=110 loops=1)"
" -> Sort (cost=2421241.77..2436247.46 rows=6002275 width=417)
(actual time=13778.118..13778.163 rows=110 loops=1)"
" Sort Key: "*SELECT* 1".provider_id, (NULL::integer),
"*SELECT* 1".master_id, "*SELECT* 1".client_ids, "*SELECT* 1".upin,
"*SELECT* 1".medical_education_number, "*SELECT* 1".abmsuid, "*SELECT*
1".npi, "*SELECT* 1".npi_status_code, "*SELECT* 1".cc_id, "*SELECT*
1".aoa_id, "*SELECT* 1".last_name, "*SELECT* 1".first_name, "*SELECT*
1".middle_name, "*SELECT* 1".suffix, "*SELECT* 1".display_name, "*SELECT*
1".display_title, "*SELECT* 1".nickname, "*SELECT* 1".familiar_name,
"*SELECT* 1".pubmed_name, "*SELECT* 1".master_name, "*SELECT*
1".display_name_orig, (NULL::text), "*SELECT* 1".gender, "*SELECT*
1".birth_year, "*SELECT* 1".birth_month, "*SELECT* 1".birth_day, "*SELECT*
1".clinical_interest, "*SELECT* 1".research_interest, "*SELECT* 1".summary,
"*SELECT* 1".comments, "*SELECT* 1".degree_types, "*SELECT*
1".provider_type_ids, "*SELECT* 1".provider_status_code, "*SELECT*
1".provider_status_year, "*SELECT* 1".created, "*SELECT* 1".unique_flag,
"*SELECT* 1".is_locked, "*SELECT* 1".provider_standing_code, "*SELECT*
1".impt_source_date, "*SELECT* 1".input_resource_id, "*SELECT*
1".input_source_ids"
" Sort Method: external merge Disk: 423352kB"
" -> Append (cost=0.00..596598.30 rows=6002275 width=417)
(actual time=0.039..7879.715 rows=1312637 loops=1)"
" -> Subquery Scan "*SELECT* 1" (cost=0.00..543238.96
rows=5994998 width=408) (actual time=0.039..7473.664 rows=1305360 loops=1)"
" -> Seq Scan on provider p (cost=0.00..483288.98
rows=5994998 width=408) (actual time=0.037..6215.112 rows=1305360 loops=1)"
" -> Subquery Scan "*SELECT* 2" (cost=0.00..53359.34
rows=7277 width=417) (actual time=0.049..186.643 rows=7277 loops=1)"
" -> Nested Loop (cost=0.00..53286.57 rows=7277
width=417) (actual time=0.043..176.134 rows=7277 loops=1)"
" -> Seq Scan on provider_alias pa
(cost=0.00..157.77 rows=7277 width=32) (actual time=0.018..3.134 rows=7277
loops=1)"
" -> Index Scan using
provider_provider_id_idx on provider p (cost=0.00..7.29 rows=1 width=389)
(actual time=0.021..0.021 rows=1 loops=7277)"
" Index Cond: (p.provider_id =
pa.provider_id)"
" -> Sort (cost=0.03..0.04 rows=1 width=4) (actual time=0.014..0.014
rows=1 loops=1)"
" Sort Key: (13083101)"
" Sort Method: quicksort Memory: 25kB"
" -> Result (cost=0.00..0.01 rows=1 width=0) (actual
time=0.001..0.001 rows=1 loops=1)"
"Total runtime: 13959.905 ms"

REATE OR REPLACE VIEW mdx_core.vw_provider AS
SELECT
p.provider_id,
NULL AS provider_alias_id,
p.master_id,
p.client_ids,
p.upin,
p.medical_education_number,
p.abmsuid,
p.npi,
p.npi_status_code,
p.cc_id,
p.aoa_id,
p.last_name,
p.first_name,
p.middle_name,
p.suffix,
p.display_name,
p.display_title,
p.nickname,
p.familiar_name,
p.pubmed_name,
p.master_name,
p.display_name_orig,
NULL::text AS is_primary,
p.gender,
p.birth_year,
p.birth_month,
p.birth_day,
p.clinical_interest,
p.research_interest,
p.summary,
p.comments,
p.degree_types,
p.provider_type_ids,
p.provider_status_code,
p.provider_status_year,
p.created,
p.unique_flag,
p.is_locked,
p.provider_standing_code,
p.impt_source_date,
p.input_resource_id,
p.input_source_ids
FROM mdx_core.provider AS p

UNION SELECT
p.provider_id,
pa.provider_alias_id,
p.master_id,
p.client_ids,
p.upin,
p.medical_education_number,
p.abmsuid,
p.npi,
p.npi_status_code,
p.cc_id,
p.aoa_id,
pa.last_name,
pa.first_name,
pa.middle_name,
pa.suffix,
p.display_name,
p.display_title,
p.nickname,
p.familiar_name,
p.pubmed_name,
p.master_name,
p.display_name_orig,
pa.is_primary,
p.gender,
p.birth_year,
p.birth_month,
p.birth_day,
p.clinical_interest,
p.research_interest,
p.summary,
p.comments,
p.degree_types,
p.provider_type_ids,
p.provider_status_code,
p.provider_status_year,
p.created,
p.unique_flag,
p.is_locked,
p.provider_standing_code,
p.impt_source_date,
p.input_resource_id,
p.input_source_ids
FROM mdx_core.provider_alias AS pa
JOIN mdx_core.provider AS p USING (provider_id);


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-performance(at)postgresql(dot)org>, "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
Subject: Re: Very bad plan when using VIEW and IN (SELECT...*)
Date: 2010-08-13 13:28:47
Message-ID: 4C65023F020000250003457D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

"Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca> wrote:

> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (SELECT 13083101)
>
> I am using the simple IN (SELECT n) in QUERY 2 to simplify the
> problem. I noticed the oddity of the behaviour when I used a
> proper "IN (SELECT myId FROM myTable)"

Did you try?:

SELECT *
FROM mdx_core.vw_provider AS p
WHERE EXISTS (SELECT * FROM myTable WHERE myId = provider_id)

For any follow-up you should probably mention what version of
PostgreSQL this is and how it's configured.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin


From: "Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca>
To: "'Kevin Grittner'" <Kevin(dot)Grittner(at)wicourts(dot)gov>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Very bad plan when using VIEW and IN (SELECT...*)
Date: 2010-08-13 14:51:00
Message-ID: BLU0-SMTP8D0D2024451B31C3D268696980@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Unfortunately I had double-posted this - originally in General.

Tom Lane pointed out (in PG-GENERAL) that the planner will take any IN
(SELECT...) statement and do a JOIN, which is what is causing the planner
problem - even though the SELECT was just returning a constant. Obviously,
the real query this was testing was something more real-world.

SO, I took my original query and turned it to this:

SELECT *
FROM mdx_core.vw_provider AS p
WHERE provider_id = ANY array(
SELECT provider_id
FROM mdx_core.provider_alias
)

BLISTERINGLY fast!

PG version is 8.3 - as for configuration, I didn't want to throw too much
info as my concern was actually whether views were as klunky as other DB
platforms.

Carlo

-----Original Message-----
From: Kevin Grittner [mailto:Kevin(dot)Grittner(at)wicourts(dot)gov]
Sent: August 13, 2010 9:29 AM
To: pgsql-performance(at)postgresql(dot)org; Carlo Stonebanks
Subject: Re: [PERFORM] Very bad plan when using VIEW and IN (SELECT...*)

"Carlo Stonebanks" <stonec(dot)register(at)sympatico(dot)ca> wrote:

> SELECT *
> FROM mdx_core.vw_provider AS p
> WHERE provider_id IN (SELECT 13083101)
>
> I am using the simple IN (SELECT n) in QUERY 2 to simplify the
> problem. I noticed the oddity of the behaviour when I used a
> proper "IN (SELECT myId FROM myTable)"

Did you try?:

SELECT *
FROM mdx_core.vw_provider AS p
WHERE EXISTS (SELECT * FROM myTable WHERE myId = provider_id)

For any follow-up you should probably mention what version of
PostgreSQL this is and how it's configured.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-Kevin