Re: Optimizer regression

Lists: pgsql-hackers
From: Jim Nasby <jim(at)nasby(dot)net>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Optimizer regression
Date: 2012-10-13 15:54:46
Message-ID: E7DB4EF9-89E7-4BE3-9C28-44386068DC27@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I don't know if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this push into subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around so I don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed.

CREATE VIEW loans.payday AS SELECT * FROM loans WHERE loan_type_cd IN ( 'payday', 'other' );
CREATE VIEW loans.payday_defaulted AS SELECT * FROM loans.payday p WHERE EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' );

This query is fast:

SELECT defaulted_then_paid_loans
, ( SELECT count(*)
FROM loans.payday
WHERE ROW( customer_id, status_cd ) = ROW( d.customer_id, d.status_cd )
AND id > coalesce( max_defaulted_loan_id, 0 )
) AS number_of_loans_since_last_default
FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans
, max( d.id ) AS max_defaulted_loan_id
FROM loans.payday p
LEFT JOIN loans.payday_defaulted d USING( id )
WHERE d.customer_id = ?
GROUP BY p.customer_id, p.status_cd
) d
WHERE status_cd = 'paid_off';

This query is not (but was fine on 8.3):
SELECT defaulted_then_paid_loans
, ( SELECT count(*)
FROM loans.payday
WHERE ROW( customer_id, status_cd ) = ROW( d.customer_id, d.status_cd )
AND id > coalesce( max_defaulted_loan_id, 0 )
) AS number_of_loans_since_last_default
FROM ( SELECT p.customer_id, p.status_cd, count( d.id ) AS defaulted_then_paid_loans
, max( d.id ) AS max_defaulted_loan_id
FROM loans.payday p
LEFT JOIN loans.payday_defaulted d USING( id )
GROUP BY p.customer_id, p.status_cd
) d
WHERE status_cd = 'paid_off'
AND customer_id = ?
;

Plan from the "bad" query on 8.3:
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Subquery Scan d (cost=0.00..438.00 rows=2 width=162) (actual time=4883.286..4883.286 rows=1 loops=1)
-> GroupAggregate (cost=0.00..421.91 rows=2 width=17) (actual time=4883.181..4883.181 rows=1 loops=1)
-> Nested Loop Left Join (cost=0.00..421.75 rows=13 width=17) (actual time=314.426..4883.082 rows=31 loops=1)
-> Index Scan using loans_m13 on loans (cost=0.00..36.72 rows=13 width=17) (actual time=52.209..561.240 rows=31 loops=1)
Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Index Scan using loans_pkey on loans (cost=0.00..29.61 rows=1 width=4) (actual time=139.410..139.410 rows=0 loops=31)
Index Cond: (cnu.loans.id = cnu.loans.id)
Filter: (((cnu.loans.loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (subplan))
SubPlan
-> Index Scan using loan_status_u1 on loan_statuses ls (cost=0.00..23.43 rows=1 width=88) (actual time=109.521..109.521 rows=0 loops=31)
Index Cond: (loan_id = $3)
Filter: ((status_cd)::text = 'in_default'::text)
SubPlan
-> Aggregate (cost=8.03..8.04 rows=1 width=0) (actual time=0.100..0.100 rows=1 loops=1)
-> Index Scan using loans_m13 on loans (cost=0.00..8.02 rows=1 width=0) (actual time=0.041..0.084 rows=31 loops=1)
Index Cond: ((customer_id = $0) AND ((status_cd)::text = ($1)::text))
Filter: (((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id > COALESCE($2, 0)))
Total runtime: 4883.439 ms
(19 rows)

And from 8.4…
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Subquery Scan d (cost=3003014.53..3027074.69 rows=2 width=162)
-> GroupAggregate (cost=3003014.53..3027059.89 rows=2 width=17)
-> Hash Left Join (cost=3003014.53..3027059.73 rows=13 width=17)
Hash Cond: (cnu.loans.id = cnu.loans.id)
-> Index Scan using loans_m13 on loans (cost=0.00..36.01 rows=13 width=17)
Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash (cost=2902187.44..2902187.44 rows=6145607 width=4)
-> Hash Join (cost=2027941.10..2902187.44 rows=6145607 width=4)
Hash Cond: (cnu.loans.id = ls.loan_id)
-> Seq Scan on loans (cost=0.00..688340.03 rows=10783881 width=4)
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash (cost=2015760.83..2015760.83 rows=974422 width=4)
-> HashAggregate (cost=2006016.61..2015760.83 rows=974422 width=4)
-> Seq Scan on loan_statuses ls (cost=0.00..1984621.11 rows=8558199 width=4)
Filter: ((status_cd)::text = 'in_default'::text)
SubPlan 1
-> Aggregate (cost=7.38..7.39 rows=1 width=0)
-> Index Scan using loans_m13 on loans (cost=0.00..7.37 rows=1 width=0)
Index Cond: ((customer_id = $0) AND ((status_cd)::text = ($1)::text))
Filter: (((loan_type_cd)::text = ANY ('{payday,cso}'::text[])) AND (id > COALESCE($2, 0)))
(21 rows)

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer regression
Date: 2012-10-13 19:45:47
Message-ID: 18903.1350157547@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> Just upgraded to 8.4 (I know, I know) and ran across this. Unfortunately I have no way to test this on 9.x, so I don't know if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this push into subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around so I don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed.

It's hard to be sure with such an incomplete example, but I think 8.4 is
flattening the EXISTS to a semijoin and then getting trapped by join
order constraints into doing something less than optimal for this
particular use-case. It was this type of example that motivated the
"parameterized path" stuff I've been working on for the past couple
of years.

In short, 9.2 should produce at least as good a plan as 8.3 for this
example, but 8.4 through 9.1 might not.

BTW, your workaround looks wrong --- you need to constrain the outside
of the left join not the inside, no?

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer regression
Date: 2012-10-13 19:54:28
Message-ID: 5079C6F4.6090700@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/13/12 2:45 PM, Tom Lane wrote:
> BTW, your workaround looks wrong --- you need to constrain the outside
> of the left join not the inside, no?

Ugh, yes, you're correct. :(
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer regression
Date: 2012-10-13 20:15:08
Message-ID: 5079CBCC.8060003@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/13/12 2:45 PM, Tom Lane wrote:
> Jim Nasby <jim(at)nasby(dot)net> writes:
>> Just upgraded to 8.4 (I know, I know…) and ran across this. Unfortunately I have no way to test this on 9.x, so I don't know if it's been fixed or not. I'm hoping that someone *cough*Tom*cough* would quickly recognize whether this push into subquery issue has been fixed or not, so I haven't included full details or a test case. I have a work-around so I don't care about this in 8.4, but if this regression still exists it would be nice if it were fixed.
>
> It's hard to be sure with such an incomplete example, but I think 8.4 is
> flattening the EXISTS to a semijoin and then getting trapped by join
> order constraints into doing something less than optimal for this
> particular use-case. It was this type of example that motivated the
> "parameterized path" stuff I've been working on for the past couple
> of years.
>
> In short, 9.2 should produce at least as good a plan as 8.3 for this
> example, but 8.4 through 9.1 might not.

FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

cnuapp_prod(at)postgres10(dot)obr=# explain WITH default_stats AS (select customer_id, status_cd, count(*), max(id) from loans.payday_defaulted group by customer_id, status_cd) SELECT * FROM default_stats where customer_id=10287151;
QUERY PLAN
------------------------------------------------------------------------------------------------------------
CTE Scan on default_stats (cost=2980046.56..3004313.73 rows=5393 width=162)
Filter: (customer_id = 10287151)
CTE default_stats
-> HashAggregate (cost=2963868.44..2980046.56 rows=1078541 width=17)
-> Hash Join (cost=2028045.22..2902409.22 rows=6145922 width=17)
Hash Cond: (loans.id = ls.loan_id)
-> Seq Scan on loans (cost=0.00..688437.25 rows=10785404 width=17)
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash (cost=2015864.33..2015864.33 rows=974471 width=4)
-> HashAggregate (cost=2006119.62..2015864.33 rows=974471 width=4)
-> Seq Scan on loan_statuses ls (cost=0.00..1984723.02 rows=8558638 width=4)
Filter: ((status_cd)::text = 'in_default'::text)
(12 rows)

cnuapp_prod(at)postgres10(dot)obr=# explain analyze select customer_id, status_cd, count(*), max(id) from loans.payday_defaulted where customer_id=10287151 group by customer_id, status_cd;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
HashAggregate (cost=202.16..202.19 rows=2 width=17) (actual time=0.422..0.422 rows=0 loops=1)
-> Nested Loop Semi Join (cost=0.00..202.07 rows=9 width=17) (actual time=0.422..0.422 rows=0 loops=1)
-> Index Scan using loans_m12 on loans (cost=0.00..41.48 rows=16 width=17) (actual time=0.028..0.121 rows=31 loops=1)
Index Cond: (customer_id = 10287151)
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Index Scan using loan_statuses__loan_id__status on loan_statuses ls (cost=0.00..10.17 rows=1 width=4) (actual time=0.009..0.009 rows=0 loops=31)
Index Cond: ((ls.loan_id = loans.id) AND ((ls.status_cd)::text = 'in_default'::text))
Total runtime: 0.510 ms
(8 rows)

cnuapp_prod(at)postgres10(dot)obr=#

I hope that we'll have 9.2 stood up before the year is out, so we'll check this then and see if it's fixed.
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer regression
Date: 2012-10-13 20:28:21
Message-ID: 19873.1350160101@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Nasby <jim(at)nasby(dot)net> writes:
> FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

I think it's not that so much as the EXISTS inside a LEFT JOIN.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizer regression
Date: 2012-10-13 20:33:36
Message-ID: 5079D020.8020208@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/13/12 3:15 PM, Jim Nasby wrote:
> FWIW, it's definitely an issue of not being able to push down past the GROUP BY:

I take that back... GROUP BY doesn't matter. It's an issue of having the EXISTS in the inner query. I realize the examples have gotten a bit silly, but this seems to break it down to the simplest case of what's happening.

FAST:

explain analyze SELECT p.customer_id, p.status_cd, EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' ) FROM loans p LEFT JOIN

( SELECT * FROM loans p WHERE loan_type_cd IN ( 'payday', 'cso' ) ) d USING( id ) WHERE p.customer_id = 10287151 AND p.status_cd = 'paid_off'

AND p.loan_type_cd IN ( 'payday', 'cso' ) ;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop Left Join (cost=0.00..234.87 rows=13 width=17) (actual time=0.085..0.861 rows=31 loops=1)
-> Index Scan using loans_m13 on loans p (cost=0.00..36.01 rows=13 width=17) (actual time=0.045..0.137 rows=31 loops=1)
Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Index Scan using loans_pkey on loans p (cost=0.00..5.12 rows=1 width=4) (actual time=0.011..0.011 rows=1 loops=31)
Index Cond: (p.id = p.id)
Filter: ((p.loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
SubPlan 1
-> Index Scan using loan_statuses__loan_id__status on loan_statuses ls (cost=0.00..10.17 rows=1 width=0) (actual time=0.011..0.011 rows=0 loops=31)
Index Cond: ((loan_id = $0) AND ((status_cd)::text = 'in_default'::text))
Total runtime: 0.950 ms
(11 rows)

SLOW:

cnuapp_prod(at)postgres10(dot)obr=# explain SELECT p.customer_id, p.status_cd FROM loans p LEFT JOIN

( SELECT * FROM loans p WHERE loan_type_cd IN ( 'payday', 'cso' ) AND EXISTS( SELECT * FROM loan_statuses ls WHERE ls.loan_id = p.id AND ls.status_cd = 'in_default' ) ) d USING( id ) WHERE p.customer_id = 10287151 AND p.status_cd =
'paid_of
f' AND p.loan_type_cd IN ( 'payday', 'cso' ) ;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
Hash Left Join (cost=3003251.16..3027297.36 rows=13 width=13)
Hash Cond: (p.id = p.id)
-> Index Scan using loans_m13 on loans p (cost=0.00..36.01 rows=13 width=17)
Index Cond: ((customer_id = 10287151) AND ((status_cd)::text = 'paid_off'::text))
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash (cost=2902419.07..2902419.07 rows=6145927 width=4)
-> Hash Join (cost=2028047.07..2902419.07 rows=6145927 width=4)
Hash Cond: (p.id = ls.loan_id)
-> Seq Scan on loans p (cost=0.00..688444.00 rows=10785509 width=4)
Filter: ((loan_type_cd)::text = ANY ('{payday,cso}'::text[]))
-> Hash (cost=2015866.17..2015866.17 rows=974472 width=4)
-> HashAggregate (cost=2006121.45..2015866.17 rows=974472 width=4)
-> Seq Scan on loan_statuses ls (cost=0.00..1984724.84 rows=8558646 width=4)
Filter: ((status_cd)::text = 'in_default'::text)
(14 rows)

cnuapp_prod(at)postgres10(dot)obr=#

--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net