Re: BUG #3494: may be Query Error: subplan does not executed

Lists: pgsql-bugs
From: "Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-27 10:38:39
Message-ID: 200707271038.l6RAcduY034807@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3494
Logged by: Sergey Burladyan
Email address: eshkinkot(at)gmail(dot)com
PostgreSQL version: 8.1.9
Operating system: CentOS release 5 (Final)
Description: may be Query Error: subplan does not executed
Details:

i have query with join of two table and 'where' filter it result by subplan
which have references to join result, but this subplan not executed and
result is incorrect. This subplan also not exist in explain analyze output.

test schema:
create table test_1 ( name char(10), ku numeric(4) , ku_1 numeric(4) );
insert into test_1 (name,ku,ku_1) values ('Petrov', 1, 0);
insert into test_1 (name,ku,ku_1) values ('Ivanov', 2, 0);
insert into test_1 (name,ku,ku_1) values ('Sidorov', 3, 0);

create table test_2 (kh numeric(13), ku numeric(4) , d_s timestamp );
insert into test_2 (kh,ku,d_s) values (1, 1, '2007-01-01');
insert into test_2 (kh,ku,d_s) values (1, 2, '2007-01-01');
insert into test_2 (kh,ku,d_s) values (1, 3, '2007-01-01');

problem query:
select *
from test_1 mt1,
test_2 mt2
where mt2.kh = 1 and
mt2.ku between 1 and 100 and
mt1.ku = mt2.ku and
mt1.ku = (select min(t1.ku)
from test_1 t1,test_2 t2
where t1.ku_1 = mt1.ku_1 and
t2.kh = mt2.kh and
t2.d_s = mt2.d_s and
t1.ku = t2.ku )

QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------
Merge Join (cost=2.13..2.19 rows=3 width=60) (actual time=0.062..0.078
rows=3 loops=1)
Merge Cond: ("outer".ku = "inner".ku)
-> Sort (cost=1.05..1.06 rows=3 width=32) (actual time=0.026..0.029
rows=3 loops=1)
Sort Key: mt1.ku
-> Seq Scan on test_1 mt1 (cost=0.00..1.03 rows=3 width=32)
(actual time=0.007..0.011 rows=3 loops=1)
-> Sort (cost=1.08..1.08 rows=3 width=28) (actual time=0.028..0.030
rows=3 loops=1)
Sort Key: mt2.ku
-> Seq Scan on test_2 mt2 (cost=0.00..1.05 rows=3 width=28)
(actual time=0.011..0.018 rows=3 loops=1)
Filter: ((kh = 1::numeric) AND (ku >= 1::numeric) AND (ku <=
100::numeric))

But, when i comment out some 'where' condition in subplan because it always
true (i think) - subplan show up and query work ok:
select *
from test_1 mt1,
test_2 mt2
where mt2.kh = 1 and
mt2.ku between 1 and 100 and
mt1.ku = mt2.ku and
mt1.ku = (select min(t1.ku)
from test_1 t1,test_2 t2
where /* t1.ku_1 = mt1.ku_1 and */
t2.kh = mt2.kh and
t2.d_s = mt2.d_s and
t1.ku = t2.ku )
QUERY PLAN

----------------------------------------------------------------------------
-------------------------------------------------
-------
Nested Loop (cost=0.00..8.74 rows=1 width=60) (actual time=0.125..0.248
rows=1 loops=1)
Join Filter: ("inner".ku = "outer".ku)
-> Seq Scan on test_2 mt2 (cost=0.00..7.67 rows=1 width=28) (actual
time=0.114..0.228 rows=1 loops=1)
Filter: ((kh = 1::numeric) AND (ku >= 1::numeric) AND (ku <=
100::numeric) AND ((subplan) = ku))
SubPlan
-> Aggregate (cost=2.19..2.20 rows=1 width=10) (actual
time=0.063..0.064 rows=1 loops=3)
-> Merge Join (cost=2.12..2.18 rows=3 width=10) (actual
time=0.039..0.054 rows=3 loops=3)
Merge Cond: ("outer".ku = "inner".ku)
-> Sort (cost=1.05..1.06 rows=3 width=10) (actual
time=0.009..0.011 rows=3 loops=3)
Sort Key: t1.ku
-> Seq Scan on test_1 t1 (cost=0.00..1.03
rows=3 width=10) (actual time=0.004..0.009 rows=3 loops=1)
-> Sort (cost=1.07..1.08 rows=3 width=10) (actual
time=0.024..0.026 rows=3 loops=3)
Sort Key: t2.ku
-> Seq Scan on test_2 t2 (cost=0.00..1.04
rows=3 width=10) (actual time=0.006..0.013 rows=3 loops=3)
Filter: ((kh = $0) AND (d_s = $1))
-> Seq Scan on test_1 mt1 (cost=0.00..1.03 rows=3 width=32) (actual
time=0.003..0.006 rows=3 loops=1)

i am not sure, is this my incompetence or may be problem in planer ?


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Sergey Burladyan <eshkinkot(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-27 13:58:20
Message-ID: 46A9F9FC.6090603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Sergey Burladyan wrote:
> i have query with join of two table and 'where' filter it result by subplan
> which have references to join result, but this subplan not executed and
> result is incorrect. This subplan also not exist in explain analyze output.

I can reproduce this on 8.1 and 8.2 branch heads, but not on 8.3 HEAD.
I've simplified test case down to this:

CREATE TABLE test1 (id int);
CREATE TABLE test2 (id int);

INSERT INTO test1 VALUES (1);
INSERT INTO test2 VALUES (1);

INSERT INTO test2 VALUES (1);

SELECT * FROM test1, test2
WHERE test1.id = test2.id
AND test1.id = (SELECT 123 WHERE test1.id = 100 AND test2.id = 100);

The query incorrectly returns one row, instead of none, with this plan:

QUERY PLAN
-----------------------------------------------------------
Nested Loop (cost=0.00..2.03 rows=1 width=8)
Join Filter: (test1.id = test2.id)
-> Seq Scan on test1 (cost=0.00..1.01 rows=1 width=4)
-> Seq Scan on test2 (cost=0.00..1.01 rows=1 width=4)
(4 rows)

I don't have the time to dig deeper right now, but I hope this helps
someone else to get started...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Sergey Burladyan <eshkinkot(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-28 15:06:28
Message-ID: 11772.1185635188@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> Sergey Burladyan wrote:
>> i have query with join of two table and 'where' filter it result by subplan
>> which have references to join result, but this subplan not executed and
>> result is incorrect. This subplan also not exist in explain analyze output.

> I can reproduce this on 8.1 and 8.2 branch heads, but not on 8.3 HEAD.
> I've simplified test case down to this:

You don't even need the sub-select:

regression=# explain SELECT * FROM test1, test2
WHERE test1.id = test2.id
AND test1.id = test1.id*test2.id;
QUERY PLAN
---------------------------------------------------------------------
Merge Join (cost=299.56..653.73 rows=22898 width=8)
Merge Cond: (test1.id = test2.id)
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: test1.id
-> Seq Scan on test1 (cost=0.00..31.40 rows=2140 width=4)
-> Sort (cost=149.78..155.13 rows=2140 width=4)
Sort Key: test2.id
-> Seq Scan on test2 (cost=0.00..31.40 rows=2140 width=4)
(8 rows)

It appears that join_clause_is_redundant() is rejecting the clause as
redundant. I suppose some part of that machinery gets confused by the
fact that the RHS of the clause references both relations. The
EquivalenceClass rewrite cleaned this whole area up greatly, so no
surprise that the bug is gone in HEAD. No time to look at it more now.

regards, tom lane


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sergey Burladyan <eshkinkot(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-31 09:10:14
Message-ID: 46AEFC76.4050001@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> It appears that join_clause_is_redundant() is rejecting the clause as
> redundant. I suppose some part of that machinery gets confused by the
> fact that the RHS of the clause references both relations. The
> EquivalenceClass rewrite cleaned this whole area up greatly, so no
> surprise that the bug is gone in HEAD. No time to look at it more now.

Yeah, that's it.

I don't understand that code very well. Why does it think that the right
pathkeys of "test1.id = test2.id" and "test1.id = test1.id+test2.id" are
equal?

Checking for overlapping left and right relation sets in
join_clause_is_redundant seems to fix the problem, though I'm not sure
if there's some other cases that that doesn't catch.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fix-3493.patch text/x-diff 2.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Sergey Burladyan <eshkinkot(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-31 14:17:21
Message-ID: 18557.1185891441@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> I don't understand that code very well. Why does it think that the right
> pathkeys of "test1.id = test2.id" and "test1.id = test1.id+test2.id" are
> equal?

They *will* be equal ... after the join (if correctly implemented :-().
The problem is the code is missing the possibility that one side or the
other can't have been checked yet because it involves variables from
both sides of the current join. The case it is thinking about is like

where a.x1 = b.y1 and a.x1 = b.z1

In this case the x1 = z1 condition need not be checked at the join
because the implied equality b.y1 = b.z1 will have been enforced at the
scan of b. In general the rule is that all conditions involving a set
of mutually equated expressions should be checked as early as possible,
which usually means that only one test is needed at a join point. But
there's no way to constrain test1.id+test2.id at any earlier point than
the join, so that clause isn't redundant.

I believe HEAD gets this right, because it's explicitly recognizing
"newly computable" members of an equivalence class:

* First, scan the EC to identify member values that are computable
* at the outer rel, at the inner rel, or at this relation but not in
* either input rel. The outer-rel members should already be enforced
* equal, likewise for the inner-rel members. We'll need to create
* clauses to enforce that any newly computable members are all equal
* to each other as well as to at least one input member, plus enforce
* at least one outer-rel member equal to at least one inner-rel member.

However the old ad-hoc code didn't have this much structure ...

> Checking for overlapping left and right relation sets in
> join_clause_is_redundant seems to fix the problem, though I'm not sure
> if there's some other cases that that doesn't catch.

No, that doesn't work (at least not in cases involving more than 2
rels). What we have to look for is left or right relids that overlap
both sides of the proposed join, which is a bit of a nuisance because
this subroutine isn't told exactly what the proposed join is. We'll
have to pass down the left/right join relids from a level or two higher.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Sergey Burladyan" <eshkinkot(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3494: may be Query Error: subplan does not executed
Date: 2007-07-31 19:55:31
Message-ID: 5846.1185911731@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Sergey Burladyan" <eshkinkot(at)gmail(dot)com> writes:
> Description: may be Query Error: subplan does not executed

I've applied a patch for this; it'll be in the next set of releases.

regards, tom lane