Inconsistant SQL results - Suspected error with query planing or query optimisation.

Lists: pgsql-bugspgsql-hackers
From: adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Inconsistant SQL results - Suspected error with query planing or query optimisation.
Date: 2007-05-22 08:45:34
Message-ID: OF0C2A0BDA.4907776C-ONCA2572E3.002AD50A-CA2572E3.00301D71@mackillop.acu.edu.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hay all :)

Firstly, just wanted to say how much I appreceate all the hard work that
has gorn into postgres. It really is a fine system.

Anyway, mybug: I have a test SELECT statement (Listing A - see "sql
listings.txt") wich produces different results under two simular setups
(Listing B and Listing C). Each setup should product the same result for
the given SELECT statement.

In the first setup (Listing B) the table "items" has 10,000 records id'ed
from 1 to 10,000. The field "number" for every record except two records
(id:500 and 600) has the value NULL., the two exceptions (id: 500 and 600)
have the value 1.
This first setup produces the correct results for the given SELECT
statement.

The SELECT statement is essentlually a dubble negitive - using two LEFT
JOINS.
- The most inner nested query selects WHERE number = 1
- Then next most inner selects everything but, and
- The most outer selects every thing but everything but WHERE number = 1

The second setup (Listing C) is identicle to the first execpt that the
table "items" has an extra field and a primary key index. The goal of this
setup is to produce a cirtian query plan that I beleive is broken, where
it seems that the "Nested Loop Left Join" has forced the filter for "WHERE
number = 1" outside or (perhaps after) a join one of the more nested joins
causeing that more nested join to cancel it self out.

Well.... that's what I make of it. I really hope you are able to
reproduce this, it took me ages to find a setup that would be reproduce
what I was observing in my developement system.

See the attached output.txt for the results and the EXPLAIN statement for
the SELECT statement under each setup.

Postgres:
"PostgreSQL 8.2.3 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)"

Hardware:
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
2GB Mem

OS:
Kernel: Linux version 2.6.18-4-amd64 (Debian 2.6.18.dfsg.1-12)
OS: GNU/Linux Debian "Etch" - AMD64

Kind regards,
Adam Terrey

Data Systems - Marketing Strategy and Planning
University Relations Directorate
Australian Catholic University Limited (ABN 15 050 192 660)

42 Edward Street, North Sydney NSW 2060
PO Box 968 North Sydney NSW 2059
Telephone 02 9739 2517
Facsimile 02 9739 2960
Mobile 0421 430 824
Email: adam(dot)terrey(at)acu(dot)edu(dot)au
http://www.acu.edu.au

Australian Catholic University and the courses offered by the University
are
registered on the Commonwealth Register of Institutions and Courses for
Overseas Students (CRICOS). Provider registration codes: 00004G, 0012C,

Attachment Content-Type Size
sql listings.txt text/plain 5.4 KB
output.txt text/plain 1.8 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.
Date: 2007-05-22 15:12:26
Message-ID: 20070522151226.GC5193@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

adam terrey wrote:

> The second setup (Listing C) is identicle to the first execpt that the
> table "items" has an extra field and a primary key index. The goal of this
> setup is to produce a cirtian query plan that I beleive is broken, where
> it seems that the "Nested Loop Left Join" has forced the filter for "WHERE
> number = 1" outside or (perhaps after) a join one of the more nested joins
> causeing that more nested join to cancel it self out.

It's easy to confirm that the nested loop is the culprit here: if you
SET enable_nestloop to off, the query returns different results (the
expected two tuples).

8.1 seems to work OK, but both 8.2 and HEAD don't.

alvherre=# set enable_nestloop to off;
SET
alvherre=# SELECT items.id
FROM items
LEFT JOIN (
-- Query i.
SELECT items.id
FROM items
LEFT JOIN (
-- Query ii.
SELECT id FROM items WHERE number = 1
) AS moded_items USING (id)
WHERE moded_items.id IS NULL
) AS sub_items USING (id)
WHERE sub_items.id IS NULL;
id
-----
500
600
(2 rows)

alvherre=# set enable_nestloop to on;
SET
alvherre=# SELECT items.id
FROM items
LEFT JOIN (
-- Query i.
SELECT items.id
FROM items
LEFT JOIN (
-- Query ii.
SELECT id FROM items WHERE number = 1
) AS moded_items USING (id)
WHERE moded_items.id IS NULL
) AS sub_items USING (id)
WHERE sub_items.id IS NULL;
id
----
(0 rows)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Inconsistant SQL results - Suspected error with query planing or query optimisation.
Date: 2007-05-22 15:53:06
Message-ID: 13885.1179849186@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

adam terrey <a(dot)terrey(at)mackillop(dot)acu(dot)edu(dot)au> writes at
http://archives.postgresql.org/pgsql-bugs/2007-05/msg00187.php

> Anyway, mybug: I have a test SELECT statement (Listing A - see "sql
> listings.txt") wich produces different results under two simular setups
> (Listing B and Listing C). Each setup should product the same result for
> the given SELECT statement.

The problem here is that 8.2 is incorrectly concluding that it can
rearrange the order of the two LEFT JOIN steps in the query:

SELECT a.id
FROM items a
LEFT JOIN (
SELECT b.id
FROM items b
LEFT JOIN (
SELECT c.id FROM items c WHERE number = 1
) AS moded_items USING (id)
WHERE moded_items.id IS NULL
) AS sub_items USING (id)
WHERE sub_items.id IS NULL;

The plan it comes up with is:

Nested Loop Left Join (cost=469.00..1063.39 rows=1 width=4) (actual time=288.962..288.962 rows=0 loops=1)
Filter: (c.id IS NULL)
-> Hash Left Join (cost=469.00..1063.00 rows=1 width=8) (actual time=288.946..288.946 rows=0 loops=1)
Hash Cond: (a.id = b.id)
Filter: (b.id IS NULL)
-> Seq Scan on items a (cost=0.00..344.00 rows=10000 width=4) (actual time=0.080..50.973 rows=10000 loops=1)
-> Hash (cost=344.00..344.00 rows=10000 width=4) (actual time=140.880..140.880 rows=10000 loops=1)
-> Seq Scan on items b (cost=0.00..344.00 rows=10000 width=4) (actual time=0.046..69.395 rows=10000 loops=1)
-> Index Scan using items_pkey on items c (cost=0.00..0.38 rows=1 width=4) (never executed)
Index Cond: (b.id = c.id)
Filter: (c.number = 1)

After reducing join_collapse_limit to 1, we get the right join order and
the right answers:

Hash Left Join (cost=750.54..1132.05 rows=1 width=4) (actual time=409.712..409.740 rows=2 loops=1)
Hash Cond: (a.id = b.id)
Filter: (b.id IS NULL)
-> Seq Scan on items a (cost=0.00..344.00 rows=10000 width=4) (actual time=0.100..51.052 rows=10000 loops=1)
-> Hash (cost=750.52..750.52 rows=1 width=4) (actual time=264.978..264.978 rows=9998 loops=1)
-> Hash Left Join (cost=369.01..750.52 rows=1 width=4) (actual time=30.074..192.023 rows=9998 loops=1)
Hash Cond: (b.id = c.id)
Filter: (c.id IS NULL)
-> Seq Scan on items b (cost=0.00..344.00 rows=10000 width=4) (actual time=0.030..50.913 rows=10000 loops=1)
-> Hash (cost=369.00..369.00 rows=1 width=4) (actual time=29.976..29.976 rows=2 loops=1)
-> Seq Scan on items c (cost=0.00..369.00 rows=1 width=4) (actual time=29.896..29.916 rows=2 loops=1)
Filter: (number = 1)

So there is something wrong with the rule used for deciding whether two
LEFT JOINs can commute. Per the planner README:

: The planner's treatment of outer join reordering is based on the following
: identities:
:
: 1. (A leftjoin B on (Pab)) innerjoin C on (Pac)
: = (A innerjoin C on (Pac)) leftjoin B on (Pab)
:
: where Pac is a predicate referencing A and C, etc (in this case, clearly
: Pac cannot reference B, or the transformation is nonsensical).
:
: 2. (A leftjoin B on (Pab)) leftjoin C on (Pac)
: = (A leftjoin C on (Pac)) leftjoin B on (Pab)
:
: 3. (A leftjoin B on (Pab)) leftjoin C on (Pbc)
: = A leftjoin (B leftjoin C on (Pbc)) on (Pab)
:
: Identity 3 only holds if predicate Pbc must fail for all-null B rows
: (that is, Pbc is strict for at least one column of B). If Pbc is not
: strict, the first form might produce some rows with nonnull C columns
: where the second form would make those entries null.

What we have here is an invocation of rule 3 in a situation where it's
not appropriate. The difficulty is that the code is only paying
attention to the syntactical JOIN/ON clauses and has neglected the
intermediate-level WHERE clause.

After a bit of reflection it seems that a WHERE that is semantically
just below a left-join's right side can be treated as if it were part of
that left-join's ON clause. It will have the same effect as if it had
been written there: any rows rejected by the WHERE will fail to be
joined to the left side and will contribute nothing to the result.
Had we been following this rule, we'd have concluded that c.id IS NULL
is part of the upper join qual, and therefore that it has a predicate
Pabc not just Pab and cannot be commuted with the lower join.

Teaching initsplan.c to do things this way seems possible but less than
trivial. Before I start worrying about that, does anyone see any flaws
in the reasoning at this level of detail?

regards, tom lane