Bogus rescan logic in ExecReScanCteScan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Adam Mackler <AdamMackler(at)gmail(dot)com>
Subject: Bogus rescan logic in ExecReScanCteScan
Date: 2012-08-15 20:02:04
Message-ID: 13458.1345060924@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the misbehavior reported by Adam Mackler in
http://archives.postgresql.org/pgsql-bugs/2012-08/msg00073.php
What it seems to boil down to is a design error in ExecReScanCteScan.
Given the query

WITH RECURSIVE
tab(id_key,link) AS ( VALUES (1,17), (2,17), (3,17), (4,17), (6,17), (5,17) ),
iter (id_key, row_type, link) AS (
SELECT 0, 'base', 17
UNION(
WITH remaining(id_key, row_type, link, min) AS (
SELECT tab.id_key, 'true'::text, iter.link, MIN(tab.id_key) OVER ()
FROM tab INNER JOIN iter USING (link)
WHERE tab.id_key > iter.id_key
),
first_remaining AS (
SELECT id_key, row_type, link
FROM remaining
WHERE id_key=min
),
effect AS (
SELECT tab.id_key, 'new'::text, tab.link
FROM first_remaining e INNER JOIN tab ON e.id_key=tab.id_key
/* Try changing this WHERE clause to other false expressions */
WHERE e.row_type='false'
)
SELECT * FROM first_remaining
/* Try uncommenting the next line */
UNION SELECT * FROM effect
)
)
SELECT * FROM iter;

we get a plan like this:

CTE Scan on iter (cost=9.06..9.48 rows=21 width=40)
CTE tab
-> Values Scan on "*VALUES*" (cost=0.00..0.08 rows=6 width=8)
CTE iter
-> Recursive Union (cost=0.00..8.98 rows=21 width=40)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Unique (cost=0.84..0.86 rows=2 width=40)
CTE remaining
-> WindowAgg (cost=0.20..0.53 rows=2 width=8)
-> Hash Join (cost=0.20..0.51 rows=2 width=8)
Hash Cond: (iter.link = tab.link)
Join Filter: (tab.id_key > iter.id_key)
-> WorkTable Scan on iter (cost=0.00..0.20 rows=10 width=8)
-> Hash (cost=0.12..0.12 rows=6 width=8)
-> CTE Scan on tab (cost=0.00..0.12 rows=6 width=8)
CTE first_remaining
-> CTE Scan on remaining (cost=0.00..0.04 rows=1 width=44)
Filter: (id_key = min)
CTE effect
-> Hash Join (cost=0.04..0.19 rows=1 width=8)
Hash Cond: (tab.id_key = e.id_key)
-> CTE Scan on tab (cost=0.00..0.12 rows=6 width=8)
-> Hash (cost=0.02..0.02 rows=1 width=4)
-----> -> CTE Scan on first_remaining e (cost=0.00..0.02 rows=1 width=4)
Filter: (row_type = 'false'::text)
-> Sort (cost=0.07..0.08 rows=2 width=40)
Sort Key: first_remaining.id_key, first_remaining.row_type, first_remaining.link
-> Append (cost=0.00..0.06 rows=2 width=40)
-----> -> CTE Scan on first_remaining (cost=0.00..0.02 rows=1 width=40)
-> CTE Scan on effect (cost=0.00..0.02 rows=1 width=40)

where there are two CTEScan nodes (marked for effect) on the output of
the "first_remaining" CTE. Now the first of these (the one inside the
"effect" CTE) is initialized first and thus becomes the "leader" of the
group of CTEScans reading this CTE. However, because node ReScans
happen basically in execution order, the one under the Append is the one
that actually gets a rescan call first; in fact, that one will get run
to completion before "effect" ever gets rescanned. So what happens
while re-executing the right-hand side of the RecursiveUnion is that
the second scan of first_remaining just regurgitates what
first_remaining already output on the previous cycle, because its
rescan only rewound its own tuple pointer and didn't change the
tuplestore contents. That leads to totally bogus results of course.

Another way to say this is that ExecReScanCteScan only works right if
the CTE nodes of a group are ReScanned in the same order they were
initialized in, which might be true some of the time but is not to be
relied on. It's a bit surprising that we've not had any reports about
this in the nearly 4 years that code has been in there.

After reflecting on this for awhile I think that ExecReScanCteScan
is just overly complicated, and there's no reason not to let the first
arrival reset the tuplestore. The attached patch seems to fix Adam's
problem (at least, variants of his query that seem like they should give
the same answer do), and it passes our regression tests.

Comments?

regards, tom lane

Attachment Content-Type Size
cte-rescan.patch text/x-patch 1.7 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2012-08-15 20:10:43 Re: CREATE SCHEMA IF NOT EXISTS
Previous Message Fabrízio de Royes Mello 2012-08-15 18:31:27 Re: CREATE SCHEMA IF NOT EXISTS