From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | andrew(at)tao11(dot)riddles(dot)org(dot)uk, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: WITH RECUSIVE patches 0723 |
Date: | 2008-07-28 14:56:31 |
Message-ID: | 19882.1217256991@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
I spent some time reading the SQL spec over the weekend, and I believe
I've identified a fairly serious problem in the WITH patch. SQL99
7.12 <query expression> General Rule 1 is
1) If a non-recursive <with clause> is specified, then:
a) For every <with list element> WLE, let WQN be the <query
name> immediately contained in WLE. Let WQE be the <query
expression> immediately contained in WLE. Let WLT be the
table resulting from evaluation of WQE, with each column name
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
replaced by the corresponding element of the <with column
list>, if any, immediately contained in WLE.
b) Every <table reference> contained in <query expression> that
specifies WQN identifies WLT.
I think what this is saying is that the subquery defined by a WITH
clause is to be evaluated only once, even if it is referenced in
multiple places in the upper query. This is sensible because if there
is no such rule, then there really is no semantic difference between
non-recursive WITH and ordinary subqueries; and the SQL committee is not
known for inventing duplicate syntax. It is a useful property for users
because (1) it lets them prevent duplicate evaluations of an expensive
subquery, and (2) it lets them prevent multiple evaluations of volatile
functions in a subquery. (Right now we tell people to use OFFSET 0 as
an optimization fence, but that's an unportable hack, and it doesn't
cover all cases anyway.) Another thing in the back of my head is that
having these semantics could enable using INSERT ... RETURNING etc
as WITH subexpressions, whereas we can't really allow them as arbitrary
subqueries because of the lack of guarantees about one-time execution.
That's something for later, though.
I think this is a "must fix" because of the point about volatile
functions --- changing it later will result in user-visible semantics
changes, so we have to get it right the first time.
This isn't going to be a particularly simple fix :-(. The basic
implementation clearly ought to be to dump the result of the subquery
into a tuplestore and then have the upper level read out from that.
However, we don't have any infrastructure for having multiple
upper-level RTEs reference the same tuplestore. (Perhaps the InitPlan
infrastructure could be enhanced in that direction, but it's not ready
for non-scalar outputs today.) Also, I think we'd have to teach
tuplestore how to support multiple readout cursors. For example,
consider
WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
If the planner chooses to do the join as a nested loop then each
Scan node needs to keep track of its own place in the tuplestore,
concurrently with the other node having a different place.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Asko Oja | 2008-07-28 15:11:38 | Re: Do we really want to migrate plproxy and citext into PG core distribution? |
Previous Message | Tom Lane | 2008-07-28 14:16:21 | Re: Review: DTrace probes (merged version) ver_03 |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Gierth | 2008-07-28 15:20:13 | Re: WITH RECUSIVE patches 0723 |
Previous Message | Andrew Gierth | 2008-07-28 14:06:31 | Re: WITH RECUSIVE patches 0723 |