Rethinking top plan representation (or, Query vs. UPDATE RETURNING)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Rethinking top plan representation (or, Query vs. UPDATE RETURNING)
Date: 2006-08-10 23:50:11
Message-ID: 2861.1155253811@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been reviewing Jonah's INSERT/UPDATE RETURNING patch, and I've run
into a nasty problem with UPDATEs across inheritance trees: we really
need a separate instance of the RETURNING targetlist for each child
table. Consider

CREATE TABLE p (f1 int);
CREATE TABLE c (fc int) INHERITS (p);
ALTER TABLE p ADD COLUMN f2 int;

UPDATE p SET f2 = f2 + 1 RETURNING f1, f2;

In p, the RETURNING list refers to Vars (physical columns) 1 and 2,
but for c it's got to refer to columns 1 and 3, because f2 is not at
the same physical column number in the child (fc got there first).

The planner already contains sufficient mechanism to deal with this
translation (that's why it works now without RETURNING) ... the problem
is just how are we going to pass the translated RETURNING targetlists
over to the executor? The patch-as-submitted assumes it can use the
original (parent-relative) list that's in the Query node, but that
obviously doesn't work for the children.

One place we could put the converted RETURNING lists is into the plan
nodes that scan the individual child tables, but it looks to me like
this would require adding support for a second targetlist to *every*
plan node type, which seems like a lot of usually-useless overhead.
I agree with Jonah's original design decision that we want to restrict
the executor's knowledge of RETURNING to the top level in execMain.
So that seems to mean we need to have a list of RETURNING lists, one
per result relation, somewhere outside the plan tree; and now we have to
figure out exactly how the planner is going to pass this list over to
the executor.

What I'm going to do as a stopgap solution to get the patch committed
is to add an additional field to Query, similar to the resultRelations
list, but this seems fairly bletcherous as a long-term representation.

I'm thinking it might be time to adopt the "TopPlan" concept that Vadim
foresaw years ago (there are still a few comments about it in the source
code). That is, invent a specialized struct type that carries all the
info the planner needs to pass to the executor, and stop passing struct
Query to the executor at all. A quick pass through the sources says
that the fields needed would be

Plan tree root
rtable list
into, intoOptions, intoOnCommit, intoTableSpaceName
(probably should combine these into a new sub-node type)
resultRelation, resultRelations (possibly fold together?)
rowMarks
new list of RETURNING targetlists

I'm not sure I like the name TopPlan, because that seems to imply that
the node type "IsA" Plan node, which it isn't. Maybe PlannerResult?
Alternatively we could make the existing QueryDesc struct serve this
purpose, but QueryDesc carries a number of fields that are determined
at executor start time not plan time, so that doesn't seem quite right
either.

Aside from fixing the immediate ugliness with RETURNING, this would be
one more small step towards restructuring the planner to not modify its
input data structures, which is something I've had a bee in my bonnet
about for a long time. There are way too many places that currently
have to work around that scribble-on-the-input habit by brute-force
means like copying entire query trees. That won't get fixed for 8.2,
for sure, but putting a TopPlan node in place could get done.

Thoughts, comments, objections, better ideas?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-08-11 00:01:55 Re: Coding style for emacs
Previous Message Alvaro Herrera 2006-08-10 23:20:33 Re: 8.2 features status