Re: WIP patch for multiple column assignment in UPDATE

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP patch for multiple column assignment in UPDATE
Date: 2014-06-19 07:02:34
Message-ID: CAFj8pRBJko7YNNLNzk2MCQ50XJJzNf+1ODcgno4FuRRts9KOyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

I did some tests and It looks so it allows only some form of nested loop.

postgres=# explain (analyze, timing off, buffers) update a1 set b = (select
b from a2 where a1.a = a2.a);
QUERY
PLAN
---------------------------------------------------------------------------------------------------------------
Update on a1 (cost=0.00..8456925.00 rows=1000000 width=10) (actual rows=0
loops=1)
Buffers: shared hit=9017134 read=14376 dirtied=58170 written=1014
-> Seq Scan on a1 (cost=0.00..8456925.00 rows=1000000 width=10)
(actual rows=1000000 loops=1)
Buffers: shared hit=4005465 read=4424 written=971
SubPlan 1
-> Index Scan using a2_a_idx on a2 (cost=0.42..8.44 rows=1
width=4) (actual rows=1 loops=1000000)
Index Cond: (a1.a = a)
Buffers: shared hit=4005464
Planning time: 0.212 ms
Execution time: 30114.101 ms
(10 rows)

do you plan some sophisticated mechanism - like MERGE or some similar?

Regards

Pavel

2014-06-16 17:17 GMT+02:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Attached is a very-much-WIP patch for supporting
> UPDATE foo SET ..., (a,b,c) = (select x,y,z from ...), ...
>
> It lacks documentation, ruleutils.c support, or any solution for the
> rule NEW.* expansion problem I mentioned Saturday. The reason I'm
> posting it now is to get feedback about an implementation choice that
> feels a bit klugy to me; but I don't see any clearly better idea.
>
> The post-parse-analysis representation I've chosen here is that the
> output columns of the sub-select are represented by PARAM_MULTIEXEC
> Params, and the sub-select itself appears in a resjunk entry at the end
> of the targetlist; that is, the UPDATE tlist is more or less like
>
> $1, -- to be assigned to a
> $2, -- to be assigned to b
> $3, -- to be assigned to c
> (select x,y,z from ...), -- resjunk entry, value will be discarded
>
> If the sub-select is uncorrelated with the outer query, the planner
> turns it into an InitPlan, replacing the resjunk tlist item with a
> NULL constant, and then everything happens normally at execution.
>
> But more usually, the sub-select will be correlated with the outer
> query. In this case, the subquery turns into a SubPlan tree that
> stays in the resjunk item. At the start of execution, the ParamExecData
> structs for each of its output Params are marked with execPlan pointers
> to the subplan, just as would happen for an InitPlan. This causes the
> subplan to get executed when the first of the output Params is evaluated;
> it loads the ParamExecData structs for all its output Params, and then
> the later Params just take data from the structs. When execution reaches
> the MULTIEXEC SubPlan in the resjunk tlist item, no evaluation of the
> subplan is needed; but instead we re-mark all the output ParamExecData
> structs with non-null execPlan pointers, so that a fresh execution of
> the subplan will happen in the next evaluation of the targetlist.
>
> The klugy aspect of this is that it assumes that the SubPlan item will
> appear later in the tlist than all the Params referencing it. This is
> true at the moment because resjunk tlist items always appear after
> non-resjunk ones. There are a few other places that already depend on
> this ordering, but we've mostly tried to avoid introducing new
> dependencies on it.
>
> The alternative that I'd originally had in mind, before put-it-in-a-
> resjunk-item occurred to me, was to invent a new "secondary tlist"
> field of Query and of ModifyTable plan nodes, as I sketched back in
> http://www.postgresql.org/message-id/1783.1399054541@sss.pgh.pa.us
> We'd put the MULTIEXEC SubPlans in this secondary tlist and expect
> the executor to evaluate it just before evaluating the main tlist.
> However, that way isn't terribly pretty either, because it extends
> knowledge of this feature to a *whole lot* of places that don't have
> to know about it in the attached approach; in particular, just about
> every place that manipulates targetlist contents would have to also
> manipulate the secondary tlist.
>
> Another approach is to explicitly identify which of the Params will
> be evaluated first and replace it with a node tree that evaluates
> the subplan (and sets output Params for the remaining columns).
> This is a bit messy because the first-to-be-evaluated is dependent
> on the targetlist reordering that the planner does; so we don't want
> parse analysis to try to do it. (If we allowed parse analysis to
> know that the planner will sort the tlist items into physical column
> order, we could do it like that; but then it would break if we ever
> get around to allowing ALTER TABLE to change the physical order.)
> We could safely have setrefs.c determine the first-to-be-evaluated
> Param, though, since it will traverse the tlist in final order.
> So if we went with this approach I'd have setrefs.c replace the first
> Param with a SubPlan node. That seems a bit of a kluge too though.
>
> Preferences, comments, better ideas?
>
> regards, tom lane
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2014-06-19 07:29:35 Re: Removing dependency to wsock32.lib when compiling code on WIndows
Previous Message Amit Langote 2014-06-19 05:09:00 Re: GIN pending list pages not recycled promptly (was Re: GIN improvements part 1: additional information)