Re: Support UPDATE table SET(*)=...

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Marko Tiikkaja <marko(at)joh(dot)to>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support UPDATE table SET(*)=...
Date: 2015-04-08 15:44:17
Message-ID: 87vbh6oe1h.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> One example that comes up occasionally (and that I've had to do
>> myself more than once) is this: given a table "foo" and another with
>> identical schema "reference_foo", apply appropriate inserts, updates
>> and deletes to table "foo" to make the content of the two tables
>> identical. This can be done these days with wCTEs:

>> with
>> t_diff as (select o.id as o_id, n.id as n_id, o, n
>> from foo o full outer join reference_foo n on (o.id=n.id)
>> where (o.*) is distinct from (n.*)),
>> ins as (insert into foo select (n).* from t_diff where o_id is null),
>> del as (delete from foo
>> where id in (select o_id from t_diff where n_id is null)),
>> upd as (update foo
>> set (col1,col2,...) = ((n).col1,(n).col2,...) -- XXX
>> from t_diff
>> where foo.id = n_id and o_id = n_id)
>> select count(*) filter (where o_id is null) as num_ins,
>> count(*) filter (where o_id = n_id) as num_upd,
>> count(*) filter (where n_id is null) as num_del
>> from t_diff;

Tom> While I agree that the UPDATE part of that desperately needs
Tom> improvement, I don't agree that the INSERT part is entirely fine.
Tom> You're still relying on a parse-time expansion of the (n).*
Tom> notation, which is inefficient

Not in my experience a huge deal given what else is going on...

Tom> and not at all robust against schema changes (the same problem as
Tom> with the patch's approach to UPDATE).

Now this I think is wrong; I think it's just as robust against schema
changes as using the composite value directly would be. Consider the
case where foo and reference_foo match with the exception of dropped
columns; the code as it is should just work, while a variant that used
the composite values would have to explicitly deal with that.

(When I've used this kind of operation in practice, reference_foo has
just been created using CREATE TEMP TABLE reference_foo (LIKE foo); and
then populated via COPY from an external data source. Even if
reference_foo were a non-temp table, the logic of the situation requires
it to have the same schema as foo as far as SQL statements are
concerned.)

Tom> So if we're taking this as a motivating example, I'd want to see a
Tom> fix that allows both INSERT and UPDATE directly from a composite
Tom> value of proper rowtype, without any expansion to individual
Tom> columns at all.

I would argue that this is a case of the perfect being the enemy of the
good.

Tom> Perhaps we could adopt some syntax like
Tom> INSERT INTO table (*) values-or-select
Tom> to represent the case that the values-or-select delivers a single
Tom> composite column of the appropriate type.

We could, but I think in all practical cases it'll be nothing more than
a small performance optimization rather than something that really
benefits people in terms of enhanced functionality.

>> Other examples arise from things one might want to do in plpgsql; for
>> example to update a record from an hstore or json value, one can use
>> [json_]populate_record to construct a record variable, but then it's
>> back to naming all the columns in order to actually perform the update
>> statement.

Tom> Sure, but the patch as given didn't work very well for that
Tom> either,

Partly that's a limitation resulting from how much can be done with the
existing SET (...) = syntax and implementation without ripping it all
out and starting over. An incremental improvement seemed to be a more
readily achievable goal.

In practice it would indeed probably look like:

declare
new_id integer;
new_values hstore;
begin
/* do stuff */
update foo
set (*) = (select * from populate_record(foo, new_values))
where foo.id = new_id;

A agree that it would be nicer to do

update foo
set (*) = populate_record(foo, new_values)
where foo.id = new_id;

but that would be a substantially larger project. The alternative of

set * = populate_record(foo, new_values)

is less satisfactory since it introduces inconsistencies with the case
where you _do_ want to specify explicit columns, unless you also allow

set (a,b) = row_value

which is required by the spec for T641 but which we don't currently
have.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Donald Stufft 2015-04-08 15:48:11 Gracefully Reload SSL Certificates
Previous Message Alvaro Herrera 2015-04-08 15:34:13 Re: rare avl shutdown slowness (related to signal handling)