Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-09-24 02:15:57
Message-ID: CAM3SWZRvkCKc=1Y6_Wn8mk97_Vi8+j-aX-RY-=msrJVU-Ec-qw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 27, 2014 at 7:43 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> Omission
> =======
>
> The patch currently lacks a way of referencing datums rejected for
> insertion when updating.

Attached revision of the patch set (which I'll call v1.2) adds this
capability in a separate commit. It now becomes possible to add a
CONFLICTING expression within the ON CONFLICT UPDATE targetlist or
predicate. Example use:

"""
postgres=# CREATE TABLE upsert(key int4 PRIMARY KEY, val text);
CREATE TABLE
postgres=# INSERT INTO upsert VALUES(1, 'Giraffe');
INSERT 0 1
postgres=# SELECT * FROM upsert;
key | val
-----+---------
1 | Giraffe
(1 row)

postgres=# INSERT INTO upsert VALUES(1, 'Bear'), (2, 'Lion') ON
CONFLICT UPDATE SET val = CONFLICTING(val);
INSERT 0 1
postgres=# SELECT * FROM upsert;
key | val
-----+------
1 | Bear
2 | Lion
(2 rows)

"""

Note that the effects of BEFORE INSERT triggers are carried here,
which I slightly favor over the alternative of not having it work that
way.

I've also expanded upon my explanation for the structure of the query
tree and plan within (revised/rebased versions of) earlier commits. I
am clearer on why there is a special subquery planning step for the
auxiliary UPDATE, rather than making the UPDATE directly accessible as
a subquery within the post-parse-analysis query tree. Basically, the
optimizer has no basis for understanding that a DML sublink isn't
optimizable. It'll try to pull-up the subquery and so on, which of
course does not and cannot work. Whereas treating it as an
independently planned subquery of the top-level query, kind of like a
data-modifying CTE makes sense (with such CTEs, the executor is
prepared for the possibility that not all rows will be pulled up - so
there too, the executor drives execution more directly than makes
sense when not dealing with DML: it finishes off the data-modifying
CTE's DML for any still-unconsumed tuples, within
ExecPostprocessPlan()).

It's certainly possible that a more unified representation makes sense
(i.e. one ModifyTable plan, likely still having seperate INSERT/UPDATE
representations at earlier stages of query processing), but that would
require serious refactoring of the representation of ModifyTable
operations -- just for example, consider the need for a
unified-though-separate targetlist, one for the INSERT part, the other
for the UPDATE part. For now, I continue to find it very convenient to
represent the UPDATE as a selectively executed, auxiliary, distinct
ModifyTable plan, rather than adding a subquery rangetable directly
during parse analysis.

There is another significant change. In this revision, I am at least
"honest" about the plan representation within EXPLAIN:

"""
postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual
time=0.115..0.115 rows=0 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36)
(actual time=0.003..0.005 rows=2 loops=1)
-> Conflict Update on upsert (cost=0.00..22.30 rows=1230
width=36) (actual time=0.042..0.051 rows=0 loops=1)
-> Seq Scan on upsert (cost=0.00..22.30 rows=1230 width=36)
(never executed)
Planning time: 0.065 ms
Execution time: 0.158 ms
(6 rows)

postgres=# EXPLAIN ANALYZE INSERT INTO upsert VALUES(1, 'Bear'), (2,
'Lion') ON CONFLICT UPDATE SET val = CONFLICTING(val) where key = 2;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Insert on upsert (cost=0.00..0.03 rows=2 width=36) (actual
time=0.075..0.075 rows=0 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=36)
(actual time=0.001..0.002 rows=2 loops=1)
-> Conflict Update on upsert (cost=4.16..8.17 rows=1 width=36)
(actual time=0.012..0.026 rows=0 loops=1)
-> Bitmap Heap Scan on upsert (cost=4.16..8.17 rows=1
width=36) (never executed)
Recheck Cond: (key = 2)
-> Bitmap Index Scan on upsert_pkey (cost=0.00..4.16
rows=1 width=0) (never executed)
Index Cond: (key = 2)
Planning time: 0.090 ms
Execution time: 0.125 ms
(9 rows)

"""

The second query gets a bitmap scan because plain index scans have
been disabled for the UPDATE (a temporary kludge), since index-only
scans can break things - IndexOnlyRecheck() throws an error. Not quite
sure why the optimizer doesn't care about resjunk for the UPDATE,
which is presumably why in general regular updates never use
index-only scans. Since I think the actual auxiliary plan generation
needs work, so as to not have uselessly complicated plans, I didn't
try too hard to figure that out. This plan structure is not
acceptable, of course, but maybe almost the same thing would be
acceptable if the auxiliary plan shown here wasn't unnecessarily
complex - if we forced a simple pseudo-scan placeholder, without
wasting optimizer cycles, somewhat in the style of WHERE CURRENT OF.
This is something discussed in newly expanded comments within
planner.c. I would have made the optimizer produce a suitably simple
plan myself, but I don't have a good enough understanding of it to
figure out how (at least in a reasonable amount of time). Pointers on
how this might be accomplished are very welcome.

With this addition, the feature is functionally complete. That just
leaves the small matter of how it has been implemented. :-)

This is still clearly a work in progress implementation, with design
trade-offs that are very much in need of fairly high level discussion.
--
Peter Geoghegan

Attachment Content-Type Size
0001-Make-UPDATE-privileges-distinct-from-INSERT-privileg.patch.gz application/x-gzip 4.8 KB
0005-Internal-documentation-for-INSERT-.-ON-CONFLICT-UPDA.patch.gz application/x-gzip 6.9 KB
0004-Tests-for-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz application/x-gzip 7.0 KB
0003-CONFLICTING-expressions-within-ON-CONFLICT-UPDATE.patch.gz application/x-gzip 8.2 KB
0002-Support-INSERT-.-ON-CONFLICT-UPDATE-IGNORE.patch.gz application/x-gzip 41.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-09-24 02:22:15 Re: LIMIT for UPDATE and DELETE
Previous Message Gregory Smith 2014-09-24 02:11:58 Re: proposal: rounding up time value less than its unit.