Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-10-24 01:43:15
Message-ID: CAM3SWZRSH2DqsbPxW0ch7rQbaKQW=OS1z6i=gzWmBKZ=Lk-r+w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

V 1.3 is attached. Like Simon, I think that it's premature to commit
to one particular value locking implementation. Personally, I think
that approach #2 is what we'll end up using, but it makes no sense to
not maintain both at once, since it requires relatively little effort
to do so. At the very least it's a useful tool for reviewers, who
would otherwise be denied the opportunity to test whether any given
concurrency problem was attributable to the value locking
implementation, or something else. So there are two variants of V 1.3
attached - one uses an unchanged value locking implementation #1,
while the other an unchanged implementation #2.

Highlights
========

* No more "WITHIN unique_index_name". There is a new syntax that
supersedes it. Unique index inference based on columns (or
expressions) is *mandatory* for the ON CONFLICT UPDATE variant. It
remains optional for the IGNORE variant, because I think someone could
very reasonably not care which unique index was implicated. As
discussed, this implies that partial unique indexes are no longer
supported (but expression indexes work just fine). This may be
revisited, but I suggest doing so in a later release. Example of
merging on the unique index on column "name":

postgres=# explain insert into capitals(name, population, altitude)
values ('Riga', 1000, 5) on conflict (name) update set altitude =
excluded(altitude);
QUERY PLAN
----------------------------------------------------------------------
Insert on capitals (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Conflict Update on capitals (cost=0.00..1.01 rows=1 width=52)
(3 rows)

When we cannot infer a unique index, it looks like this:

postgres=# explain insert into capitals(name, population, altitude)
values ('Riga', 1000, 5) on conflict (population) update set altitude
= excluded(altitude);
ERROR: 42P10: could not infer which unique index to use from
expressions/columns provided for ON CONFLICT
LINE 1: ...e, population, altitude) values ('Riga', 1000, 5) on conflic...
^
HINT: Partial unique indexes are not supported.
LOCATION: transformConflictClause, parse_clause.c:2407

* No more planner kludges. Index paths are never created for the
auxiliary UPDATE plan to begin with. To be tidy, TID scan paths are
never added either. Unless the UPDATE will never proceed due to a
tautological predicate like "WHERE false", we're guaranteed to have a
"sequential scan" in a fairly principled way (as before, we just use
this with EvalPlanQual(); it's just an implementation detail). I am
not entirely qualified to say so, but I think that this makes my
modifications to the optimizer look quite reasonable. The kludge of
enforcing various restrictions (e.g. on subqueries appearing in the
UPDATE) in the optimizer is also removed. We prefer to enforce
everything during parse analysis. This also gets us better error
messages, which is nice.

* Sane (although limited) support for table inheritance and updatable
views. However, in general user-defined rules are unsupported - I
cannot see how that could be made sane. The IGNORE variant works for
updatable views, and for inheritance relations with children (provided
that there is no inference required, which effectively makes the
UPDATE variant unsupported). However, both IGNORE and UPDATE variants
work for relations that happen to be in an inheritance hierarchy,
provided they have no children. I think it's fine to only support the
IGNORE variant for relations with inheritance children, because even
when users have the "partitioning pattern" use-case, there is no
principled way of telling the difference between a vanilla INSERT, and
an INSERT with an ON CONFLICT UPDATE clause from within the custom
redirection trigger function. Also, unique indexes already only work
at the relation level with inheritance - that's a long-standing
limitation. And so, in general INSERTs better have the right
inheritance child as their target - this is no more and no less true
when there is an ON CONFLICT UPDATE clause (note that I'm talking
about the "object orientated" inheritance use-case here, and not the
"partitioning pattern" use-case - with the latter, it's all up to the
trigger function to do the right thing). There may currently be an
"UPDATE ONLY", but there is no "INSERT ONLY" - why should I add one?

* Both INSERT and UPDATE sets of statement level triggers fire for
INSERT with ON CONFLICT UPDATE. The number of rows affected doesn't
matter, nor does it matter how they were or were not affected. This
certainly seems like the correct behavior. Per-row triggers work the
same as before, since far more thought went into their behavior
earlier. This incorporates feedback from Kevin.

* CONFLICTING() is renamed to EXCLUDED(). I know that some people
wanted me to go a different way with this. I think that there are very
good practical reasons not to [1], as well as good reasons related to
design, but I still accept that CONFLICTING() isn't very descriptive.
This spelling seems a lot better.

Clean-up
=======

If you take a look at the EXPLAIN output above, you'll see that the
sequential scan node does not appear. Basically, I'm back to
suppressing the implementation detail of that never-executed
"sequential scan", but the new approach is far better than earlier
approaches. Certain things actually associated with the sequential
scan are now attributed to the parent (auxiliary) ModifyTable UPDATE
node. Example (incidentally, note that "key" is used to infer a unique
index to take as an arbiter index):

postgres=# explain insert into upsert values (1000, 'Plucky') on
conflict (key) update set val = excluded(val) where key = 5;
QUERY PLAN
---------------------------------------------------------------------
Insert on upsert (cost=0.00..0.01 rows=1 width=0)
-> Result (cost=0.00..0.01 rows=1 width=0)
-> Conflict Update on upsert (cost=0.00..25.38 rows=6 width=36)
Filter: (key = 5)
(4 rows)

This seems reasonable to me. Including the "(never executed)"
sequential scan would be very confusing to users.

Inference
=======

Unique index inference (i.e. the way we figure out *which* unique
index to use) occurs during parse analysis. I think it would be
inappropriate, and certainly inconvenient to do it during planning. I
maintain that ON CONFLICT DML statements have a legitimate semantic
dependence on particular unique indexes, which makes this appropriate.
I don't know about others, but my only problem with naming unique
indexes directly was that it is unmaintainable, and very ugly. In
principle, I think this semantic dependence is quite reasonable, and
reflects the reality of how we all expect this to work. There are
comments on the implications for plan caching and how that relates to
where and when we perform unique index inference.

Documentation
===========

The documentation has been updated, incorporating feedback. I also
made the cardinality violation error a lot clearer than before, since
Craig said that was unclear.

Tests
====

Many tests were added. I've added a couple of new isolation tests.
insert-conflict-update-3 should be of particular interest. That test
illustrates the visibility issues with the WHERE clause that I've
already highlighted as a possible concern [2]. Unique index inference
tests will also give you a fair idea of how flexible it is.

Remaining open items
=================

Apart from the obvious issue of value locking (i.e. verifying the
correctness of its implementation in general), the only open items
are:

* RLS needs to be considered. I have yet to give it any real thought.

* I could probably do better at postgres_fdw support. That seems like
something that could be followed up on later, because it's clearly
just about a Simple Matter of Programming.

In summary, I was able to remove a lot of TODO/FIXME items here -
almost all of them. I'm pretty happy about that. I'll have to edit the
UPSERT wiki page, to strike out many open items...

[1] http://www.postgresql.org/message-id/CAM3SWZQhiXQi1osT14V7spjQrUpmcnRtbXJe846-EB1bC+9i1g@mail.gmail.com
[2] https://wiki.postgresql.org/wiki/UPSERT#Visibility_issues_and_the_proposed_syntax_.28WHERE_clause.2Fpredicate_stuff.29
--
Peter Geoghegan

Attachment Content-Type Size
v1.3.vallock1.tar.gz application/x-gzip 83.5 KB
v1.3.vallock2.tar.gz application/x-gzip 70.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-10-24 02:04:08 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Alvaro Herrera 2014-10-23 23:55:46 Re: [PATCH] add ssl_protocols configuration option