Re: Making joins involving ctid work for the benefit of UPSERT

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making joins involving ctid work for the benefit of UPSERT
Date: 2014-07-24 01:44:10
Message-ID: CAM3SWZSa62EjAFwk0R7GLf1k=p2JSwkSatdazp5+Aqbdy1cJ1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 23, 2014 at 9:58 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I'd suggest something like:
>
> UPSERT table SET col = value [, ...] WHERE predicate;

I think I see another problem with this. The UPDATE must provide a
WHERE clause Var on a unique indexed column (let's say it's
constrained to provide a "(Var [unique-indexed opclass support
function 3 op] Const)" qual during parse analysis because you asked
for UPSERT. But it can also affect 0 rows for other reasons, since
this UPDATE qual can have arbitrary other expressions. So in general
you have any number of reasons for not updating, which implies that
you must insert, which might not be possible because there might even
still be duplicate key violation in a not-yet-visible-to-you row (even
just in the unique index you intended to merge on). Whereas, when
inserting, there is exactly one reason (per row) to go and update - a
conflict (i.e. a would-be duplicate violation). And this is a conflict
that is meaningful to every transaction, regardless of their snapshot,
since it represents an objective fact about the physical presence of
an index tuple.

So, do you make the UPDATE differentiate between different reasons for
the UPDATE failing, only inserting when an UPSERT would have happened
had you omitted the extra stuff in your UPSERT predicate? Can you
really differentiate anyway? And isn't the choice to do the insert
based on what your MVCC snapshot happens to see - rather than
something that there is necessarily objective truth on, the physical
presence of a duplicate tuple in an index - rather arbitrary? It's a
different situation to my implementation, where you do an insert-like
thing, and then find a conflict row to lock and then update, because
at that point the row is successfully locked, and the WHERE clause may
be evaluated against rejects and the *most recent version* of the
locked, conflict-causing row. The most recent, locked version is not
arbitrary at all. That's the version you ought to evaluate a predicate
against before finally deciding to UPDATE.

I assume you agree with my view that UPSERT should always insert or
update. This kind of business sounds closer to SQL MERGE, where an
insert may not drive things, and you accept these kinds of anomalies
in exchange for a lot more flexibility in not having inserts drive
things. Did you happen to read my post on that?
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Charlie Holleran 2014-07-24 01:49:00 date-time library
Previous Message Tom Lane 2014-07-24 00:45:44 Re: [COMMITTERS] pgsql: Diagnose incompatible OpenLDAP versions during build and test.