Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Craig Ringer <craig(at)2ndquadrant(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Date: 2014-10-07 11:27:45
Message-ID: CABRT9RD6zriK+t6mnqQOqaozZ5z1bUaKh+kNY=O9ZqBZFoAuBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 4, 2014 at 12:13 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Wed, Sep 3, 2014 at 9:51 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> INSERT INTO upsert(key, val) VALUES(1, 'insert') ON CONFLICT WITHIN
>>> upsert_pkey UPDATE SET val = 'update';
>>
>> It seems to me that it would be better to specify a conflicting column
>> set rather than a conflicting index name.
>
> I'm open to pursuing that, provided there is a possible implementation
> that's robust against things like BEFORE triggers that modify
> constrained attributes. It must also work well with partial unique
> indexes. So I imagine we'd have to determine a way of looking up the
> unique index only after BEFORE triggers fire. Unless you're
> comfortable with punting on some of these cases by throwing an error,
> then all of this is actually surprisingly ticklish.

Speaking of this, I really don't like the proposed behavior of firing
BEFORE INSERT triggers even before we've decided whether to insert or
update. In the "classical" upsert pattern, changes by a BEFORE INSERT
trigger would get rolled back on conflict, but the new approach seems
surprising: changes from BEFORE INSERT get persisted in the database,
but AFTER INSERT is not fired.

I haven't found any discussion about alternative triggers semantics
for upsert. If there has been any, can you point me to it?

----
How about this: use the original VALUES results for acquiring a value
lock; if indeed the row didn't conflict, *then* fire BEFORE INSERT
triggers, and throw an error if the trigger changed any columns of the
(specified?) unique key.

Advantages of this approach:
1. Would solve the above conundrum about specifying a unique index via columns.
2. In the UPDATE case we can skip evaluating INSERT triggers and
DEFAULT expressions for columns
3. If I'm not missing anything, this approach may also let us get rid
of the CONFLICTING() construct
4. Possibly be closer to MySQL's syntax?

Point (2) is actually a major consideration IMO: if your query is
mostly performing UPDATEs, on a table with SERIAL keys, and you're
using a different key to perform the updates, then you waste sequence
values unnecessarily. I believe this is a very common pattern, for
example:

create table evt_type (id serial primary key, name text unique, evt_count int);
prepare upsert(text) as INSERT into evt_type (name, evt_count) values ($1, 1)
on conflict within evt_type_name_key UPDATE set evt_count=evt_count+1;

execute upsert('foo');
execute upsert('foo');
execute upsert('bar');

# table evt_type;
id | name | evt_count
----+------+-----------
1 | foo | 2
3 | bar | 1 <-- id could very well be "2"

Regards,
Marti

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-10-07 12:03:04 Lets delete src/test/performance
Previous Message Thom Brown 2014-10-07 10:44:21 RLS - permissive vs restrictive