Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Thom Brown <thom(at)linux(dot)com>
Subject: Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0
Date: 2015-02-16 02:29:36
Message-ID: CAM3SWZR=KDcwEOMMg6r8MCjNDw8ERmJXM8oWqAxRbs3YQ0w7oA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 13, 2015 at 7:22 PM, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com> wrote:
> In patch 1, "sepgsql is also affected by this commit. Note that this commit
> necessitates an initdb, since stored ACLs are broken."
>
> Doesn't that warrant bumping catversion?

Yes, but traditionally that is left until the last minute - when the
patch is committed. That's why it's called out in the commit message
(it isn't otherwise obvious - it's not a common catversion
necessitating change like an addition to a system catalog).

> Patch 2
> + * When killing a speculatively-inserted tuple, we set xmin to invalid
> and
> +if (!(xlrec->flags & XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))
>
> When doing this, should we also set the HEAP_XMIN_INVALID hint bit?
>
> <reads more of patch>
>
> Ok, I see we're not doing this because the check for a super deleted tuple
> is already cheap. Probably worth mentioning that in the comment...

See my remarks to Heikki on this. I don't think it adds much.

> ExecInsert():
> + * We don't suppress the effects (or, perhaps, side-effects) of
> + * BEFORE ROW INSERT triggers. This isn't ideal, but then we
> + * cannot proceed with even considering uniqueness violations until
> + * these triggers fire on the one hand, but on the other hand they
> + * have the ability to execute arbitrary user-defined code which
> + * may perform operations entirely outside the system's ability to
> + * nullify.
>
> I'm a bit confused as to why we're calling BEFORE triggers out here...
> hasn't this always been true for both BEFORE *and* AFTER triggers? The
> comment makes me wonder if there's some magic that happens with AFTER...

Yes, but the difference here is that the UPDATE path could be taken
(which is sort of like when a before row insert path returns NULL).
What I'm calling out is the dependency on firing before row insert
triggers to decide if the alternative path must be taken. Roughly
speaking, we must perform "part" of the INSERT (firing of before row
insert triggers) in order to decide to do or not do an INSERT. This
is, as I said, similar to when those triggers return NULL, and won't
matter with idiomatic patterns of before trigger usage. Still feels
worth calling out, because sometimes users do foolishly write before
triggers with many external side-effects.

> ExecLockUpdateTuple():
> + * Try to lock tuple for update as part of speculative insertion. If
> + * a qual originating from ON CONFLICT UPDATE is satisfied, update
> + * (but still lock row, even though it may not satisfy estate's
> + * snapshot).
>
> I find this confusing... which row is being locked? The previously inserted
> one? Perhaps a better wording would be "satisfied, update. Lock the original
> row even if it doesn't satisfy estate's snapshot."

Take a look at the executor README. We're locking the only row that
can be locked when an UPSERT non-conclusively thinks to take the
UPDATE path: the row that was found during our pre-check. We can only
UPDATE when we find such a tuple, and then lock it without finding a
row-level conflict.

> infer_unique_index():
> +/*
> + * We need not lock the relation since it was already locked, either by
> + * the rewriter or when expand_inherited_rtentry() added it to the query's
> + * rangetable.
> + */
> +relationObjectId = rt_fetch(parse->resultRelation, parse->rtable)->relid;
> +
> +relation = heap_open(relationObjectId, NoLock);
>
> Seems like there should be an Assert here. Also, the comment should probably
> go before the heap_open call.

An Assert() of what? Note that the similar function
get_relation_info() does about the same thing here.

> heapam_xlog.h:
> +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */
> I wish this would mention why it's safe to do this. Also, the comment
> mentions xl_heap_delete when the #define is for
> XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps:
> /*
> * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for
> * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do
> * multi-inserts for INSERT ON CONFLICT.
> */

It's safe, as the comment indicates, because the former is only used
for xl_heap_multi_insert records, while the latter is only used for
xl_heap_delete records. There is no ambiguity, because naturally we're
always able to establish what type of record is under consideration
before checking the bit is set.

The XLOG_HEAP_* format is used for other flags there, despite the fact
that other flags (like XLOG_HEAP_PREFIX_FROM_OLD) can only appear in
certain context-appropriate xl_heap_* records. AFAICT, all that I've
done that's new here is rely on that, since a bunch of those bits were
used up in the last year or two, and the need to even consider bit
reuse here is a new problem.

> I'll review the remaining patches later.

Thanks.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Gierth 2015-02-16 02:38:04 Re: Really bad blowups with hash outer join and nulls
Previous Message David Steele 2015-02-16 02:23:16 Re: Issue installing doc tools on OSX