Re: Making joins involving ctid work for the benefit of UPSERT

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making joins involving ctid work for the benefit of UPSERT
Date: 2014-07-28 17:43:18
Message-ID: CAM3SWZR=24A3ARrGakTs5JZrwdyDLuysB5wfQ=aTJmo80We5pQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 28, 2014 at 8:37 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> AFAIUI, this is because your implementation uses lwlocks in a way that
> Andres and I both find unacceptable.

That's not the case. My implementation uses page-level heavyweight
locks. The nbtree AM used to use them for other stuff. Plenty of other
systems use index level locks managed by a heavyweight lock manager.

>>> Here you seem to be suggested that I intended to propose your existing
>>> design rather than something else, which I didn't. In this design,
>>> you find the conflict (at most one) but scanning for the tuple to be
>>> updated.
>>
>> Yes, but what if you don't see a conflict because it isn't visible to
>> your snapshot, and then you insert, and only then (step 5), presumably
>> with a dirty snapshot, you find a conflict? How does the loop
>> terminate if that brings you back to step 1 with the same MVCC
>> snapshot feeding the update?
>
> Good point. Maybe the syntax should be something like:
>
> UPSERT table (keycol [, keycol] ...) { VALUES (val [, val] ...) [,
> ...] | select_query }
>
> That would address both the concern about being able to pipe multiple
> tuples through it and the point you just raised. We look for a row
> that matches each given tuple on the key columns; if one is found, we
> update it; if none is found, we insert.

That basically is my design, except that (tangentially) yours risks
bloat in exchange for not having to use a real locking mechanism, and
has a different syntax. The parts of inserting into an index scan that
I stagger include an initial part that is more or less just an index
scan. With this design you'll have to set up things so that all
indexes can be directly accessed in the manner of ExecInsert() (get a
list of them from the relcache, open them in an order that avoids
possible deadlocks, etc). Why not just use ExecInsert()? I don't think
I'm alone in seeing things that way.

On a mostly unrelated note, I'll remind you of the reason that I felt
it was best to lock indexes. It wasn't so much about avoiding bloat as
it was about avoiding deadlocks. When I highlighted the issue,
Heikki's prototype, which did insert optimistically rather than
locking, was then made to go and physically "super delete" the
upsert-insert conflicting heap tuple (inserted optimistically before
its index tuples), before going to lock a row, in order to avoid
unprincipled deadlocking. In contrast, my design just used a callback
that released page level heavyweight locks before going to lock a row.
Heikki's prototype involved making it so that *even someone else's
dirty snapshot* didn't see our dead speculatively-inserted heap tuple.

Anyway, making all that happen is fairly invasive to a bunch of places
that are just as critical as the nbtree code. I'm not saying it can't
be done, or even that it definitely shouldn't be, but taking an
approach that produces bloat, rather than locking values the way other
systems do (and, to a limited extent Postgres already does) is at
least way more invasive than it first appears. Plus, I ask you to
consider that.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-07-28 17:46:58 Re: Making joins involving ctid work for the benefit of UPSERT
Previous Message Robert Haas 2014-07-28 17:43:12 Re: sendLong in custom communication doesn't work