Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-10 03:36:24
Message-ID: CAM3SWZQBhS0JriD6EfeW3MoTXy1eK-8Wdr6FvFFR0AyCDgCBvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 7, 2014 at 8:46 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> I've worked on a simple set of tests, written quickly in bash, that I
> think exercise interesting cases:
>
> https://github.com/petergeoghegan/upsert
>
> Perhaps most notably, these tests make comparisons between the
> performance of ordinary inserts with a serial primary key table, and
> effectively equivalent upserts that always insert.

While I realize that everyone is busy, I'm concerned about the lack of
discussing here. It's been 6 full days since I posted my benchmark,
which I expected to quickly clear some things up, or at least garner
interest, and yet no one has commented here since.

Here is a summary of the situation, at least as I understand it:

* My patch has been shown to perform much better than the alternative
"promise tuples" proposal. The benchmark previously published,
referred to above makes this evident for workloads with lots of
contention [1].

Now, to cover everything, I've gone on to benchmark inserts into a
table foo(serial, int4, text) that lock the row using the new
infrastructure. The SERIAL column is the primary key. I'm trying to
characterize the overhead of the extended value locking here, by
showing the same case (a worst case) with and without the overhead.
Here are the results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/vs-vanilla-insert/

(asynchronous commits, logged table)

With both extremes covered, the data suggests that my patch performs
very well by *any* standard. But if we consider how things compare to
the alternative proposal, all indications are that performance is far
superior (at least for representative cases without too many unique
indexes, not that I suspect things are much different with many).
Previous concerns about the cost of extended leaf page locking ought
to be almost put to rest by this benchmark, because inserting a
sequence of btree index tuple integers in succession is a particularly
bad case, and yet in practice the implementation does very well. (With
my patch, we're testing the same statement with an ON DUPLICATE KEY
LOCK FOR UPDATE part, but there are naturally no conflicts on the
SERIAL PK - on master we're testing the same INSERT statement without
that, inserting sequence values just as before, only without the
worst-case value locking overhead).

* The alternative exclusion* patch still deadlocks in an unprincipled
fashion, when simple, idiomatic usage encounters contention. Heikki
intends to produce a revision that fixes the problem, though having
considered it carefully myself, I don't know what mechanism he has in
mind, and frankly I'm skeptical. More importantly, I have to question
whether we should continue to pursue that alternative approach, giving
what we now know about its performance characteristics. It could be
improved, but not by terribly much, particularly for the case where
there is plenty of update contention, which was shown in [1] to be
approximately 2-3 times slower than extended page locking (*and* it's
already looking for would-be duplicates *first*). I'm trying to be as
fair as possible, and yet the difference is huge. It's going to be
really hard to beat something where the decision to try to see if we
should insert or update comes so late: the decision is made as late as
possible, is based on strong indicators of likely outcome, while the
cost of making the wrong choice is very low. With shared buffer locks
held calling _bt_check_unique(), we still lock out concurrent would-be
duplicate insertion, and so don't need to restart from scratch (to
insert instead) in the same way as with the alternative proposal's
largely AM-naive approach.

* I am not aware that anyone considers that there are any open items
yet. I've addressed all of those now. Progress here is entirely
blocked on waiting for review feedback.

With the new priorConflict lock strength optimization, my patch is in
some ways similar to what Heikki proposed (in the exclusion* patch).
It's as if the first phase, the locking operation is an index scan
with an identity crisis. It can decide to continue to be an "index
scan" (albeit an oddball one with an insertion scankey that using
shared buffer locks prevents concurrent duplicate insertion, for very
efficient uniqueness checks), or it can decide to actually insert, at
the last possible moment. The second phase is picked up with much of
the work already complete from the first, so the amount of wasted work
is very close to zero in all cases. How can anything beat that?

If the main argument for the exclusion approach is that it works with
exclusion constraints, then I can still go and make what I've done
work there too (for the IGNORE case, which I maintain is the only
exclusion constraint variant of this that is useful to users). In any
case I think making anything work for exclusion constraints should be
a relatively low priority.

I'd like to hear more opinions on what I've done here, if anyone has
bandwidth to spare. I doubt I need to remind anybody that this is a
feature of considerable strategic importance. We need this, and we've
been unnecessarily at a disadvantage to other systems by not having it
for all these years. Every application developer wants this feature -
it's a *very* frequent complaint.

[1] http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2014-01-10 03:57:02 Re: Show lossy heap block info in EXPLAIN ANALYZE for bitmap heap scan
Previous Message Mika Eloranta 2014-01-10 02:56:56 Re: [PATCH] pg_basebackup: progress report max once per second