Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-03 22:55:24
Message-ID: CAM3SWZQZTAN1fDiq4o2umGOaczbpemyQoM-6OxgUFBzi+dQzkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 3, 2014 at 7:39 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> This patch doesn't apply anymore.

Yes, there was some bit-rot. I previous deferred dealing with a
shift/reduce conflict implied by commit
1b4f7f93b4693858cb983af3cd557f6097dab67b. I've fixed that problem now
using non operator precedence, and performed a clean rebase on master.
I've also fixed the basis of your much earlier complaint about
breakage of ecpg's regression tests (without adding support for the
feature to ecpg). All make check-world tests pass. Patch is attached.
I have yet to figure out how to make REJECTS a non-reserved keyword,
or even just a type_func_name_keyword, though intuitively I have a
sense that the latter ought to be possible.

This is the same basic patch as benchmarked above, with various tricks
to avoid stronger lock acquisition when that's likely profitable (we
can even do _bt_check_unique() with only a shared lock and no hwlock
much of the time, on the well-informed suspicion that it won't be
necessary to insert, but only to return a TID). There has also been
some clean-up to aspects of serializable behavior, but that needs
further attention and scrutiny from a subject matter expert, hopefully
Heikki. Though it's probably also true that I should find time to
think about transaction isolation some more.

I've since had another idea relating to performance optimization,
which was to hint that the last attempt to insert a key was
unsuccessful, so the next one (after the conflicting transaction's
commit/abort) of that same value will very likely conflict too, making
lock avoidance profitable on average. This appears to be much more
effective than the previous woolly heuristic (never published, just
benchmarked), which I've left in as an additional reason to avoid
heavyweight locking, if only for discussion. This benchmark now shows
my approach winning convincingly with this additional "priorConflict"
optimization:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/upsert-cmp-2/

If someone had time to independently recreate the benchmark I have
here, or perhaps to benchmark the patch in some other way, that would
be useful (for full details see my recent e-mail about the prior
benchmark, where the exact details are described - this is the same,
but with one more run for the priorConflict optimization).

Subtleties of visibility also obviously deserve closer inspection, but
perhaps I shouldn't be so hasty: No consensus on the way forward looks
even close to emerging. How do people feel about my approach now?

--
Peter Geoghegan

Attachment Content-Type Size
btreelock_insert_on_dup.v6.2014_01_03.patch.gz application/x-gzip 37.5 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2014-01-03 23:21:20 Re: [ANNOUNCE] IMCS: In Memory Columnar Store for PostgreSQL
Previous Message Tom Lane 2014-01-03 22:50:55 Re: costing of hash join