Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Date: 2014-01-08 04:46:26
Message-ID: CAM3SWZRi1bZVLfUk6CQKcJXXVd0c-F5rzmy-5uNeRBRFTN24UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've worked on a simple set of tests, written quickly in bash, that I
think exercise interesting cases:

https://github.com/petergeoghegan/upsert

Perhaps most notably, these tests make comparisons between the
performance of ordinary inserts with a serial primary key table, and
effectively equivalent upserts that always insert. Even though a
SERIAL primary key is used, which you might imagine to be a bad case
for the extended heavyweight leaf page locks, performance seems almost
as good as regular insertion (though I should add that there was still
only one unique index). Upserts that only update are significantly
slower, and end up at very approximately 2/3 the performance of
equivalent updates (a figure that I arrive at through running the
tests on my laptop, which is not terribly rigorous, so perhaps take
that with a grain of salt). That the update-only case is slower is
hardly surprising, since those require an "extra" index scan as we
re-find the conflict tuple. Using ctid projection to avoid the index
scan doesn't work, at least not without extra in-application handling,
because a tid scan is never used if you write a typical wCTE upsert
statement - the planner forces a seqscan. It would have been
interesting to see where a tid scan for the Update/ModifyTable node's
nestloop join left things, but I didn't get that far. I think that if
we had a really representative test (no "extra" index scan), the
performance of upserts that only update would similarly be almost as
good as regular updates for many representative cases.

The upsert tests also provide cursory smoke testing for cases of
interest. I suggest comparing the test cases, and their
performance/behavior between the exclusion* and btree* patches.

A new revision of my patch is attached. There have mostly just been
improvement made that are equally applicable to promise tuples, so
this should not be interpreted as a rejection of promise tuples, so
much as a way of using my time efficiently while I wait to hear back
about how others feel my approach compares. I'm still rather curious
about what people think in light of recent developments. :-)

Changes include:

* Another ecpg fix.

* Misc polishing and refactoring to btree code. Most notably I now
cache btree insertion scankeys across phases (as well as continuing to
cache each IndexTuple).

* contrib/pageinspect notes when btree leaf pages have the locked flag bit set.

* Better lock/unlock ordering. Commit
a1dfaef6c6e2da34dbc808f205665a6c96362149 added strict ordering of
indexes because of exclusive heavyweight locks held on indexes. Back
then (~2000), Heavyweight page locks were also acquired on btree pages
[1], and in general it was expected that heavyweight locks could be
held on indexes across complex operations (I don't think this is
relied upon anymore, but the assumption that it could happen remains).
I've extended this so that relcache ensures that we always insert into
primary key indexes first (and in the case of my patch, lock values in
PKs last, to minimize the locking window). It seems sensible as a
general bloat avoidance measure to get their insertion out of the way,
so that we're guaranteed that the same slot's index tuples bloat only
unique indexes when the slot is responsible for a unique violation,
rather than just ordering by oid where you can get dead index tuples
in previously inserted non-unique indexes.

* I believe I've fixed the bug in the modifications made to
HeapTupleSatisfiesMVCC(), though I'd like confirmation that I have the
details right. What do you think, Andres?

* I stop higher isolation levels from availing of the aforementioned
modifications to HeapTupleSatisfiesMVCC(), since that would certainly
be inconsistent with user expectations. I'm not sure what "read
phenomenon" described by the standard this violates, but it's
obviously inconsistent with the spirit of the higher isolation levels
to be able to see a row committed by a transaction conceptually
still-in-progress. It's bad enough to do it for READ COMMITTED, but I
think a consensus may be emerging that that's the least worst thing
(correct me if I'm mistaken).

* The "priorConfict" optimization, which was previously shown to
really help performance [2] has been slightly improved to remember row
locking conflicts too.

[1] https://github.com/postgres/postgres/blob/a1dfaef6c6e2da34dbc808f205665a6c96362149/src/backend/access/nbtree/nbtpage.c#L314

[2] http://www.postgresql.org/message-id/CAM3SWZQZTAN1fDiq4o2umGOaczbpemyQoM-6OxgUFBzi+dQzkg@mail.gmail.com

--
Peter Geoghegan

Attachment Content-Type Size
btreelock_insert_on_dup.v7.2014_01_07.patch.gz application/x-gzip 44.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-01-08 04:55:12 Re: Standalone synchronous master
Previous Message Kyotaro HORIGUCHI 2014-01-08 04:22:57 Re: Get more from indices.