Re: INSERT...ON DUPLICATE KEY IGNORE

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY IGNORE
Date: 2013-08-31 18:34:26
Message-ID: 20130831183426.GA6989@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2013-08-30 19:38:24 -0700, Peter Geoghegan wrote:
> On Fri, Aug 30, 2013 at 5:47 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> > While I ponder on it some more, could you explain whether I understood
> > something correctly? Consider the following scenario:
> >
> > CREATE TABLE foo(a int UNIQUE, b int UNIQUE);
> >
> > S1: INSERT INTO foo(0, 0);
> > S1: BEGIN;
> > S1: INSERT INTO foo(1, 1);
> > S1: SELECT pg_sleep(3600);
> > S2: BEGIN;
> > S2: INSERT INTO foo(2, 1);
> > S3: SELECT * FROM foo WHERE a = 0;
> >
> > Won't S2 in this case exclusively lock a page in foo_a_key (the one
> > where 2 will reside on) for 3600s, while it's waiting for S1 to finish
> > while getting the speculative insertion into foo_b_key?
>
> Yes. The way the patch currently handles that case is unacceptable. It
> needs to be more concerned about holding an exclusive lock on
> earlier-locked indexes when locking the second and subsequent indexes
> iff it blocks on another transaction in the course of locking the
> second and subsequent indexes.

After some thinking I don't think any solution primarily based on
holding page level locks across other index operations is going to scale
ok.

What I had in mind when playing around with this is the following trick:
Just as in your patch split index insertion into two phases and perform
one of them before inserting the heap tuple. In what you called
"speculative insertion" don't stop at locking the page, but actually
insert an index tuple in the index. As we haven't yet inserted the heap
tuple we obviously can't insert an actual tuple. Instead set a flag on
the item and reuse the the item pointer to store the xid of the
inserting process. I coined those items "insertion promises"...

Index searches will ignore promises they see, but concurrent index
insertions will notice it's a promise and wait for the xid (or not, if
it has aborted). That fits pretty well into the existing btree code. If
the insertion of an index promise worked for all unique indexes and the
heap tuple has been inserted, convert those promises into actual item
pointers pointing to the heap tuple. That probably requires a second
search and some cuteness to find the correct pointer because of
concurrent splits, but that's not too bad (possibly you can do away with
that if we continue to hold a pin).

The fact that now xids can be in the index creates some slight
complications because it now needs to be vacuumed - but that's
relatively easy to fit into the bulkdelete api and I don't think the
contortions to the vacuum code will be too bad.

Imo that solution is fairly elegant because it doesn't cause any heap
bloat, and it causes the same amount of index bloat
"insert-into-heap-first" type of solutions cause. It also has pretty
good concurrency behaviour because you never need to retain a page lock
for longer than a normal index insertion.
It's certainly a bit more complex than your solution tho...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2013-08-31 19:40:14 Re: [v9.4] row level security
Previous Message Robert Haas 2013-08-31 12:27:14 Re: dynamic shared memory