Re: INSERT...ON DUPLICATE KEY IGNORE

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT...ON DUPLICATE KEY IGNORE
Date: 2013-09-04 22:01:57
Message-ID: CAM3SWZRLu80UtKfdS4V0Sdt1Qv3-_7Ey2bhOCayjLYt7oTmkHQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 4, 2013 at 1:26 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Concurrent readers will block in a non-interruptible wait if they try
> to access a buffer, and that's a situation that will be intolerable
> if, for example, it can persist across a disk I/O. And I don't see
> any way to avoid that.

Then I have some bad news for you - that's already quite possible.
_bt_insertonpg() is called with the very same buffer exclusive locked,
and is where we do btree page splits. The first thing that _bt_split
does is this:

/* Acquire a new page to split into */
rbuf = _bt_getbuf(rel, P_NEW, BT_WRITE);

(Obviously this may ultimately result in the storage manager extending
the index relation).

Plus the split is WAL-logged immediately afterwards, which could
result in us blocking on someone else's I/O under contention (granted,
the XLogInsert scaling patch has now considerably ameliorated that
general problem). All the while holding an exclusive lock on the same
buffer. Note also that _bt_insertonpg() is called again recursively
after a page split. And of course we WAL-log btree index tuple
insertion proper all the time.

Let me be clear about something, though: I am not at all dismissive of
Andres' concerns. I was concerned about many of the same things before
I posted the patch.

I think that Andres and I ought to re-frame this discussion a little
bit. Right now, the presumption we seem to be making, perhaps without
even realizing it, is this is about providing functionality equivalent
to MySQL's INSERT IGNORE; insert tuples proposed for insertion where
possible, otherwise do not. However, Andres and I, not to mention
almost every Postgres user, are actually much more interested in
something like INSERT...ON DUPLICATE KEY LOCK FOR UPDATE.

That's what this mechanism has to support, even if it is *technically*
possible to commit just INSERT...ON DUPLICATE KEY IGNORE and punt on
these other questions. It was silly of me not to do that up-front. So
I'm going to try and produce a patch that does this as well for my
next revision.

Maybe this will enable Andres to refute my position that the buffer
locking approach to speculative insertion/value locking may actually
be acceptable. If that gets us closer to having the feature committed
in some form, then I welcome it. I fully expect to be held to this new
standard - it would be insane to do anything less. I don't want to
throw out an old IGNORE value locking mechanism and invent a whole new
one for upserting a little bit down the line.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message arthernan 2013-09-04 22:14:02 De-normalization optimizer research project
Previous Message Peter Eisentraut 2013-09-04 21:11:39 Re: Improving avg performance for numeric