Re: ask for review of MERGE

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-26 12:10:38
Message-ID: 1288095038.1587.230.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote:
> On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> Now, as Greg says, that might be what some people want, but it's
> >> certainly monumentally unserializable.
> >
> > To be clear when I said it's what people want what I meant was that in
> > the common cases it's doing exactly what people want. As opposed to
> > getting closer to what people want in general but not quite hitting
> > the mark in the common cases.
> >
> > Just as an example I think it's important that in the simplest case,
> > upsert of a single record, it be 100% guaranteed to do the naive
> > upsert. If two users are doing the merge of a single key at the same
> > time one of them had better insert and one of them had better update
> > or else users are going to be monumentally surprised.
>
> Hmm, so let's think about that case.
>
> The first merge comes along and finds no match so it fires the NOT
> MATCHED rule, which inserts a tuple. The second merge comes along and
> finds no match, so it also fires the NOT MATCHED rule and tries to
> insert a tuple. But upon consulting the PRIMARY KEY index it finds
> that an in-doubt tuple exists so it goes to sleep waiting for the
> first transaction to commit or abort. If the first transaction
> commits it then decides that the jig is up and fails. We could
> (maybe) fix this by doing something similar to what EPQ does for
> updates: when the first transaction commits, instead of redoing the
> insert, we back up and recheck whether the new tuple would have
> matched the join clause and, if so, we instead fire the MATCHED action
> on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
> sure how hard that would be, or whether it would introduce any other
> nasty anomalies in more complex cases.
>
> Alternatively, we could introduce an UPSERT or REPLACE statement
> intended to handle exactly this case and leave MERGE for more complex
> situations. It's pretty easy to imagine what the coding of that
> should look like: if we encounter an in-doubt tuple in we wait on its
> xmin. If the transaction aborts, we insert. If it commits, and we're
> in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
> or SERIALIZABLE mode, we abort with a serialization error. That's a
> lot simpler to understand and reason about than MERGE in its full
> generality.
>
> I think it's pretty much hopeless to think that MERGE is going to work
> in complex concurrent scenarios without creating serialization
> anomalies, or at least rollbacks. I think that's baked into the
> nature of what the statement does. To simulate MERGE, you need to
> read from the target table and then do writes that depend on what you
> read. If you do that with the commands that are available today,
> you're going to get serialization anomalies and/or rollbacks under
> concurrency. The mere fact of that logic being inside the database
> rather than outside isn't going to make that go away. Now sometimes,
> as with exclusion constraints, you can play games with dirty snapshots
> to get the semantics you want, but whether that's possible in a
> particular case depends on the details of the operation being
> performed, and here I think it can't be done. Some operations are
> *fundamentally* unserializable.

I agree with your analysis. Let me review...

There is a case that locking alone won't resolve, however that locking
works. The transaction history for that is:

T1: takes snapshot
T2: INSERT row1
T2: COMMIT;
T1: attempts to determine if MATCHED or NOT MATCHED.

The answer is neither of those two answers. If we say it is NOT MATCHED
then we will just fail on any INSERT, or if we say it is MATCHED then
technically we can't see the row so the UPDATE should fail. The COMMIT
of T2 releases any locks that would have helped resolve this, and note
that even if T1 attempts to lock that row as early as possible, only a
table level lock would prevent T2 from doing that action.

Two options for resolution are

1) Throw SERIALIZABLE error

2) Implement something similar to EvalPlanQual
As you say, we already resolve this situation for concurrent updates by
following the update chain from a row that is visible to the latest row.
For MERGE the semantics need to be subtely different here: we need to
follow the update chain to the latest row, but from a row that we
*can't* see.

MERGE is still very useful without the need for 2), though fails in some
cases for concurrent use. The failure rate would increase as the number
of concurrent MERGErs and/or number of rows in source table. Those
errors are no more serious than are possible now.

So IMHO we should implement 1) now and come back later to implement 2).
Given right now there may be other issues with 2) it seems unsafe to
rely on the assumption that we'll fix them by end of release.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-26 12:23:26 Re: No hash join across partitioned tables?
Previous Message Divakar Singh 2010-10-26 11:44:28 Re: Postgres insert performance and storage requirement compared to Oracle