Re: ask for review of MERGE

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-18 00:20:14
Message-ID: 4CBB92BE.2030703@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Boxuan Zhai wrote:
> Yes, it should be NULL instead of NIL.

OK, I applied that patch to my local copy and pushed it out to github:

http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

That represents what I tested against. However, when I tried to merge
against HEAD once I was finished, I discovered this patch has bit-rotted
significantly. If you have a local copy that works for you, I would not
recommend pulling in the PostgreSQL repo updates done in the last couple
of weeks yet. Friday's "Allow WITH clauses to be attached to INSERT,
UPDATE, DELETE statements" commit in particular conflicts quite a bit
with your changes. Attached is a rebased patch that applies to HEAD now
after a round of fixes to resolve those. But it doesn't work yet,
because of recent change to the ExecUpdate and ExecDelete functions
you're calling from src/backend/executor/nodeModifyTable.c inside
ExecMerge. If you can continue working on the patch without merging
recent repo work, I can hack away at fixing that once I figure out what
got changed there recently. It's taken some painful git work to sort
out what I've done so far, there's more left to do, and I know that's
not an area you specialize in.

Back to the feature review...I dove into how I expected this to work,
relative to what it actually does at the moment. That didn't really go
too well so far, but I don't know that this represents any fundamental
issue with the patch. Just situations the existing code didn't really
anticipate we have to flush out. As a general community FYI here, while
it's taken me a while to get up to speed on this whole feature, I expect
to keep chugging away on this regardless of the CommitFest boundaries.
This feature is both too big and too important to just stop working on
it because a date has passed.

Onto the test cases. The examples that Boxuan has been working with,
and that the regression tests included with the patch exercise, all
involve two tables being joined together using MERGE. The use case I
decided to test instead was when you're trying to simulate an UPSERT
where only a single table is involved. I couldn't get to this to work
correctly. Maybe I'm just using MERGE wrong here, but I tried so many
different variations without success (including one that's basically
copied from Simon's original regression test set suggestions) that I
suspect there may be a subtle problem with the implementation instead.

To replicate the most straightforward variations of what I ran into, you
can start with the same data that's populated by the regression test set:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock;

item_id | balance
---------+---------
10 | 2200
20 | 1900

If you now execute the following:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected
it to. It just gives back "MERGE 0".

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more
like the working regression test examples. I expected this to do the
same thing as the first example:

MERGE INTO Stock t
USING Stock s
ON s.item_id=10 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

But it gives back this:

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(10) already exists.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED
side of things to generate that. But it doesn't work any better if you
give it an example that doesn't exist:

MERGE INTO Stock t
USING Stock s
ON s.item_id=30 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(30) already exists.

Now that one is really weird, because that key value certainly doesn't
exist yet in the table. There seem to be a couple of issues in the area
of joining a table with itself here. Feel free to tell me I just don't
know how to use MERGE if that's the case in any of these.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function. I was hoping I could use this to update the pgbench tables:

CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC)
RETURNS VOID AS
$$
BEGIN
MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts
WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET
abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES
(key,1+(key / 100000)::integer,delta,'');
END;
$$
LANGUAGE plpgsql;

But I just get this instead:

ERROR: "pgbench_accounts" is not a known variable
LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p...

The other way I wrote the MERGE statement above (not using a subquery)
does the same thing. I know that error messages is coming from the
changes introduced in
http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but
I'm not familiar enough with the whole grammar implementation to know
what that means yet.

That's what I found so far in my second pass over this. Once the first
problem here is sorted out, I've already worked out how to test the
performance of the code using pgbench. Have all the scripts ready to go
once the correct MERGE statement is plugged into them, just ran into
this same class of problems when I tried them. So far I was only able
to see how fast the UPDATE path worked though, which isn't very helpful
yet. My hope here is to test the MERGE implementation vs. the classic
pl/pgsql implementation of UPSERT, calling both within a function so
it's a fair comparison, and see how that goes. This may flush out
concurrency bugs that are in the MERGE code as well.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachment Content-Type Size
merge-v203-rebase.patch text/x-patch 108.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-18 00:56:56 Re: gincostestimate
Previous Message Joshua D. Drake 2010-10-17 22:58:21 Re: Floating-point timestamps versus Range Types