Re: ask for review of MERGE

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-21 18:36:17
Message-ID: 4CC08821.3020102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> I think the right way to write UPSERT is something
> along the lines of:
>
> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
> s.item_id = t.item_id ...
>

That led in the right direction, after a bit more fiddling I was finally
able to get something that does what I wanted: a single table UPSERT
implemented with this MERGE implementation. Here's a log of a test
session, suitable for eventual inclusion in the regression tests:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;

item_id | balance
---------+---------
10 | 2200
20 | 1900

MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1

SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900

MERGE INTO Stock t
USING (VALUES(30,2000)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900
30 | 2000

I'm still a little uncertain as to whether any of my other examples
should have worked under the spec but just didn't work here, but I'll
worry about that later.

Here's what the query plan looks like on a MATCH:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.050..0.061 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.009..0.010 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.370 ms

And here's a miss:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.028..0.033 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.004..0.005 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.255 ms

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected
behavior edge cases and general bugs.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2010-10-21 18:37:50 Re: Serializable snapshot isolation patch
Previous Message Alvaro Herrera 2010-10-21 18:32:53 Re: Bug in plpython's Python Generators