Re: ask for review of MERGE

Lists: pgsql-hackers
From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: ask for review of MERGE
Date: 2010-09-23 10:31:22
Message-ID: AANLkTimO76gBRQ789Scs8GM9yMtDZS63QiyYrS4LsVdJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear All,

I have just generate a new patch of MERGE command.

One main change in this edition is the removal of RASIE ERROR action from
MEREG, because its semantics is not well defined yet.

I also rewrote the regress test file merge.sql, to make it consistent with
the examples I used in my wiki page.

Some little (error and warning) bugs are fixed.

In this patch, all the main features of MERGE (sublinks, explain, rule and
trigger, inheritance) are not changed. And so is the DO NOTHING action.

I do wish the MERGE command can be added into psql 9.1. And I wonder what
improvement should be made on current edition.

Could you please have a review on this patch, if you have time and interest?

Your feedback will be highly appreciated.

Thanks

Yours Boxuan

Attachment Content-Type Size
merge_v202.patch application/octet-stream 104.8 KB

From: Thom Brown <thom(at)linux(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-23 10:59:19
Message-ID: AANLkTikAD+B91bGP9KHjM6q-GeeuwGxo_hjSeVCfD0Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23 September 2010 11:31, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
> Dear All,
> I have just generate a new patch of MERGE command.
> One main change in this edition is the removal of RASIE ERROR action from
> MEREG, because its semantics is not well defined yet.
> I also rewrote the regress test file merge.sql, to make it consistent with
> the examples I used in my wiki page.
> Some little (error and warning) bugs are fixed.
> In this patch, all the main features of MERGE (sublinks, explain, rule and
> trigger, inheritance) are not changed. And so is the DO NOTHING action.
> I do wish the MERGE command can be added into psql 9.1. And I wonder what
> improvement should be made on current edition.
> Could you please have a review on this patch, if you have time and interest?
> Your feedback will be highly appreciated.
> Thanks
>
> Yours Boxuan

A few corrections:

in src/backend/executor/nodeModifyTable.c:
s/excute/execute/
s/orignial/original/

in src/backend/optimizer/plan/planner.c
s/expreesions/expressions/
s/So,we/So, we/
s/comand/command/
s/fileds/fields/

in src/backend/optimizer/prep/preptlist.c:
s/aggresive/aggressive/

in src/backend/optimizer/util/var.c
s/targe/target/ -- this appears twice
s/sourse/source/

in src/backend/parser/analyze.c
s/takend/taken/
s/relaion/relation/
s/targe/target/ -- this appears twice
s/consitency/consistency/
s/commond/command/
s/seperate/separate/

in src/backend/rewrite/rewriteHandler.c
s/acton/action/

in src/include/nodes/execnodes.h
s/meger/merge/

in src/include/nodes/parsenodes.h
s/proecess/process/
s/allwo/allow/
s/elments/elements/

in src/test/regress/expected/merge.out
s/qulifications/qualifications/ -- this appears twice
s/suceeds/succeeds/ -- this appears twice

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935


From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-23 11:55:29
Message-ID: 4C9B4031.4030400@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:
> I have just generate a new patch of MERGE command.

I haven't followed the discussion very closely, but this part in the
regression tests caught my attention:

+-- we now have a duplicate key in Buy, so when we join to
+-- Stock we will generate 2 matching rows, not one.
+-- According to standard this command should fail.
+-- But it suceeds in PostgreSQL implementation by simply ignoring the
second

It doesn't seem like a very good idea to go against the standard here.
The "second" row is not well defined in this case so the results are
unpredictable.

The patch is also missing a (trivial) change to explain.c.

Regards,
Marko Tiikkaja


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-23 13:48:57
Message-ID: AANLkTikgGZcntov1UuhD3MbHB3ZxvcvqCW6JORoeh4Pv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 23, 2010 at 7:55 PM, Marko Tiikkaja <
marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi> wrote:

> On 2010-09-23 1:31 PM +0300, Boxuan Zhai wrote:
>
>> I have just generate a new patch of MERGE command.
>>
>
> I haven't followed the discussion very closely, but this part in the
> regression tests caught my attention:
>
> +-- we now have a duplicate key in Buy, so when we join to
> +-- Stock we will generate 2 matching rows, not one.
> +-- According to standard this command should fail.
> +-- But it suceeds in PostgreSQL implementation by simply ignoring the
> second
>
> It doesn't seem like a very good idea to go against the standard here. The
> "second" row is not well defined in this case so the results are
> unpredictable.
>
>
Yes, the result is uncertain. It depends on which row is scanned first,
which is almost out of the control of users.

But, in postgres, this is what the system do for UPDATE.

For example, consider a simple update query like the following:

CREATE TABLE target (id int, val int);
INSERT INTO target VALUES (1, 10);

CREATE TABLE source (id int, add int);
INSERT INTO source VALUES (1, 100);
INSERT INTO source VALUES (1, 100000);

-- DO the update query with source table, which has multiple matched rows

UPDATE target SET val = val + add FROM source
WHERE source.id = target.id;

t=# SELECT * FROM target;
id | val
----+-----
1 | 110
(1 row)

The target tuple has two matched source tuples, but it is only updated once.
And, yet, this query is not forbidden by postgres. The result is also
uncertain.

> The patch is also missing a (trivial) change to explain.c.
>
>
Sorry, I massed up the files. Here comes the new patch file, with EXPLAIN in
it.

>
> Regards,
> Marko Tiikkaja
>

Attachment Content-Type Size
merge_v203.patch application/octet-stream 108.5 KB

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-24 08:13:19
Message-ID: 4C9C5D9F.8000008@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Finding time for a review as large as this one is a bit tough, but I've
managed to set aside a couple of days for it over the next week. I'm
delivering a large project tonight and intend to start in on the review
work tomorrow onced that's cleared up. If you're ever not sure who is
working on your patch and what state they feel it's in, check
https://commitfest.postgresql.org/action/commitfest_view?id=7 for an
update; that's where we keep track of all that information.

Did you ever end up keeping a current version of this patch in an
alternate repository location, such as github? I thought I saw a
suggestion from you about that, but after looking through the history
here all I see are the diff patches you've been sending to the list.
That's fine, just trying to confirm where everything is at.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-29 06:44:35
Message-ID: 4CA2E053.2090603@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Starting looking at the latest MERGE patch from Boxuan here tonight. The
regression tests pass for me here, good starting sign. I expect to move
onto trying to break it more creatively next, then onto performance
tests. Nothing much more exciting than that to report so far.

It had suffered some bit rot, I think because of the security label
changes. Attached is a rebased version against the new git HEAD so
nobody else has to duplicate that to apply the patch. Also, to provide
an alternate interface for anyone who wants to do testing/browsing of
this patch, I've made a Github fork with a merge branch in it. I plan to
commit intermediate stuff to there that keeps up to date with review
changes: http://github.com/greg2ndQuadrant/postgres/tree/merge

Probably easier to read
http://github.com/greg2ndQuadrant/postgres/compare/merge than most local
patch viewers, so I gzip'ed the attached updated patch to save some bytes.

One compiler warning I noticed that needs to get resolved:

src/backend/commands/explain.c:

explain.c: In function ‘ExplainMergeActions’:
explain.c:1528: warning: comparison of distinct pointer types lacks a cast

That is complaining about this section:

if (mt_planstate->operation != CMD_MERGE ||
mt_planstate->mt_mergeActPstates == NIL)
return;

So presumably that comparison with NIL needs a cast. Once I get more
familiar with the code I'll fix that myself if Boxuan doesn't offer a
suggestion first.

The rest of the compiler warnings I saw didn't look related to his code,
maybe stuff my picky Ubuntu compiler is noticing that was done recently
to HEAD. I haven't checked HEAD without this patch yet to confirm, and
am done for the night now. Here's the list if anyone is interested:

Warning in src/backend/parser/scan.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c
-MMD -MP -MF .deps/index.Po
In file included from gram.y:12172:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16256: warning: unused variable ‘yyg’

Warning in src/backend/utils/error/elog.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I../../../../src/include -D_GNU_SOURCE -c -o ts_cache.o
ts_cache.c -MMD -MP -MF .deps/ts_cache.Po
elog.c: In function ‘write_console’:
elog.c:1698: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result
elog.c: In function ‘write_pipe_chunks’:
elog.c:2388: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result
elog.c:2397: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result

Warning in src/bin/scripts/common.c:

gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing
-fwrapv -g -I. -I. -I../../../src/interfaces/libpq
-I../../../src/bin/pg_dump -I../../../src/include -D_GNU_SOURCE -c -o
input.o input.c -MMD -MP -MF .deps/input.Po
common.c: In function ‘handle_sigint’:
common.c:247: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result
common.c:250: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result
common.c:251: warning: ignoring return value of ‘write’, declared with
attribute warn_unused_result

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
Author, "PostgreSQL 9.0 High Performance" Pre-ordering at:
https://www.packtpub.com/postgresql-9-0-high-performance/book

Attachment Content-Type Size
merge_v203-rebase.patch.gz application/x-gzip 27.3 KB

From: Josh Kupershmidt <schmiddy(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-29 13:13:31
Message-ID: AANLkTimHA7AXEu7y3K=1jhp3Vf8roQd9WLYU47=YaMoh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

> The rest of the compiler warnings I saw didn't look related to his code,
> maybe stuff my picky Ubuntu compiler is noticing that was done recently to
> HEAD. I haven't checked HEAD without this patch yet to confirm, and am done
> for the night now. Here's the list if anyone is interested:
>
> Warning in src/backend/parser/scan.c:
>
> gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Wendif-labels -fno-strict-aliasing -fwrapv -g
> -I../../../src/include -D_GNU_SOURCE -c -o index.o index.c -MMD -MP -MF
> .deps/index.Po
> In file included from gram.y:12172:
> scan.c: In function ‘yy_try_NUL_trans’:
> scan.c:16256: warning: unused variable ‘yyg’

Known problem: http://archives.postgresql.org/pgsql-hackers/2009-07/msg00657.php

I'm pretty sure I've seen the warn_unused_result warnings on HEAD as well.

Josh


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-29 13:16:38
Message-ID: AANLkTikpirN5GxpSgDwj=M4rv=oaq2n-iv1awXCQDYA6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> One compiler warning I noticed that needs to get resolved:
>
> src/backend/commands/explain.c:
>
> explain.c: In function ‘ExplainMergeActions’:
> explain.c:1528: warning: comparison of distinct pointer types lacks a cast
>
> That is complaining about this section:
>
> if (mt_planstate->operation != CMD_MERGE ||
> mt_planstate->mt_mergeActPstates == NIL)
> return;
>
> So presumably that comparison with NIL needs a cast. Once I get more
> familiar with the code I'll fix that myself if Boxuan doesn't offer a
> suggestion first.

Possibly NULL was meant instead of NIL. NIL is specifically for a List.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-09-29 13:29:09
Message-ID: AANLkTiky7H-faSMPx-=nTxH6HoG=U3t8nQR3Ek=XC+Am@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 9:16 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> > One compiler warning I noticed that needs to get resolved:
> >
> > src/backend/commands/explain.c:
> >
> > explain.c: In function ‘ExplainMergeActions’:
> > explain.c:1528: warning: comparison of distinct pointer types lacks a
> cast
> >
> > That is complaining about this section:
> >
> > if (mt_planstate->operation != CMD_MERGE ||
> > mt_planstate->mt_mergeActPstates == NIL)
> > return;
> >
> > So presumably that comparison with NIL needs a cast. Once I get more
> > familiar with the code I'll fix that myself if Boxuan doesn't offer a
> > suggestion first.
>
> Possibly NULL was meant instead of NIL. NIL is specifically for a List.
>
>
Yes, it should be NULL instead of NIL.

At first, I designed this filed as a List. But I changed it into an array in
the last editions. This is why I have an unmatched assignment here. Sorry
for that.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-14 01:27:28
Message-ID: AANLkTikYKcOrfgW3Lkf+fCx_xv5sU3VxfWaHUWw388dy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 29, 2010 at 2:44 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Starting looking at the latest MERGE patch from Boxuan here tonight. The
> regression tests pass for me here, good starting sign. I expect to move onto
> trying to break it more creatively next, then onto performance tests.
> Nothing much more exciting than that to report so far.

Greg, are you still working on a review of this patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-14 14:23:24
Message-ID: 4CB7125C.5070604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> Greg, are you still working on a review of this patch?
>

Yes, just had more distractions while coming to speed up on this area
than I'd hoped. I'll get a second round of looking at this done by the
weekend.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-18 00:20:14
Message-ID: 4CBB92BE.2030703@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Boxuan Zhai wrote:
> Yes, it should be NULL instead of NIL.

OK, I applied that patch to my local copy and pushed it out to github:

http://github.com/greg2ndQuadrant/postgres/commit/9013ba9e81490e3623add1b029760817021297c0

That represents what I tested against. However, when I tried to merge
against HEAD once I was finished, I discovered this patch has bit-rotted
significantly. If you have a local copy that works for you, I would not
recommend pulling in the PostgreSQL repo updates done in the last couple
of weeks yet. Friday's "Allow WITH clauses to be attached to INSERT,
UPDATE, DELETE statements" commit in particular conflicts quite a bit
with your changes. Attached is a rebased patch that applies to HEAD now
after a round of fixes to resolve those. But it doesn't work yet,
because of recent change to the ExecUpdate and ExecDelete functions
you're calling from src/backend/executor/nodeModifyTable.c inside
ExecMerge. If you can continue working on the patch without merging
recent repo work, I can hack away at fixing that once I figure out what
got changed there recently. It's taken some painful git work to sort
out what I've done so far, there's more left to do, and I know that's
not an area you specialize in.

Back to the feature review...I dove into how I expected this to work,
relative to what it actually does at the moment. That didn't really go
too well so far, but I don't know that this represents any fundamental
issue with the patch. Just situations the existing code didn't really
anticipate we have to flush out. As a general community FYI here, while
it's taken me a while to get up to speed on this whole feature, I expect
to keep chugging away on this regardless of the CommitFest boundaries.
This feature is both too big and too important to just stop working on
it because a date has passed.

Onto the test cases. The examples that Boxuan has been working with,
and that the regression tests included with the patch exercise, all
involve two tables being joined together using MERGE. The use case I
decided to test instead was when you're trying to simulate an UPSERT
where only a single table is involved. I couldn't get to this to work
correctly. Maybe I'm just using MERGE wrong here, but I tried so many
different variations without success (including one that's basically
copied from Simon's original regression test set suggestions) that I
suspect there may be a subtle problem with the implementation instead.

To replicate the most straightforward variations of what I ran into, you
can start with the same data that's populated by the regression test set:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock;

item_id | balance
---------+---------
10 | 2200
20 | 1900

If you now execute the following:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=10) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

This works fine, and updates the matching row:

item_id | balance
---------+---------
20 | 1900
10 | 2201

But if I give it a key that doesn't exist instead:

MERGE INTO Stock t
USING (SELECT * FROM Stock WHERE item_id=30) AS s
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

This doesn't execute the NOT MATCHED case and INSERT the way I expected
it to. It just gives back "MERGE 0".

Since I wasn't sure if the whole "subquery in the USING clause" case was
really implemented fully, I then tried to do this with something more
like the working regression test examples. I expected this to do the
same thing as the first example:

MERGE INTO Stock t
USING Stock s
ON s.item_id=10 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (10,1)
;

But it gives back this:

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(10) already exists.

Can't tell from that whether it's hitting the MATCHED or NOT MATCHED
side of things to generate that. But it doesn't work any better if you
give it an example that doesn't exist:

MERGE INTO Stock t
USING Stock s
ON s.item_id=30 AND s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
WHEN NOT MATCHED THEN INSERT VALUES (30,1)
;

ERROR: duplicate key value violates unique constraint "stock_item_id_key"
DETAIL: Key (item_id)=(30) already exists.

Now that one is really weird, because that key value certainly doesn't
exist yet in the table. There seem to be a couple of issues in the area
of joining a table with itself here. Feel free to tell me I just don't
know how to use MERGE if that's the case in any of these.

The other thing I noticed that may take some work to sort out is that I
haven't had any luck getting MERGE to execute from within a plpgsql
function. I was hoping I could use this to update the pgbench tables:

CREATE OR REPLACE FUNCTION merge_account_balance(key INT, delta NUMERIC)
RETURNS VOID AS
$$
BEGIN
MERGE INTO pgbench_accounts t USING (SELECT * FROM pgbench_accounts
WHERE aid = key) AS s ON t.aid=s.aid WHEN MATCHED THEN UPDATE SET
abalance = s.abalance + delta WHEN NOT MATCHED THEN INSERT VALUES
(key,1+(key / 100000)::integer,delta,'');
END;
$$
LANGUAGE plpgsql;

But I just get this instead:

ERROR: "pgbench_accounts" is not a known variable
LINE 4: MERGE INTO pgbench_accounts t USING (SELECT * FROM p...

The other way I wrote the MERGE statement above (not using a subquery)
does the same thing. I know that error messages is coming from the
changes introduced in
http://archives.postgresql.org/pgsql-committers/2010-01/msg00161.php but
I'm not familiar enough with the whole grammar implementation to know
what that means yet.

That's what I found so far in my second pass over this. Once the first
problem here is sorted out, I've already worked out how to test the
performance of the code using pgbench. Have all the scripts ready to go
once the correct MERGE statement is plugged into them, just ran into
this same class of problems when I tried them. So far I was only able
to see how fast the UPDATE path worked though, which isn't very helpful
yet. My hope here is to test the MERGE implementation vs. the classic
pl/pgsql implementation of UPSERT, calling both within a function so
it's a fair comparison, and see how that goes. This may flush out
concurrency bugs that are in the MERGE code as well.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachment Content-Type Size
merge-v203-rebase.patch text/x-patch 108.8 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-18 13:54:41
Message-ID: AANLkTinB==t8cqZh5026gQ5==Y0ULQhoZkXr3SLbXJ+m@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think that MERGE is supposed to trigger one rule for each row in the
source data. So:

On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> MERGE INTO Stock t
>  USING (SELECT * FROM Stock WHERE item_id=10) AS s
>  ON s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>  ;
>
> This works fine, and updates the matching row:
>
> item_id | balance
> ---------+---------
>     20 |    1900
>     10 |    2201

Here you have one row of source data, and you got one action (the WHEN
MATCHED case).

> But if I give it a key that doesn't exist instead:
>
> MERGE INTO Stock t
>  USING (SELECT * FROM Stock WHERE item_id=30) AS s
>  ON s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>  ;
>
> This doesn't execute the NOT MATCHED case and INSERT the way I expected it
> to.  It just gives back "MERGE 0".

Here you have no rows of source data (the USING (SELECT ...) doesn't
return anything, since no rows exist) so nothing happens.

> Since I wasn't sure if the whole "subquery in the USING clause" case was
> really implemented fully, I then tried to do this with something more like
> the working regression test examples.  I expected this to do the same thing
> as the first example:
>
> MERGE INTO Stock t
>  USING Stock s
>  ON s.item_id=10 AND s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>  ;
>
> But it gives back this:
>
> ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL:  Key (item_id)=(10) already exists.

Here you have two rows of source data. The ON clause represents the
join condition. The item_id=10 row matches - so you get an update,
presumably, though we can't see that as things turn out - and the
item_id=20 row doesn't match - so you try to insert (10, 1), which is
a duplicate key, thus the error.

> Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side of
> things to generate that.  But it doesn't work any better if you give it an
> example that doesn't exist:
>
> MERGE INTO Stock t
>  USING Stock s
>  ON s.item_id=30 AND s.item_id=t.item_id
>  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>  ;
>
> ERROR:  duplicate key value violates unique constraint "stock_item_id_key"
> DETAIL:  Key (item_id)=(30) already exists.

In this case neither row of the source data matches the join condition
(s.item_id=30 might as well be constant FALSE as far as the test data
is concerned) so you attempt to execute the NOT MATCHED side twice.
So this one also looks correct to me.

> The other thing I noticed that may take some work to sort out is that I
> haven't had any luck getting MERGE to execute from within a plpgsql
> function.  I was hoping I could use this to update the pgbench tables:

Good catch. Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-18 14:09:30
Message-ID: AANLkTikB+=d0Ns2sCc1H-qP_2fZDNJ_ZwA3MQoYNbySY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think that MERGE is supposed to trigger one rule for each row in the
> source data. So:
>
> On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> > MERGE INTO Stock t
> > USING (SELECT * FROM Stock WHERE item_id=10) AS s
> > ON s.item_id=t.item_id
> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> > WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> > ;
> >
> > This works fine, and updates the matching row:
> >
> > item_id | balance
> > ---------+---------
> > 20 | 1900
> > 10 | 2201
>
> Here you have one row of source data, and you got one action (the WHEN
> MATCHED case).
>
> > But if I give it a key that doesn't exist instead:
> >
> > MERGE INTO Stock t
> > USING (SELECT * FROM Stock WHERE item_id=30) AS s
> > ON s.item_id=t.item_id
> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> > WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> > ;
> >
> > This doesn't execute the NOT MATCHED case and INSERT the way I expected
> it
> > to. It just gives back "MERGE 0".
>
> Here you have no rows of source data (the USING (SELECT ...) doesn't
> return anything, since no rows exist) so nothing happens.
>
>
Yes.
The MERGE process is based on a left join between the source table and
target table.
Since here the source table is empty, no join is carried, and thus no MERGE
action is taken.

But, is it correct logically? I mean, should we insert some rows in the
above example rather than do nothing?

> > Since I wasn't sure if the whole "subquery in the USING clause" case was
> > really implemented fully, I then tried to do this with something more
> like
> > the working regression test examples. I expected this to do the same
> thing
> > as the first example:
> >
> > MERGE INTO Stock t
> > USING Stock s
> > ON s.item_id=10 AND s.item_id=t.item_id
> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> > WHEN NOT MATCHED THEN INSERT VALUES (10,1)
> > ;
> >
> > But it gives back this:
> >
> > ERROR: duplicate key value violates unique constraint
> "stock_item_id_key"
> > DETAIL: Key (item_id)=(10) already exists.
>
> Here you have two rows of source data. The ON clause represents the
> join condition. The item_id=10 row matches - so you get an update,
> presumably, though we can't see that as things turn out - and the
> item_id=20 row doesn't match - so you try to insert (10, 1), which is
> a duplicate key, thus the error.
>
> > Can't tell from that whether it's hitting the MATCHED or NOT MATCHED side
> of
> > things to generate that. But it doesn't work any better if you give it
> an
> > example that doesn't exist:
> >
> > MERGE INTO Stock t
> > USING Stock s
> > ON s.item_id=30 AND s.item_id=t.item_id
> > WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
> > WHEN NOT MATCHED THEN INSERT VALUES (30,1)
> > ;
> >
> > ERROR: duplicate key value violates unique constraint
> "stock_item_id_key"
> > DETAIL: Key (item_id)=(30) already exists.
>
> In this case neither row of the source data matches the join condition
> (s.item_id=30 might as well be constant FALSE as far as the test data
> is concerned) so you attempt to execute the NOT MATCHED side twice.
> So this one also looks correct to me.
>
>
Yes, that is what happened in the above two examples.

> The other thing I noticed that may take some work to sort out is that I
> haven't had any luck getting MERGE to execute from within a plpgsql
> function. I was hoping I could use this to update the pgbench tables:

Good catch. Considering the size of this patch, I have no problem
leaving this to the eventual committer to fix, or to a subsequent
commit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-18 14:17:37
Message-ID: AANLkTi=psFqBqhi0=Hyk8gxgAA7A__=D6nc-qvJc0bDs@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 18, 2010 at 10:09 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
>
>
> On Mon, Oct 18, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> I think that MERGE is supposed to trigger one rule for each row in the
>> source data.  So:
>>
>> On Sun, Oct 17, 2010 at 8:20 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
>> > MERGE INTO Stock t
>> >  USING (SELECT * FROM Stock WHERE item_id=10) AS s
>> >  ON s.item_id=t.item_id
>> >  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>> >  WHEN NOT MATCHED THEN INSERT VALUES (10,1)
>> >  ;
>> >
>> > This works fine, and updates the matching row:
>> >
>> > item_id | balance
>> > ---------+---------
>> >     20 |    1900
>> >     10 |    2201
>>
>> Here you have one row of source data, and you got one action (the WHEN
>> MATCHED case).
>>
>> > But if I give it a key that doesn't exist instead:
>> >
>> > MERGE INTO Stock t
>> >  USING (SELECT * FROM Stock WHERE item_id=30) AS s
>> >  ON s.item_id=t.item_id
>> >  WHEN MATCHED THEN UPDATE SET balance=s.balance + 1
>> >  WHEN NOT MATCHED THEN INSERT VALUES (30,1)
>> >  ;
>> >
>> > This doesn't execute the NOT MATCHED case and INSERT the way I expected
>> > it
>> > to.  It just gives back "MERGE 0".
>>
>> Here you have no rows of source data (the USING (SELECT ...) doesn't
>> return anything, since no rows exist) so nothing happens.
>>
>
> Yes.
> The MERGE process is based on a left join between the source table and
> target table.
> Since here the source table is empty, no join is carried, and thus no MERGE
> action is taken.
> But, is it correct logically? I mean, should we insert some rows in the
> above example rather  than do nothing?

I don't think so. I think the right way to write UPSERT is something
along the lines of:

MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
s.item_id = t.item_id ...

(untested)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-20 01:20:13
Message-ID: AANLkTikg1jSBAY+tQQi=xP1fBbwLgHms9HawsH6wZ0_M@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I considered the empty source table situation again. I think it is correct
to upsert nothing in this case.

Back to the original logic of MERGE command, it is main purpose is to add
the supplementary data from the source table into the target table. Thus, an
empty source table means no input data is available, so no upsert is needed
in target table.

Regards,
Boxuan


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-21 18:36:17
Message-ID: 4CC08821.3020102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I think the right way to write UPSERT is something
> along the lines of:
>
> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
> s.item_id = t.item_id ...
>

That led in the right direction, after a bit more fiddling I was finally
able to get something that does what I wanted: a single table UPSERT
implemented with this MERGE implementation. Here's a log of a test
session, suitable for eventual inclusion in the regression tests:

CREATE TABLE Stock(item_id int UNIQUE, balance int);
INSERT INTO Stock VALUES (10, 2200);
INSERT INTO Stock VALUES (20, 1900);
SELECT * FROM Stock ORDER BY item_id;

item_id | balance
---------+---------
10 | 2200
20 | 1900

MERGE INTO Stock t
USING (VALUES(10,100)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1

SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900

MERGE INTO Stock t
USING (VALUES(30,2000)) AS s(item_id,balance)
ON s.item_id=t.item_id
WHEN MATCHED THEN UPDATE SET balance=t.balance + s.balance
WHEN NOT MATCHED THEN INSERT VALUES(s.item_id,s.balance)
;

MERGE 1
SELECT * FROM Stock ORDER BY item_id;
item_id | balance
---------+---------
10 | 2300
20 | 1900
30 | 2000

I'm still a little uncertain as to whether any of my other examples
should have worked under the spec but just didn't work here, but I'll
worry about that later.

Here's what the query plan looks like on a MATCH:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.050..0.061 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.009..0.010 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.026..0.030 rows=1 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.370 ms

And here's a miss:

Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.145..0.145
rows=0 loops=1)
Action 1: Update When Matched
Action 2: Insert When Not Mactched
MainPlan:
-> Nested Loop Left Join (cost=0.00..8.29 rows=1 width=22) (actual
time=0.028..0.033 rows=1 loops=1)
-> Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=8)
(actual time=0.004..0.005 rows=1 loops=1)
-> Index Scan using stock_item_id_key on stock t
(cost=0.00..8.27 rows=1 width=14) (actual time=0.015..0.015 rows=0 loops=1)
Index Cond: ("*VALUES*".column1 = item_id)
Total runtime: 0.255 ms

Next steps here:
1) Performance/concurrency tests against trigger-based UPSERT approach.
2) Finish bit rot cleanup against HEAD.
3) Work out more complicated test cases to try and fine more unexpected
behavior edge cases and general bugs.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-22 15:35:19
Message-ID: 4CC1AF37.9070704@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/21/2010 08:36 PM, Greg Smith wrote:
> Robert Haas wrote:
>> I think the right way to write UPSERT is something
>> along the lines of:
>>
>> MERGE INTO Stock t USING (VALUES (10, 1)) s(item_id, balance) ON
>> s.item_id = t.item_id ...

[...]

> Here's what the query plan looks like on a MATCH:
>
> Merge (cost=0.00..8.29 rows=1 width=22) (actual time=0.166..0.166 rows=0
> loops=1)
> Action 1: Update When Matched
> Action 2: Insert When Not Mactched

"Mactched"? - is this a c&p error or the actual output of EXPLAIN? :)

lg

Stefan


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To:
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 05:34:58
Message-ID: 4CC27402.2000907@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There are now two branches of MERGE code in review progress available.
http://github.com/greg2ndQuadrant/postgres/tree/merge-unstable has the
bit-rotted version that doesn't quite work against HEAD yet, while
http://github.com/greg2ndQuadrant/postgres/tree/merge is what I'm still
testing against until I get that sorted out.

Attached is a tar file containing an initial concurrent MERGE test
case. What it does is create is a pgbench database with a scale of 1.
Then it runs a custom test that does an UPSERT using MERGE against
pgbench_accounts, telling pgbench the database scale is actually 2.
This means that approximately half of the statements executed will hit
the MATCHED side of the merge and update existing rows, while half will
hit the NOT MATCHED one and create new account records. Did some small
tests using pgbench's debug mode where you see all the statements it
executes, and all the output looked correct. Successive tests runs are
not deterministically perfect performance comparisons, but with enough
transactions I hope that averages out.

For comparison sake, there's an almost identical test case that does the
same thing using the pl/pgsql example function from the PostgreSQL
manual for the UPSERT instead also in there. You just run test-merge.sh
or test-function.sh and it runs the whole test, presuming you built and
installed pgbench and don't mind that it will blow away any database
named pgbench you already have.

Since the current MERGE implementation is known not to be optimized for
concurrency, my main goal here wasn't to see how fast it was. That
number is interesting though. With the sole postgresql.conf change of:

checkpoint_settings=32

And a debug/assertion build using 8 concurrent clients, I got 1607 TPS
of UPSERT out of the trigger approach @ 6MB/s of writes to disk, while
the MERGE one delivered 988 TPS @ 4.5MB/s of writes. Will explore this
more later.

This did seem to find a bug in the implementation after running for a while:

TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
"execMain.c", Line: 1732)

Line number there is relative to what you can see at
http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
and that's a null pointer check at the beginning of
EvalPlanQualFetchRowMarks. Haven't explored why this happened or how
repeatable this is, but since Boxuan was looking for some bugs to chase
I wanted to deliver him one to chew on.

While the performance doesn't need to be great in V1, there needs to be
at least some minimal protection against concurrency issues before this
is commit quality. Will continue to shake this code out looking for
them now that I have some basic testing that works for doing so.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us

Attachment Content-Type Size
merge-test-v1.tar application/x-tar 10.0 KB

From: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 12:50:01
Message-ID: 4CC2D9F9.2080002@cs.helsinki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2010-10-23 8:34 AM +0300, Greg Smith wrote:
> While the performance doesn't need to be great in V1, there needs to be
> at least some minimal protection against concurrency issues before this
> is commit quality.

What's been bothering me is that so far there has not been an agreement
on whether we need to protect against concurrency issues or not. In
fact, there has been almost no discussion about the concurrency issues
which AIUI have been the biggest single reason we don't have MERGE
already. Right now, this patch fails in even the simplest scenario:

=# create table foo(a int unique);
NOTICE: CREATE TABLE / UNIQUE will create implicit index "foo_a_key"
for table "foo"
CREATE TABLE

T1=# begin;
BEGIN
T1=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
then update set a=a+1 when not matched then insert values (s.i);
MERGE 1

T2=# merge into foo using (values (1)) s(i) on s.i = foo.a when matched
then update set a=a+1 when not matched then insert values (s.i);
-- blocks

T1=# commit;
COMMIT

.. and T2 gets a unique constraint violation.

As far as I'm concerned, the reason people want MERGE is to avoid these
problems; the nicer syntax is just a bonus. Having to LOCK the target
table makes this feature a lot less useful, even though there are a few
use cases where locking the table would be OK.

Regards,
Marko Tiikkaja


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 14:31:05
Message-ID: 4CC2F1A9.7050503@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marko Tiikkaja wrote:
> What's been bothering me is that so far there has not been an
> agreement on whether we need to protect against concurrency issues or
> not. In fact, there has been almost no discussion about the
> concurrency issues which AIUI have been the biggest single reason we
> don't have MERGE already.

Sure there were, they just happened a long time ago. I tried to
summarize the previous discussion at
http://wiki.postgresql.org/wiki/SQL_MERGE with the following:

"PostgreSQL doesn't have a good way to lock access to a key value that
doesn't exist yet--what other databases call key range
locking...Improvements to the index implementation are needed to allow
this feature."

You can sift through the other archive links in there, the discussion
leading up to that conclusion is in there somewhere. And we had a
discussion of this at the last developer's meeting where this was
identified as a key issue to sort out before this was done; see the
table at the end of
http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting and note
the warning associated with this feature.

The deadlock on developing this feature has been that there's little
benefit to building key range locking on its own, or even a good way to
prove that it works, without a visible feature that uses it. It's much
easier to justify development time on if the rest of MERGE is known to
work, and just needs that plugged in to improve concurrency. And since
Boxuan appeared at the right time to do the syntax and implementation
part first as well, that's the order it's ended up happening in. We're
only making the concurrency part a second priority right now in order to
break the development into managable chunks.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 16:12:44
Message-ID: 13452.1287850364@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> Marko Tiikkaja wrote:
>> What's been bothering me is that so far there has not been an
>> agreement on whether we need to protect against concurrency issues or
>> not. In fact, there has been almost no discussion about the
>> concurrency issues which AIUI have been the biggest single reason we
>> don't have MERGE already.

> Sure there were, they just happened a long time ago. I tried to
> summarize the previous discussion at
> http://wiki.postgresql.org/wiki/SQL_MERGE with the following:

> "PostgreSQL doesn't have a good way to lock access to a key value that
> doesn't exist yet--what other databases call key range
> locking...Improvements to the index implementation are needed to allow
> this feature."

This seems to be presuming there's a consensus that that's the way to
implement it, which is news to me. Why wouldn't we try the INSERT
first, and if we get a unique-key failure, do UPDATE instead? I am not
at all in agreement that we ought to build key range locking, nor that
it'd be particularly helpful for this problem even if we had it.

regards, tom lane


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 20:30:10
Message-ID: AANLkTikssVJWqDQMJfhRo1FhEyfeNdwyCKVSMAuyS1Hc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
>
> This did seem to find a bug in the implementation after running for a
> while:
>
> TRAP: FailedAssertion("!(epqstate->origslot != ((void *)0))", File:
> "execMain.c", Line: 1732)
>
> Line number there is relative to what you can see at
> http://github.com/greg2ndQuadrant/postgres/blob/merge/src/backend/executor/execMain.c
> and that's a null pointer check at the beginning of
> EvalPlanQualFetchRowMarks. Haven't explored why this happened or how
> repeatable this is, but since Boxuan was looking for some bugs to chase I
> wanted to deliver him one to chew on.
>
>
I am not sure how this bug comes. I will try to find out the reason for it.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 22:59:13
Message-ID: AANLkTikp2EZkQhKMC0Bhn2B-HTnGzRkF=C6u2c5XL+87@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 23, 2010 at 9:12 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> "PostgreSQL doesn't have a good way to lock access to a key value that
>> doesn't exist yet--what other databases call key range
>> locking...Improvements to the index implementation are needed to allow
>> this feature."
>
> This seems to be presuming there's a consensus that that's the way to
> implement it, which is news to me.  Why wouldn't we try the INSERT
> first, and if we get a unique-key failure, do UPDATE instead?  I am not
> at all in agreement that we ought to build key range locking, nor that
> it'd be particularly helpful for this problem even if we had it.

Except we *do* have range locking for the special case of inserting
equal values into a unique btree index. In that case the btree insert
locks the leaf page containing the conflicting key value --
effectively locking out anyone else from inserting the same key value.
That lock is held until the index entry pointing to the newly
inserted value is finished. That's what prevents two inserters from
inserting the same key value.

So the trick for MERGE on equality would be to refactor the btree api
so that you could find the matching leaf page and *keep* that page
locked. Then do an update against the conflicting row found there if
any without ever releasing that page lock. Someone else can come along
and delete the row (or have already deleted it) before the update
locks it but if they do you can insert your row without worrying about
anyone else inserting a conflicting entry since you're still holding a
lock on the page they'll need to insert the btree entry on and have
been holding it continuously for the whole operation.

I'm not sure how the new exclusion constraints stuff affects this. I
think they would work the same way except instead of holding a page
lock on the leaf page they would store the value being edited in the
in-memory array -- effectively another form of key range locking.

I think the blocker with MERGE has always been that the standard is
*so* general that it could apply to conditions that *aren't* covered
by a unique constraint or exclusion constriant. Personally, I'm fine
saying that those cases will perform poorly, locking the table, or
aren't allowed and print a warning explaining exactly what exclusion
constraint or btree unique index would be necessary to allow it. I
think virtually every case where people will want to use it will be on
a primary key anyways.

--
greg


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-23 23:03:36
Message-ID: 4CC369C8.8080104@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> So the trick for MERGE on equality would be to refactor the btree api
> so that you could find the matching leaf page and *keep* that page
> locked. Then do an update against the conflicting row found there if
> any without ever releasing that page lock. Someone else can come along
> and delete the row (or have already deleted it) before the update
> locks it but if they do you can insert your row without worrying about
> anyone else inserting a conflicting entry since you're still holding a
> lock on the page they'll need to insert the btree entry on and have
> been holding it continuously for the whole operation.

I think that such a lock would also be useful for improving the FK
deadlock issues we have.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 07:32:13
Message-ID: AANLkTik1vLn9kLfnPpKmQ-N+CGvR8xkEDiYB9fV3bDeu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 23, 2010 at 4:03 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> I think that such a lock would also be useful for improving the FK deadlock
> issues we have.

I don't see how. I think the problem you're referring to occurs when
different plans update rows in different orders and the resulting
locks on foreign key targets are taken in different orders. In which
case the problem isn't that we're unable to lock the resources --
they're locked using regular row locks -- but rather that there's
nothing controlling the ordering of the locks.

I don't think it would be acceptable to hold low level btree page
locks across multiple independent row operations on different rows and
I don't see how they would be any better than the row locks we have
now. Worse, the resulting deadlock would no longer be detectable (one
of the reasons it wouldn't be acceptable to hold the lock for so
long).

That does point out a problem with the logic I sketched. If you go to
do an update and find there's an uncommitted update pending you have
to wait on it. You can't do that while holding the index page lock. I
assume then you release it and wait on the uncommitted transaction and
when it's finished you start over doing the btree lookup and
reacquiring the lock. I haven't thought it through in detail though.

--
greg


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 09:50:09
Message-ID: 20101024095009.GA4695@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
> I think the blocker with MERGE has always been that the standard is
> *so* general that it could apply to conditions that *aren't* covered
> by a unique constraint or exclusion constriant. Personally, I'm fine
> saying that those cases will perform poorly, locking the table, or
> aren't allowed and print a warning explaining exactly what exclusion
> constraint or btree unique index would be necessary to allow it. I
> think virtually every case where people will want to use it will be on
> a primary key anyways.

Can we please not get MERGE hung up on trying to make it atomic. The
standard requires no such thing and there are plenty of uses of MERGE
that don't involve high concurrency updates of the same row by
different processes. If we want an atomic UPSERT, make that a seperate
command, MERGE without atomicity is useful enough already.

Simply, if the row is visible in the current snapshot you do an update
otherwise an insert. If you get an error, raise it. Icing can be added
later.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 12:05:29
Message-ID: AANLkTimZk5ywU7QbfWrPyTaaqpYvbnmXogHJuOwiq6gE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 5:50 AM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> On Sat, Oct 23, 2010 at 03:59:13PM -0700, Greg Stark wrote:
>> I think the blocker with MERGE has always been that the standard is
>> *so* general that it could apply to conditions that *aren't* covered
>> by a unique constraint or exclusion constriant. Personally, I'm fine
>> saying that those cases will perform poorly, locking the table, or
>> aren't allowed and print a warning explaining exactly what exclusion
>> constraint or btree unique index would be necessary to allow it. I
>> think virtually every case where people will want to use it will be on
>> a primary key anyways.
>
> Can we please not get MERGE hung up on trying to make it atomic. The
> standard requires no such thing and there are plenty of uses of MERGE
> that don't involve high concurrency updates of the same row by
> different processes. If we want an atomic UPSERT, make that a seperate
> command, MERGE without atomicity is useful enough already.
>
> Simply, if the row is visible in the current snapshot you do an update
> otherwise an insert. If you get an error, raise it. Icing can be added
> later.

Long term, I'm wondering if the permanent fix for this problem might
involve something similar to what EXCLUDE does - use a dirty snapshot
for the scan of the target table, and block on in=d

Yeah, I couldn't have said it better. It's an annoying implementation
restriction but only that. I think it's a perfectly acceptable
restriction for an initial commit. It's no secret that our process
has trouble absorbing large patches.

I am also wondering exactly what the semantics are supposed to be
under concurrency. We can't assume that it's OK to treat WHEN NOT
MATCHED as WHEN MATCHED if a unique-key violation is encountered. The
join condition could be something else entirely. I guess we could
scan the target table with a dirty snapshot and block on any in-doubt
tuples, similar to what EXCLUDE constraints do internally, but
throwing MVCC out the window doesn't seem right either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 16:15:54
Message-ID: 4CC45BBA.9080400@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I am also wondering exactly what the semantics are supposed to be
> under concurrency. We can't assume that it's OK to treat WHEN NOT
> MATCHED as WHEN MATCHED if a unique-key violation is encountered. The
> join condition could be something else entirely. I guess we could
> scan the target table with a dirty snapshot and block on any in-doubt
> tuples, similar to what EXCLUDE constraints do internally, but
> throwing MVCC out the window doesn't seem right either.
>

You've answered Tom's question of why not to just INSERT and trap the
error here. Presuming a unique key violation is what will kick back in
this situation and to just follow the other side doesn't cover all the
ways this can get used.

I'm thinking of this problem now as being like the one that happens when
a concurrent UPDATE happens. To quote from the docs here:

"a target row might have already been updated (or deleted or locked) by
another concurrent transaction by the time it is found. In this case,
the would-be updater will wait for the first updating transaction to
commit or roll back (if it is still in progress). If the first updater
rolls back, then its effects are negated and the second updater can
proceed with updating the originally found row. If the first updater
commits, the second updater will ignore the row if the first updater
deleted it, otherwise it will attempt to apply its operation to the
updated version of the row. The search condition of the command (the
WHERE clause) is re-evaluated to see if the updated version of the row
still matches the search condition. If so, the second updater proceeds
with its operation using the updated version of the row."

What I think we can do with adding a key reservation is apply the same
sort of logic to INSERTs too, given a way to lock an index entry before
the row itself is complete. If MERGE hits a WHERE condition that finds
a key lock entry in the index, it has to sit and wait for that other
command to finish. There isn't any other sensible behavior I can see in
that situation, just like there isn't one for UPDATE right now.

If the transaction that was locking the index entry commits, it has to
start over with re-evaluating the WHEN MATCHED part (the equivilent of
the WHERE in the UPDATE case). But it shouldn't need to do a new JOIN
again, because that condition was proven to be met already by the lock
squatting on that index key, without even having the rest of the row
there yet to check its data. If the other entry commits, it would then
follow the WHEN MATCHED path and in the UPSERT case do an UPDATE. If
the transaction who had the key reservervation rolls back, the
re-evaluation of the MERGE WHEN MATCHED will fail, at which point it
follows the WHERE FOUND INSERT path. As it will now be the locker of
the key reservation it had been waiting on earlier at that point, it
will be free to do the INSERT with no concerns about race conditions or
retries. In the SERIALIZABLE case, again just try to follow the same
slightly different rules that exist for concurrent UPDATE commands.

There may be a tricky point here that we will need to warn people that
UPSERT implementations need to make sure they only reference the columns
of the primary key in the MERGE conditions for this to work as expected,
because otherwise they might get back a unique key violation error
anyway. I don't know how other databases deal with that particular
problem. I have considered following the somewhat distasteful path of
installing SQL Server or Oracle here just to see how they handle some of
the pathological test cases I'm now thinking about for this command.

This particular weak spot in MVCC is already there for UPDATE, and this
approach seems to me to be the most logical way to extend the same basic
implementation to also eliminate some concurrent MERGE race conditions,
including the most common one the UPSERT case is stuck with. But I have
no intention of halting work on the basic MERGE to wait for this problem
to be solved even if I'm completely wrong about what I've outlined
above. That style of thinking--don't even start until every a perfect
solution for every possible problem is known--is something I consider a
problem in this community's development model, one that has contributed
to why no work has been done on this feature until now. This one splits
nicely into "get the implemenation working" and "improve the
concurrency" parts, and I haven't heard a good reason yet why those need
to coupled together.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 17:43:35
Message-ID: AANLkTineR-rDFWENeddLg=GrkT+epMHk2j9X0YqpiTY8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 12:15 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Robert Haas wrote:
>> I am also wondering exactly what the semantics are supposed to be
>> under concurrency.  We can't assume that it's OK to treat WHEN NOT
>> MATCHED as WHEN MATCHED if a unique-key violation is encountered.  The
>> join condition could be something else entirely.  I guess we could
>> scan the target table with a dirty snapshot and block on any in-doubt
>> tuples, similar to what EXCLUDE constraints do internally, but
>> throwing MVCC out the window doesn't seem right either.
>
> [discussion of EPQ behavior for UPDATE statements]
>
> What I think we can do with adding a key reservation is apply the same sort
> of logic to INSERTs too, given a way to lock an index entry before the row
> itself is complete.  If MERGE hits a WHERE condition that finds a key lock
> entry in the index, it has to sit and wait for that other command to finish.
>  There isn't any other sensible behavior I can see in that situation, just
> like there isn't one for UPDATE right now.

Well, there's no guarantee that any index at all exists on the target
table, so any solution based on index locks can't be fully general.

But let's back up and talk about MVCC for a minute. Suppose we have
three source tuples, (1), (2), and (3); and the target table contains
tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
suppose also an equijoin. Clearly, source tuple (1) should fire the
MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
but what in the world should source tuple (2) do? AFAICS, the only
sensible behavior is to throw a serialization error, because no matter
what you do the results won't be equivalent to a serial execution of
the transaction that committed target tuple (2) and the transaction
that contains the MERGE.

Even with predicate locks, it's not obvious how to me how solve this
problem. Target tuple (2) may already be there, and its transaction
already committed, by the time the MERGE statement gets around to
looking at the source data. In fact, even taking an
AccessExclusiveLock on the target table doesn't fix this, because the
snapshot would be taken before the lock.

> been done on this feature until now.  This one splits nicely into "get the
> implemenation working" and "improve the concurrency" parts, and I haven't
> heard a good reason yet why those need to coupled together.

Sounds like we're all on the same page.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 21:39:12
Message-ID: 4CC4A780.10301@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> Well, there's no guarantee that any index at all exists on the target
> table, so any solution based on index locks can't be fully general.
>

Sure, but in the most common use case I think we're targeting at first,
no indexes means there's also no unique constraints either. I don't
think people can reasonable expect to MERGE or anything else to
guarantee they won't accidentally create two rows that conflict via the
terms of some non-database enforced key.

I am too brain-dead this particular Sunday to think of exactly how to
deal with the 3-tuple case you described, except to say that I think
it's OK for complicated situations to give up and throw a serialization
error. I'm already collecting a list of pathological tests and will try
to add something based on your problem case, then see what we can do
with it later.

--
Greg Smith, 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 23:11:51
Message-ID: AANLkTimyoPqJ58Nz=aCvj2GAJURLVexU4LEwqytD61wQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout
<kleptog(at)svana(dot)org> wrote:
> Can we please not get MERGE hung up on trying to make it atomic. The
> standard requires no such thing and there are plenty of uses of MERGE
> that don't involve high concurrency updates of the same row by
> different processes. If we want an atomic UPSERT, make that a seperate
> command, MERGE without atomicity is useful enough already.

Really? I don't really see any point in a non-atomic MERGE. Nor in a
non-atomic UPDATE or INSERT or any other operation. The A in ACID is
as important as any other letter.

For that matter if you don't care about atomicity then this is a
problem already solvable easily solvable in the application. Why
bother providing a special command for it. The whole reason to want a
special command is precisely because the database can implement it
atomically more easily and more efficiently than any application
implementation.

Moreover having a case which is non-atomic and allows inconsistent
results or errors in the face of concurrent updates is a foot-gun.
Someone will come along and try to use it and it will appear to work
in their application but introduce nasty hidden race conditions.

--
greg


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-24 23:16:56
Message-ID: AANLkTi=kmFsD5zdN6FFwXv-i6zPcj=o8SK1Vq07kz+5Z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 2:39 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Sure, but in the most common use case I think we're targeting at first, no
> indexes means there's also no unique constraints either.  I don't think
> people can reasonable expect to MERGE or anything else to guarantee they
> won't accidentally create two rows that conflict via the terms of some
> non-database enforced key.

I'm fine with either a) having the merge constraint be required to
match exactly either a exclusion constraint or unique index or throw
an error or b) lock the table if you perform a merge against a table
on a non-indexed condition like foreign keys do if you're missing the
relevant index.

Either way I expect this case to be a rare use case where users are
either doing data loads and locking the table against concurrent
updates is fine or they will immediately realize the error of their
ways and create the corresponding unique or exclusion constraint
index.

Other than bulk data loads I don't see any useful use case that
doesn't have the corresponding exclusion constraint or unique index as
a hard requirement anyways. It'll be necessary for both correctness
and performance.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-25 03:19:26
Message-ID: AANLkTikkdCbRkG3W58sg7EdNUFbhBc_N4oHVZikiDXE-@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 7:11 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Sun, Oct 24, 2010 at 2:50 AM, Martijn van Oosterhout
> <kleptog(at)svana(dot)org> wrote:
>> Can we please not get MERGE hung up on trying to make it atomic. The
>> standard requires no such thing and there are plenty of uses of MERGE
>> that don't involve high concurrency updates of the same row by
>> different processes. If we want an atomic UPSERT, make that a seperate
>> command, MERGE without atomicity is useful enough already.
>
> Really? I don't really see any point in a non-atomic MERGE. Nor in a
> non-atomic UPDATE or INSERT or any other operation. The A in ACID is
> as important as any other letter.

I think we're really discussing the "I" in ACID, not the "A". There's
no question that the MERGE transaction will either commit or fail.
What we're talking about is what happens when there are concurrent
table modifications in progress; and the answer is that you might get
serialization anomalies. But we have serialization anomalies without
MERGE, too - see the discussions around Kevin Grittner's SSI patch
which, come to think of it, might be useful for this case, too.

I posted an example upthread which I think demonstrates very clearly
that MERGE will result in unavoidable serialization failures - so if
the standard is that we mustn't have any serialization failures then
the standard can never be met. The best we can hope for is that we'll
detect them and roll back if they occur, rather than going on to
commit but perhaps with some anomaly. And I'm pretty sure that's what
KG's SSI patch is going to give us. So I'm not sure there's really
anything to get worked up about here in terms of concurrency issues.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-25 17:42:35
Message-ID: AANLkTimyX=bpfU1_1VaTBszGF5AEN3pZA38BWoULpdwM@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But let's back up and talk about MVCC for a minute.  Suppose we have
> three source tuples, (1), (2), and (3); and the target table contains
> tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
> suppose also an equijoin.  Clearly, source tuple (1) should fire the
> MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
> but what in the world should source tuple (2) do?  AFAICS, the only
> sensible behavior is to throw a serialization error, because no matter
> what you do the results won't be equivalent to a serial execution of
> the transaction that committed target tuple (2) and the transaction
> that contains the MERGE.

So the behaviour we get with UPDATE in this situation is that we
update (2) so I would expect this to execute the MATCHED rule. The key
distinction is that since we're not returning the data to the user the
user sees we want to update the most recent version and it's "almost"
as if we ran "after" all the other transactions. It's not really
serializable and I think in serializable mode we throw a serialization
failure instead but in most usage patterns it's precisely what the
user wants.

Here "bbb" contained two records when we began with values "1" and "2"
but the "2" was inserted in a transaction which hadn't committed yet.
It commited after the update.

postgres=> begin;
BEGIN
postgres=> select * from bbb;
i
---
1
(1 row)

postgres=> update bbb set i = i+1;
UPDATE 2
postgres=> commit;
COMMIT
postgres=> select * from bbb;
i
---
2
3
(2 rows)

--
greg


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Marko Tiikkaja" <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, "Boxuan Zhai" <bxzhai2010(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 18:07:52
Message-ID: 4CC581280200002500036DC5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But let's back up and talk about MVCC for a minute. Suppose we
>> have three source tuples, (1), (2), and (3); and the target table
>> contains tuples (1) and (2), of which only (1) is visible to our
>> MVCC snapshot; suppose also an equijoin. Clearly, source tuple
>> (1) should fire the MATCHED rule and source tuple (3) should fire
>> the NOT MATCHED rule, but what in the world should source tuple
>> (2) do? AFAICS, the only sensible behavior is to throw a
>> serialization error, because no matter what you do the results
>> won't be equivalent to a serial execution of the transaction that
>> committed target tuple (2) and the transaction that contains the
>> MERGE.
>
> So the behaviour we get with UPDATE in this situation is that we
> update (2) so I would expect this to execute the MATCHED rule.

Certainly that would be consistent with the behavior of READ
COMMITTED -- wait for commit or rollback of the concurrent
transaction, and then proceed with whatever data is there after
completion of the other transaction. With REPEATABLE READ or
SERIALIZABLE you would block until commit of the other transaction
and terminate with a write conflict -- a form of serialization
failure. If the other transaction rolls back you INSERT.

At least, that would be the least surprising behavior to me.

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ask for review of MERGE
Date: 2010-10-25 18:09:54
Message-ID: AANLkTinbxfdaSYQan1fhT0UOKGD20z4xN6XGrc7sOAYu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 1:42 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Sun, Oct 24, 2010 at 10:43 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But let's back up and talk about MVCC for a minute.  Suppose we have
>> three source tuples, (1), (2), and (3); and the target table contains
>> tuples (1) and (2), of which only (1) is visible to our MVCC snapshot;
>> suppose also an equijoin.  Clearly, source tuple (1) should fire the
>> MATCHED rule and source tuple (3) should fire the NOT MATCHED rule,
>> but what in the world should source tuple (2) do?  AFAICS, the only
>> sensible behavior is to throw a serialization error, because no matter
>> what you do the results won't be equivalent to a serial execution of
>> the transaction that committed target tuple (2) and the transaction
>> that contains the MERGE.
>
> So the behaviour we get with UPDATE in this situation is that we
> update (2) so I would expect this to execute the MATCHED rule.

Not exactly. Consider this example:

rhaas=# create table concurrent (x integer primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
rhaas=# insert into x values (1);
rhaas=# begin;
BEGIN
rhaas=# insert into concurrent values (2);
INSERT 0 1

<switch to a different window>

rhaas=# update concurrent set x=x where x=2;
UPDATE 0

> The key
> distinction is that since we're not returning the data to the user the
> user sees we want to update the most recent version and it's "almost"
> as if we ran "after" all the other transactions. It's not really
> serializable and I think in serializable mode we throw a serialization
> failure instead but in most usage patterns it's precisely what the
> user wants.

I think it would be perfectly reasonable to have a transaction
isolation level that does not use a snapshot at all and instead runs
everything relative to SnapshotNow, and people could use it with MERGE
if they were so inclined. I think this would correspond more or less
to the READ COMMITTED isolation level specified in the standard; what
we now call READ COMMITTED is actually better than READ COMMITTED but
not quite as good as REPEATABLE READ. That, combined with an
exclusive lock on the table (or, perhaps, some kind of predicate
locking mechanism) would be sufficient to prevent serialization
anomalies.

However, I don't think that implementing those semantics for just this
one command (or part of it) makes a whole lot of sense. The EPQ
behavior of our current default isolation level is really pretty
strange, and adding a random wart that the target table (but not the
source table) in a MERGE query gets scanned using SnapshotNow would be
one more piece of strangeness atop the strangeness we already have.
And, as we just saw with the enum stuff, SnapshotNow can lead to some
awfully strange behavior - you could end up processing half of the
data from a concurrent transaction and missing the other half.

> Here "bbb" contained two records when we began with values "1" and "2"
> but the "2" was inserted in a transaction which hadn't committed yet.
> It commited after the update.
>
> postgres=> begin;
> BEGIN
> postgres=> select * from bbb;
>  i
> ---
>  1
> (1 row)
>
> postgres=> update bbb set i = i+1;
> UPDATE 2
> postgres=> commit;
> COMMIT
> postgres=> select * from bbb;
>  i
> ---
>  2
>  3
> (2 rows)

Well, at least on my system, if the transaction inserting (2) hasn't
committed yet, that UPDATE statement will block until it does, because
trying to change i from 1 to 2 causes the update of the unique index
to block, since there's an in-doubt tuple with (2) already. Then it
will continue on as you've shown here, due to EPQ. But if you do the
same statement with i = i + 10 instead of + 1, then it doesn't block,
and only updates the one row that's visible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Marko Tiikkaja" <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, "Boxuan Zhai" <bxzhai2010(at)gmail(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 19:15:14
Message-ID: 4CC590F20200002500036DCF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> rhaas=# create table concurrent (x integer primary key);
> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
> "concurrent_pkey" for table "concurrent"
> CREATE TABLE
> rhaas=# insert into x values (1);
> rhaas=# begin;
> BEGIN
> rhaas=# insert into concurrent values (2);
> INSERT 0 1
>
> <switch to a different window>
>
> rhaas=# update concurrent set x=x where x=2;
> UPDATE 0

That surprised me. I would have thought that the INSERT would have
created an "in doubt" tuple which would block the UPDATE. What is
the reason for not doing so?

FWIW I did a quick test and REPEATABLE READ also lets this pass but
with the SSI patch SERIALIZABLE seems to cover this correctly,
generating a serialization failure where such access is involved in
write skew:

test=# create table concurrent (x integer primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
test=# insert into concurrent select generate_series(1, 20000);
INSERT 0 20000
test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (0);
INSERT 0 1
test=# update concurrent set x = 30001 where x = 30000;
UPDATE 0

<different session>

test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (30000);
INSERT 0 1
test=# update concurrent set x = -1 where x = 0;
UPDATE 0
test=# commit;
ERROR: could not serialize access due to read/write dependencies
among transactions
HINT: The transaction might succeed if retried.

I'll need to add a test to cover this, because it might have broken
with one of the optimizations on my list, had you not point out this
behavior.

On the other hand:

<session 1>

test=# drop table concurrent ;
DROP TABLE
test=# create table concurrent (x integer primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"concurrent_pkey" for table "concurrent"
CREATE TABLE
test=# insert into concurrent select generate_series(1, 20000);
INSERT 0 20000
test=# begin isolation level serializable;
BEGIN
test=# insert into concurrent values (0);
INSERT 0 1

<session 2>

test=# begin isolation level serializable;
BEGIN
test=# select * from concurrent where x = 0;
x
---
(0 rows)

test=# insert into concurrent values (0);
<blocks>

<session 1>

test=# commit;
COMMIT

<session 2>

ERROR: duplicate key value violates unique constraint
"concurrent_pkey"
DETAIL: Key (x)=(0) already exists.

Anyway, I thought this might be of interest in terms of the MERGE
patch concurrency issues, since the SSI patch has been mentioned.

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 19:40:07
Message-ID: AANLkTikF4JH9=xu51XLBeWDPqXYpRLn1YFt6gvQBVCqf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 3:15 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> rhaas=# create table concurrent (x integer primary key);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "concurrent_pkey" for table "concurrent"
>> CREATE TABLE
>> rhaas=# insert into x values (1);
>> rhaas=# begin;
>> BEGIN
>> rhaas=# insert into concurrent values (2);
>> INSERT 0 1
>>
>> <switch to a different window>
>>
>> rhaas=# update concurrent set x=x where x=2;
>> UPDATE 0
>
> That surprised me.  I would have thought that the INSERT would have
> created an "in doubt" tuple which would block the UPDATE.  What is
> the reason for not doing so?

This is just standard MVCC - readers don't block writers, nor writers
readers. You might also think about what would happen if the UPDATE
were run before the INSERT of (2). There's no serialization anomaly
here, because either concurrent case is equivalent to the serial
schedule where the update precedes the insert.

In the case of a MERGE that matches a just-inserted invisible tuple
but no visible tuple, things are a bit stickier. Let's suppose we're
trying to use MERGE to get UPSERT semantics. If the MERGE command has
the obvious behavior of ignoring the invisible tuple just as UPDATE or
DELETE would do, then clearly any equivalent serial schedule must run
the MERGE before the INSERT (because if it were run after the INSERT,
it would fire the MATCHED action rather than the NOT MATCHED action).
But if the merge were run before the INSERT, then the INSERT would
have failed with a unique key violation; instead, the merge fails with
a unique key violation. On the other hand, if the MERGE sees the
invisible tuple, essentially using SnapshotNow semantics, as Greg
Stark proposed, you get a different (and probably worse) class of
serialization anomalies. For example, suppose the table has rows
1-100 and you do an update adding 1000 to each value concurrently with
merging in the values 51-100. You might get something like this:

- MERGE scans rows 1-75, firing MATCHED for rows 51-75.
- UPDATE commits.
- MERGE scans rows 76-100, firing NOT MATCHED for each.

Now, as Greg says, that might be what some people want, but it's
certainly monumentally unserializable. In a serial execution
schedule, the MERGE will either run before the UPDATE, in which case
MATCHED will fire for rows 51-100, or else the UPDATE will run before
the MERGE, in which case NOT MATCHED will fire for rows 51-100. No
serial schedule is going to fire MATCHED for some rows and NOT MATCHED
for others.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 20:10:48
Message-ID: AANLkTi=a1+AWn0wpmWXNLHWt1AsmU+q3pajPuSdN4SfF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Now, as Greg says, that might be what some people want, but it's
> certainly monumentally unserializable.

To be clear when I said it's what people want what I meant was that in
the common cases it's doing exactly what people want. As opposed to
getting closer to what people want in general but not quite hitting
the mark in the common cases.

Just as an example I think it's important that in the simplest case,
upsert of a single record, it be 100% guaranteed to do the naive
upsert. If two users are doing the merge of a single key at the same
time one of them had better insert and one of them had better update
or else users are going to be monumentally surprised.

I guess I hadn't considered all the cases and I agree it's important
that our behaviour make some kind of sense and be consistent with how
we handle updates and of existing in-doubt tuples. I wasn't trying to
introduce a whole new mode of operation, just work from analogy from
the way update works. It's clear that even with our existing semantics
there are strange corner cases once you get to multiple updates
happening in a single transaction. But we get the simple cases right
and even in the more complex cases, while it's not truly serializable
we should be able to come up with some basic smell tests that we pass.

My understanding is that currently we generally treat DML in one of
two ways depending on whether it's returning data to the user or
updating data in the table (include select for share). If it's
returning data to the user we use a snapshot to give the user a
consistent view of the database. If it's altering data in the database
we use the snapshot to get a consistent set of records and then apply
the updates to the most recent version.

The anomaly you showed with update and the problem with MERGE are both
because the operation was simultaneously doing a "read" -- the WHERE
clause and the uniqueness check in the MERGE -- and a write. This is
already the kind of case where we do weird things -- what kind of
behaviour would be consistent with our existing, somewhat weird,
behaviour?

--
greg


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Greg Smith" <greg(at)2ndquadrant(dot)com>, "Marko Tiikkaja" <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, "Boxuan Zhai" <bxzhai2010(at)gmail(dot)com>, "Greg Stark" <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 20:17:16
Message-ID: 4CC59F7C0200002500036DE5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> I would have thought that the INSERT would have
>> created an "in doubt" tuple which would block the UPDATE.

> This is just standard MVCC - readers don't block writers, nor
> writers readers.

Sure, but I tend to think of both INSERT and UPDATE as writes. ;-)

> You might also think about what would happen if the UPDATE
> were run before the INSERT

> either concurrent case is equivalent to the serial
> schedule where the update precedes the insert.

I guess that's persuasive enough. It feels funny, but the argument
looks sound, so I guess it's just a case of my intuition being
faulty.

> In the case of a MERGE that matches a just-inserted invisible
> tuple but no visible tuple, things are a bit stickier.

Well, more generally it can lead to anomalies in a more complex
combination of actions, since it creates, as you imply above, a
rw-dependency from the transaction doing the UPDATE to the
transaction doing the INSERT, so the combination can form part of a
cycle in apparent order of execution which can produce an anomaly.
The more I look at it, the more clear it is that current behavior is
correct and what the implications are. I've just missed that detail
until now, wrongly assuming that it would be a write conflict.

> [example of MERGE which can not serialize with a concurrent
> transaction, and possible outcomes if there is no serialization
> failure]

> Now, as Greg says, that might be what some people want, but it's
> certainly monumentally unserializable.

Yeah. MERGE should probably be sensitive to the transaction
isolation level, at least to the extent that MERGE in a SERIALIZABLE
transaction plays nice with other SERIALIZABLE transactions. That
would be necessary to allow business rules enforced through triggers
to be able to guarantee data integrity. It would mean that a MERGE
involving tables which were the target of modifications from
concurrent SERIALIZABLE transactions would be likely to fail and/or
to cause other transactions to fail.

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-25 20:58:15
Message-ID: AANLkTikdP6c_Mw_1mZUtSzidXyWt2aT_5HL5scU=KHCq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Now, as Greg says, that might be what some people want, but it's
>> certainly monumentally unserializable.
>
> To be clear when I said it's what people want what I meant was that in
> the common cases it's doing exactly what people want. As opposed to
> getting closer to what people want in general but not quite hitting
> the mark in the common cases.
>
> Just as an example I think it's important that in the simplest case,
> upsert of a single record, it be 100% guaranteed to do the naive
> upsert. If two users are doing the merge of a single key at the same
> time one of them had better insert and one of them had better update
> or else users are going to be monumentally surprised.

Hmm, so let's think about that case.

The first merge comes along and finds no match so it fires the NOT
MATCHED rule, which inserts a tuple. The second merge comes along and
finds no match, so it also fires the NOT MATCHED rule and tries to
insert a tuple. But upon consulting the PRIMARY KEY index it finds
that an in-doubt tuple exists so it goes to sleep waiting for the
first transaction to commit or abort. If the first transaction
commits it then decides that the jig is up and fails. We could
(maybe) fix this by doing something similar to what EPQ does for
updates: when the first transaction commits, instead of redoing the
insert, we back up and recheck whether the new tuple would have
matched the join clause and, if so, we instead fire the MATCHED action
on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
sure how hard that would be, or whether it would introduce any other
nasty anomalies in more complex cases.

Alternatively, we could introduce an UPSERT or REPLACE statement
intended to handle exactly this case and leave MERGE for more complex
situations. It's pretty easy to imagine what the coding of that
should look like: if we encounter an in-doubt tuple in we wait on its
xmin. If the transaction aborts, we insert. If it commits, and we're
in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
or SERIALIZABLE mode, we abort with a serialization error. That's a
lot simpler to understand and reason about than MERGE in its full
generality.

I think it's pretty much hopeless to think that MERGE is going to work
in complex concurrent scenarios without creating serialization
anomalies, or at least rollbacks. I think that's baked into the
nature of what the statement does. To simulate MERGE, you need to
read from the target table and then do writes that depend on what you
read. If you do that with the commands that are available today,
you're going to get serialization anomalies and/or rollbacks under
concurrency. The mere fact of that logic being inside the database
rather than outside isn't going to make that go away. Now sometimes,
as with exclusion constraints, you can play games with dirty snapshots
to get the semantics you want, but whether that's possible in a
particular case depends on the details of the operation being
performed, and here I think it can't be done. Some operations are
*fundamentally* unserializable.

A very simple example of this is a sequence that is guaranteed not to
have gaps (a feature we've occasionally been requested to provide).
If N processes request a sequence number simultaneously, you have to
hand out a value to the first guy and wait and see whether he commits
or aborts before deciding what number to give the second guy. That
sucks, so usually we just design our applications not to require that
sequences be gap-free. Similarly here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-26 12:10:38
Message-ID: 1288095038.1587.230.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2010-10-25 at 16:58 -0400, Robert Haas wrote:
> On Mon, Oct 25, 2010 at 4:10 PM, Greg Stark <gsstark(at)mit(dot)edu> wrote:
> > On Mon, Oct 25, 2010 at 12:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> Now, as Greg says, that might be what some people want, but it's
> >> certainly monumentally unserializable.
> >
> > To be clear when I said it's what people want what I meant was that in
> > the common cases it's doing exactly what people want. As opposed to
> > getting closer to what people want in general but not quite hitting
> > the mark in the common cases.
> >
> > Just as an example I think it's important that in the simplest case,
> > upsert of a single record, it be 100% guaranteed to do the naive
> > upsert. If two users are doing the merge of a single key at the same
> > time one of them had better insert and one of them had better update
> > or else users are going to be monumentally surprised.
>
> Hmm, so let's think about that case.
>
> The first merge comes along and finds no match so it fires the NOT
> MATCHED rule, which inserts a tuple. The second merge comes along and
> finds no match, so it also fires the NOT MATCHED rule and tries to
> insert a tuple. But upon consulting the PRIMARY KEY index it finds
> that an in-doubt tuple exists so it goes to sleep waiting for the
> first transaction to commit or abort. If the first transaction
> commits it then decides that the jig is up and fails. We could
> (maybe) fix this by doing something similar to what EPQ does for
> updates: when the first transaction commits, instead of redoing the
> insert, we back up and recheck whether the new tuple would have
> matched the join clause and, if so, we instead fire the MATCHED action
> on the updated tuple. If not, we fire NOT MATCHED anyway. I'm not
> sure how hard that would be, or whether it would introduce any other
> nasty anomalies in more complex cases.
>
> Alternatively, we could introduce an UPSERT or REPLACE statement
> intended to handle exactly this case and leave MERGE for more complex
> situations. It's pretty easy to imagine what the coding of that
> should look like: if we encounter an in-doubt tuple in we wait on its
> xmin. If the transaction aborts, we insert. If it commits, and we're
> in READ COMMITTED mode, we update it; but if we're in REPEATABLE READ
> or SERIALIZABLE mode, we abort with a serialization error. That's a
> lot simpler to understand and reason about than MERGE in its full
> generality.
>
> I think it's pretty much hopeless to think that MERGE is going to work
> in complex concurrent scenarios without creating serialization
> anomalies, or at least rollbacks. I think that's baked into the
> nature of what the statement does. To simulate MERGE, you need to
> read from the target table and then do writes that depend on what you
> read. If you do that with the commands that are available today,
> you're going to get serialization anomalies and/or rollbacks under
> concurrency. The mere fact of that logic being inside the database
> rather than outside isn't going to make that go away. Now sometimes,
> as with exclusion constraints, you can play games with dirty snapshots
> to get the semantics you want, but whether that's possible in a
> particular case depends on the details of the operation being
> performed, and here I think it can't be done. Some operations are
> *fundamentally* unserializable.

I agree with your analysis. Let me review...

There is a case that locking alone won't resolve, however that locking
works. The transaction history for that is:

T1: takes snapshot
T2: INSERT row1
T2: COMMIT;
T1: attempts to determine if MATCHED or NOT MATCHED.

The answer is neither of those two answers. If we say it is NOT MATCHED
then we will just fail on any INSERT, or if we say it is MATCHED then
technically we can't see the row so the UPDATE should fail. The COMMIT
of T2 releases any locks that would have helped resolve this, and note
that even if T1 attempts to lock that row as early as possible, only a
table level lock would prevent T2 from doing that action.

Two options for resolution are

1) Throw SERIALIZABLE error

2) Implement something similar to EvalPlanQual
As you say, we already resolve this situation for concurrent updates by
following the update chain from a row that is visible to the latest row.
For MERGE the semantics need to be subtely different here: we need to
follow the update chain to the latest row, but from a row that we
*can't* see.

MERGE is still very useful without the need for 2), though fails in some
cases for concurrent use. The failure rate would increase as the number
of concurrent MERGErs and/or number of rows in source table. Those
errors are no more serious than are possible now.

So IMHO we should implement 1) now and come back later to implement 2).
Given right now there may be other issues with 2) it seems unsafe to
rely on the assumption that we'll fix them by end of release.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-26 20:08:32
Message-ID: AANLkTimXJe+XwjjVL+xuyNYvVJYxzF5MSACabMgNA3wH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> I agree with your analysis. Let me review...
> [review]

Sounds like we're on the same page.

> Two options for resolution are
>
> 1) Throw SERIALIZABLE error
>
> 2) Implement something similar to EvalPlanQual
> As you say, we already resolve this situation for concurrent updates by
> following the update chain from a row that is visible to the latest row.
> For MERGE the semantics need to be subtely different here: we need to
> follow the update chain to the latest row, but from a row that we
> *can't* see.
>
> MERGE is still very useful without the need for 2), though fails in some
> cases for concurrent use. The failure rate would increase as the number
> of concurrent MERGErs and/or number of rows in source table. Those
> errors are no more serious than are possible now.
>
> So IMHO we should implement 1) now and come back later to implement 2).
> Given right now there may be other issues with 2) it seems unsafe to
> rely on the assumption that we'll fix them by end of release.

Yeah. In fact, I'm not sure we're ever going to want to implement #2
- I think that needs more study to determine whether there's even
something there that makes sense to implement at all. But certainly I
wouldn't count on it happening for 9.1.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-10-26 21:00:16
Message-ID: 1288126816.1587.407.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2010-10-26 at 16:08 -0400, Robert Haas wrote:
> On Tue, Oct 26, 2010 at 8:10 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > I agree with your analysis. Let me review...
> > [review]
>
> Sounds like we're on the same page.
>
> > Two options for resolution are
> >
> > 1) Throw SERIALIZABLE error
> >
> > 2) Implement something similar to EvalPlanQual
> > As you say, we already resolve this situation for concurrent updates by
> > following the update chain from a row that is visible to the latest row.
> > For MERGE the semantics need to be subtely different here: we need to
> > follow the update chain to the latest row, but from a row that we
> > *can't* see.
> >
> > MERGE is still very useful without the need for 2), though fails in some
> > cases for concurrent use. The failure rate would increase as the number
> > of concurrent MERGErs and/or number of rows in source table. Those
> > errors are no more serious than are possible now.
> >
> > So IMHO we should implement 1) now and come back later to implement 2).
> > Given right now there may be other issues with 2) it seems unsafe to
> > rely on the assumption that we'll fix them by end of release.
>
> Yeah. In fact, I'm not sure we're ever going to want to implement #2
> - I think that needs more study to determine whether there's even
> something there that makes sense to implement at all. But certainly I
> wouldn't count on it happening for 9.1.

2) sounds weird, until you realise it is exactly how you would need to
code a PL/pgSQL procedure to do the equivalent of MERGE. Not doing it
just makes no sense in the longer term. I agree it will take a while to
think it through in sufficient detail.

In the meantime it's a good argument for the ELSE construct at the end
of the WHEN clauses, so we can do something other than skip a row or
throw an ERROR.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Marko Tiikkaja <marko(dot)tiikkaja(at)cs(dot)helsinki(dot)fi>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: ask for review of MERGE
Date: 2010-11-12 18:04:03
Message-ID: 201011121804.oACI43o14633@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> > rhaas=# create table concurrent (x integer primary key);
> > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
> > "concurrent_pkey" for table "concurrent"
> > CREATE TABLE
> > rhaas=# insert into x values (1);
> > rhaas=# begin;
> > BEGIN
> > rhaas=# insert into concurrent values (2);
> > INSERT 0 1
> >
> > <switch to a different window>
> >
> > rhaas=# update concurrent set x=x where x=2;
> > UPDATE 0
>
> That surprised me. I would have thought that the INSERT would have
> created an "in doubt" tuple which would block the UPDATE. What is
> the reason for not doing so?

When Kevin gets surprised, I get worried. LOL

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +