MERGE SQL Statement

Lists: pgsql-hackers
From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: MERGE SQL Statement
Date: 2008-04-16 18:58:58
Message-ID: 1208372338.4259.202.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I've analysed various flavours of MERGE command to understand and
propose what we should use for PostgreSQL.

The results aren't what you'd expect from a quick flick through the
standard, so lets look at my main concerns:

1. The simplest syntax is for SQL:2003. The syntax for DB2, SQL Server
and Oracle is more complex, with SQL:2008(final draft) being very
similar to DB2 and SQL Server, so unlikely to be a point of contention
in the standard. I suggest we go with the latter, but yes, its still in
draft (yawn).

2. MySQL and Teradata have their own syntax for the row-oriented Upsert
operation. Both of those are more useful (IMHO) than MERGE for OLTP
apps, while MERGE is very useful for bulk data loads. I'm open to the
idea that we do something like this in addition to MERGE.

3. The way MERGE works is to define a left outer join between source and
target, then specify a series of WHEN clauses that may or may not apply.
It **isn't** just a simple Update/Insert and so much of what we have
discussed previously goes straight in the trash. AFAICS the way it is
specified to work it would be fairly straightforward to cause race
conditions and failures when using multiple concurrent MERGE statements.

General Example of the recommended syntax for PostgreSQL

MERGE INTO Stock S /* target */

USING DailySales DS /* source table */

ON S.Item = DS.Item /* left outer join source to target */

WHEN MATCHED AND (QtyOnHand - QtySold = 0) THEN

/* delete item if no stock remaining */
DELETE

WHEN MATCHED THEN /* No AND clause, so drop thru */

/* update value if some remains */
UPDATE SET QtyOnHand = QtyOnHand - QtySold

WHEN NOT MATCHED THEN

/* insert a row if the stock is new */
INSERT VALUES (Item, QtySold)
;

So lets look at the syntaxes and then review how it might work.

SYNTAX
======
SQL:2003
--------
MERGE INTO target [AS correlation-name]
USING [table-ref | subquery]
ON <search-condition>
[WHEN MATCHED THEN MergeUpdate]
[WHEN NOT MATCHED THEN MergeInsert]

Oracle 11g
----------
MERGE INTO target [AS correlation-name]
USING [table-ref | subquery]
ON <search-condition>
[WHEN MATCHED THEN MergeUpdate
WHERE <where-clause> DELETE WHERE <where-clause>]
[WHEN NOT MATCHED THEN MergeInsert
WHERE <where-clause>]

Differences from SQL:2003 are
* Update and Insert have WHERE clauses on them
* Oracle allows multiple WHEN ... WHERE clauses
* Oracle allows an error logging clause also
* optional DELETE statement as part of the UPDATE, so you can only
DELETE what you update (yeh, really)
* WHEN MATCHED/WHEN NOT MATCHED must be in fixed order, only

IBM DB2
-------
MERGE INTO target [AS correlation-name]
USING [table-ref | subquery]
ON <search-condition>
[WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete]
[WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert | SignalClause]
[ELSE IGNORE]

Differences from SQL:2003 are
* Update and Insert have AND clauses on them (like WHERE...)
* DB2 allows multiple WHEN ... AND clauses
* DELETE is also a full-strength option, not part of the MergeUpdate
clause as it is in Oracle
* DB2 allows a SIGNAL statement, similar to RAISE
* ELSE IGNORE is an optional syntax, which does nothing

SQL Server 2008
---------------

MERGE [INTO] target [AS correlation-name]
USING [table-ref | subquery]
ON <search-condition>
[WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete]
[WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert]

Differences from SQL:2003 are
* Update and Insert have AND clauses on them (like WHERE...)
* DB2 allows multiple WHEN ... AND clauses
* DELETE is also a full-strength option, not part of the MergeUpdate
clause as it is in Oracle

SQL:2008
--------

MERGE INTO target [AS correlation-name]
USING [table-ref | subquery]
ON <search-condition>
[WHEN MATCHED [AND <where-clause>] THEN MergeUpdate]
[WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert]

Differences from SQL:2003 are
* Update and Insert have AND clauses on them (like WHERE...)
* Allows multiple WHEN ... AND clauses

Alternate Syntax
----------------

MySQL supports
* REPLACE INTO
* INSERT ... ON DUPLICATE KEY UPDATE ...

Teradata supports
* UPDATE ... ELSE INSERT ...
* MERGE with an additional error logging clause

The Teradata and Oracle error logging clauses are very cute and I
suggest we do something similar for COPY, at least.

Proposed Syntax for PostgreSQL
==============================

MERGE INTO table [[AS] alias]
USING [table-ref | query]
ON join-condition
[WHEN MATCHED [AND condition] THEN MergeUpdate | DELETE]
[WHEN NOT MATCHED [AND condition] THEN MergeInsert]

MergeUpdate is
UPDATE SET { column = { expression | DEFAULT } |
( column [, ...] ) = ( { expression | DEFAULT } [, ...] ) }
[, ...]
(yes, there is no WHERE clause here)

MergeInsert is
INSERT [ ( column [, ...] ) ]
{ DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] )
[, ...]}
(no subquery allowed)

Notes and behaviours
--------------------

* It is possible for concurrent MERGE statements to cause duplicate
INSERT violations because of a race condition between when we check
whether the row is matching/not matching and when we apply the
appropriate WHEN clause, if any. This is just the same as what we do now
with try-UPDATE-then-INSERT logic. (This seems to end the discussion
about whether we do inserts/updates first because the matching test is
always performed before we take the action; or perhaps it means we don't
like MERGE as much as we did before and would prefer alternate
syntaxes...). Maybe we could avoid some problems by applying
heap_lock_tuple() to each matched row, so we know it will stay matched
while we evaluate the WHEN clauses? Maybe not.

* USING query can be a VALUES clause if we wish to do single/few row
operations, so MERGE can be used for bulk-loading and OLTP

* There is no RETURNING option for MERGE, nor for any INSERT/UPDATE
sub-clauses
* WHERE CURRENT OF cursor is not supported anywhere
* The join can't be recursive, so no WITH support (common expressions,
i.e. non-recursive WITH are supported by SQLServer 2008)
* conditions would not allow sub-selects

* MERGE would work on base tables only, just like COPY
* Changes are made only to that single table
* Cannot update a column mentioned in the ON clause cos that would make
my head hurt too much

* MERGE will perform a left outer join between source on left and target
on right. There must be no more than 1 row from table-ref for each row
in the table. Each row in the table can only be updated once during each
MERGE statement. Each non-matching row in the table-ref will result in
one INSERT into table.

* WHEN clauses are tested in the order specified. If the AND condition
returns false then we skip onto the next WHEN clause. We stop once a
WHEN clause activates, so only *one* action is ever activated for each
row.

* AND clauses need not form a complete set, i.e. it is possible that no
action will result. It is also possible that some WHEN clauses will
never activate because of the execution order; we would not try to
prevent this, just document it as a possible user error.

* MERGE will respect Triggers, but not Rules since the rules behaviour
is roughly orthogonal to the WHEN clauses
* MERGE fires UPDATE and INSERT triggers according to which WHEN clause
is activated (if any)

* It's unclear whether MERGE should activate statement-level triggers,
or not. Few of the above sources are explicit either way on this point.
DB2 always runs both UPDATE and INSERT statement-level triggers, whether
or not rows have been changed; I would suggest we do that also for
before triggers. For after statement triggers I would suggest that we
track how many updates and inserts are caused and if updates > 0 then we
activate the after statement for update triggers, and if inserts > 0
then we activate the after statement for insert triggers. If a statement
level trigger is activated by both update and insert then it would be
possible for both TRIGGER_FIRED_BY_UPDATE() and
TRIGGER_FIRED_BY_DELETE() to be true (for statement level triggers
only), which would be a change from what we do now, even if the old
behaviour was not explicitly mutually exclusive. In both cases I suggest
we run Update triggers before Insert triggers consistently for both
before and after statement triggers.

* The number of rows changed should be (inserts + updates) which should
be < number of rows returned by table-ref. It would be good to get
access to the number of rows inserted and updated, so I propose that we
return a NOTICE statement with this information.

* The way MERGE is specified, the internals design seems to fall out
fairly clearly:
- set up a left outer join between source and target, with a junk
attribute that might be NULL to indicate NOT MATCHED. Let planner
optimise that as it wishes, nothing special needed
- in ExecutePlan have a new operation type of CMD_MERGE, which then
allows us to apply the WHEN clauses to decide what kind of final
operation will result for that tuple (U, D or I)

I'm planning to agree the syntax and write regression tests first, so we
all agree the behaviour we are aiming towards.

Thoughts? (Apart from jokes about having a WHY clause...)

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: MERGE SQL Statement
Date: 2008-04-17 02:18:53
Message-ID: 200804162218.53963.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 16 April 2008 14:58, Simon Riggs wrote:
> I've analysed various flavours of MERGE command to understand and
> propose what we should use for PostgreSQL.
>
> The results aren't what you'd expect from a quick flick through the
> standard, so lets look at my main concerns:
>
> 1. The simplest syntax is for SQL:2003. The syntax for DB2, SQL Server
> and Oracle is more complex, with SQL:2008(final draft) being very
> similar to DB2 and SQL Server, so unlikely to be a point of contention
> in the standard. I suggest we go with the latter, but yes, its still in
> draft (yawn).
>
> 2. MySQL and Teradata have their own syntax for the row-oriented Upsert
> operation. Both of those are more useful (IMHO) than MERGE for OLTP
> apps, while MERGE is very useful for bulk data loads. I'm open to the
> idea that we do something like this in addition to MERGE.
>
> 3. The way MERGE works is to define a left outer join between source and
> target, then specify a series of WHEN clauses that may or may not apply.
> It **isn't** just a simple Update/Insert and so much of what we have
> discussed previously goes straight in the trash. AFAICS the way it is
> specified to work it would be fairly straightforward to cause race
> conditions and failures when using multiple concurrent MERGE statements.
>
> General Example of the recommended syntax for PostgreSQL
>
> MERGE INTO Stock S /* target */
>
> USING DailySales DS /* source table */
>
> ON S.Item = DS.Item /* left outer join source to target */
>
> WHEN MATCHED AND (QtyOnHand - QtySold = 0) THEN
>
> /* delete item if no stock remaining */
> DELETE
>
> WHEN MATCHED THEN /* No AND clause, so drop thru */
>
> /* update value if some remains */
> UPDATE SET QtyOnHand = QtyOnHand - QtySold
>
> WHEN NOT MATCHED THEN
>
> /* insert a row if the stock is new */
> INSERT VALUES (Item, QtySold)
> ;
>
> So lets look at the syntaxes and then review how it might work.
>
> SYNTAX
> ======
> SQL:2003
> --------
> MERGE INTO target [AS correlation-name]
> USING [table-ref | subquery]
> ON <search-condition>
> [WHEN MATCHED THEN MergeUpdate]
> [WHEN NOT MATCHED THEN MergeInsert]
>
> Oracle 11g
> ----------
> MERGE INTO target [AS correlation-name]
> USING [table-ref | subquery]
> ON <search-condition>
> [WHEN MATCHED THEN MergeUpdate
> WHERE <where-clause> DELETE WHERE <where-clause>]
> [WHEN NOT MATCHED THEN MergeInsert
> WHERE <where-clause>]
>
> Differences from SQL:2003 are
> * Update and Insert have WHERE clauses on them
> * Oracle allows multiple WHEN ... WHERE clauses
> * Oracle allows an error logging clause also
> * optional DELETE statement as part of the UPDATE, so you can only
> DELETE what you update (yeh, really)
> * WHEN MATCHED/WHEN NOT MATCHED must be in fixed order, only
>
> IBM DB2
> -------
> MERGE INTO target [AS correlation-name]
> USING [table-ref | subquery]
> ON <search-condition>
> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete]
> [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert | SignalClause]
> [ELSE IGNORE]
>
> Differences from SQL:2003 are
> * Update and Insert have AND clauses on them (like WHERE...)
> * DB2 allows multiple WHEN ... AND clauses
> * DELETE is also a full-strength option, not part of the MergeUpdate
> clause as it is in Oracle
> * DB2 allows a SIGNAL statement, similar to RAISE
> * ELSE IGNORE is an optional syntax, which does nothing
>
> SQL Server 2008
> ---------------
>
> MERGE [INTO] target [AS correlation-name]
> USING [table-ref | subquery]
> ON <search-condition>
> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate | MergeDelete]
> [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert]
>
> Differences from SQL:2003 are
> * Update and Insert have AND clauses on them (like WHERE...)
> * DB2 allows multiple WHEN ... AND clauses
> * DELETE is also a full-strength option, not part of the MergeUpdate
> clause as it is in Oracle
>
> SQL:2008
> --------
>
> MERGE INTO target [AS correlation-name]
> USING [table-ref | subquery]
> ON <search-condition>
> [WHEN MATCHED [AND <where-clause>] THEN MergeUpdate]
> [WHEN NOT MATCHED [AND <where-clause>] THEN MergeInsert]
>
> Differences from SQL:2003 are
> * Update and Insert have AND clauses on them (like WHERE...)
> * Allows multiple WHEN ... AND clauses
>
> Alternate Syntax
> ----------------
>
> MySQL supports
> * REPLACE INTO
> * INSERT ... ON DUPLICATE KEY UPDATE ...
>
> Teradata supports
> * UPDATE ... ELSE INSERT ...
> * MERGE with an additional error logging clause
>
> The Teradata and Oracle error logging clauses are very cute and I
> suggest we do something similar for COPY, at least.
>
> Proposed Syntax for PostgreSQL
> ==============================
>
> MERGE INTO table [[AS] alias]
> USING [table-ref | query]
> ON join-condition
> [WHEN MATCHED [AND condition] THEN MergeUpdate | DELETE]
> [WHEN NOT MATCHED [AND condition] THEN MergeInsert]
>
> MergeUpdate is
> UPDATE SET { column = { expression | DEFAULT } |
> ( column [, ...] ) = ( { expression | DEFAULT } [, ...] ) }
> [, ...]
> (yes, there is no WHERE clause here)
>
> MergeInsert is
> INSERT [ ( column [, ...] ) ]
> { DEFAULT VALUES | VALUES ( { expression | DEFAULT } [, ...] )
> [, ...]}
> (no subquery allowed)
>
>
> Notes and behaviours
> --------------------
>
> * It is possible for concurrent MERGE statements to cause duplicate
> INSERT violations because of a race condition between when we check
> whether the row is matching/not matching and when we apply the
> appropriate WHEN clause, if any. This is just the same as what we do now
> with try-UPDATE-then-INSERT logic. (This seems to end the discussion
> about whether we do inserts/updates first because the matching test is
> always performed before we take the action; or perhaps it means we don't
> like MERGE as much as we did before and would prefer alternate
> syntaxes...). Maybe we could avoid some problems by applying
> heap_lock_tuple() to each matched row, so we know it will stay matched
> while we evaluate the WHEN clauses? Maybe not.
>

I would have thought that MVCC + the command taking a row lock on each tuple
that matched a update/delete case would protect us from many race
scenarios... at least as protected as we are now. ISTM you can't protect well
for INSERT operations.

> * USING query can be a VALUES clause if we wish to do single/few row
> operations, so MERGE can be used for bulk-loading and OLTP
>
> * There is no RETURNING option for MERGE, nor for any INSERT/UPDATE
> sub-clauses

Is there a reason for this? A returning for any insert/updated rows would be
great, especially if your doing single row merges via the values clause.

> * WHERE CURRENT OF cursor is not supported anywhere
> * The join can't be recursive, so no WITH support (common expressions,
> i.e. non-recursive WITH are supported by SQLServer 2008)
> * conditions would not allow sub-selects
>

same question on both of these, is there some technical reason for this? I'd
imagine people would like both of these options.

> * MERGE would work on base tables only, just like COPY
> * Changes are made only to that single table
> * Cannot update a column mentioned in the ON clause cos that would make
> my head hurt too much
>

Hmm... if the matching test is always done first, istm you could then update
the columns that matched on, since you don't really care about the value of
the on column once you have the row.

> * MERGE will perform a left outer join between source on left and target
> on right. There must be no more than 1 row from table-ref for each row
> in the table. Each row in the table can only be updated once during each
> MERGE statement. Each non-matching row in the table-ref will result in
> one INSERT into table.
>
> * WHEN clauses are tested in the order specified. If the AND condition
> returns false then we skip onto the next WHEN clause. We stop once a
> WHEN clause activates, so only *one* action is ever activated for each
> row.
>
> * AND clauses need not form a complete set, i.e. it is possible that no
> action will result. It is also possible that some WHEN clauses will
> never activate because of the execution order; we would not try to
> prevent this, just document it as a possible user error.
>

Just curious if any of these behaviors come from the spec? or maybe from
other databases? they don't seem unreasonable in general though.

> * MERGE will respect Triggers, but not Rules since the rules behaviour
> is roughly orthogonal to the WHEN clauses

Should there be a new rule option? ie. ON MERGE rules ?

> * MERGE fires UPDATE and INSERT triggers according to which WHEN clause
> is activated (if any)
>
> * It's unclear whether MERGE should activate statement-level triggers,
> or not. Few of the above sources are explicit either way on this point.
> DB2 always runs both UPDATE and INSERT statement-level triggers, whether
> or not rows have been changed; I would suggest we do that also for
> before triggers. For after statement triggers I would suggest that we
> track how many updates and inserts are caused and if updates > 0 then we
> activate the after statement for update triggers, and if inserts > 0
> then we activate the after statement for insert triggers. If a statement
> level trigger is activated by both update and insert then it would be
> possible for both TRIGGER_FIRED_BY_UPDATE() and
> TRIGGER_FIRED_BY_DELETE() to be true (for statement level triggers
> only), which would be a change from what we do now, even if the old
> behaviour was not explicitly mutually exclusive. In both cases I suggest
> we run Update triggers before Insert triggers consistently for both
> before and after statement triggers.
>

It would seem wierd that a before update statement level trigger would fire
and not the matching after update statement level trigger. I can probably
live with this behavior though, but seems like a note worth documenting.

> * The number of rows changed should be (inserts + updates) which should
> be < number of rows returned by table-ref. It would be good to get
> access to the number of rows inserted and updated, so I propose that we
> return a NOTICE statement with this information.
>

How do you handle deletes? IE. If I merge two tables and I end up with no
updates or inserts but 100 deletes, is the number of affected rows 0 ?

> * The way MERGE is specified, the internals design seems to fall out
> fairly clearly:
> - set up a left outer join between source and target, with a junk
> attribute that might be NULL to indicate NOT MATCHED. Let planner
> optimise that as it wishes, nothing special needed
> - in ExecutePlan have a new operation type of CMD_MERGE, which then
> allows us to apply the WHEN clauses to decide what kind of final
> operation will result for that tuple (U, D or I)
>
> I'm planning to agree the syntax and write regression tests first, so we
> all agree the behaviour we are aiming towards.
>
> Thoughts? (Apart from jokes about having a WHY clause...)
>

Nice work, hope my comments will be helpful.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE SQL Statement
Date: 2008-04-17 06:31:32
Message-ID: 1208413892.4259.244.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2008-04-16 at 22:18 -0400, Robert Treat wrote:

> > * USING query can be a VALUES clause if we wish to do single/few row
> > operations, so MERGE can be used for bulk-loading and OLTP
> >
> > * There is no RETURNING option for MERGE, nor for any INSERT/UPDATE
> > sub-clauses
>
> Is there a reason for this? A returning for any insert/updated rows would be
> great, especially if your doing single row merges via the values clause.

It's non-standard, is the first. Second would be complexity and third
would be difficulty in doing this usefully since you'd probably also
want to know which WHEN clause fired and whether the result for that row
was an I, U or D. Perhaps a later extension, but certainly not first
shot.

> > * WHERE CURRENT OF cursor is not supported anywhere
> > * The join can't be recursive, so no WITH support (common expressions,
> > i.e. non-recursive WITH are supported by SQLServer 2008)
> > * conditions would not allow sub-selects
> >
>
> same question on both of these, is there some technical reason for this? I'd
> imagine people would like both of these options.
>
> > * MERGE would work on base tables only, just like COPY
> > * Changes are made only to that single table
> > * Cannot update a column mentioned in the ON clause cos that would make
> > my head hurt too much
> >
>
> Hmm... if the matching test is always done first, istm you could then update
> the columns that matched on, since you don't really care about the value of
> the on column once you have the row.

I think you can update all columns without difficulty.

> > * MERGE will perform a left outer join between source on left and target
> > on right. There must be no more than 1 row from table-ref for each row
> > in the table. Each row in the table can only be updated once during each
> > MERGE statement. Each non-matching row in the table-ref will result in
> > one INSERT into table.
> >
> > * WHEN clauses are tested in the order specified. If the AND condition
> > returns false then we skip onto the next WHEN clause. We stop once a
> > WHEN clause activates, so only *one* action is ever activated for each
> > row.
> >
> > * AND clauses need not form a complete set, i.e. it is possible that no
> > action will result. It is also possible that some WHEN clauses will
> > never activate because of the execution order; we would not try to
> > prevent this, just document it as a possible user error.
> >
>
> Just curious if any of these behaviors come from the spec? or maybe from
> other databases? they don't seem unreasonable in general though.

First two come from spec following clarifications from other
implementations. The last point about the AND clauses not necessarily
covering 100% of cases follows from the other points.

> > * MERGE will respect Triggers, but not Rules since the rules behaviour
> > is roughly orthogonal to the WHEN clauses
>
> Should there be a new rule option? ie. ON MERGE rules ?

Maybe, but not as part of this project.

> > * MERGE fires UPDATE and INSERT triggers according to which WHEN clause
> > is activated (if any)
> >
> > * It's unclear whether MERGE should activate statement-level triggers,
> > or not. Few of the above sources are explicit either way on this point.
> > DB2 always runs both UPDATE and INSERT statement-level triggers, whether
> > or not rows have been changed; I would suggest we do that also for
> > before triggers. For after statement triggers I would suggest that we
> > track how many updates and inserts are caused and if updates > 0 then we
> > activate the after statement for update triggers, and if inserts > 0
> > then we activate the after statement for insert triggers. If a statement
> > level trigger is activated by both update and insert then it would be
> > possible for both TRIGGER_FIRED_BY_UPDATE() and
> > TRIGGER_FIRED_BY_DELETE() to be true (for statement level triggers
> > only), which would be a change from what we do now, even if the old
> > behaviour was not explicitly mutually exclusive. In both cases I suggest
> > we run Update triggers before Insert triggers consistently for both
> > before and after statement triggers.
> >
>
> It would seem wierd that a before update statement level trigger would fire
> and not the matching after update statement level trigger. I can probably
> live with this behavior though, but seems like a note worth documenting.

Agreed. I think this aspect can be re-visited when we see all of the
test cases.

> > * The number of rows changed should be (inserts + updates) which should
> > be < number of rows returned by table-ref. It would be good to get
> > access to the number of rows inserted and updated, so I propose that we
> > return a NOTICE statement with this information.
> >
>
> How do you handle deletes? IE. If I merge two tables and I end up with no
> updates or inserts but 100 deletes, is the number of affected rows 0 ?

No, 100. I meant: add those as well.

> Nice work, hope my comments will be helpful.

Thanks; yes.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE SQL Statement
Date: 2008-04-17 14:15:21
Message-ID: 1208441721.4259.304.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 2008-04-17 at 07:31 +0100, Simon Riggs wrote:

> > > * MERGE will perform a left outer join between source on left and target
> > > on right. There must be no more than 1 row from table-ref for each row
> > > in the table. Each row in the table can only be updated once during each
> > > MERGE statement. Each non-matching row in the table-ref will result in
> > > one INSERT into table.
> > >
> > > * WHEN clauses are tested in the order specified. If the AND condition
> > > returns false then we skip onto the next WHEN clause. We stop once a
> > > WHEN clause activates, so only *one* action is ever activated for each
> > > row.
> > >
> > > * AND clauses need not form a complete set, i.e. it is possible that no
> > > action will result. It is also possible that some WHEN clauses will
> > > never activate because of the execution order; we would not try to
> > > prevent this, just document it as a possible user error.
> > >
> >
> > Just curious if any of these behaviors come from the spec? or maybe from
> > other databases? they don't seem unreasonable in general though.
>
> First two come from spec following clarifications from other
> implementations. The last point about the AND clauses not necessarily
> covering 100% of cases follows from the other points.

No, it looks like I missed one line in the standard.

I've spent all day analysing this and writing up test cases, so this
next bit isn't really a reply to you Robert, just lots of additional
detail on this particular area.

In summary, the standard requires us to

1. If MATCHED to more than one source row we are supposed to throw an
error "cardinality violation". However, if there is no target row, yet
would have generated multiple rows had there been one, we do *not* throw
an error.

2. If the source row doesn't match any target row AND there is no WHEN
NOT MATCHING clause that applies, we are then supposed to implicitly
perform an INSERT with DEFAULT VALUES.

Neither of these seem particularly useful behaviours in *all* cases.

For 1, it implies we would need to sort and de-duplicate the output so
that we can throw an error in *all* cases. We would need to do this
because its fairly hard to determine that the set formed from the union
of all WHEN clauses does not cover the universal set. (i.e. its possible
to write a statement that doesn't have a WHEN clause that applies).

If we try to update a row that just got updated we need to throw an
error, otherwise we may allow the Update Halloween problem. So we will
effectively detect this error in most cases anyway.

So for 1, I suggest we don't throw the explicit error as mentioned in
the standard and allow the secondary update error to provide 99% of
required errors. In the case where there was a matching row but no WHEN
clause tat applies, we simply ignore that matching row.

For 2, DB2 allows an additional statement ELSE IGNORE. I suggest we
support the standard for (2), yet also provide an additional option
WHEN [NOT] MATCHING THEN
DO NOTHING

which would simply drop the matching or non-matching row and continue
with the next tuple.

My first attempt at a MERGE test case, with simulated output (no, I
haven't written it yet) is attached.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

Attachment Content-Type Size
merge.out text/x-vhdl 6.1 KB

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE SQL Statement
Date: 2008-04-17 17:38:54
Message-ID: 60od88qt3l.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

simon(at)2ndquadrant(dot)com (Simon Riggs) writes:
>> Should there be a new rule option? ie. ON MERGE rules ?
>
> Maybe, but not as part of this project.

That seems to warrant a bit of elaboration...

If we're running a MERGE, and it performs an INSERT or UPDATE of a
particular tuple in(to) a particular table, will it fire the ON
INSERT/ON UPDATE trigger? I'd hope so...
--
let name="cbbrowne" and tld="cbbrowne.com" in String.concat "@" [name;tld];;
http://linuxfinances.info/info/multiplexor.html
Canada, Mexico, and Australia form the Axis of Nations That Are
Actually Quite Nice But Secretly Have Nasty Thoughts About America