Re: MERGE Specification

From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE Specification
Date: 2010-08-15 07:31:02
Message-ID: AANLkTi=6QTheKcBPuMzPm2KHMeH6pxP6xmY6PdTSzAsd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> On 11/08/10 11:11, Boxuan Zhai wrote:
>
>> The new patch is done. I named it as merge_v102. (1 means it is the
>> non-inheritance merge command, 02 means this is the second time of fixing
>> reported bugs)
>>
>
> Thanks. I went through this, fixing all the spurious end-of-line whitespace
> first with "git apply --whitespace=fix", and then manually fixing comment
> and whitespace style, typos, and doing some other small comment editing.
> Resulting patch attached. It is also available at my git repository at
> git://git.postgresql.org/git/users/heikki/postgres.git, branch
> 'mergestmt'. Please base any further patch versions on this patch, so that
> we don't need to redo this cleanup.
>
>
Thanks for the cleanup. The codes looks better now. My problem is that I
have done some more modifications after merge_v102. Is there any way to
apply the cleanup patch without erasing my new codes?

> I'll continue reviewing this sometime next week, but here's few
> miscellaneous issues for starters;
>
> * Explain output of actions needs some work:
>
> > Merge (cost=246.37..272.96 rows=1770 width=38)
> > ACTION: UPDATE WHEN NOT MATCHED
> > ACTION: INSERT WHEN NOT MATCHED
> > -> Merge Left Join (cost=246.37..272.96 rows=1770 width=38)
>
> Should print out more details of the action, like for normal
> updates/inserts/deletes.

Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
details than what we have here except the costs, rows and width, which is
print out at the MERGE command line.

For example:

Explain
update buy set volume = volume + 1;
QUERY PLAN
--------------------------------------------------------------
Update (cost=0.00..36.75 rows=2140 width=14)
-> Seq Scan on buy (cost=0.00..36.75 rows=2140 width=14)
(2 rows)

For the explanation of an UPDATE command, we only have a Update title
followed by the costs, then, after the arrow -> there comes the plan tree.
In a MERGE command, no action has its real private plan. They all share the
main plan. Thus, the costs for a merge command is that of the main plan.
What is useful for a merge action is only the target list and quals. So my
design is to show the merge command costs in first line. Then print out the
actions and their qualifications in a list, followed by the main plan tree.

Is there any other thing you suggest to print out for each action?

> And all uppercase doesn't fit the surrounding style.
>

This will be fixed.

> * Is the behavior now SQL-compliant? We had long discussions about the
> default action being insert or do nothing or raise error, but I never got a
> clear picture of what the SQL spec says and whether we're compliant.
>
> * What does the "one tuple is error" notice mean? We'll have to do
> something about that.. I don't think we've properly thought through the
> meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
> only a few dozen lines to put back when we know what to do about it.
>
I find that we have not reached an agreement on MERGE's syntax yet.
Personally, I support Simon's idea, that is the default action should be
RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is
error"); for this situation.

I leave this for further extension when a more specific design for RAISE
ERROR is available.

Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
you want me to chage the default action back to DO NOTHING? Or any other
suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE"
clause as the end of the action list which forces the user to specify the
default action for merge).

> * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
> be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?
>
>

I need one flag in these statement to differentiate them from normal
InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
process of these two kinds of statements in the transfromStmt() function. I
define a set of alias nodes which have different nodetags. This can make
the code simpler.

> * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
> InsertStmt?
>

Ah, I add this function because I want to have a look of the content of
DeleteStmt. I did this long ago, just after I started the project. If you
don't want this function, I will remove it.

> * I wonder if it would be simpler to check the fact that you can only have
> UPDATE/DELETE as a WHEN MATCHED action and only INSERT as a WHEN NOT MATCHED
> action directly in the grammar?
>
>
We can do this, but, I think it is better to keep it as it is now.

> * Regarding this speculation in the MergeStmt grammar rule:
> > /*although we have only one USING table,
> > we still make it a list, maybe in future
> > we will allow multiple USING tables.*/
>
> I wonder what the semantics of having multiple USING tables would be? If it
> would be like "USING (SELECT * FROM a UNION ALL SELECT * FROM b)", then we
> don't ever need it because you can already achieve it with that subquery. If
> it's something like "USING (SELECT * FROM a,b WHERE ...)", then we again
> don't need it because you can write that instead. So I think we should give
> up on the notion that source can be a list of tables, and simplify the code
> everywhere accordingly.
>
>
Yes, we should give up it. I was considering to allow the user input
multiple source tables as a short way of "USING (SELECT * FROM a,b WHERE
...)". Now, this idea seems not interesting at all.

> * Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
> every time, should set flags at plan time to mark what kind of actions there
> is in the statement.

First of all, I need to say that the functions of ExecBSMergeTriggers() and
ExecASMergeTriggers() are for per-statement triggers, and are invoked only
once in the whole process of a MERGE command. Setting flags at plan time can
save the scanning. I may add it to next patch. But don't expect it save
much execution time.

> Or should we defer firing the statement triggers until we hit the first
> matching row and execute the action for the first time?
>

No, we cannot do this. These are Per-statement triggers. They should be
fired before/after the main plan is executed. The statement triggers are
fired even no tuple is really matched with the actions.

> * Do we need the 'replaced' field? Could you just replace the action with a
> DO NOTHING action instead.
>
>
Yes, I think it is a good idea to replace them with DO NOTHING. The replaced
field was there before DO NOTHING is added to the system. But since we
have DO NOTING action now, we can turn all the actions replaced by
INSTEAD rules into DO NOTINGs.

> * This crashes:
>
> postgres=# CREATE TABLE target AS (SELECT 1 AS id);
> SELECT 1
> postgres=# MERGE into target t
> USING (select 1 AS id) AS s
>
> ON t.id = s.id
> WHEN MATCHED THEN
> UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
> ;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
>
>
>

Oh, a really serious bug. I have not considered this case (user series
generator in actions) before. I will see what I can do for it.

>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2010-08-15 07:56:50 Re: MERGE Specification
Previous Message Alvaro Herrera 2010-08-15 03:00:00 Re: MERGE Specification