Re: MERGE Specification

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MERGE Specification
Date: 2010-08-15 07:56:50
Message-ID: 4C679DC2.2020508@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15/08/10 10:31, Boxuan Zhai wrote:
> On Sun, Aug 15, 2010 at 4:05 AM, Heikki Linnakangas<
> heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> Thanks. I went through this, fixing all the spurious end-of-line whitespace
>> first with "git apply --whitespace=fix", and then manually fixing comment
>> and whitespace style, typos, and doing some other small comment editing.
>> Resulting patch attached. It is also available at my git repository at
>> git://git.postgresql.org/git/users/heikki/postgres.git, branch
>> 'mergestmt'. Please base any further patch versions on this patch, so that
>> we don't need to redo this cleanup.
>>
> Thanks for the cleanup. The codes looks better now. My problem is that I
> have done some more modifications after merge_v102. Is there any way to
> apply the cleanup patch without erasing my new codes?

Yeah, there's multiple ways. You can create a patch of what you have
now, and run interdiff against your previous patch that I worked
against. That gives you a diff of changes since the last patch you sent
to the list. You can then apply that patch to the codetree from my git
repository.

Or you can just generate a new patch of what you have as usual, and I'll
incorporate the changes to the cleaned-up patch.

>> I'll continue reviewing this sometime next week, but here's few
>> miscellaneous issues for starters;
>>
>> * Explain output of actions needs some work:
>>
>>> Merge (cost=246.37..272.96 rows=1770 width=38)
>>> ACTION: UPDATE WHEN NOT MATCHED
>>> ACTION: INSERT WHEN NOT MATCHED
>>> -> Merge Left Join (cost=246.37..272.96 rows=1770 width=38)
>>
>> Should print out more details of the action, like for normal
>> updates/inserts/deletes.
>
>
> Well, I think, in normal UPDATE/INSERT/DELETE explanation, there are no more
> details than what we have here except the costs, rows and width, which is
> print out at the MERGE command line.
>
> For example:
>
> Explain
> update buy set volume = volume + 1;
> QUERY PLAN
> --------------------------------------------------------------
> Update (cost=0.00..36.75 rows=2140 width=14)
> -> Seq Scan on buy (cost=0.00..36.75 rows=2140 width=14)
> (2 rows)
>
> For the explanation of an UPDATE command, we only have a Update title
> followed by the costs, then, after the arrow -> there comes the plan tree.
> In a MERGE command, no action has its real private plan. They all share the
> main plan. Thus, the costs for a merge command is that of the main plan.
> What is useful for a merge action is only the target list and quals. So my
> design is to show the merge command costs in first line. Then print out the
> actions and their qualifications in a list, followed by the main plan tree.

It's more more interesting with more complex statement:

postgres=# explain UPDATE target SET id = (SELECT COUNT(*) FROM
generate_series(1,10));
QUERY PLAN

--------------------------------------------------------------------------------------
Update (cost=12.51..52.52 rows=2400 width=6)
InitPlan 1 (returns $0)
-> Aggregate (cost=12.50..12.51 rows=1 width=0)
-> Function Scan on generate_series (cost=0.00..10.00
rows=1000 width=0)
-> Seq Scan on target (cost=0.00..40.00 rows=2400 width=6)
(5 rows)

> Is there any other thing you suggest to print out for each action?

It should match the output of a normal Update/Insert as closely as possible.

>> * Is the behavior now SQL-compliant? We had long discussions about the
>> default action being insert or do nothing or raise error, but I never got a
>> clear picture of what the SQL spec says and whether we're compliant.
>>
>> * What does the "one tuple is error" notice mean? We'll have to do
>> something about that.. I don't think we've properly thought through the
>> meaning of RAISE ERROR. Let's cut it out from the patch until then. It's
>> only a few dozen lines to put back when we know what to do about it.
>>
> I find that we have not reached an agreement on MERGE's syntax yet.
> Personally, I support Simon's idea, that is the default action should be
> RAISE ERROR. However, I am no sure what RAISE ERROR should do when we
> get a missing tuple. Here I just use a simple elog(NOTICE, "a tuple is
> error"); for this situation.
>
> I leave this for further extension when a more specific design for RAISE
> ERROR is available.
>
> Well, I have to say the current RAISE ERROR elog is a little bit ugly. Do
> you want me to chage the default action back to DO NOTHING? Or any other
> suggetions? (In fact, my personal thinking is to add a non-omissible "ELSE"
> clause as the end of the action list which forces the user to specify the
> default action for merge).

Whatever the spec says is what we should do.

>> * Do you need the MergeInsert/MergeUpdate/MergeDelete alias nodes? Would it
>> be simpler to just use InsertStmt/UpdateStmt/DeleteStmt directly?
>
> I need one flag in these statement to differentiate them from normal
> InsertStmt/UpdateStmt/DeleteStmt. There are slight difference for the
> process of these two kinds of statements in the transfromStmt() function. I
> define a set of alias nodes which have different nodetags. This can make
> the code simpler.

Ok. You might also consider just adding a "isMergeAction" boolean to
InsertStmt/UpdateStmt/DeleteStmt instead, or if the difference between
the regular statements and merge actions grow big, create a completely
separate node struct for them.

>> * Do you need the out-function for DeleteStmt? Why not for UpdateStmt and
>> InsertStmt?
>
> Ah, I add this function because I want to have a look of the content of
> DeleteStmt. I did this long ago, just after I started the project. If you
> don't want this function, I will remove it.

Ok.

>> * Instead of scanning the list of actions in ExecBS/ExecASMergeTriggers
>> every time, should set flags at plan time to mark what kind of actions there
>> is in the statement.
>
> First of all, I need to say that the functions of ExecBSMergeTriggers() and
> ExecASMergeTriggers() are for per-statement triggers, and are invoked only
> once in the whole process of a MERGE command. Setting flags at plan time can
> save the scanning. I may add it to next patch. But don't expect it save
> much execution time.

Yeah, I'm not so much concerned about performance, it just seems like it
might make the code slightly simpler.

>> Or should we defer firing the statement triggers until we hit the first
>> matching row and execute the action for the first time?
>
> No, we cannot do this. These are Per-statement triggers. They should be
> fired before/after the main plan is executed. The statement triggers are
> fired even no tuple is really matched with the actions.

Ok.

>> * This crashes:
>>
>> postgres=# CREATE TABLE target AS (SELECT 1 AS id);
>> SELECT 1
>> postgres=# MERGE into target t
>> USING (select 1 AS id) AS s
>>
>> ON t.id = s.id
>> WHEN MATCHED THEN
>> UPDATE SET id = (SELECT COUNT(*) FROM generate_series(1,10))
>> ;
>> server closed the connection unexpectedly
>> This probably means the server terminated abnormally
>> before or while processing the request.
>> The connection to the server was lost. Attempting reset: Failed.
>
> Oh, a really serious bug. I have not considered this case (user series
> generator in actions) before. I will see what I can do for it.

It's not specific to generate_series() but subqueries in general that
are the problem.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2010-08-15 11:37:04 Re: Status report on writeable CTEs
Previous Message Boxuan Zhai 2010-08-15 07:31:02 Re: MERGE Specification