Re: merge command - GSoC progress

Lists: pgsql-hackers
From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: merge command - GSoC progress
Date: 2010-07-27 05:04:05
Message-ID: AANLkTimXv-5axnsQhHM-wZazd+dOtGD84TzzF5D5MP84@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

---------- Forwarded message ----------
From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Date: 2010/7/26
Subject: Re: GSoC progress
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>

Hi,

I have get a edition that the merge command can run. It accept the standard
merge command and can do UPDATE, INSERT and DELETE actions now. But we
cannot put additional qualification for actions. There are some bugs when we
try to evaluate the quals which make the system quit. I will fix it soon.

The merge actions are packed in ModifyTable nodes now. This can benefit the
coding if we want to implement MERGE ... RETURNING ...

Please find the patch file in attachment.

2010/7/24 Robert Haas <robertmhaas(at)gmail(dot)com>

On Fri, Jul 23, 2010 at 12:21 PM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
> > I my current implementation, the overall merge command plan is still a
> > single ModifyTable node. And the main query of merge command is in the
> field
> > of ModifyTable.plans. I add a new "List * mergeActPstates" field in the
> > structure of ModifyTableState. When apply the ExecInitModifyTable()
> function
> > on a ModifyTable node of a merge command, the function will do an extra
> work
> > to init all the merge actions and put the result into the
> > list of mergeActPstates. (in previous design, I put the list in
> > PlannedStmt, but that makes the code a little bit ugly).
>
> Hmm, that might work.
>
> > I now put all merge-command-related functions in the file of
> > nodeModifyTable.c. and I have fixed something that I missed in the init
> > process for merge command, such as set the junkfiler attribute for ctid.
> > However, I still cannot run ExecProject over the tuple slot returned
> by main
> > query.
> >
> > In fact, I am not sure how ExecProject () works exactly. It totally
> depends
> > on the correctness of the input ProjectionInfo, in which we have a
> > "ExprContext *pi_exprContext" parameter. The expression context should
> > contains all the tuple slot needed for the projection. I
> > init ProjectionInfo correctly, but the expression context is empty when
> > initialized. Could you tell me WHEN does the system fill the tuple slots
> > (the ecxt_scantuple, ecxt_innertuple, ecxt_outertuple) in the expression
> > context of the ProjectionInfo structure?
>
> I'm not real familiar with this part of the code; that would be a good
> question to ask on pgsql-hackers. But I believe the answer is that
> the individual node types initialize the expression context at the
> appropriate time. For example, if you grep for ecxt_innertuple,
> you'll find that it's set in the hash, merge, and nestloop nodes;
> whereas the scan tuple is set by the scan, bitmap heap, and index scan
> nodes.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

Attachment Content-Type Size
merge_run_no_qual.patch text/plain 61.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-27 17:06:50
Message-ID: AANLkTin4EXaz-x1+VYqZ3pm=o4YcTemRdzRMD-jpcH=G@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
> I have get a edition that the merge command can run. It accept the standard
> merge command and can do UPDATE, INSERT and DELETE actions now. But we
> cannot put additional qualification for actions. There are some bugs when we
> try to evaluate the quals which make the system quit. I will fix it soon.

This patch doesn't compile. You're using zbxprint() from a bunch of
places where it's not defined. I get compile warnings for all of
those files and then a link failure at the end. You might find it
useful to create src/Makefile.custom in your local tree and put
COPT=-Werror in there; it tends to prevent problems of this kind.

Undefined symbols:
"_zbxprint", referenced from:
_transformStmt in analyze.o
_ExecInitMergeAction in nodeModifyTable.o
_ExecModifyTable in nodeModifyTable.o
_ExecInitModifyTable in nodeModifyTable.o
_merge_action_planner in planner.o

Not that it's as high-priority as getting this fully working, but you
should revert the useless changes in this patch - e.g. the one-line
change to heaptuple.c is obvious debugging leftovers, and all of the
changes to execQual.c and execUtil.c are whitespace-only. You should
also try to make your code and comments conform to project style
guidelines. In general, you'll find it easier to keep track of your
changes (and you'll have fewer spurious changes) if you use git diff
master...yourbranch instead of marking comments, etc. with ZBX or
similar.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-28 10:08:48
Message-ID: AANLkTima00ryh7OBzVkZDoSB-SeDX4HLhcKQNROungAv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/28 Robert Haas <robertmhaas(at)gmail(dot)com>

> On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
> > I have get a edition that the merge command can run. It accept the
> standard
> > merge command and can do UPDATE, INSERT and DELETE actions now. But we
> > cannot put additional qualification for actions. There are some bugs when
> we
> > try to evaluate the quals which make the system quit. I will fix it soon.
>
> This patch doesn't compile. You're using zbxprint() from a bunch of
> places where it's not defined. I get compile warnings for all of
> those files and then a link failure at the end. You might find it
> useful to create src/Makefile.custom in your local tree and put
> COPT=-Werror in there; it tends to prevent problems of this kind.
>
> Undefined symbols:
> "_zbxprint", referenced from:
> _transformStmt in analyze.o
> _ExecInitMergeAction in nodeModifyTable.o
> _ExecModifyTable in nodeModifyTable.o
> _ExecInitModifyTable in nodeModifyTable.o
> _merge_action_planner in planner.o
>
> Sorry, this is a debug function defined by me. It may not be included in
the patch. I add a line of "#define zbxprint printf" somewhere in the
system.

> Not that it's as high-priority as getting this fully working, but you
> should revert the useless changes in this patch - e.g. the one-line
> change to heaptuple.c is obvious debugging leftovers, and all of the
> changes to execQual.c and execUtil.c are whitespace-only. You should
> also try to make your code and comments conform to project style
> guidelines. In general, you'll find it easier to keep track of your
> changes (and you'll have fewer spurious changes) if you use git diff
> master...yourbranch instead of marking comments, etc. with ZBX or
> similar.
>
>

I will clean all these in my next patch.

I am now very confused with the failure of action qualification. I look
through the whole process of a query, from parser to executor. In my
opinion, the qualification transformed in analyzer, will be processed by
prepsocess_qual_condition() in planner, and then by the ExecInitExpr()
function in excutor start phase (in InitPlan() function). Then the qual is
ready to be used in ExecQual(). Am I correct?

I have done these on the merge action qual, but when I pass them into
ExecQual(), the server just closed abnormally. I don't know if I missed any
steps on preparing the qual expressions.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-28 10:28:32
Message-ID: AANLkTim2bk+-1M3mE+axD7Samw4nydoBDusQx-0BRHkD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 28, 2010 at 6:08 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
>> On Tue, Jul 27, 2010 at 1:04 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
>> > I have get a edition that the merge command can run. It accept the
>> > standard
>> > merge command and can do UPDATE, INSERT and DELETE actions now. But we
>> > cannot put additional qualification for actions. There are some bugs
>> > when we
>> > try to evaluate the quals which make the system quit. I will fix it
>> > soon.
>>
>> This patch doesn't compile.  You're using zbxprint() from a bunch of
>> places where it's not defined.  I get compile warnings for all of
>> those files and then a link failure at the end.  You might find it
>> useful to create src/Makefile.custom in your local tree and put
>> COPT=-Werror in there; it tends to prevent problems of this kind.
>>
>> Undefined symbols:
>>  "_zbxprint", referenced from:
>>      _transformStmt in analyze.o
>>      _ExecInitMergeAction in nodeModifyTable.o
>>      _ExecModifyTable in nodeModifyTable.o
>>      _ExecInitModifyTable in nodeModifyTable.o
>>      _merge_action_planner in planner.o
>>
> Sorry, this is a debug function defined by me. It may not be included in the
> patch. I add a line of "#define zbxprint printf" somewhere in the system.

Yeah, but it's not included in all the places that are needed to make
everything compile. You might move this to postgres.h or something.

>> Not that it's as high-priority as getting this fully working, but you
>> should revert the useless changes in this patch - e.g. the one-line
>> change to heaptuple.c is obvious debugging leftovers, and all of the
>> changes to execQual.c and execUtil.c are whitespace-only.  You should
>> also try to make your code and comments conform to project style
>> guidelines.  In general, you'll find it easier to keep track of your
>> changes (and you'll have fewer spurious changes) if you use git diff
>> master...yourbranch instead of marking comments, etc. with ZBX or
>> similar.
>
> I will clean all these in my next patch.
>
> I am now very confused with the failure of action qualification. I look
> through the whole process of a query, from parser to executor. In my
> opinion, the qualification transformed in analyzer, will be processed by
> prepsocess_qual_condition() in planner, and then by the ExecInitExpr()
> function in excutor start phase (in InitPlan() function). Then the qual is
> ready to be used in ExecQual(). Am I correct?

I'm not sure, sorry.

> I have done these on the merge action qual, but when I pass them into
> ExecQual(), the server just closed abnormally. I don't know if I missed any
> steps on preparing  the qual expressions.

Have you tried attaching a debugger? Try "SELECT pg_backend_pid()"
and then use "gdb -p the_pid" from another window. Hit "continue".
Then do whatever it is that's crashing. That way you can get a stack
backtrace, and poke around at the data structures.

Using pprint() on node-type data structures, either in debugging code
or actually straight from the debugger via "call", is also very
helpful, often-times.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-28 15:51:38
Message-ID: AANLkTimALSojoq405DeSC_Kmt6sVw3F3=-CuEf4RCjPa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I have fixed the action qual problem. Now the system can run merge command,
with quals.

I create a clean patch file (no debug clauses). See the attachment.

Please try this new command if you have interest.

Boxuan

Attachment Content-Type Size
merge_no_debug.patch text/plain 52.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-29 20:51:22
Message-ID: AANLkTi=x0MyPP_6fqRHW7LHXyMrUBWdsK7XB7AbUnf84@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 28, 2010 at 11:51 AM, Boxuan Zhai <bxzhai2010(at)gmail(dot)com> wrote:
> I have fixed the action qual problem. Now the system can run merge command,
> with quals.
>
> I create a clean patch file (no debug clauses). See the attachment.
>
> Please try this new command if you have interest.

So, I tried this today, but:

- I got some compiler warnings in analyze.c, and
- when tried to run 'make check' with the patch applied, initdb failed.

So you still need to do some more bug-squashing on this...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-30 16:21:49
Message-ID: 4C52FC1D.80604@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Boxuan Zhai wrote:
> I create a clean patch file (no debug clauses). See the attachment.

Some general coding feedback for you on this.

It's not obvious to people who might want to try this out what exactly
they are supposed to do. Particularly for complicated patches like
this, where only a subset of the final feature might actually be
implemented, some sort of reviewer guide suggesting what should and
shouldn't work would be extremely helpful. I recall there was some sort
of patch design guide in an earlier version of this patch; it doesn't
seem to be there anymore. Don't remember if that had any examples in it.

Ultimately, the examples of working behavior for this patch will need to
be put into code. The way that happens is by adding working examples
into the regression tests for the database. If you had those done for
this patch, I wouldn't have to ask for code examples; I could just look
at the source to the regression output and see how to use it against the
standard database the regression samples create, and then execute
against. This at least lets you avoid having to generate some test
data, because there will already be some in the regression database for
you to use. There is an intro this topic at
http://wiki.postgresql.org/wiki/Regression_test_authoring Another
common way to generate test data is to run pgbench which creates 4
tables and populates them with data.

As far as the patch itself goes, you have some work left on cleaning
that up still you'll need to do eventually. What I would suggest is
actually reading the patch itself; don't just generate it and send it,
read through the whole thing like someone new to it would do. One way
you can improve what you've done already is to find places where you
have introduced changes to the code structure just through editing.
Here's an example of what I'm talking about, from line 499 of your patch:

- break;
+ break;

This happened because you added two invisible tabs to the end of this
line. This makes the patch larger for no good reason and tends to
infuriate people who work on the code. There's quite a bit of extra
lines added in here too that should go. You should consider reducing
the ultimate size of the patch in terms of lines a worthwhile use of
your time, even if it doesn't change how things work. There's lots of
examples in this one where you put two or three lines between two
statements when a single one would match the look of the code in that
file. A round of reading the diff looking for that sort of problem
would be useful.

Another thing you should do is limit how long each line is when
practical. You have lots of seriously wide comment lines here right now
in particular. While there are some longer lines in the PostgreSQL code
right now, that's code, not comments. And when you have a long line and
a long comment, don't tack the comment onto the end. Put it on the line
above instead. Also, don't use "//" in the middle of comments the way
you've done in a few places here.

Getting some examples sorted out and starting on regression tests is
more important than the coding style parts I think, just wanted to pass
those along while I noticed them reading the patch, so you could start
looking out for them more as you continue to work on it.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-07-30 16:31:20
Message-ID: AANLkTi=-V_W2wq52qjpYO3yBBhzehs3SRoE8fy9dbDbb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 30, 2010 at 12:21 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> If you had those done for this patch, I
> wouldn't have to ask for code examples; I could just look at the source to
> the regression output and see how to use it against the standard database
> the regression samples create, and then execute against.

I agree. While not every feature needs regression tests, something
this complex certainly does. Also, running the regression tests
(frequently!) can help you realize when you've broken the existing
code before too much time goes by and it's no longer easy to figure
out which change is responsible for the breakage.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: merge command - GSoC progress
Date: 2010-07-30 16:33:37
Message-ID: 201007301833.37702.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 30 July 2010 18:21:49 Greg Smith wrote:
> - break;
> + break;
>
> This happened because you added two invisible tabs to the end of this
> line. This makes the patch larger for no good reason and tends to
If you want to remove changes like this using:
"git checkout -p HEAD"
is very useful if youre using git. It allows you to selectively revert hunks
of not-checked-in changes...

Andres


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-01 05:03:15
Message-ID: AANLkTi=cDEZnf2AgwgLL4-aoux72HECg3Njj9C8RMGgR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/31 Greg Smith <greg(at)2ndquadrant(dot)com>

> Boxuan Zhai wrote:
>
>> I create a clean patch file (no debug clauses). See the attachment.
>>
>
> Some general coding feedback for you on this.
>
> Thanks for your consideration!

> It's not obvious to people who might want to try this out what exactly they
> are supposed to do. Particularly for complicated patches like this, where
> only a subset of the final feature might actually be implemented, some sort
> of reviewer guide suggesting what should and shouldn't work would be
> extremely helpful. I recall there was some sort of patch design guide in an
> earlier version of this patch; it doesn't seem to be there anymore. Don't
> remember if that had any examples in it.
>
> I am now working on fixing a bug which makes the system unable to initdb. I
will update my page in postgres Wiki with a detailed instruction of my
implementation and testing examples soon, with my next patch file.

> Ultimately, the examples of working behavior for this patch will need to be
> put into code. The way that happens is by adding working examples into the
> regression tests for the database. If you had those done for this patch, I
> wouldn't have to ask for code examples; I could just look at the source to
> the regression output and see how to use it against the standard database
> the regression samples create, and then execute against. This at least lets
> you avoid having to generate some test data, because there will already be
> some in the regression database for you to use. There is an intro this
> topic at http://wiki.postgresql.org/wiki/Regression_test_authoring Another common way to generate test data is to run pgbench which creates 4
> tables and populates them with data.
>
> I will try to add my testing examples to the gregression folder.

> As far as the patch itself goes, you have some work left on cleaning that
> up still you'll need to do eventually. What I would suggest is actually
> reading the patch itself; don't just generate it and send it, read through
> the whole thing like someone new to it would do. One way you can improve
> what you've done already is to find places where you have introduced changes
> to the code structure just through editing. Here's an example of what I'm
> talking about, from line 499 of your patch:
>
> - break;
> + break;
> This happened because you added two invisible tabs to the end of this line.
> This makes the patch larger for no good reason and tends to infuriate
> people who work on the code. There's quite a bit of extra lines added in
> here too that should go. You should consider reducing the ultimate size of
> the patch in terms of lines a worthwhile use of your time, even if it
> doesn't change how things work. There's lots of examples in this one where
> you put two or three lines between two statements when a single one would
> match the look of the code in that file. A round of reading the diff
> looking for that sort of problem would be useful.
>
> Another thing you should do is limit how long each line is when practical.
> You have lots of seriously wide comment lines here right now in particular.
> While there are some longer lines in the PostgreSQL code right now, that's
> code, not comments. And when you have a long line and a long comment, don't
> tack the comment onto the end. Put it on the line above instead. Also,
> don't use "//" in the middle of comments the way you've done in a few places
> here.
>
> Sorry for these mistakes, again. I promise that the same thing will not
happen in my next patch.

> Getting some examples sorted out and starting on regression tests is more
> important than the coding style parts I think, just wanted to pass those
> along while I noticed them reading the patch, so you could start looking out
> for them more as you continue to work on it.
>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us <http://www.2ndquadrant.us/>
>
>


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-03 20:57:05
Message-ID: AANLkTimDPtmM4+iYBFq=m-6mHYpfOjGhntS3dmYYS70W@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear Robert,

I have finished the tested edition of my code.

I think there are no redundant lines in this time's patch file.

And, I have tested the running of MERGE command with different situations. I
am sorry that I didn't create regression test files, because I am not sure
how to add new files in the git package. But, I have written web pages in
Postgres Wiki. I explain the details of my implementation and a set of
testing examples.

Please refer to the following pages if you are interested.
https://wiki.postgresql.org/wiki/Add_MERGE_command_GSoC_2010

https://wiki.postgresql.org/wiki/Implementation_detalis
https://wiki.postgresql.org/wiki/Test_examples

Thanks!

Yours Boxuan

Attachment Content-Type Size
tested_merge.tar application/x-tar 70.0 KB

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-03 22:14:02
Message-ID: 4C5894AA.9010007@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> And, I have tested the running of MERGE command with different
> situations. I am sorry that I didn't create regression test files,
> because I am not sure how to add new files in the git package. But, I
> have written web pages in Postgres Wiki. I explain the details of my
> implementation and a set of testing examples.

Can someone help Boxuan with how to write regression tests?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-03 22:23:15
Message-ID: 4C5896D3.3080601@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

From: David Fetter <david(at)fetter(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-03 22:45:11
Message-ID: 20100803224511.GT5082@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 03, 2010 at 03:14:02PM -0700, Josh Berkus wrote:
>
> > And, I have tested the running of MERGE command with different
> > situations. I am sorry that I didn't create regression test files,
> > because I am not sure how to add new files in the git package.
> > But, I have written web pages in Postgres Wiki. I explain the
> > details of my implementation and a set of testing examples.
>
> Can someone help Boxuan with how to write regression tests?

Happy to. I'll start this evening PDT or tomorrow :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-03 23:19:05
Message-ID: 4C58A3E9.5080408@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Boxuan Zhai wrote:
> I think there are no redundant lines in this time's patch file.

It is much better. There are still more blank likes around the new code
you've added than are needed in many places, but that doesn't interfere
with reading the patch.

The main code formatting issue left you'll need to address eventually
are all the really long comments in there.

> And, I have tested the running of MERGE command with different
> situations. I am sorry that I didn't create regression test files,
> because I am not sure how to add new files in the git package.

git add <filename> ?

The tests you've put in there are the right general sort of things to
try out. The one example you gave does show an UPSERT being emulated by
MERGE, which is the #1 thing people are looking for initially.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 04:55:51
Message-ID: AANLkTi=GNn7THsnWeokMyE+ZJCky1O528Pb=7FvGtA-D@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/4 Greg Smith <greg(at)2ndquadrant(dot)com>

> Boxuan Zhai wrote:
>
>> I think there are no redundant lines in this time's patch file.
>>
>
> It is much better. There are still more blank likes around the new code
> you've added than are needed in many places, but that doesn't interfere with
> reading the patch.
>
> Sorry, it is my personal habit of leaving blanks around codes. I will chage
this if it doesn't follow the pgsql coding style.

> The main code formatting issue left you'll need to address eventually are
> all the really long comments in there.
>
> I will correct the long comments in the next patch.

>
>
> And, I have tested the running of MERGE command with different situations.
>> I am sorry that I didn't create regression test files, because I am not sure
>> how to add new files in the git package.
>>
>
> git add <filename> ?
>
> The tests you've put in there are the right general sort of things to try
> out. The one example you gave does show an UPSERT being emulated by MERGE,
> which is the #1 thing people are looking for initially.
>
> In fact, I have created a merge.sql with simple merge example. I put it in
the folder of /src/test/regress/sql/ and modified the serial_schedule file
to add a line of : test: merge
Is this correct?

But, I don't know how to run regress to test this sql file. My "make check"
fails when install the db. I think this is because I do it under a MinGW
environment and some parameters are not matched with the default setting of
postgres.
I can configure, make install , initdb and do sql query in psql successfully
in my machine. So the source code itself should be correct.

I put my merge.sql in attachment, in case anyone want to have a look.

>
> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us <http://www.2ndquadrant.us/>
>
>

Attachment Content-Type Size
merge.sql application/octet-stream 1.1 KB

From: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 09:23:12
Message-ID: AANLkTi=o_c2m=f-1mFERyBW89V0rO5JnSu5gOOt8hin=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear Robert,

I am just considering that there may be some logical mistakes for my rule
rewriting strategy of MERGE actions.

In my current design, if we find that an action type, say UPDATE, is
replaced by INSTEAD rules, we will remove all the actions of this type from
the MERGE command, as if they are not be specified by user from the
beginning. See the test example in my pages for this situation.
https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules

Now,I am thinking that maybe we should keep the replaced actions in action
list, and just mark them to be "invalid". If one join tuple from the main
plan fits the condition of this action, we will do nothing on it.

This strategy is a little bit different with the current one. If we delete
an action, the tuples that meet it condition will be caught by other
actions. If we keep it, the tuples that match it will be skipped.

I think the new design is more logical, and I wonder your opinion on this
problem.

Yours Boxuan


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 12:09:51
Message-ID: 4C59588F.3030809@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/08/10 07:55, Boxuan Zhai wrote:
> 2010/8/4 Greg Smith<greg(at)2ndquadrant(dot)com>
>> Boxuan Zhai wrote:
>>> And, I have tested the running of MERGE command with different situations.
>>> I am sorry that I didn't create regression test files, because I am not sure
>>> how to add new files in the git package.
>>
>> git add<filename> ?

Yep. I believe Boxuan is using git in a simplistic way, doing just "git
diff" to create patches. For adding new files, you need to do "git add
<filename>", but note that this adds the new file to "staging area". To
view all changes in the staging area, use "git diff --cached", but that
won't show any modifications to existing files that you haven't also
"git add"ed. So to generate a patch you need to "git add" all modified
and added files ("git add -u" will add all modified files
automatically), and then use "git diff --cached" to generate the diff.

> In fact, I have created a merge.sql with simple merge example. I put it in
> the folder of /src/test/regress/sql/ and modified the serial_schedule file
> to add a line of : test: merge
> Is this correct?

Yep. You also need to add it to parallel_schedule, "make check" and
"make installcheck-parallel" use parallel_schedule and "make
installcheck" uses serial_schedule.

> But, I don't know how to run regress to test this sql file. My "make check"
> fails when install the db. I think this is because I do it under a MinGW
> environment and some parameters are not matched with the default setting of
> postgres.
> I can configure, make install , initdb and do sql query in psql successfully
> in my machine. So the source code itself should be correct.

Hmm, I don't know much about MinGW, but "make check" should work. But
since you can install and run postgres normally, you can use "make
installcheck" instead. That runs the regression suite against an already
running instance of postgres.

> I put my merge.sql in attachment, in case anyone want to have a look.

Thanks, that helps a lot.

Here's a combined patch which includes the latest code patch from
"tested_merge.tar", with the merge.sql test case, the corresponding
expected output file, and the required changes to the schedule files.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
merge-command-2010-08-03.patch text/x-diff 62.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 12:23:36
Message-ID: 4C595BC8.1040101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/08/10 12:23, Boxuan Zhai wrote:
> I am just considering that there may be some logical mistakes for my rule
> rewriting strategy of MERGE actions.
>
> In my current design, if we find that an action type, say UPDATE, is
> replaced by INSTEAD rules, we will remove all the actions of this type from
> the MERGE command, as if they are not be specified by user from the
> beginning. See the test example in my pages for this situation.
> https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules
>
> Now,I am thinking that maybe we should keep the replaced actions in action
> list, and just mark them to be "invalid". If one join tuple from the main
> plan fits the condition of this action, we will do nothing on it.
>
> This strategy is a little bit different with the current one. If we delete
> an action, the tuples that meet it condition will be caught by other
> actions. If we keep it, the tuples that match it will be skipped.
>
> I think the new design is more logical, and I wonder your opinion on this
> problem.

So if I understood correctly, in the instead rule example you have at
the wiki page, the stock table should contain one row, with the same
balance it had before running the MERGE? Yeah, agreed, that's much more
logical.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Boxuan Zhai <bxzhai2010(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 13:02:39
Message-ID: 201008041502.43810.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 04 August 2010 14:09:51 Heikki Linnakangas wrote:
> Yep. I believe Boxuan is using git in a simplistic way, doing just "git
> diff" to create patches. For adding new files, you need to do "git add
> <filename>", but note that this adds the new file to "staging area". To
> view all changes in the staging area, use "git diff --cached", but that
> won't show any modifications to existing files that you haven't also
> "git add"ed. So to generate a patch you need to "git add" all modified
> and added files ("git add -u" will add all modified files
> automatically), and then use "git diff --cached" to generate the diff.
Or use git add --intent--to-add (or -N). That adds the file but not the actual
changes.

Andres


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 14:36:44
Message-ID: 1280932604.1838.91.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-08-04 at 17:23 +0800, Boxuan Zhai wrote:
> Dear Robert,
>
> I am just considering that there may be some logical mistakes for my
> rule rewriting strategy of MERGE actions.
>
> In my current design, if we find that an action type, say UPDATE, is
> replaced by INSTEAD rules, we will remove all the actions of this type
> from the MERGE command, as if they are not be specified by user from
> the beginning. See the test example in my pages for this situation.
> https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules

It seems sensible to use the test files that I wrote for MERGE in 2008,
published to -hackers at that time.

The tests were a complete output from a MERGE test script.

Developing new tests when we already have code makes little sense, plus
its a good way of objectively testing that the spec has been implemented
correctly in these patches.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Boxuan Zhai <bxzhai2010(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: merge command - GSoC progress
Date: 2010-08-04 15:26:30
Message-ID: 1280935591.1838.126.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-08-04 at 15:36 +0100, Simon Riggs wrote:
> On Wed, 2010-08-04 at 17:23 +0800, Boxuan Zhai wrote:
> > Dear Robert,
> >
> > I am just considering that there may be some logical mistakes for my
> > rule rewriting strategy of MERGE actions.
> >
> > In my current design, if we find that an action type, say UPDATE, is
> > replaced by INSTEAD rules, we will remove all the actions of this type
> > from the MERGE command, as if they are not be specified by user from
> > the beginning. See the test example in my pages for this situation.
> > https://wiki.postgresql.org/wiki/MergeTestExamples#With_INSTEAD_rules
>
> It seems sensible to use the test files that I wrote for MERGE in 2008,
> published to -hackers at that time.

Even more sensible for me to include it as a patch, with the files in
the right places and the schedules updated.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
merge_tests.patch text/x-patch 13.1 KB