Re: (PATCH) Adding CORRESPONDING to Set Operations

Lists: pgsql-hackers
From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-10-19 13:01:09
Message-ID: CAJZSWkWN3YwQ01C3+cq0+eyZ1DmK=69_6vryrsVGMC=+fWrSZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Adding CORRESPONDING to Set Operations
Initial patch, filename: corresponding_clause_v2.patch

This patch adds CORRESPONDING clause to set operations according to
SQL20nn standard draft as Feature F301, "CORRESPONDING in query
expressions"

Corresponding clause either contains a BY(...) clause or not. If it
doesn't have a BY(...) clause the usage is as follows.

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f;

with output:
b c
-----
2 3
4 6

i.e. matching column names are filtered and are only output from the
whole set operation clause.

If we introduce a BY(...) clause, then column names are further
intersected with that BY clause:

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f;

with output:

b
--
2
4

This patch compiles and tests successfully with master branch.

It has been tested only on Pardus Linux i686 ( Kernel 2.6.37.6 #1 SMP
i686 i686 i386 GNU/Linux)

This patch includes documentation and add one regression file.

This patch addresses the following TODO item:
SQL Commands: Add CORRESPONDING BY to UNION/INTERSECT/EXCEPT

Best Regards,

Kerem KAT

Attachment Content-Type Size
corresponding_clause_v2.patch text/x-patch 37.0 KB

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kerem Kat" <keremkat(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-10-24 17:52:31
Message-ID: cd505d419919eaab83d943fc2d242c9a.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, October 19, 2011 15:01, Kerem Kat wrote:
> Adding CORRESPONDING to Set Operations
> Initial patch, filename: corresponding_clause_v2.patch

I had a quick look at the behaviour of this patch.

Btw, the examples in your email were typoed (one select is missing):

> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f;
should be:
SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f;

and

> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f;
should be:
SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f;
>

But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed,
according to the standard (foundation, p. 408):

"2) If <corresponding column list> is specified, then let SL be a <select list> of those <column
name>s explicitly appearing in the <corresponding column list> in the order that these
<column name>s appear in the <corresponding column list>. Every <column name> in the
<corresponding column list> shall be a <column name> of both T1 and T2."

That would make this wrong, I think:

SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ;

b | c
---+---
2 | 3
4 | 6
(2 rows)

i.e., I think it should show columns in the order c, b (and not b, c); the order of the
CORRESPONDING BY phrase.

(but maybe I'm misreading the text of the standard; I find it often difficult to follow)

Thanks,

Erik Rijkers


From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-10-25 17:49:14
Message-ID: CAJZSWkWk-FVTsO4TgX=Hvm28UQZturKHU8_-0jMFNO7S83Aesg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> On Wed, October 19, 2011 15:01, Kerem Kat wrote:
>> Adding CORRESPONDING to Set Operations
>> Initial patch, filename: corresponding_clause_v2.patch
>
> I had a quick look at the behaviour of this patch.
>
> Btw, the examples in your email were typoed (one select is missing):
>
>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f;
> should be:
>  SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f;
>
> and
>
>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f;
> should be:
>  SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f;
>>

Yes you are correct, mea culpa.

>
>
>
> But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed,
> according to the standard (foundation, p. 408):
>
> "2) If <corresponding column list> is specified, then let SL be a <select list> of those <column
> name>s explicitly appearing in the <corresponding column list> in the order that these
> <column name>s appear in the <corresponding column list>. Every <column name> in the
> <corresponding column list> shall be a <column name> of both T1 and T2."
>
> That would make this wrong, I think:
>
> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ;
>
>  b | c
> ---+---
>  2 | 3
>  4 | 6
> (2 rows)
>
> i.e., I think it should show columns in the order c, b (and not b, c); the order of the
> CORRESPONDING BY phrase.
>
> (but maybe I'm misreading the text of the standard; I find it often difficult to follow)
>

It wasn't a misread, I checked the draft, in my version same
explanation is at p.410.
I have corrected the ordering of the targetlists of subqueries. And
added 12 regression
tests for column list ordering. Can you confirm that the order has
changed for you?

>
> Thanks,
>
>
> Erik Rijkers
>
>

Regards,

Kerem KAT

Attachment Content-Type Size
corresponding_clause_v3.patch text/x-patch 39.6 KB

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kerem Kat" <keremkat(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-10-25 18:09:51
Message-ID: 90d32620940905c1c2ef608fa653941d.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, October 25, 2011 19:49, Kerem Kat wrote:
> On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>> On Wed, October 19, 2011 15:01, Kerem Kat wrote:
>>> Adding CORRESPONDING to Set Operations
> I have corrected the ordering of the targetlists of subqueries. And
> added 12 regression
> tests for column list ordering. Can you confirm that the order has
> changed for you?
>

Yes, this one is OK.

thanks,

Erik Rijkers


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Kerem Kat" <keremkat(at)gmail(dot)com>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 12:45:24
Message-ID: cc632458cd3518d43c164ec4ea87aff4.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

(pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)

Hi,

here is another peculiarity, which I think is a bug:

-- first without CORRESPONDING:

$ psql -Xaf null.sql
select 1 a , 2 b
union all
select null a, 4 b ;
a | b
---+---
1 | 2
| 4
(2 rows)

-- then with CORRESPONDING:

select 1 a , 2 b
union all
corresponding
select null a, 4 b ;
psql:null.sql:9: ERROR: failed to find conversion function from unknown to integer

If the null value is in a table column the error does not occur:

drop table if exists t1; create table t1 (a int, b int); insert into t1 values (1,2);
drop table if exists t2; create table t2 (a int, b int); insert into t2 values (null,2);
select a,b from t1
union all
corresponding
select a,b from t2 ;
a | b
---+---
1 | 2
| 2
(2 rows)

I'm not sure it is actually a bug; but it seems an unneccessary error.

thanks,

Erik Rijkers


From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 19:21:41
Message-ID: CAJZSWkVJ2Rr9BFDj77EHUOo9CYpDj=rJ7dtrF5i7V+g7q2CwEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Union with NULL error persists without the corresponding patch. Here
is the output from postgres without the patch:

SELECT a FROM (SELECT 1 a) foo
UNION
SELECT a FROM (SELECT NULL a) foo2;

ERROR: failed to find conversion function from unknown to integer

It is thrown from parse_coerce.c:coerce_type method. I will try to dig
deep on it.

Regards,

Kerem KAT

On Thu, Oct 27, 2011 at 15:45, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> (pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
> linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)
>
> Hi,
>
> here is another peculiarity, which I think is a bug:
>
> -- first without CORRESPONDING:
>
> $ psql -Xaf null.sql
>            select 1 a   , 2 b
> union all
>            select null a, 4 b ;
>  a | b
> ---+---
>  1 | 2
>   | 4
> (2 rows)
>
> -- then with CORRESPONDING:
>
>            select 1 a   , 2 b
> union all
> corresponding
>            select null a, 4 b ;
> psql:null.sql:9: ERROR:  failed to find conversion function from unknown to integer
>
>
> If the null value is in a table column the error does not occur:
>
> drop table if exists t1; create table t1 (a int, b int); insert into t1 values (1,2);
> drop table if exists t2; create table t2 (a int, b int); insert into t2 values (null,2);
>                select a,b from t1
> union all
> corresponding
>                select a,b from t2 ;
>  a | b
> ---+---
>  1 | 2
>   | 2
> (2 rows)
>
>
> I'm not sure it is actually a bug; but it seems an unneccessary error.
>
>
> thanks,
>
> Erik Rijkers
>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 19:55:57
Message-ID: 4746.1319745357@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kerem Kat <keremkat(at)gmail(dot)com> writes:
> Union with NULL error persists without the corresponding patch. Here
> is the output from postgres without the patch:

> SELECT a FROM (SELECT 1 a) foo
> UNION
> SELECT a FROM (SELECT NULL a) foo2;

> ERROR: failed to find conversion function from unknown to integer

Yeah, this is a longstanding issue that is not simple to fix without
introducing other unpleasantnesses. It is not something you should
try to deal with at the same time as implementing CORRESPONDING.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 20:20:55
Message-ID: 5162.1319746855@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Kerem Kat <keremkat(at)gmail(dot)com> writes:
>> Union with NULL error persists without the corresponding patch. Here
>> is the output from postgres without the patch:

>> SELECT a FROM (SELECT 1 a) foo
>> UNION
>> SELECT a FROM (SELECT NULL a) foo2;

>> ERROR: failed to find conversion function from unknown to integer

> Yeah, this is a longstanding issue that is not simple to fix without
> introducing other unpleasantnesses. It is not something you should
> try to deal with at the same time as implementing CORRESPONDING.

BTW, just to clarify: although that case fails, the case Erik was
complaining of does work in unmodified Postgres:

regression=# select 1 a , 2 b
union all
select null a, 4 b ;
a | b
---+---
1 | 2
| 4
(2 rows)

and I agree with him that it should still work with CORRESPONDING.
Even though the behavior of unlabeled NULLs is less than perfect,
we definitely don't want to break cases that work now. I suspect
the failure means that you tried to postpone too much work to plan
time. You do have to match up the columns honestly at parse time
and do the necessary type coercions on them then.

regards, tom lane


From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 20:34:21
Message-ID: CAJZSWkWfRDp5dt+TCAis7=d-==vCx9U3v0xV8R9dLR=9=SxbzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 27, 2011 at 23:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Kerem Kat <keremkat(at)gmail(dot)com> writes:
>>> Union with NULL error persists without the corresponding patch. Here
>>> is the output from postgres without the patch:
>
>>> SELECT a FROM (SELECT 1 a) foo
>>> UNION
>>> SELECT a FROM (SELECT NULL a) foo2;
>
>>> ERROR:  failed to find conversion function from unknown to integer
>
>> Yeah, this is a longstanding issue that is not simple to fix without
>> introducing other unpleasantnesses.  It is not something you should
>> try to deal with at the same time as implementing CORRESPONDING.
>
> BTW, just to clarify: although that case fails, the case Erik was
> complaining of does work in unmodified Postgres:
>
> regression=# select 1 a   , 2 b
> union all
>            select null a, 4 b ;
>  a | b
> ---+---
>  1 | 2
>   | 4
> (2 rows)
>
> and I agree with him that it should still work with CORRESPONDING.
> Even though the behavior of unlabeled NULLs is less than perfect,
> we definitely don't want to break cases that work now.  I suspect
> the failure means that you tried to postpone too much work to plan
> time.  You do have to match up the columns honestly at parse time
> and do the necessary type coercions on them then.
>
>                        regards, tom lane
>

That is by design, because CORRESPONDING is implemented as subqueries:

select 1 a , 2 b
union all
corresponding
select null a, 4 b ;

is equivalent to

SELECT a, b FROM ( SELECT 1 a, 2 b ) foo
UNION ALL
SELECT a, b FROM ( SELECT null a, 4 b ) foo2;

which gives the same error in unpatched postgres.

Regards,

Kerem KAT


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING (NULL error)
Date: 2011-10-27 21:04:55
Message-ID: 5792.1319749495@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kerem Kat <keremkat(at)gmail(dot)com> writes:
> On Thu, Oct 27, 2011 at 23:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, just to clarify: although that case fails, the case Erik was
>> complaining of does work in unmodified Postgres:
>> ...
>> and I agree with him that it should still work with CORRESPONDING.

> That is by design, because CORRESPONDING is implemented as subqueries:

Well, this appears to me to be a counterexample sufficient to refute
that implementation decision. You can inject subqueries at plan time,
if that helps you make things match up, but you can't rearrange things
that way at parse time, as I gather you're doing or else you would not
be seeing this problem. In any case, I already pointed out to you that
rearranging the parse tree that way is problematic for reverse-listing
the parse tree. We don't want to see subqueries injected in the results
of printing parse trees with ruleutils.c.

regards, tom lane


From: Thom Brown <thom(at)linux(dot)com>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-14 10:56:44
Message-ID: CAA-aLv6MU_xpRrD9LFiSzCUo9BHWTiFp9OAHQSxh9-1PMZ6FXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25 October 2011 18:49, Kerem Kat <keremkat(at)gmail(dot)com> wrote:
> On Mon, Oct 24, 2011 at 20:52, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>> On Wed, October 19, 2011 15:01, Kerem Kat wrote:
>>> Adding CORRESPONDING to Set Operations
>>> Initial patch, filename: corresponding_clause_v2.patch
>>
>> I had a quick look at the behaviour of this patch.
>>
>> Btw, the examples in your email were typoed (one select is missing):
>>
>>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING 4 b, 5 d, 6 c, 7 f;
>> should be:
>>  SELECT 1 a, 2 b, 3 c UNION CORRESPONDING select 4 b, 5 d, 6 c, 7 f;
>>
>> and
>>
>>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) 4 b, 5 d, 6 c, 7 f;
>> should be:
>>  SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(b) select 4 b, 5 d, 6 c, 7 f;
>>>
>
> Yes you are correct, mea culpa.
>
>>
>>
>>
>> But there is also a small bug, I think: the order in the CORRESPONDING BY list should be followed,
>> according to the standard (foundation, p. 408):
>>
>> "2) If <corresponding column list> is specified, then let SL be a <select list> of those <column
>> name>s explicitly appearing in the <corresponding column list> in the order that these
>> <column name>s appear in the <corresponding column list>. Every <column name> in the
>> <corresponding column list> shall be a <column name> of both T1 and T2."
>>
>> That would make this wrong, I think:
>>
>> SELECT 1 a, 2 b, 3 c UNION CORRESPONDING BY(c,b) select 5 d, 6 c, 7 f, 4 b ;
>>
>>  b | c
>> ---+---
>>  2 | 3
>>  4 | 6
>> (2 rows)
>>
>> i.e., I think it should show columns in the order c, b (and not b, c); the order of the
>> CORRESPONDING BY phrase.
>>
>> (but maybe I'm misreading the text of the standard; I find it often difficult to follow)
>>
>
> It wasn't a misread, I checked the draft, in my version same
> explanation is at p.410.
> I have corrected the ordering of the targetlists of subqueries. And
> added 12 regression
> tests for column list ordering. Can you confirm that the order has
> changed for you?
>
>
>>
>> Thanks,
>>
>>
>> Erik Rijkers
>>
>>
>
> Regards,
>
> Kerem KAT

This explain plan doesn't look right to me:

test=# explain select a,b,c from one intersect corresponding by (a,c)
select a,b,c from two;
QUERY PLAN
---------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8)
-> Append (cost=0.00..97.60 rows=3880 width=8)
-> Subquery Scan on "*SELECT* 3" (cost=0.00..48.80 rows=1940 width=8)
-> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8)
-> Subquery Scan on "*SELECT* 4" (cost=0.00..48.80 rows=1940 width=8)
-> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8)
(6 rows)

If I do the same thing without the "corresponding...":

test=# explain select a,b,c from one intersect select a,b,c from two;
QUERY PLAN
----------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..126.70 rows=200 width=12)
-> Append (cost=0.00..97.60 rows=3880 width=12)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..48.80
rows=1940 width=12)
-> Seq Scan on one (cost=0.00..29.40 rows=1940 width=12)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..48.80
rows=1940 width=12)
-> Seq Scan on two (cost=0.00..29.40 rows=1940 width=12)
(6 rows)

So it looks like it's now seeing the two tables as the 3rd and 4th
tables, even though there are only 2 tables in total.

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Thom Brown <thom(at)linux(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-14 11:29:05
Message-ID: CAJZSWkXZeCncG-kd4BG12ArzrQze-vvnjRxOt2wqXjicLsqSpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> This explain plan doesn't look right to me:
>
> test=# explain select a,b,c from one intersect corresponding by (a,c)
> select a,b,c from two;
>                                   QUERY PLAN
> ---------------------------------------------------------------------------------
>  HashSetOp Intersect  (cost=0.00..117.00 rows=200 width=8)
>   ->  Append  (cost=0.00..97.60 rows=3880 width=8)
>         ->  Subquery Scan on "*SELECT* 3"  (cost=0.00..48.80 rows=1940 width=8)
>               ->  Seq Scan on one  (cost=0.00..29.40 rows=1940 width=8)
>         ->  Subquery Scan on "*SELECT* 4"  (cost=0.00..48.80 rows=1940 width=8)
>               ->  Seq Scan on two  (cost=0.00..29.40 rows=1940 width=8)
> (6 rows)

In the current implementation,

select a,b,c from one intersect corresponding by (a,c) select a,b,c from two;

is translated to equivalent

select a, c from (select a,b,c from one)
intersect
select a, c from (select a,b,c from two);

Methinks that's the reason for this explain output.

Corresponding is currently implemented in the parse/analyze phase. If
it were to be implemented in the planning phase, explain output would
likely be as you expect it to be.

> If I do the same thing without the "corresponding...":
>
> test=# explain select a,b,c from one intersect select a,b,c from two;
>                                    QUERY PLAN
> ----------------------------------------------------------------------------------
>  HashSetOp Intersect  (cost=0.00..126.70 rows=200 width=12)
>   ->  Append  (cost=0.00..97.60 rows=3880 width=12)
>         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..48.80
> rows=1940 width=12)
>               ->  Seq Scan on one  (cost=0.00..29.40 rows=1940 width=12)
>         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..48.80
> rows=1940 width=12)
>               ->  Seq Scan on two  (cost=0.00..29.40 rows=1940 width=12)
> (6 rows)
>
> So it looks like it's now seeing the two tables as the 3rd and 4th
> tables, even though there are only 2 tables in total.
>
> --
> Thom Brown
> Twitter: @darkixion
> IRC (freenode): dark_ixion
> Registered Linux user: #516935
>
> EnterpriseDB UK: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Regards,

Kerem KAT


From: Thom Brown <thom(at)linux(dot)com>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-14 13:22:40
Message-ID: CAA-aLv5CJbc4490f0Bbr_3FppnKgtz-egbC0FaiC8C+JLhh8SA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 November 2011 11:29, Kerem Kat <keremkat(at)gmail(dot)com> wrote:

> > This explain plan doesn't look right to me:
> >
> > test=# explain select a,b,c from one intersect corresponding by (a,c)
> > select a,b,c from two;
> > QUERY PLAN
> >
> ---------------------------------------------------------------------------------
> > HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8)
> > -> Append (cost=0.00..97.60 rows=3880 width=8)
> > -> Subquery Scan on "*SELECT* 3" (cost=0.00..48.80 rows=1940
> width=8)
> > -> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8)
> > -> Subquery Scan on "*SELECT* 4" (cost=0.00..48.80 rows=1940
> width=8)
> > -> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8)
> > (6 rows)
>
> In the current implementation,
>
> select a,b,c from one intersect corresponding by (a,c) select a,b,c from
> two;
>
> is translated to equivalent
>
> select a, c from (select a,b,c from one)
> intersect
> select a, c from (select a,b,c from two);
>
> Methinks that's the reason for this explain output.
>
> Corresponding is currently implemented in the parse/analyze phase. If
> it were to be implemented in the planning phase, explain output would
> likely be as you expect it to be.

I'm certainly no expert on what the right way to represent the plan is, but
I'm still uncomfortable with its current representation. And having just
tested the translated equivalent, I still don't get the same explain plan:

test=# explain select a, c from (select a,b,c from one) a
intersect
select a, c from (select a,b,c from two) b;
QUERY PLAN

---------------------------------------------------------------------------------
HashSetOp Intersect (cost=0.00..117.00 rows=200 width=8)
-> Append (cost=0.00..97.60 rows=3880 width=8)
-> Subquery Scan on "*SELECT* 1" (cost=0.00..48.80 rows=1940
width=8)
-> Seq Scan on one (cost=0.00..29.40 rows=1940 width=8)
-> Subquery Scan on "*SELECT* 2" (cost=0.00..48.80 rows=1940
width=8)
-> Seq Scan on two (cost=0.00..29.40 rows=1940 width=8)
(6 rows)

Also you probably want to update src/backend/catalog/sql_features.txt so
that F301 is marked as "YES" for supporting the standard. :)

--
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-14 13:32:47
Message-ID: 24578.1321277567@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kerem Kat <keremkat(at)gmail(dot)com> writes:
> Corresponding is currently implemented in the parse/analyze phase. If
> it were to be implemented in the planning phase, explain output would
> likely be as you expect it to be.

It's already been pointed out to you that doing this at parse time is
unacceptable, because of the implications for reverse-listing of rules
(views).

regards, tom lane


From: Kerem Kat <keremkat(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thom Brown <thom(at)linux(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-14 14:09:31
Message-ID: CAJZSWkUHRWbQ4htoL4WP9xzHwjGMpa1mjphz9KZrSS4Y7e3C7Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 14, 2011 at 15:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Kerem Kat <keremkat(at)gmail(dot)com> writes:
>> Corresponding is currently implemented in the parse/analyze phase. If
>> it were to be implemented in the planning phase, explain output would
>> likely be as you expect it to be.
>
> It's already been pointed out to you that doing this at parse time is
> unacceptable, because of the implications for reverse-listing of rules
> (views).
>
>                        regards, tom lane
>

I am well aware of that thank you.

Regards,

Kerem KAT


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Kerem Kat <keremkat(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Thom Brown <thom(at)linux(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: (PATCH) Adding CORRESPONDING to Set Operations
Date: 2011-11-17 08:54:49
Message-ID: CAP7Qgmn3T+tSgUVBc7KGXgOCPCX5vDnNUOVeaPn06iSBSv9MoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 14, 2011 at 6:09 AM, Kerem Kat <keremkat(at)gmail(dot)com> wrote:
> On Mon, Nov 14, 2011 at 15:32, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Kerem Kat <keremkat(at)gmail(dot)com> writes:
>>> Corresponding is currently implemented in the parse/analyze phase. If
>>> it were to be implemented in the planning phase, explain output would
>>> likely be as you expect it to be.
>>
>> It's already been pointed out to you that doing this at parse time is
>> unacceptable, because of the implications for reverse-listing of rules
>> (views).
>>
>>                        regards, tom lane
>>
>
> I am well aware of that thank you.

I took a quick look at the patch and found some miscellaneous points including:

- don't use // style comment
- keep consistent in terms of space around parenthesis like if and foreach
- ereport should have error position as long as possible, especially
in syntax error
- I'm not convinced that new correspoinding_union.sql test is added. I
prefer to include new tests in union.sql
- CORRESPONDING BY should have column name list, not expression list.
It's not legal to say CORRESPONDING BY (1 + 1)
- column list rule should be presented in document, too
- I don't see why you call checkWellFormedRecursionWalker on
corresponding clause

And more than above, Tom's point is the biggest blocker. I'd suggest
to rework it so that target list of Query of RangeTblEntry on the top
of tree have less columns that match the filter. By that way, I guess
you can keep original information as well as filtered top-most target
list. Eventually you need to work on the planner, too. Though I've not
read all of the patch, the design rework should be done first. I'll
mark this as Waiting on Author.

Regards,
--
Hitoshi Harada