query_is_distinct_for does not take into account set returning functions

Lists: pgsql-hackers
From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: query_is_distinct_for does not take into account set returning functions
Date: 2014-07-08 08:27:08
Message-ID: CAApHDvrfVkH0P3FAooGcckBy7feCJ9QFanKLkX7MWsBcxY2Vcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Over here ->
http://www.postgresql.org/message-id/6351.1404663344@sss.pgh.pa.us Tom
noted that create_unique_path did not check for set returning functions.

Tom Wrote:
> I notice that create_unique_path is not paying attention to the question
> of whether the subselect's tlist contains SRFs or volatile functions.
> It's possible that that's a pre-existing bug.

I looked at this a bit and I can confirm that it does not behave as it
should do. Take the following as an example:

create table x (id int primary key);
create table y (n int not null);

insert into x values(1);
insert into y values(1);

select * from x where (id,id) in(select n,generate_series(1,2) / 10 + 1 g
from y);
id
----
1
(1 row)

select * from x where (id,id) in(select n,generate_series(1,2) / 10 + 1 g
from y group by n);
id
----
1
1
(2 rows)

The 2nd query does group by n, so query_is_distinct_for returns true,
therefore the outer query think's it's ok to perform an INNER JOIN rather
than a SEMI join, which is this case produces an extra record.

I think we should probably include the logic to test for set returning
functions into query_is_distinct_for.

The attached fixes the problem.

Regards

David Rowley

Attachment Content-Type Size
query_is_distinct_for_fix.patch application/octet-stream 846 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: query_is_distinct_for does not take into account set returning functions
Date: 2014-07-08 14:45:13
Message-ID: 7496.1404830713@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Rowley <dgrowleyml(at)gmail(dot)com> writes:
> I think we should probably include the logic to test for set returning
> functions into query_is_distinct_for.

It strikes me that there's only a problem if the SRF is in a tlist entry
that is not one of the DISTINCT or GROUP BY columns, respectively. It
may not be worth the extra complexity to figure that out, though.

regards, tom lane