regexp_matches illegally restricts rows

Lists: pgsql-bugs
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: regexp_matches illegally restricts rows
Date: 2010-04-06 04:16:23
Message-ID: 4BBAB597.6000409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Severity: major (data loss)
Versions Tested: 8.4.2, 9.0 HEAD
Test Case:

create table regex_test ( id serial not null primary key, myname text );

insert into regex_test ( myname )
values ( 'josh'),('joe'),('mary'),('stephen'), ('jose'),
('kelley'),('alejandro');

select id, regexp_matches(myname, $x$(j[\w]+)$x$)
from regex_test;

The above will return 4 rows, not the 7 which are in the table.

I can't see how this is anything but a bug; as far as I know, nothing in
the target list is allowed to restrict the number of rows which are
returned by the query. We should get 7 rows, 3 of which have an empty
array or a NULL in the 2nd column.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Josh Berkus <josh(at)postgresql(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: regexp_matches illegally restricts rows -- just a documentation issue?
Date: 2010-04-06 05:06:46
Message-ID: 4BBAC166.2080901@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 4/5/10 9:16 PM, Josh Berkus wrote:

> I can't see how this is anything but a bug; as far as I know, nothing in
> the target list is allowed to restrict the number of rows which are
> returned by the query. We should get 7 rows, 3 of which have an empty
> array or a NULL in the 2nd column.

Just noticed it's a SETOF[] function. Which makes it odd that I can
call it in the target list at all, but explains the row restriction.

It's still confusing behavior (three regulars on IRC thought it was a
bug too) and users should be warned in the documentation. Not sure
exactly where, though ... maybe in 9.7?

--Josh Berkus


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)postgresql(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: regexp_matches illegally restricts rows -- just a documentation issue?
Date: 2010-04-06 16:10:00
Message-ID: q2i603c8f071004060910ta4ea434eiefb338ed6f18f519@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, Apr 6, 2010 at 1:06 AM, Josh Berkus <josh(at)postgresql(dot)org> wrote:
> On 4/5/10 9:16 PM, Josh Berkus wrote:
>
>> I can't see how this is anything but a bug; as far as I know, nothing in
>> the target list is allowed to restrict the number of rows which are
>> returned by the query.  We should get 7 rows, 3 of which have an empty
>> array or a NULL in the 2nd column.
>
> Just noticed it's a SETOF[] function.  Which makes it odd that I can
> call it in the target list at all, but explains the row restriction.
>
> It's still confusing behavior (three regulars on IRC thought it was a
> bug too) and users should be warned in the documentation.  Not sure
> exactly where, though ... maybe in 9.7?

While I understand why this is confusing, it's really very normal
behavior for a SRF, and I don't really think it makes sense to
document that this SRF behaves just like other SRFs...

...Robert


From: Josh Berkus <josh(at)postgresql(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: regexp_matches illegally restricts rows -- just a documentation issue?
Date: 2010-04-06 19:42:12
Message-ID: 4BBB8E94.4070102@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


> While I understand why this is confusing, it's really very normal
> behavior for a SRF, and I don't really think it makes sense to
> document that this SRF behaves just like other SRFs...

It's likely to be used by people who do not otherwise use SRFs, and many
would not be prepared for the consequences. It's not instinctive that a
regexp function would be an SRF in any case; if someone is not looking
closely at the docs, it would be easy to miss this entirely -- as 3
experienced PG people did yesterday.

Personally, I also think that PostgreSQL is wrong to allow an SRF in the
target list to restrict the number of rows output. A subselect in the
target list does not do so. However, that's completely another discussion.

--Josh Berkus


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Josh Berkus" <josh(at)postgresql(dot)org>
Cc: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: Re: regexp_matches illegally restricts rows -- just a documentation issue?
Date: 2010-04-06 20:00:41
Message-ID: 892692f19f05e7b626392ebaac320cd3.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, April 6, 2010 21:42, Josh Berkus wrote:
>
>> While I understand why this is confusing, it's really very normal
>> behavior for a SRF, and I don't really think it makes sense to
>> document that this SRF behaves just like other SRFs...
>
> It's likely to be used by people who do not otherwise use SRFs, and many
> would not be prepared for the consequences. It's not instinctive that a
> regexp function would be an SRF in any case; if someone is not looking
> closely at the docs, it would be easy to miss this entirely -- as 3
> experienced PG people did yesterday.
>
> Personally, I also think that PostgreSQL is wrong to allow an SRF in the
> target list to restrict the number of rows output. A subselect in the
> target list does not do so. However, that's completely another discussion.
>

You said:
"users should be warned in the documentation.";

The documentation has this warning:

"Currently, functions returning sets can also be called in the select list
of a query. For each row that the query generates by itself, the function
returning set is invoked, and an output row is generated for each element
of the function&#8217;s result set. Note, however, that this capability is
deprecated and might be removed in future releases."

(8.4 docs, section 34.4.7.)

Erik Rijkers