Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation

Lists: pgsql-bugs
From: "Daniele Varrazzo" <daniele(dot)varrazzo(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-24 13:16:51
Message-ID: 201005241316.o4ODGp03006023@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5469
Logged by: Daniele Varrazzo
Email address: daniele(dot)varrazzo(at)gmail(dot)com
PostgreSQL version: 8.4
Operating system: any
Description: regexp_matches() has poor behaviour and more poor
documentation
Details:

regexp_matches() has been recently discussed
(http://archives.postgresql.org/pgsql-bugs/2010-04/msg00026.php): it is a
setof function and as such it can drop results.

Unfortunately it is an useful function to newcomers who use SQL, use regexps
but don't arrive to read the chapter 34.4.7. (i.e. Extending SQL -> Query
Language (SQL) Functions -> SQL Functions Returning Sets) and are not so
enlightened to know that "setof text[]" means "if it doesn't match, it drops
the record". They just expect the function to be a LIKE on steroids.

Please describe the behavior in the documentation of the function (i.e.
table 9-6. and section 9.7.3), possibly provide a function with a saner
interface, i.e. returning a text[] of the first match or NULL on no match,
or document a workaround (suitable for an user knowing regexps but not
setof-returning functions) to make the function not dropping record (e.g. I
fixed the "bug" adding a "|" at the end of the pattern, so that the function
returns an array of NULL in case of no match: I don't think it is a trivial
workaround).


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-26 03:51:45
Message-ID: AANLkTilIkP5cqXLiVahSCOYuyU6gMu8Bdcany2yHtzOH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, May 24, 2010 at 9:16 AM, Daniele Varrazzo
<daniele(dot)varrazzo(at)gmail(dot)com> wrote:
> regexp_matches() has been recently discussed
> (http://archives.postgresql.org/pgsql-bugs/2010-04/msg00026.php): it is a
> setof function and as such it can drop results.
>
> Unfortunately it is an useful function to newcomers who use SQL, use regexps
> but don't arrive to read the chapter 34.4.7. (i.e. Extending SQL -> Query
> Language (SQL) Functions -> SQL Functions Returning Sets) and are not so
> enlightened to know that "setof text[]" means "if it doesn't match, it drops
> the record". They just expect the function to be a LIKE on steroids.
>
> Please describe the behavior in the documentation of the function (i.e.
> table 9-6. and section 9.7.3), possibly provide a function with a saner
> interface, i.e. returning a text[] of the first match or NULL on no match,
> or document a workaround (suitable for an user knowing regexps but not
> setof-returning functions) to make the function not dropping record (e.g. I
> fixed the "bug" adding a "|" at the end of the pattern, so that the function
> returns an array of NULL in case of no match: I don't think it is a trivial
> workaround).

I'm not sure that it's very productive to refer to the behavior of our
code as insane. We do document this in section 9.7.3, pretty
explicitly:

"The regexp_matches function returns all of the captured substrings
resulting from matching a POSIX regular expression pattern. It has the
syntax regexp_matches(string, pattern [, flags ]). If there is no
match to the pattern, the function returns no rows. If there is a
match, the function returns a text array whose n'th element is the
substring matching the n'th parenthesized subexpression of the pattern
(not counting "non-capturing" parentheses; see below for details)."

I think that's pretty clear. Your mileage may vary, of course.

I'm less confident than what we have in table 9-6 (other string
functions, in section 9.4, string functions and operators) is clear on
first reading, but neither do I immediately know how to improve it.
Perhaps instead of critiquing our insanity you could provide some
specific suggestions for improvement.

Similarly, if you think we should have another function besides
regexp_matches(), rather than just complaining that we don't, it would
be more useful to suggest a name and a specific behavior and ideally
maybe even provide a patch (or just the docs portion of a patch) -
especially if you can point to a specific use-case that is hard to do
with the SRF but would be easier with a function with a different
interface.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-26 04:14:02
Message-ID: 4BFCA00A.8040604@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 26/05/10 15:51, Robert Haas wrote:
>
> I'm not sure that it's very productive to refer to the behavior of our
> code as insane.
>

Not meaning to single you out Robert, but typically folk are honest with
their impression of the code without worrying about feather ruffling too
much e.g: searching for "brain dead" in the pg-hackers archives returns
a sizeable collection of reading material.

Personally I think it is good to be blunt about code we consider not
well thought out or well behaved. Obviously in some cases such comments
may turn out to be incorrect or misleading (e.g user error or not
reading the docs), but I don't think we should try (too hard anyway) to
smother any strong criticism.

regards

Mark


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-26 11:58:15
Message-ID: AANLkTil7bJBR-pH_tDUCZqKtSTypFg_BqjTYpPC1cHL1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, May 26, 2010 at 4:51 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, May 24, 2010 at 9:16 AM, Daniele Varrazzo
> <daniele(dot)varrazzo(at)gmail(dot)com> wrote:
>> regexp_matches() has been recently discussed
>> (http://archives.postgresql.org/pgsql-bugs/2010-04/msg00026.php): it is a
>> setof function and as such it can drop results.
>>
>> Unfortunately it is an useful function to newcomers who use SQL, use regexps
>> but don't arrive to read the chapter 34.4.7. (i.e. Extending SQL -> Query
>> Language (SQL) Functions -> SQL Functions Returning Sets) and are not so
>> enlightened to know that "setof text[]" means "if it doesn't match, it drops
>> the record". They just expect the function to be a LIKE on steroids.
>>
>> Please describe the behavior in the documentation of the function (i.e.
>> table 9-6. and section 9.7.3), possibly provide a function with a saner
>> interface, i.e. returning a text[] of the first match or NULL on no match,
>> or document a workaround (suitable for an user knowing regexps but not
>> setof-returning functions) to make the function not dropping record (e.g. I
>> fixed the "bug" adding a "|" at the end of the pattern, so that the function
>> returns an array of NULL in case of no match: I don't think it is a trivial
>> workaround).
>
> I'm not sure that it's very productive to refer to the behavior of our
> code as insane.  We do document this in section 9.7.3, pretty
> explicitly:
>
> "The regexp_matches function returns all of the captured substrings
> resulting from matching a POSIX regular expression pattern. It has the
> syntax regexp_matches(string, pattern  [, flags  ]). If there is no
> match to the pattern, the function returns no rows. If there is a
> match, the function returns a text array whose n'th element is the
> substring matching the n'th parenthesized subexpression of the pattern
> (not counting "non-capturing" parentheses; see below for details)."
>
> I think that's pretty clear.  Your mileage may vary, of course.

"If there is no match to the pattern, the function returns no rows" is
easily overlooked as "it returns null", or some other behaviour that
don't change the returned set. The point is, because the function is
listed in the string function, you would expect the function to
manipulate text, not the dataset. The function as it is is not safe to
be used in a construct

SELECT foo, bar, regexp_matches(bar, pattern) FROM table;

unless you really wanted:

SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
~ pattern;

otherwise you have to take measures to be able to deal with records in
which the pattern is not matched, for example:

SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;

the latter still doesn't work when bar is NULL: in this case the
record is dropped anyway, so I don't think it can be proposed as
general solution.

The characteristics of returning a set of text[] is useful when the
user wants all the matches, not only the first one: the behaviour is
selected specifying the flag 'g' as third argument.

From this point of view, I hope it can be stated that in its current
form the regexp_matches() has not the most optimal interface. Please
accept my apology for the tone being too rude in my previous message.

> I'm less confident than what we have in table 9-6 (other string
> functions, in section 9.4, string functions and operators) is clear on
> first reading, but neither do I immediately know how to improve it.
> Perhaps instead of critiquing our insanity you could provide some
> specific suggestions for improvement.
>
> Similarly, if you think we should have another function besides
> regexp_matches(), rather than just complaining that we don't, it would
> be more useful to suggest a name and a specific behavior and ideally
> maybe even provide a patch (or just the docs portion of a patch) -
> especially if you can point to a specific use-case that is hard to do
> with the SRF but would be easier with a function with a different
> interface.

Below I assume an alternative function is provided. I have problems in
finding a name for the function, as regexp_matches() is already used.
I would call it regexp_match() in reference to the fact that it
returns a single value (being an array) and not a list of matches as
potentially regexp_matches() could. The quite similar names could be a
problem though.

Because table 9-6 is the index people look for when they have a task
related to strings, I would say wording should be:

[regexp_matches:] Return all groups of captured substrings resulting
from matching a POSIX regular expression against the string. Warning:
in case of no match, tested record is dropped. See Section 9.7.3 for
more information.
[regexp_match:] Return the first group of captured substrings
resulting from matching a POSIX regular expression against the string.
In case of no match, return NULL. See Section 9.7.3 for more
information.

In section 9.7.3, after "If there is no match to the pattern, the
function returns no rows." I would add "This means that if the
function is used in a SELECT, records where the string don't match the
pattern are discarded from the dataset. If such records are required,
use regexp_match() instead".

Reference implementation for regexp_match() is:

CREATE OR REPLACE FUNCTION regexp_match(s text, pattern text)
RETURNS text[] AS
$$
DECLARE
rv text[];
BEGIN
SELECT * INTO rv FROM regexp_matches(s, pattern);
IF FOUND THEN
RETURN rv;
ELSE
RETURN NULL;
END IF;
END;
$$
LANGUAGE 'plpgsql'
IMMUTABLE STRICT ;

The reference implementation is rather inefficient: a more efficient
one can be easily provided in C, sharing code with
regexp_{matches,split}.

A minimal test case is:

test=> SELECT regexp_match('foobarbequebaz', '(bar)(beque)');
regexp_match | {bar,beque}

test=> SELECT regexp_match('foobarbaz', '(bar)(beque)');
regexp_match |

test=> SELECT regexp_match(NULL, '(bar)(beque)');
regexp_match |

test=> SELECT regexp_match('foobarbequebaz', NULL);
regexp_match |

If the problem is acknowledged, I'd be happy to provide relevant patches.

-- Daniele


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-26 20:13:00
Message-ID: AANLkTinWJwn6osAEnwU1KG_p3fM1aFURjTeCg0Kn8F3B@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Wed, May 26, 2010 at 7:58 AM, Daniele Varrazzo
<daniele(dot)varrazzo(at)gmail(dot)com> wrote:
>> I'm not sure that it's very productive to refer to the behavior of our
>> code as insane.  We do document this in section 9.7.3, pretty
>> explicitly:
>>
>> "The regexp_matches function returns all of the captured substrings
>> resulting from matching a POSIX regular expression pattern. It has the
>> syntax regexp_matches(string, pattern  [, flags  ]). If there is no
>> match to the pattern, the function returns no rows. If there is a
>> match, the function returns a text array whose n'th element is the
>> substring matching the n'th parenthesized subexpression of the pattern
>> (not counting "non-capturing" parentheses; see below for details)."
>>
>> I think that's pretty clear.  Your mileage may vary, of course.
>
> "If there is no match to the pattern, the function returns no rows" is
> easily overlooked as "it returns null", or some other behaviour that
> don't change the returned set. The point is, because the function is
> listed in the string function, you would expect the function to
> manipulate text, not the dataset. The function as it is is not safe to
> be used in a construct
>
>    SELECT foo, bar, regexp_matches(bar, pattern) FROM table;
>
> unless you really wanted:
>
>    SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
> ~ pattern;

Well, even that's not really the same thing... if you're surprised by
getting no rows for a row in the source table, you could easily also
be surprised by getting more than one.

> otherwise you have to take measures to be able to deal with records in
> which the pattern is not matched, for example:
>
>    SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;
>
> the latter still doesn't work when bar is NULL: in this case the
> record is dropped anyway, so I don't think it can be proposed as
> general solution.
>
> The characteristics of returning a set of text[] is useful when the
> user wants all the matches, not only the first one: the behaviour is
> selected specifying the flag 'g' as third argument.
>
> From this point of view, I hope it can be stated that in its current
> form the regexp_matches() has not the most optimal interface. Please
> accept my apology for the tone being too rude in my previous message.

No sweat. I don't agree that with the statement that regexp_matches()
is not the optimal interface, but I would agree with an alternative
statement that some people might prefer a different interface.

>> I'm less confident than what we have in table 9-6 (other string
>> functions, in section 9.4, string functions and operators) is clear on
>> first reading, but neither do I immediately know how to improve it.
>> Perhaps instead of critiquing our insanity you could provide some
>> specific suggestions for improvement.
>>
>> Similarly, if you think we should have another function besides
>> regexp_matches(), rather than just complaining that we don't, it would
>> be more useful to suggest a name and a specific behavior and ideally
>> maybe even provide a patch (or just the docs portion of a patch) -
>> especially if you can point to a specific use-case that is hard to do
>> with the SRF but would be easier with a function with a different
>> interface.
>
> Below I assume an alternative function is provided. I have problems in
> finding a name for the function, as regexp_matches() is already used.
> I would call it regexp_match() in reference to the fact that it
> returns a single value (being an array) and not a list of matches as
> potentially regexp_matches() could. The quite similar names could be a
> problem though.

Actually, I kind of like that. I think it would be reasonable to
provide regexp_match() returning text[] and regexp_matches() returning
setof text[].

> Because table 9-6 is the index people look for when they have a task
> related to strings, I would say wording should be:
>
> [regexp_matches:] Return all groups of captured substrings resulting
> from matching a POSIX regular expression against the string. Warning:
> in case of no match, tested record is dropped. See Section 9.7.3 for
> more information.
> [regexp_match:] Return the first group of captured substrings
> resulting from matching a POSIX regular expression against the string.
> In case of no match, return NULL. See Section 9.7.3 for more
> information.
>
> In section 9.7.3, after "If there is no match to the pattern, the
> function returns no rows." I would add "This means that if the
> function is used in a SELECT, records where the string don't match the
> pattern are discarded from the dataset. If such records are required,
> use regexp_match() instead".

I think that talking about rows being dropped is confusing and not
really accurate. What I would say is that regexp_matches() is a
set-returning function and can return multiple rows, or none.
Therefore, calling it in the target list may increase or decrease the
number of output rows. If this behavior is not desired, use
regexp_match() instead.

> If the problem is acknowledged, I'd be happy to provide relevant patches.

Assuming a lack of violent disagreement, I'd go for it.

http://wiki.postgresql.org/wiki/Submitting_a_Patch

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-27 07:56:16
Message-ID: htl8j0$aau$2@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On 2010-05-26, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> wrote:

> [regexp_matches:] Return all groups of captured substrings resulting
> from matching a POSIX regular expression against the string. Warning:
> in case of no match, tested record is dropped. See Section 9.7.3 for
> more information.

possibly like this instead.

[regexp_matches:] Return all groups of captured substrings resulting
from matching a POSIX regular expression against the string. Note:
one row is returned for each match, no matches results in no rows,
several matches results in several rows.

> In section 9.7.3, after "If there is no match to the pattern, the
> function returns no rows." I would add "This means that if the
> function is used in a SELECT, records where the string don't match the
> pattern are discarded from the dataset. If such records are required,
> use regexp_match() instead".

or use an outer join to a subquery if you want to see all the results.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-29 02:01:37
Message-ID: 201005290201.o4T21bk04708@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Daniele Varrazzo wrote:
> "If there is no match to the pattern, the function returns no rows" is
> easily overlooked as "it returns null", or some other behaviour that
> don't change the returned set. The point is, because the function is
> listed in the string function, you would expect the function to
> manipulate text, not the dataset. The function as it is is not safe to
> be used in a construct
>
> SELECT foo, bar, regexp_matches(bar, pattern) FROM table;
>
> unless you really wanted:
>
> SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
> ~ pattern;
>
> otherwise you have to take measures to be able to deal with records in
> which the pattern is not matched, for example:
>
> SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;
>
> the latter still doesn't work when bar is NULL: in this case the
> record is dropped anyway, so I don't think it can be proposed as
> general solution.
>
> The characteristics of returning a set of text[] is useful when the
> user wants all the matches, not only the first one: the behaviour is
> selected specifying the flag 'g' as third argument.
>
> >From this point of view, I hope it can be stated that in its current
> form the regexp_matches() has not the most optimal interface. Please
> accept my apology for the tone being too rude in my previous message.

I found the description in the documentation quite confusing also. I
have created the attached documention patch which is clearer about the
behavior of regexp_matches().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

Attachment Content-Type Size
/pgpatches/regexp text/x-diff 2.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-29 21:00:55
Message-ID: 201005292100.o4TL0tf27606@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


I have updated the patch, attached, to clarify that this returns text
arrays, and that you can force it to always return one row using
COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Daniele Varrazzo wrote:
> > "If there is no match to the pattern, the function returns no rows" is
> > easily overlooked as "it returns null", or some other behaviour that
> > don't change the returned set. The point is, because the function is
> > listed in the string function, you would expect the function to
> > manipulate text, not the dataset. The function as it is is not safe to
> > be used in a construct
> >
> > SELECT foo, bar, regexp_matches(bar, pattern) FROM table;
> >
> > unless you really wanted:
> >
> > SELECT foo, bar, regexp_matches(bar, pattern) FROM table WHERE bar
> > ~ pattern;
> >
> > otherwise you have to take measures to be able to deal with records in
> > which the pattern is not matched, for example:
> >
> > SELECT foo, bar, regexp_matches(bar, pattern || '|') FROM table;
> >
> > the latter still doesn't work when bar is NULL: in this case the
> > record is dropped anyway, so I don't think it can be proposed as
> > general solution.
> >
> > The characteristics of returning a set of text[] is useful when the
> > user wants all the matches, not only the first one: the behaviour is
> > selected specifying the flag 'g' as third argument.
> >
> > >From this point of view, I hope it can be stated that in its current
> > form the regexp_matches() has not the most optimal interface. Please
> > accept my apology for the tone being too rude in my previous message.
>
> I found the description in the documentation quite confusing also. I
> have created the attached documention patch which is clearer about the
> behavior of regexp_matches().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

Attachment Content-Type Size
/pgpatches/regexp text/x-diff 3.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-30 03:45:45
Message-ID: AANLkTimN3m-TUfJjTwSQE-00G7vvHPdvYlLAcApuAZWW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> I have updated the patch, attached, to clarify that this returns text
> arrays, and that you can force it to always return one row using
> COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).

I don't find this part to be something we should include in the
documentation. If we want to include a workaround, how about defining
a non-SRF that just calls the SRF and returns the first row?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-30 16:59:36
Message-ID: AANLkTingKKwwBTLIJas2PRd8coEhAW0b9mbBvMJaNNNz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, May 30, 2010 at 4:45 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> I have updated the patch, attached, to clarify that this returns text
>> arrays, and that you can force it to always return one row using
>> COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
>
> I don't find this part to be something we should include in the
> documentation.  If we want to include a workaround, how about defining
> a non-SRF that just calls the SRF and returns the first row?

I think a documentation correction could be backported without problem
to all the currently maintained version of PostgreSQL (which would be
of good google value, as very often google searches lands you to
previous releases doc pages), whereas a easier to use function would
be a new feature and as such could only be introduced in 9.0 or even
9.1.

-- Daniele


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-31 14:11:44
Message-ID: 201005311411.o4VEBiQ11120@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas wrote:
> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > I have updated the patch, attached, to clarify that this returns text
> > arrays, and that you can force it to always return one row using
> > COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
>
> I don't find this part to be something we should include in the
> documentation. If we want to include a workaround, how about defining
> a non-SRF that just calls the SRF and returns the first row?

Remember this has to return one row for no matches, so a simple SRF will
not work. I also have not seen enough demand for another function. A
single doc mention seemed the appropriate level of detail for this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-05-31 14:12:55
Message-ID: 201005311412.o4VECtI11351@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Daniele Varrazzo wrote:
> On Sun, May 30, 2010 at 4:45 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> I have updated the patch, attached, to clarify that this returns text
> >> arrays, and that you can force it to always return one row using
> >> COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
> >
> > I don't find this part to be something we should include in the
> > documentation. ?If we want to include a workaround, how about defining
> > a non-SRF that just calls the SRF and returns the first row?
>
> I think a documentation correction could be backported without problem
> to all the currently maintained version of PostgreSQL (which would be
> of good google value, as very often google searches lands you to
> previous releases doc pages), whereas a easier to use function would
> be a new feature and as such could only be introduced in 9.0 or even
> 9.1.

While you might have had this problem, it is not a common problem so not
something we are about to take tons of time addressing.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-06-01 13:58:15
Message-ID: AANLkTik3Yva4rQBChx9g52w9pkktQAGf3zW39C_AUJIj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, May 31, 2010 at 10:11 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Robert Haas wrote:
>> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > I have updated the patch, attached, to clarify that this returns text
>> > arrays, and that you can force it to always return one row using
>> > COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
>>
>> I don't find this part to be something we should include in the
>> documentation.  If we want to include a workaround, how about defining
>> a non-SRF that just calls the SRF and returns the first row?
>
> Remember this has to return one row for no matches, so a simple SRF will
> not work.  I also have not seen enough demand for another function.  A
> single doc mention seemed the appropriate level of detail for this.

Well, we can debate later whether to add another function to core, but
what I meant was that the user having the problem could create a
user-defined function that calls regexp_matches() and returns the
first row, or NULL.

But actually here's an even simpler workaround, which is IMHO less
ugly than the original one:

SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-06-01 14:26:50
Message-ID: 2452.1275402410@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> But actually here's an even simpler workaround, which is IMHO less
> ugly than the original one:

> SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;

Doesn't that blow up if the subselect returns more than one row?

I think you could make it work by wrapping regexp_matches in a
simple (non-SETOF) SQL function, but just writing out the sub-SELECT
doesn't do it. This goes back to the recent discussion of why SQL
functions can't always be inlined --- the semantics are a bit
different in some cases.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-06-01 14:31:00
Message-ID: 201006011431.o51EV0w14740@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > But actually here's an even simpler workaround, which is IMHO less
> > ugly than the original one:
>
> > SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;
>
> Doesn't that blow up if the subselect returns more than one row?
>
> I think you could make it work by wrapping regexp_matches in a
> simple (non-SETOF) SQL function, but just writing out the sub-SELECT
> doesn't do it. This goes back to the recent discussion of why SQL
> functions can't always be inlined --- the semantics are a bit
> different in some cases.

If you don't use 'g' as a third argument, it can't return more than one
row.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-06-01 14:40:45
Message-ID: 201006011440.o51EejP16631@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Robert Haas wrote:
> On Mon, May 31, 2010 at 10:11 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > Robert Haas wrote:
> >> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> >> > I have updated the patch, attached, to clarify that this returns text
> >> > arrays, and that you can force it to always return one row using
> >> > COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
> >>
> >> I don't find this part to be something we should include in the
> >> documentation. ?If we want to include a workaround, how about defining
> >> a non-SRF that just calls the SRF and returns the first row?
> >
> > Remember this has to return one row for no matches, so a simple SRF will
> > not work. ?I also have not seen enough demand for another function. ?A
> > single doc mention seemed the appropriate level of detail for this.
>
> Well, we can debate later whether to add another function to core, but
> what I meant was that the user having the problem could create a
> user-defined function that calls regexp_matches() and returns the
> first row, or NULL.
>
> But actually here's an even simpler workaround, which is IMHO less
> ugly than the original one:
>
> SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;

Good idea. Simplified patch attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +

Attachment Content-Type Size
/pgpatches/regexp text/x-diff 3.4 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5469: regexp_matches() has poor behaviour and more poor documentation
Date: 2010-06-03 14:40:47
Message-ID: 201006031440.o53EelZ12335@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Bruce Momjian wrote:
> Robert Haas wrote:
> > On Mon, May 31, 2010 at 10:11 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > > Robert Haas wrote:
> > >> On Sat, May 29, 2010 at 5:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> > >> > I have updated the patch, attached, to clarify that this returns text
> > >> > arrays, and that you can force it to always return one row using
> > >> > COALESCE() and a '|' pattern (the later suggested by Daniele Varrazzo).
> > >>
> > >> I don't find this part to be something we should include in the
> > >> documentation. ?If we want to include a workaround, how about defining
> > >> a non-SRF that just calls the SRF and returns the first row?
> > >
> > > Remember this has to return one row for no matches, so a simple SRF will
> > > not work. ?I also have not seen enough demand for another function. ?A
> > > single doc mention seemed the appropriate level of detail for this.
> >
> > Well, we can debate later whether to add another function to core, but
> > what I meant was that the user having the problem could create a
> > user-defined function that calls regexp_matches() and returns the
> > first row, or NULL.
> >
> > But actually here's an even simpler workaround, which is IMHO less
> > ugly than the original one:
> >
> > SELECT foo, bar, (SELECT regexp_matches(bar, pattern)) FROM table;
>
> Good idea. Simplified patch attached.

Applied.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +