Re: patch adding new regexp functions

Lists: pgsql-hackerspgsql-patches
From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: writing new regexp functions
Date: 2007-02-01 21:20:18
Message-ID: Pine.BSO.4.64.0702011309240.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I am wanting to write some new C functions which leverage postgresql's
existing regexp code in an extension module. I notice that the functions
RE_compile_and_cache and RE_compile_and_execute in
src/backend/util/regexp.c contain the code necessary to connect the regexp
code in src/backend/regex with the postgresql string conversion, error
reporting, and memory management infrastructure, as well as providing
caching of regexes which would probably be a win to any regex function in
postgres. It would seem that these functions would be useful to any
C function dealing with regexp matching in postgresql, but they are static
functions, so they cannot be used outside of
src/backend/utils/adt/regexp.c. Since all of the core regexp functions
are in this file, this has been ok, but it is my opinion that these
functions should be made visible and added to a header file so that
extensions can make use of them, because any add-on functions that want to
use the regex code in postgres in some new way would need to basically
duplicate that same code in order to do so.

Is there some specific reason that these functions are static, or would it
be ok to make them non-static and add them to a header (say,
src/include/utils/regexp.h) so that extensions could use them as well? I
could put together a patch for this if desired, or it seems simple enough
that someone could just do it...

--
I can't decide whether to commit suicide or go bowling.
-- Florence Henderson


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 00:41:39
Message-ID: 1875.1170376899@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> Is there some specific reason that these functions are static,

Yeah: not cluttering the global namespace. I'm not excited about
exporting everything that anybody could possibly want access to;
that just makes it harder to maintain the code. When you see a
static function, you know that you don't have to look further than
the current file to understand how it's used. When you see a global
function, the difficulty of knowing how it's used is an order of
magnitude higher, maybe more. What's more, if you want to change it
then you have to worry about the possible side-effects on unknown
non-core code that might be calling it.

Is there a reason for not putting your new code itself into regexp.c?

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 01:11:30
Message-ID: Pine.BSO.4.64.0702011658050.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 1 Feb 2007, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > Is there some specific reason that these functions are static,
>
> Yeah: not cluttering the global namespace.

> Is there a reason for not putting your new code itself into regexp.c?

Not really, I just figured it would be cleaner/easier to write it as an
extension. I also figure that it is unlikely that every regexp function
that anyone could possibly want will be implemented in core in that one
file. If anyone writes an extension like this, they would need to
duplicate a good amount of code in order to do so, that would make more
difficulty in maintaining the code if it should need to change. It also
makes developing a new function a lot easier, no need to re-initdb to add
the function, no need to relink the postmaster and restart it every time
the function changes.

Anyway, the particular thing I was writing was a function like
substring(str FROM pattern) which instead of returning just the first
match group, would return an array of text containing all of the match
groups. I exported the functions in my sandbox, and wrote a module with a
function that does this.

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

--
Calling J-Man Kink. Calling J-Man Kink. Hash missile sighted, target
Los Angeles. Disregard personal feelings about city and intercept.


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-02 03:29:35
Message-ID: Pine.BSO.4.64.0702011914480.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 1 Feb 2007, Jeremy Drake wrote:

> On Thu, 1 Feb 2007, Tom Lane wrote:
>
> > Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > > Is there some specific reason that these functions are static,
> >
> > Yeah: not cluttering the global namespace.
>
> > Is there a reason for not putting your new code itself into regexp.c?
>
> Not really, I just figured it would be cleaner/easier to write it as an
> extension. I also figure that it is unlikely that every regexp function
> that anyone could possibly want will be implemented in core in that one
> file.
<snip>

> Anyway, the particular thing I was writing was a function like
> substring(str FROM pattern) which instead of returning just the first
> match group, would return an array of text containing all of the match
> groups. I exported the functions in my sandbox, and wrote a module with a
> function that does this.

I have attached the patch I have put together, which does the following:
* Expose the previously static RE_* functions from regexp.c which wrap
the code in src/backend/regex with postgres-style errors, string
conversion, and caching of patterns.

* expose regex_flavor guc var, which is needed to know how to interpret
patterns when compiling them

* Add a couple more RE_* functions in regexp.c to provide access
to different levels of the process, which were necessary to avoid
duplicating effort elsewhere.

* Update replace_text_regexp in varlena.c to use newly exposed functions
from regexp.c instead of duplicating error handling code from there.

Also attached is the function I wrote to retrieve all of the capture
groups in a pattern match in a text[]. I also intend to put together a
function analogous to split_part which will take a string and a pattern to
split on, and return setof text.

Let me know if I should work under the assumption of the attached patch
and write the functions for contrib or pgfoundry, or to put the functions
in regexp.c and try to get them in core, or both? (it made my life a lot
easier working on the function to not have to restart the postmaster every
time I recompiled it, may be nice for the future to be able to make
extensions like this...)

--
To err is human, to forgive, beyond the scope of the Operating System.

Attachment Content-Type Size
regexp_ext.c text/plain 1.8 KB
regexp-export.patch text/plain 9.3 KB

From: David Fetter <david(at)fetter(dot)org>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 05:11:42
Message-ID: 20070202051142.GE3882@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 01, 2007 at 05:11:30PM -0800, Jeremy Drake wrote:
> On Thu, 1 Feb 2007, Tom Lane wrote:
>
> > Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > > Is there some specific reason that these functions are static,
> >
> > Yeah: not cluttering the global namespace.
>
> > Is there a reason for not putting your new code itself into regexp.c?
>
> Not really, I just figured it would be cleaner/easier to write it as an
> extension. I also figure that it is unlikely that every regexp function
> that anyone could possibly want will be implemented in core in that one
> file. If anyone writes an extension like this, they would need to
> duplicate a good amount of code in order to do so, that would make more
> difficulty in maintaining the code if it should need to change. It also
> makes developing a new function a lot easier, no need to re-initdb to add
> the function, no need to relink the postmaster and restart it every time
> the function changes.
>
> Anyway, the particular thing I was writing was a function like
> substring(str FROM pattern) which instead of returning just the
> first match group, would return an array of text containing all of
> the match groups.

That'd be great! People who use dynamic languages like Perl would
feel much more at home having access to all the matches. While you're
at it, could you could make pre-match and post-match (optionally--I
know it's expensive) available?

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 06:16:54
Message-ID: Pine.BSO.4.64.0702012204460.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 1 Feb 2007, David Fetter wrote:

> On Thu, Feb 01, 2007 at 05:11:30PM -0800, Jeremy Drake wrote:
> > Anyway, the particular thing I was writing was a function like
> > substring(str FROM pattern) which instead of returning just the
> > first match group, would return an array of text containing all of
> > the match groups.

If you are subscribed to -patches, I sent my code to date there earlier
this evening. I also said that I wanted to make a function that split on
a pattern (like perl split) and returned setof text.

> That'd be great! People who use dynamic languages like Perl would
> feel much more at home having access to all the matches. While you're
> at it, could you could make pre-match and post-match (optionally--I
> know it's expensive) available?

I could, but I'm not sure how someone would go about accessing such a
thing. What I just wrote would be most like this perl:
@foo = ($str=~/pattern/);

Where would pre and post match fit into this? Are you talking about a
different function? Or sticking prematch at the beginning of the array
and postmatch at the end? I could also put the whole match somewhere
also, but I did not in this version.

The code I wrote returns a text[] which is one-dimensional, has a lower
bound of 1 (as most postgres arrays do), where if there are n capture
groups, ra[1] has the first capture group and ra[n] has the last one.
Since postgres has an option to make different lower bounds, I suppose I
could have an option to put the prematch in [-1], the entire match in [0],
and the postmatch in [n+1]. This seems to be odd to me though.

I guess I'm saying, I agree that the entire match, prematch, and postmatch
would be helpful, but how would you propose to present these to the user?

>
> Cheers,
> D
>

--
To err is human, to forgive, beyond the scope of the Operating System.


From: David Fetter <david(at)fetter(dot)org>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 06:55:18
Message-ID: 20070202065518.GF3882@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 01, 2007 at 10:16:54PM -0800, Jeremy Drake wrote:
> On Thu, 1 Feb 2007, David Fetter wrote:
>
> > On Thu, Feb 01, 2007 at 05:11:30PM -0800, Jeremy Drake wrote:
> > > Anyway, the particular thing I was writing was a function like
> > > substring(str FROM pattern) which instead of returning just the
> > > first match group, would return an array of text containing all
> > > of the match groups.
>
> If you are subscribed to -patches, I sent my code to date there
> earlier this evening. I also said that I wanted to make a function
> that split on a pattern (like perl split) and returned setof text.
>
> > That'd be great! People who use dynamic languages like Perl would
> > feel much more at home having access to all the matches. While
> > you're at it, could you could make pre-match and post-match
> > (optionally--I know it's expensive) available?
>
> I could, but I'm not sure how someone would go about accessing such
> a thing. What I just wrote would be most like this perl: @foo =
> ($str=~/pattern/);

> Where would pre and post match fit into this? Are you talking about a
> different function?

Yes, although it might have the same name, as in regex_match(pattern
TEXT, string TEXT, return_pre_and_post BOOL).

> Or sticking prematch at the beginning of the array and postmatch at
> the end? I could also put the whole match somewhere also, but I did
> not in this version.

The data structure could be something like

TYPE matches (
prematch TEXT,
match TEXT[],
postmatch TEXT
)

> The code I wrote returns a text[] which is one-dimensional, has a lower
> bound of 1 (as most postgres arrays do), where if there are n capture
> groups, ra[1] has the first capture group and ra[n] has the last one.
> Since postgres has an option to make different lower bounds, I suppose I
> could have an option to put the prematch in [-1], the entire match in [0],
> and the postmatch in [n+1]. This seems to be odd to me though.

Odd == bad. I think the pre- and post-matches should be different in
essence, not just in index :)

> I guess I'm saying, I agree that the entire match, prematch, and postmatch
> would be helpful, but how would you propose to present these to the user?

See above :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 08:15:15
Message-ID: Pine.BSO.4.64.0702020011160.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 1 Feb 2007, David Fetter wrote:

> Yes, although it might have the same name, as in regex_match(pattern
> TEXT, string TEXT, return_pre_and_post BOOL).
>
> The data structure could be something like
>
> TYPE matches (
> prematch TEXT,
> match TEXT[],
> postmatch TEXT
> )

I just coded up for this:

CREATE FUNCTION regexp_matches(IN str text, IN pattern text) RETURNS
text[]
AS 'MODULE_PATHNAME', 'regexp_matches'
LANGUAGE C IMMUTABLE STRICT;

CREATE FUNCTION regexp_matches(
IN str text, IN pattern text, IN return_pre_and_post bool,
OUT prematch text, OUT fullmatch text, OUT matches text[], OUT
postmatch text) RETURNS record
AS 'MODULE_PATHNAME', 'regexp_matches'
LANGUAGE C IMMUTABLE STRICT;

Which works like this:

jeremyd=# \pset null '\\N'
Null display is "\N".
jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(beque)$re$);
regexp_matches
----------------
{bar,beque}
(1 row)

jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(beque)$re$, false);
prematch | fullmatch | matches | postmatch
----------+-----------+-------------+-----------
\N | \N | {bar,beque} | \N
(1 row)

jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(beque)$re$, true);
prematch | fullmatch | matches | postmatch
----------+-----------+-------------+-----------
foo | barbeque | {bar,beque} | baz
(1 row)

And then you also have this behavior in the matches array:

jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(.*)(beque)$re$);
regexp_matches
----------------
{bar,"",beque}
(1 row)

jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(.+)(beque)$re$);
regexp_matches
----------------
\N
(1 row)

jeremyd=# select * from regexp_matches('foobarbequebaz',
$re$(bar)(.+)?(beque)$re$);
regexp_matches
------------------
{bar,NULL,beque}
(1 row)

Reasonable?

--
A.A.A.A.A.:
An organization for drunks who drive


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 08:54:30
Message-ID: Pine.BSO.4.64.0702020048510.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 2 Feb 2007, Jeremy Drake wrote:

> jeremyd=# select * from regexp_matches('foobarbequebaz',
> $re$(bar)(beque)$re$, false);
> prematch | fullmatch | matches | postmatch
> ----------+-----------+-------------+-----------
> \N | \N | {bar,beque} | \N
> (1 row)

I just changed this to fill in fullmatch when the bool is false, so this
one would look like:
prematch | fullmatch | matches | postmatch
----------+-----------+-------------+-----------
\N | barbeque | {bar,beque} | \N
(1 row)

I also removed my check for capture groups, since in this setup you could
get useful output without any. I am still trying to decide whether or not
to add back an error if you called the no-bool version which just returns
the array, and you do not have any capture groups. ISTM this is likely an
oversight on the query author's part, and it would be helpful to alert him
to this.

If you have no capture groups, the matches array is empty (not null). If
the match happened at the start of the string, the prematch is an empty
string, and if the match happened at the end of the string, the postmatch
is an empty string.

> Reasonable?

--
It's odd, and a little unsettling, to reflect upon the fact that
English is the only major language in which "I" is capitalized; in many
other languages "You" is capitalized and the "i" is lower case.
-- Sydney J. Harris


From: David Fetter <david(at)fetter(dot)org>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-02 17:12:27
Message-ID: 20070202171227.GA28220@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 02, 2007 at 12:54:30AM -0800, Jeremy Drake wrote:
> On Fri, 2 Feb 2007, Jeremy Drake wrote:
>
> > jeremyd=# select * from regexp_matches('foobarbequebaz',
> > $re$(bar)(beque)$re$, false);
> > prematch | fullmatch | matches | postmatch
> > ----------+-----------+-------------+-----------
> > \N | \N | {bar,beque} | \N
> > (1 row)
>
> I just changed this to fill in fullmatch when the bool is false, so this
> one would look like:
> prematch | fullmatch | matches | postmatch
> ----------+-----------+-------------+-----------
> \N | barbeque | {bar,beque} | \N
> (1 row)
>
> I also removed my check for capture groups, since in this setup you could
> get useful output without any. I am still trying to decide whether or not
> to add back an error if you called the no-bool version which just returns
> the array, and you do not have any capture groups. ISTM this is likely an
> oversight on the query author's part, and it would be helpful to alert him
> to this.
>
> If you have no capture groups, the matches array is empty (not null). If
> the match happened at the start of the string, the prematch is an empty
> string, and if the match happened at the end of the string, the postmatch
> is an empty string.
>
> > Reasonable?

This is great :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-03 00:59:54
Message-ID: Pine.BSO.4.64.0702021653360.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 2 Feb 2007, Jeremy Drake wrote:

> I just coded up for this:
>
> CREATE FUNCTION regexp_matches(IN str text, IN pattern text) RETURNS
> text[]
> AS 'MODULE_PATHNAME', 'regexp_matches'
> LANGUAGE C IMMUTABLE STRICT;
>
> CREATE FUNCTION regexp_matches(
> IN str text, IN pattern text, IN return_pre_and_post bool,
> OUT prematch text, OUT fullmatch text, OUT matches text[], OUT
> postmatch text) RETURNS record
> AS 'MODULE_PATHNAME', 'regexp_matches'
> LANGUAGE C IMMUTABLE STRICT;
>

I wanted to put out there the question of what order the parameters to
these regex functions should go. ISTM most people expect them to go
(pattern, string), but I made these functions consistant with
substring(text,text) which takes (string, pattern). Now I have been
working on a regexp_split function, which takes (pattern, string), which
is what someone familiar with the function from perl would expect, but is
not consistant with substring or now with my regexp_matches function.

I want to ask, should I break with following substring's precedent, and
put the pattern first (as most people probably would expect), or should I
break with perl's precedent and put the pattern second (to behave like
substring)?

--
We cannot put the face of a person on a stamp unless said person is
deceased. My suggestion, therefore, is that you drop dead.
-- James E. Day, Postmaster General


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-03 01:56:31
Message-ID: 26162.1170467791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> I want to ask, should I break with following substring's precedent, and
> put the pattern first (as most people probably would expect), or should I
> break with perl's precedent and put the pattern second (to behave like
> substring)?

All of SQL's pattern match operators have the pattern on the right, so
my advice is to stick with that and try not to think about Perl ;-)

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: writing new regexp functions
Date: 2007-02-03 02:03:13
Message-ID: 20070203020313.GA27022@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 02, 2007 at 08:56:31PM -0500, Tom Lane wrote:
> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > I want to ask, should I break with following substring's
> > precedent, and put the pattern first (as most people probably
> > would expect), or should I break with perl's precedent and put the
> > pattern second (to behave like substring)?
>
> All of SQL's pattern match operators have the pattern on the right,
> so my advice is to stick with that and try not to think about Perl
> ;-)

Perl provides inspiration, but here, consistency would help more than
orderly imitation of how it does what it does. And besides, when
people really need Perl, they can pull it in as a PL :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-03 03:01:33
Message-ID: Pine.BSO.4.64.0702021845540.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 2 Feb 2007, David Fetter wrote:

> On Fri, Feb 02, 2007 at 08:56:31PM -0500, Tom Lane wrote:
> >
> > All of SQL's pattern match operators have the pattern on the right,
> > so my advice is to stick with that and try not to think about Perl
> > ;-)
>
> Perl provides inspiration, but here, consistency would help more than
> orderly imitation of how it does what it does. And besides, when
> people really need Perl, they can pull it in as a PL :)

Alright, here is my code to date. I have put the pattern after the string
in the split function, as discussed above. The .tar.gz file expects to be
untarred in contrib/. I have made some regression tests that can be run
using 'make installcheck' as normal for contrib. I think they exercise
the corner cases in the code, but I may very well have missed some. It
requires the (previously submitted) attached patch to core to compile, as
it takes advantage of new exported functions from
src/backend/utils/adt/regexp.c.

Let me know if you see any bugs or issues with this code, and I am open to
suggestions for further regression tests ;)

Things that I still want to look into:
* regexp flags (a la regexp_replace).

* maybe make regexp_matches return setof whatever, if given a 'g' flag
return all matches in string.

* maybe a join function that works as an aggregate
SELECT join(',', col) FROM tbl
currently can be written as
SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')

--
It was a virgin forest, a place where the Hand of Man had never set
foot.

Attachment Content-Type Size
regexp-export.patch text/plain 9.3 KB
regexp_ext.tar.gz application/octet-stream 3.7 KB

From: David Fetter <david(at)fetter(dot)org>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-04 15:45:28
Message-ID: 20070204154528.GB13202@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 02, 2007 at 07:01:33PM -0800, Jeremy Drake wrote:

> Let me know if you see any bugs or issues with this code, and I am
> open to suggestions for further regression tests ;)

> Things that I still want to look into:
> * regexp flags (a la regexp_replace).

One more text field at the end is how the regexp_replace() one does
it.

> * maybe make regexp_matches return setof whatever, if given a 'g' flag
> return all matches in string.

This is doable with current machinery, albeit a little clumsily.

> * maybe a join function that works as an aggregate
> SELECT join(',', col) FROM tbl
> currently can be written as
> SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')

The array_accum() aggregate in the docs works OK for this purpose.

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] writing new regexp functions
Date: 2007-02-04 21:00:12
Message-ID: Pine.BSO.4.64.0702041254011.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 4 Feb 2007, David Fetter wrote:

> On Fri, Feb 02, 2007 at 07:01:33PM -0800, Jeremy Drake wrote:
>
> > Let me know if you see any bugs or issues with this code, and I am
> > open to suggestions for further regression tests ;)
>
> > Things that I still want to look into:
> > * regexp flags (a la regexp_replace).
>
> One more text field at the end is how the regexp_replace() one does
> it.

That's how I did it.

> > * maybe make regexp_matches return setof whatever, if given a 'g' flag
> > return all matches in string.
>
> This is doable with current machinery, albeit a little clumsily.

I have implemented this too.

> > * maybe a join function that works as an aggregate
> > SELECT join(',', col) FROM tbl
> > currently can be written as
> > SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')
>
> The array_accum() aggregate in the docs works OK for this purpose.

I have not tackled this yet, I think it may be better to stick with the
ARRAY() construct for now.

So, here is the new version of the code, and also a new version of the
patch to core, which fixes some compile warnings that I did not see at
first because I was using ICC rather than GCC.

Here is the README.regexp_ext from the tar file:

This package contains regexp functions beyond those currently provided
in core PostgreSQL, utilizing the regexp engine built into core. This
is still a work-in-progress.

The most recent version of this code can be found at
http://www.jdrake.com/postgresql/regexp/regexp_ext.tar.gz
and the prerequisite patch to PostgreSQL core, which has been submitted
for review, can be found at
http://www.jdrake.com/postgresql/regexp/regexp-export.patch

The .tar.gz file expects to be untarred in contrib/. I have made some
regression tests that can be run using 'make installcheck' as normal for
contrib. I think they exercise the corner cases in the code, but I may
very well have missed some. It requires the above mentioned patch to
core to compile, as it takes advantage of new exported functions from
src/backend/utils/adt/regexp.c.

Let me know if you see any bugs or issues with this code, and I am open to
suggestions for further regression tests ;)

Functions implemented in this module:
* regexp_split(str text, pattern text) RETURNS SETOF text
regexp_split(str text, pattern text, flags text) RETURNS SETOF text
returns each section of the string delimited by the pattern.
* regexp_matches(str text, pattern text) RETURNS text[]
returns all capture groups when matching pattern against string in an array
* regexp_matches(str text, pattern text, flags text) RETURNS SETOF
(prematch text, fullmatch text, matches text[], postmatch text)
returns all capture groups when matching pattern against string in an array.
also returns the entire match in fullmatch. if the 'g' option is given,
returns all matches in the string. if the 'r' option is given, also return
the text before and after the match in prematch and postmatch respectively.

See the regression tests for more details about usage and return values.

Recent changes:
* I have put the pattern after the string in all of the functions, as
discussed on the pgsql-hackers mailing list.

* regexp flags (a la regexp_replace).

* make regexp_matches return setof whatever, if given a 'g' flag return
all matches in string.

Things that I still want to look into:
* maybe a join function that works as an aggregate
SELECT join(',', col) FROM tbl
currently can be written as
SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')

--
Philogeny recapitulates erogeny; erogeny recapitulates philogeny.

Attachment Content-Type Size
regexp-export.patch text/plain 9.2 KB
regexp_ext.tar.gz application/octet-stream 6.1 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] writing new regexp functions
Date: 2007-02-07 08:26:00
Message-ID: Pine.BSO.4.64.0702070010470.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 4 Feb 2007, David Fetter wrote:

> On Fri, Feb 02, 2007 at 07:01:33PM -0800, Jeremy Drake wrote:
>
> > Let me know if you see any bugs or issues with this code, and I am
> > open to suggestions for further regression tests ;)

I have not heard anything, so I guess at this point I should figure out
where to go next with this. I see a couple options:

* Set this up as a pgfoundry project or contrib. This would require
merging the patch to expose some functions from regexp.c outside that
file, which has raised some concerns about maintainability.

* Put together a patch to add these functions to core. I could put them
directly in regexp.c, so the support functions could stay static. My
concern here is that I don't know if there are any functions currently
in core with OUT parameters. I don't know the acceptable style for
handling this: OUT parameters, a named composite type, ...?

Does anyone have any opinions either way, as to how I should proceed
from here?

> > * maybe a join function that works as an aggregate
> > SELECT join(',', col) FROM tbl
> > currently can be written as
> > SELECT array_to_string(ARRAY(SELECT col FROM tbl), ',')
>
> The array_accum() aggregate in the docs works OK for this purpose.

I have decided not to pursue this function, I think the array construct,
or the array_accum option, is about the best possible currently. If it
should become possible in the future to write aggregates with a non-sql
state type (structs with pointers) it may be worthwhile to re-evaluate
this.

--
The cost of living hasn't affected its popularity.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] writing new regexp functions
Date: 2007-02-07 14:23:58
Message-ID: 12179.1170858238@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> * Put together a patch to add these functions to core. I could put them
> directly in regexp.c, so the support functions could stay static. My
> concern here is that I don't know if there are any functions currently
> in core with OUT parameters.

As of 8.2 there are.

If we are going to include these I would vote for core not contrib
status, exactly to avoid having to export those functions.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] writing new regexp functions
Date: 2007-02-07 19:07:30
Message-ID: 20070207190730.GB8836@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, Feb 07, 2007 at 09:23:58AM -0500, Tom Lane wrote:
> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > * Put together a patch to add these functions to core. I could put them
> > directly in regexp.c, so the support functions could stay static. My
> > concern here is that I don't know if there are any functions currently
> > in core with OUT parameters.
>
> As of 8.2 there are.
>
> If we are going to include these I would vote for core not contrib
> status, exactly to avoid having to export those functions.

+1 for core. :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] writing new regexp functions
Date: 2007-02-07 20:34:23
Message-ID: Pine.BSO.4.64.0702071231570.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 7 Feb 2007, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > * Put together a patch to add these functions to core. I could put them
> > directly in regexp.c, so the support functions could stay static. My
> > concern here is that I don't know if there are any functions currently
> > in core with OUT parameters.
>
> As of 8.2 there are.

Could you give me the name of one in pg_proc.h so I can see how I should
go about adding one there?

> If we are going to include these I would vote for core not contrib
> status, exactly to avoid having to export those functions.

OK, this patch will be my next project.

--
History is curious stuff
You'd think by now we had enough
Yet the fact remains I fear
They make more of it every year.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] writing new regexp functions
Date: 2007-02-07 20:37:24
Message-ID: 28289.1170880644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> On Wed, 7 Feb 2007, Tom Lane wrote:
>> As of 8.2 there are.

> Could you give me the name of one in pg_proc.h so I can see how I should
> go about adding one there?

select * from pg_proc where proargmodes is not null;

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: WIP patch adding new regexp functions
Date: 2007-02-08 00:20:25
Message-ID: Pine.BSO.4.64.0702071611370.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 7 Feb 2007, David Fetter wrote:

> On Wed, Feb 07, 2007 at 09:23:58AM -0500, Tom Lane wrote:
> > Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > > * Put together a patch to add these functions to core. I could put them
> > > directly in regexp.c, so the support functions could stay static. My
> > > concern here is that I don't know if there are any functions currently
> > > in core with OUT parameters.
> >
> > As of 8.2 there are.
> >
> > If we are going to include these I would vote for core not contrib
> > status, exactly to avoid having to export those functions.
>
> +1 for core. :)

Here is a patch to add these functions to core. Please take a look and
let me know what you think. I had to jump through a bit of a hoop to
avoid opr_sanity test from breaking, may not have done that right.

Also, this patch allows regexp_replace to take more flags, since my two
new functions needed flags also, I split flag parsing into a seperate
function and made regexp_replace use it too. It also results that the
error message for an invalid flag in regexp_replace changes, since it is
now more generic as it is called from multiple functions.

I still need to write documentation for the new functions before I
consider the patch completed, but please take a look at the code and see
if it is acceptable as I do not have any further changes planned for it.

--
Chicken Soup, n.:
An ancient miracle drug containing equal parts of aureomycin,
cocaine, interferon, and TLC. The only ailment chicken soup
can't cure is neurotic dependence on one's mother.
-- Arthur Naiman, "Every Goy's Guide to Yiddish"

Attachment Content-Type Size
regexp-split-matches.patch text/plain 34.5 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Cc: David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: patch adding new regexp functions
Date: 2007-02-09 03:22:58
Message-ID: Pine.BSO.4.64.0702081918491.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 7 Feb 2007, Jeremy Drake wrote:

> Here is a patch to add these functions to core. Please take a look and
> let me know what you think. I had to jump through a bit of a hoop to
> avoid opr_sanity test from breaking, may not have done that right.
>
> Also, this patch allows regexp_replace to take more flags, since my two
> new functions needed flags also, I split flag parsing into a seperate
> function and made regexp_replace use it too. It also results that the
> error message for an invalid flag in regexp_replace changes, since it is
> now more generic as it is called from multiple functions.
>
> I still need to write documentation for the new functions before I
> consider the patch completed, but please take a look at the code and see
> if it is acceptable as I do not have any further changes planned for it.

I have added documentation for the functions in this patch. At this point
I am ready to submit this patch for the review and application process.
Please let me know if you find any issues with it.

Thank you

--
The trouble with being punctual is that nobody's there to appreciate
it.
-- Franklin P. Jones

Attachment Content-Type Size
regexp-split-matches-documented.patch.gz application/octet-stream 8.6 KB

From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-09 03:35:52
Message-ID: 1170992152.5454.111.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 2007-02-08 at 19:22 -0800, Jeremy Drake wrote:
> I have added documentation for the functions in this patch. At this point
> I am ready to submit this patch for the review and application process.
> Please let me know if you find any issues with it.

Barring any objections, I'll review and apply this patch in the next few
days.

BTW, unless the patch is truly large, leaving the patch uncompressed
makes it easier to eyeball it without leaving my mail client. That's
just personal preference, though...

-Neil


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-09 03:39:04
Message-ID: 20070209033904.GF24069@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil Conway wrote:

> BTW, unless the patch is truly large, leaving the patch uncompressed
> makes it easier to eyeball it without leaving my mail client. That's
> just personal preference, though...

FWIW, I just do a "| zcat | less", which shows it uncompressed. No big
deal. I don't know if Evolution can do that ... (it surprised me that
Thunderbird doesn't seem to be capable of searching a message based on
Message-Id).

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-09 03:46:09
Message-ID: 200702090346.l193k9f01434@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> Neil Conway wrote:
>
> > BTW, unless the patch is truly large, leaving the patch uncompressed
> > makes it easier to eyeball it without leaving my mail client. That's
> > just personal preference, though...
>
> FWIW, I just do a "| zcat | less", which shows it uncompressed. No big
> deal. I don't know if Evolution can do that ... (it surprised me that
> Thunderbird doesn't seem to be capable of searching a message based on
> Message-Id).

My .mailcap calls a script runs 'file' and tries to display it properly
automatically, and it worked fine for his attachment.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-09 09:08:57
Message-ID: Pine.BSO.4.64.0702090100460.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 8 Feb 2007, Neil Conway wrote:

> On Thu, 2007-02-08 at 19:22 -0800, Jeremy Drake wrote:
> > I have added documentation for the functions in this patch. At this point
> > I am ready to submit this patch for the review and application process.
> > Please let me know if you find any issues with it.
>
> Barring any objections, I'll review and apply this patch in the next few
> days.
>
> BTW, unless the patch is truly large, leaving the patch uncompressed
> makes it easier to eyeball it without leaving my mail client. That's
> just personal preference, though...

Yeah, I try to do that, but this one just barely passed my personal
compression threshold. Guess I should raise my threshold :)

Here is a new version of the patch which fixes up the documentation a
little (should have read it over again before posting). I also added
mention of the additional flags that I added support for, and linked to
the embedded options table for details, since the flags supported
correspond to those. This is my first foray into the world of Jade, so
please forgive any SGML snafus.

Also, it is uncompressed this time ;)

--
Q: How many existentialists does it take to screw in a lightbulb?
A: Two. One to screw it in and one to observe how the lightbulb
itself symbolizes a single incandescent beacon of subjective
reality in a netherworld of endless absurdity reaching out toward a
maudlin cosmos of nothingness.

Attachment Content-Type Size
regexp-split-matches-documented.patch text/plain 44.9 KB

From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-09 22:46:06
Message-ID: 1171061166.5454.127.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 2007-02-09 at 01:08 -0800, Jeremy Drake wrote:
> Yeah, I try to do that, but this one just barely passed my personal
> compression threshold. Guess I should raise my threshold :)

No, don't pay any attention to me, I'm just lazy :)

> Here is a new version of the patch which fixes up the documentation a
> little (should have read it over again before posting).

The "doing_srf" business is rather tortured, and seems an invitation for
bugs. ISTM there should be a cleaner way to implement this. For example,
would it be possible to put all the common logic into one or more
additional functions, and then have SRF vs. non-SRF cases that call into
those functions after doing the appropriate initialization?

-Neil


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-10 00:33:38
Message-ID: Pine.BSO.4.64.0702091631080.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 9 Feb 2007, Neil Conway wrote:

> The "doing_srf" business is rather tortured, and seems an invitation for
> bugs. ISTM there should be a cleaner way to implement this. For example,
> would it be possible to put all the common logic into one or more
> additional functions, and then have SRF vs. non-SRF cases that call into
> those functions after doing the appropriate initialization?

Here is a new version of the patch which eliminates the doing_srf stuff.
It does seem a lot cleaner this way...

--
Fortune's Real-Life Courtroom Quote #18:

Q: Are you married?
A: No, I'm divorced.
Q: And what did your husband do before you divorced him?
A: A lot of things I didn't know about.

Attachment Content-Type Size
regexp-split-matches-documented.patch text/plain 44.4 KB

From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-10 06:57:46
Message-ID: 1171090666.5454.184.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 2007-02-09 at 16:33 -0800, Jeremy Drake wrote:
> Here is a new version of the patch which eliminates the doing_srf stuff.

* C89 require constant-sized stack allocated arrays, so the coding in
perform_regexp_matches() is non-portable.

* I'm not clear about the control flow in regexp_matches() and
regexp_split(). Presumably it's not possible for the call_cntr to
actually exceed max_calls, so the error message in these cases should be
elog(ERROR), not ereport (the former is for "shouldn't happen" bug
scenarios, the latter is for user-facing errors). Can you describe the
logic here (e.g. via comments) a bit more?

* The logic in regexp_split (incremented_offset, first_match, etc.) is
pretty ugly and hard to follow. The "if" condition on line 1037 is
particularly objectionable. Again, ISTM there should be a cleaner way to
do this.

* Try to keep lines to 80 characters or fewer. If the code is wandering
off the right side of the screen all the time, that might be a hint that
it needs simplification.

Attached is a cleaned up version of your patch -- various improvements
throughout, but mostly cosmetic stuff. Do you want to take a look at the
above?

-Neil

Attachment Content-Type Size
regexp-split-matches-documented_new-4.patch text/x-patch 45.3 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-10 08:33:59
Message-ID: Pine.BSO.4.64.0702100021470.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 10 Feb 2007, Neil Conway wrote:

> On Fri, 2007-02-09 at 16:33 -0800, Jeremy Drake wrote:
> > Here is a new version of the patch which eliminates the doing_srf stuff.
>
> * C89 require constant-sized stack allocated arrays, so the coding in
> perform_regexp_matches() is non-portable.

I thought that was the case, but I had seen some other code doing this.
Turns out it was in the fulldisjunctions pgfoundry project. I replaced
them with palloc'd memory.

> * I'm not clear about the control flow in regexp_matches() and
> regexp_split(). Presumably it's not possible for the call_cntr to
> actually exceed max_calls, so the error message in these cases should be
> elog(ERROR), not ereport (the former is for "shouldn't happen" bug
> scenarios, the latter is for user-facing errors). Can you describe the
> logic here (e.g. via comments) a bit more?

I added some comments, and changed to using elog instead of ereport.

> * The logic in regexp_split (incremented_offset, first_match, etc.) is
> pretty ugly and hard to follow. The "if" condition on line 1037 is
> particularly objectionable. Again, ISTM there should be a cleaner way to
> do this.

That if condition was very difficult to get right ;) I added a bunch of
comments, and switched the logic around with a continue so it is much more
obvious what is happening there. Incidentally, that if condition being
incorrect is what results in call_cntr exceeding max_calls :)

>
> * Try to keep lines to 80 characters or fewer. If the code is wandering
> off the right side of the screen all the time, that might be a hint that
> it needs simplification.
>
> Attached is a cleaned up version of your patch -- various improvements
> throughout, but mostly cosmetic stuff. Do you want to take a look at the
> above?
>
> -Neil
>
>

--
If God had meant for us to be naked, we would have been born that way.

Attachment Content-Type Size
regexp-split-matches-documented_new-5.patch text/plain 47.6 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-10 09:26:00
Message-ID: Pine.BSO.4.64.0702100111320.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 10 Feb 2007, Jeremy Drake wrote:

> On Sat, 10 Feb 2007, Neil Conway wrote:
>
> > * I'm not clear about the control flow in regexp_matches() and
> > regexp_split(). Presumably it's not possible for the call_cntr to
> > actually exceed max_calls, so the error message in these cases should be
> > elog(ERROR), not ereport (the former is for "shouldn't happen" bug
> > scenarios, the latter is for user-facing errors). Can you describe the
> > logic here (e.g. via comments) a bit more?
>
> I added some comments, and changed to using elog instead of ereport.

I fixed a couple more things in this patch. I changed the max calls limit
to the real limit, rather than the arbitrarily high limit that was
previously set (three times the length of the string in bytes). Also, I
changed the checks for offset to compare against wide_len rather than
orig_len, since in multibyte character sets orig_len is the length in
bytes of the string in whatever encoding it is in, while wide_len is the
length in characters, which is what everything else in these functions
deal with.

The calls to text_substr have me somewhat concerned now, also. I think
performance starts to look like O(n^2) in multibyte character sets. But I
think doing anything about it would require this code to know more about
the internals of text than it has any right to. I guess settle for the
correctness now, and if performance is a problem this can be addressed.
Would hate to make this code even more ugly due to premature
optimization...

--
When does summertime come to Minnesota, you ask?
Well, last year, I think it was a Tuesday.

Attachment Content-Type Size
regexp-split-matches-documented_new-6.patch text/plain 47.7 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-11 00:08:33
Message-ID: Pine.BSO.4.64.0702101605450.28908@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I noticed in the documentation that you added indexterms for the
functions, but that they (along with the existing ones) are in the SIMILAR
TO regular expression section. These functions do not take SIMILAR TO
regular expressions, but POSIX regular expressions, so I moved them to
that section in this patch, and also moved the regexp_replace indexterm
there also, since it also uses POSIX regular expressions.

--
10.0 times 0.1 is hardly ever 1.0.

Attachment Content-Type Size
regexp-split-matches-documented_new-7.patch text/plain 48.1 KB

From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 00:49:32
Message-ID: Pine.BSO.4.64.0702141647000.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

What was the status of this? Was there anything else I needed to do with
this patch, or is it ready to be applied? Let me know if there is
anything else I need to do on this...

--
What this country needs is a good five cent nickel.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 02:10:54
Message-ID: 200702150210.l1F2Asm11102@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> What was the status of this? Was there anything else I needed to do with
> this patch, or is it ready to be applied? Let me know if there is
> anything else I need to do on this...

I assume it is ready for application.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <jeremyd(at)jdrake(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 05:50:08
Message-ID: 1171518608.5454.339.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 2007-02-14 at 16:49 -0800, Jeremy Drake wrote:
> What was the status of this? Was there anything else I needed to do with
> this patch, or is it ready to be applied? Let me know if there is
> anything else I need to do on this...

Will do -- I'm planning to apply this as soon as I have the free cycles
to do so, likely tomorrow or Friday.

-Neil


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-patches(at)postgresql(dot)org
Cc: Neil Conway <neilc(at)samurai(dot)com>, Jeremy Drake <jeremyd(at)jdrake(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 08:22:27
Message-ID: 200702150922.28871.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil Conway wrote:
> On Wed, 2007-02-14 at 16:49 -0800, Jeremy Drake wrote:
> > What was the status of this? Was there anything else I needed to
> > do with this patch, or is it ready to be applied? Let me know if
> > there is anything else I need to do on this...
>
> Will do -- I'm planning to apply this as soon as I have the free
> cycles to do so, likely tomorrow or Friday.

I don't know which patch is actually being proposed now. It would be
good to make this more explicit and maybe include a synopsis of the
functions in the email, so we know what's going on.

What confuses me about some of the functions I've seen in earlier
patches in this thread is that they return setof something. But in my
mind, regular expression matches or string splits are inherently
ordered, so an array would be the correct return type.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 08:56:06
Message-ID: Pine.BSO.4.64.0702150029160.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 15 Feb 2007, Peter Eisentraut wrote:

> Neil Conway wrote:
> > On Wed, 2007-02-14 at 16:49 -0800, Jeremy Drake wrote:
> > > What was the status of this? Was there anything else I needed to
> > > do with this patch, or is it ready to be applied? Let me know if
> > > there is anything else I need to do on this...
> >
> > Will do -- I'm planning to apply this as soon as I have the free
> > cycles to do so, likely tomorrow or Friday.
>
> I don't know which patch is actually being proposed now. It would be
> good to make this more explicit and maybe include a synopsis of the
> functions in the email, so we know what's going on.

Sorry, my intent was just to check to see if I had gotten the patch
sufficiently fixed for Neil to apply and he just hadn't gotten to it yet
(which seems to be the case), or if there was something else he still
expected me to fix that I had missed in the prior discussions. I suppose
I should have emailed him privately.

The patch in question can be seen in the archives here:
http://archives.postgresql.org/pgsql-patches/2007-02/msg00214.php

The functions added are:
* regexp_split(str text, pattern text) RETURNS SETOF text
regexp_split(str text, pattern text, flags text) RETURNS SETOF text
returns each section of the string delimited by the pattern.
* regexp_matches(str text, pattern text) RETURNS text[]
returns all capture groups when matching pattern against string in an
array
* regexp_matches(str text, pattern text, flags text) RETURNS SETOF
(prematch text, fullmatch text, matches text[], postmatch text)
returns all capture groups when matching pattern against string in an
array. also returns the entire match in fullmatch. if the 'g' option
is given, returns all matches in the string. if the 'r' option is
given, also return the text before and after the match in prematch and
postmatch respectively.

> What confuses me about some of the functions I've seen in earlier
> patches in this thread is that they return setof something. But in my
> mind, regular expression matches or string splits are inherently
> ordered, so an array would be the correct return type.

They do return SETOF. Addressing them separately:

regexp_matches uses a text[] for the match groups. If you specify the
global flag, it could return multiple matches. Couple this with the
requested feature of pre- and postmatch returns (with its own flag) and
the return would turn into some sort of nasty array of tuples, or multiple
arrays. It seems much cleaner to me to return a set of the matches found,
and I find which order the matches occur in to be much less important than
the fact that they did occur and their contents.

regexp_split returns setof text. This has, in my opinion, a much greater
case to return an array. However, there are several issues with this
approach:

# My experience with the array code leads me to believe that building up
an array is an expensive proposition. I know I could code it smarter so
that the array is only constructed in the end.

# With a set-returning function, it is possible to add a LIMIT clause, to
prevent splitting up more of the string than is necessary. It is also
immediately possible to insert them into a table, or do grouping on them,
or call a function on each value. Most of the time when I do a split, I
intend to do something like this with each split value.

# You can still get an array if you really want it:
#* SELECT ARRAY(SELECT * FROM regexp_split('first, second, third', E',\\s*'))

--
No problem is so formidable that you can't just walk away from it.
-- C. Schulz


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 09:57:45
Message-ID: 200702151057.46660.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> regexp_matches uses a text[] for the match groups. If you specify
> the global flag, it could return multiple matches. Couple this with
> the requested feature of pre- and postmatch returns (with its own
> flag) and the return would turn into some sort of nasty array of
> tuples, or multiple arrays. It seems much cleaner to me to return a
> set of the matches found, and I find which order the matches occur in
> to be much less important than the fact that they did occur and their
> contents.

Then the fact that the flag-less matches function returns an array would
be a mistake. They have to return the same category of object.

> regexp_split returns setof text. This has, in my opinion, a much
> greater case to return an array. However, there are several issues
> with this approach:

Any programming language I have ever seen returns the result of a
regular expression split as a structure with order. That in turn
implies that there are use cases for having the order, which your
proposed function could not address.

> # My experience with the array code leads me to believe that building
> up an array is an expensive proposition. I know I could code it
> smarter so that the array is only constructed in the end.

You can make any code arbitrarily fast if it doesn't have to give the
right answer.

> # With a set-returning function, it is possible to add a LIMIT
> clause, to prevent splitting up more of the string than is necessary.

You can also add such functionality to a function in form of a
parameter. In fact, relying on a LIMIT clause to do this seems pretty
fragile. We argue elsewhere that LIMIT without ORDER BY is not
well-defined, and while it's hard to imagine in the current
implementation why the result of a set returning function would come
back in arbitrary order, it is certainly possible in theory, so you
still need to order the result set if you want reliable limits, but
that is not possible of the order is lost in the result.

> It is also immediately possible to insert them into a table, or do
> grouping on them, or call a function on each value. Most of the time
> when I do a split, I intend to do something like this with each split
> value.

These sort of arguments remind me of the contrib/xml2 module, which also
has a very, uh, pragmatic API. Sure, these operations may seem useful
to you. But when we offer a function as part of the core API, it is
also important that we offer a clean design that allows other users to
combine reasonably orthogonal functionality into tools that are useful
to them.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 13:55:35
Message-ID: 20070215135535.GC4682@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:

> The functions added are:
> * regexp_split(str text, pattern text) RETURNS SETOF text
> regexp_split(str text, pattern text, flags text) RETURNS SETOF text
> returns each section of the string delimited by the pattern.
> * regexp_matches(str text, pattern text) RETURNS text[]
> returns all capture groups when matching pattern against string in an
> array
> * regexp_matches(str text, pattern text, flags text) RETURNS SETOF
> (prematch text, fullmatch text, matches text[], postmatch text)
> returns all capture groups when matching pattern against string in an
> array. also returns the entire match in fullmatch. if the 'g' option
> is given, returns all matches in the string. if the 'r' option is
> given, also return the text before and after the match in prematch and
> postmatch respectively.

I think the position the match is in could be important. I'm wondering
if you could define them like

create type re_match(match text, position int)
regexp_split(str text, pattern text) returns setof re_match

or maybe
regexp_split(str text, pattern text, OUT match text, OUT position int);
(not sure of the exact syntax for this one)

so that you would have the position for each match, automatically. Is
this information available in the regex code?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 15:34:23
Message-ID: 24844.1171553663@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> Jeremy Drake wrote:
>> # My experience with the array code leads me to believe that building
>> up an array is an expensive proposition. I know I could code it
>> smarter so that the array is only constructed in the end.

> You can make any code arbitrarily fast if it doesn't have to give the
> right answer.

Even more to the point, it's folly to suppose that the overhead of
processing a SETOF result is less than that of array construction.

I tend to agree with Peter's concern that returning a set is going to
make it hard to track which result is which.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 15:37:26
Message-ID: 24873.1171553846@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> so that you would have the position for each match, automatically. Is
> this information available in the regex code?

Certainly, that's where we got the text snippets from to begin with.
However, I'm not sure that this is important enough to justify a special
type --- for one thing, since we don't have arrays of composites, that
would foreclose responding to Peter's concern that SETOF is the wrong
thing. If you look at the Perl and Tcl APIs for regexes, they return
just the strings, not the numerical positions; and I've not heard anyone
complaining about that.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 15:56:25
Message-ID: 20070215155625.GM4682@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > so that you would have the position for each match, automatically. Is
> > this information available in the regex code?
>
> Certainly, that's where we got the text snippets from to begin with.
> However, I'm not sure that this is important enough to justify a special
> type --- for one thing, since we don't have arrays of composites, that
> would foreclose responding to Peter's concern that SETOF is the wrong
> thing.

My point is that if you want to have the order in which the matches were
found, you can do that easily by looking at the positions; no need to
create an ordered array. Which does respond to Peter's concern, since
the point was to keep the ordering of matches, which an array does; but
if we provide the positions, the SETOF way does as well.

On the other hand, I don't think it's impossible to have matches that
start earlier than others in the string, but are actually found later
(say, because they are a parentized expression that ends later). So
giving the starting positions allows one to know where are they located,
rather than where were they reported. (I don't really know if the
matches are sorted before reporting though.)

> If you look at the Perl and Tcl APIs for regexes, they return
> just the strings, not the numerical positions; and I've not heard anyone
> complaining about that.

I know, but that may be just because it would be too much extra
complexity for them (in terms of user API) to be returning the positions
along the text. I know I'd be fairly annoyed if =~ in Perl returned an
array of hashes { text => 'foo', position => 42} instead of array of
text. We don't have that problem.

In fact, I would claim that's much easier to deal with a SETOF function
than is to deal with text[].

Regarding the "nobody complains" argument, I don't find that
particularly compelling; witness how people gets used to working around
limitations in MySQL ... ;-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 16:57:09
Message-ID: 200702151757.10732.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera wrote:
> On the other hand, I don't think it's impossible to have matches that
> start earlier than others in the string, but are actually found later
> (say, because they are a parentized expression that ends later). So
> giving the starting positions allows one to know where are they
> located, rather than where were they reported. (I don't really know
> if the matches are sorted before reporting though.)

I have no strong opinion about how matches are returned. Seeing the
definitional difficulties that you point out, it may be fine to return
them unordered. But then all "matches" functions should do that.

For the "split" functions, however, providing the order is clearly
important.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 17:02:58
Message-ID: 20070215170258.GA3282@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 15, 2007 at 10:37:26AM -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > so that you would have the position for each match, automatically. Is
> > this information available in the regex code?
>
> Certainly, that's where we got the text snippets from to begin with.
> However, I'm not sure that this is important enough to justify a
> special type --- for one thing, since we don't have arrays of
> composites,

This is a TODO :)

I've obviously misunderstood the scope of the TODO because it appears
that an INSERT into pg_type at creation time for compound types that
looks something like the below would do it. What have I missed?

INSERT INTO pg_type
VALUES (
'_foo', /* Generated by makeArrayTypeName */
16744, /* OID of schema */
10, /* OID of owner of the base type */
-1, /* typlen indicates varlena */
'f', /* not passed by value */
'c', /* typtype is composite */
't', /* type is already defined */
',', /* typdelim */
0, /* should this actually refer to the type? */
'foo'::regtype, /* typelem */
'array_in', /* typinput */
'array_out', /* typoutput */
'array_recv', /* typreceive */
'array_send', /* typsend */
0, /* typanalyze */
'i', /* typalign. Should this be 'd'? */
'x', /* typstorage */
'f', /* not a DOMAIN, but while we're at it, why not arrays of DOMAIN? */
0, /* base type. should this be different? */
-1, /* no typmod */
0 /* dims not specified */
);

> that would foreclose responding to Peter's concern that SETOF is the
> wrong thing. If you look at the Perl and Tcl APIs for regexes, they
> return just the strings, not the numerical positions; and I've not
> heard anyone complaining about that.

They do return them in the order in which they appear, though, which,
as far as I can tell, Jeremy's functions also do.

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: David Fetter <david(at)fetter(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 20:10:56
Message-ID: 20070215201056.GE3282@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 15, 2007 at 10:57:45AM +0100, Peter Eisentraut wrote:
> Jeremy Drake wrote:

> > # With a set-returning function, it is possible to add a LIMIT
> > clause, to prevent splitting up more of the string than is
> > necessary.
>
> You can also add such functionality to a function in form of a
> parameter.

That's what things like Perl's split do :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 23:15:31
Message-ID: Pine.BSO.4.64.0702151506140.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 15 Feb 2007, Alvaro Herrera wrote:

> Jeremy Drake wrote:
>
> > The functions added are:
> > * regexp_split(str text, pattern text) RETURNS SETOF text
> > regexp_split(str text, pattern text, flags text) RETURNS SETOF text
> > returns each section of the string delimited by the pattern.
> > * regexp_matches(str text, pattern text) RETURNS text[]
> > returns all capture groups when matching pattern against string in an
> > array
> > * regexp_matches(str text, pattern text, flags text) RETURNS SETOF
> > (prematch text, fullmatch text, matches text[], postmatch text)
> > returns all capture groups when matching pattern against string in an
> > array. also returns the entire match in fullmatch. if the 'g' option
> > is given, returns all matches in the string. if the 'r' option is
> > given, also return the text before and after the match in prematch and
> > postmatch respectively.
>
> I think the position the match is in could be important. I'm wondering
> if you could define them like
>
> create type re_match(match text, position int)
> regexp_split(str text, pattern text) returns setof re_match

So it looks like the issues are:
1. regexp_matches without flags has a different return type than
regexp_matches with flags. I can make them return the same OUT
parameters, but should I declare it as returning SETOF when I know
for a fact that the no-flags version will never return more than one
row?

2. regexp_split does not represent the order of the results. I can do
something like:

regexp_split(str text, pattern text[, flags text], OUT result text, OUT
startpos int) RETURNS SETOF record;

It could also have the int being a simple counter to represent the
relative order, rather than the position.

Thoughts? Do these changes address the issues recently expressed?

--
I have yet to see any problem, however complicated, which, when looked
at in the right way, did not become still more complicated.
-- Poul Anderson


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Fetter <david(at)fetter(dot)org>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 00:35:46
Message-ID: 5214.1171586146@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

David Fetter <david(at)fetter(dot)org> writes:
> I've obviously misunderstood the scope of the TODO because it appears
> that an INSERT into pg_type at creation time for compound types that
> looks something like the below would do it. What have I missed?

There are a couple of issues. One is that we probably don't want two
pg_type entries for every single table. Will you be satisfied if only
CREATE TYPE AS ... makes an array type? The other is that, at least at
the time they were written, the array support routines couldn't handle
composite array values. Things might or might not be easier today;
I don't think we had record_in and record_out in their current form
then.

regards, tom lane


From: David Fetter <david(at)fetter(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 00:59:15
Message-ID: 20070216005915.GH3282@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, Feb 15, 2007 at 07:35:46PM -0500, Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
> > I've obviously misunderstood the scope of the TODO because it appears
> > that an INSERT into pg_type at creation time for compound types that
> > looks something like the below would do it. What have I missed?
>
> There are a couple of issues. One is that we probably don't want
> two pg_type entries for every single table.

Now that you mention it, I would want that if that's what it takes to
get arrays for them. The long-term goal here is to make all of
PostgreSQL's types play nicely together. I'm guessing that SETOF
will eventually be a way to describe a collection because MULTISET is
in SQL:2003.

> Will you be satisfied if only CREATE TYPE AS ... makes an array
> type? The other is that, at least at the time they were written,
> the array support routines couldn't handle composite array values.
> Things might or might not be easier today; I don't think we had
> record_in and record_out in their current form then.

OK. What about pg_depend?

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 01:20:52
Message-ID: 45D506F4.3060005@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> David Fetter <david(at)fetter(dot)org> writes:
>
>> I've obviously misunderstood the scope of the TODO because it appears
>> that an INSERT into pg_type at creation time for compound types that
>> looks something like the below would do it. What have I missed?
>>
>
> There are a couple of issues. One is that we probably don't want two
> pg_type entries for every single table. Will you be satisfied if only
> CREATE TYPE AS ... makes an array type?
>

There should be some better way to create the array type for tables than
directly mangling pg_type, though. Maybe a builtin function that took an
oid?

cheers

andrew


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 07:02:33
Message-ID: Pine.BSO.4.64.0702152300520.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 15 Feb 2007, Peter Eisentraut wrote:

> I have no strong opinion about how matches are returned. Seeing the
> definitional difficulties that you point out, it may be fine to return
> them unordered. But then all "matches" functions should do that.
>
> For the "split" functions, however, providing the order is clearly
> important.

Does this version sufficiently address your concerns?

--
Finagle's Creed:
Science is true. Don't be misled by facts.

Attachment Content-Type Size
regexp-split-matches-documented_newapi.patch text/plain 50.8 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 12:19:55
Message-ID: 200702161319.56700.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Freitag, 16. Februar 2007 08:02 schrieb Jeremy Drake:
> On Thu, 15 Feb 2007, Peter Eisentraut wrote:
> > I have no strong opinion about how matches are returned. Seeing the
> > definitional difficulties that you point out, it may be fine to return
> > them unordered. But then all "matches" functions should do that.
> >
> > For the "split" functions, however, providing the order is clearly
> > important.
>
> Does this version sufficiently address your concerns?

I don't think anyone asked for the start position and length in the result of
regexp_split(). The result should be an array of text. That is what Perl et
al. do.

As for the regexp_matches() function, it seems to me that it returns too much
information at once. What is the use case for getting all of prematch,
fullmatch, matches, and postmatch in one call?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: David Fetter <david(at)fetter(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 16:11:21
Message-ID: 20070216161121.GD21379@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 16, 2007 at 01:19:55PM +0100, Peter Eisentraut wrote:
> Am Freitag, 16. Februar 2007 08:02 schrieb Jeremy Drake:
> > On Thu, 15 Feb 2007, Peter Eisentraut wrote:
> > > I have no strong opinion about how matches are returned. Seeing
> > > the definitional difficulties that you point out, it may be fine
> > > to return them unordered. But then all "matches" functions
> > > should do that.
> > >
> > > For the "split" functions, however, providing the order is
> > > clearly important.
> >
> > Does this version sufficiently address your concerns?
>
> I don't think anyone asked for the start position and length in the
> result of regexp_split(). The result should be an array of text.
> That is what Perl et al. do.
>
> As for the regexp_matches() function, it seems to me that it returns
> too much information at once. What is the use case for getting all
> of prematch, fullmatch, matches, and postmatch in one call?

If not in one call, how would you get it? Perl, for example, makes
these available to any regex match in the form of variables it sets.

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 16:54:47
Message-ID: 200702161754.49301.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Am Freitag, 16. Februar 2007 17:11 schrieb David Fetter:
> > As for the regexp_matches() function, it seems to me that it returns
> > too much information at once. What is the use case for getting all
> > of prematch, fullmatch, matches, and postmatch in one call?
>
> If not in one call, how would you get it? Perl, for example, makes
> these available to any regex match in the form of variables it sets.

The question is, what is the use case? If there is one in Perl, can this
proposed function API support it?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: David Fetter <david(at)fetter(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 17:49:36
Message-ID: 20070216174936.GE21379@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 16, 2007 at 05:54:47PM +0100, Peter Eisentraut wrote:
> Am Freitag, 16. Februar 2007 17:11 schrieb David Fetter:
> > > As for the regexp_matches() function, it seems to me that it
> > > returns too much information at once. What is the use case for
> > > getting all of prematch, fullmatch, matches, and postmatch in
> > > one call?
> >
> > If not in one call, how would you get it? Perl, for example,
> > makes these available to any regex match in the form of variables
> > it sets.
>
> The question is, what is the use case? If there is one in Perl, can
> this proposed function API support it?

Perl makes the following variables available in any regex match,
although it optimizes some cases for when they're not there:

$1, ... $n (captured matches in parentheses)
$` (pre-match)
$' (post-match)
$& (whole match)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 18:03:32
Message-ID: 45D5F1F4.7090502@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

David Fetter wrote:
>>
>> The question is, what is the use case? If there is one in Perl, can
>> this proposed function API support it?
>>
>
> Perl makes the following variables available in any regex match,
> although it optimizes some cases for when they're not there:
>
> $1, ... $n (captured matches in parentheses)
> $` (pre-match)
> $' (post-match)
> $& (whole match)
>
>

Use of any of these is notoriously costly, BTW, especially pre-match and
post-match. See perlre man page.

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 18:19:54
Message-ID: 200702161919.56500.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

David Fetter wrote:
> > The question is, what is the use case? If there is one in Perl,
> > can this proposed function API support it?
>
> Perl makes the following variables available in any regex match,

That is not a use case.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 18:28:39
Message-ID: 20070216182839.GG21379@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, Feb 16, 2007 at 01:03:32PM -0500, Andrew Dunstan wrote:
> David Fetter wrote:
> >>
> >>The question is, what is the use case? If there is one in Perl, can
> >>this proposed function API support it?
> >>
> >
> >Perl makes the following variables available in any regex match,
> >although it optimizes some cases for when they're not there:
> >
> >$1, ... $n (captured matches in parentheses)
> >$` (pre-match)
> >$' (post-match)
> >$& (whole match)
> >
>
> Use of any of these is notoriously costly, BTW, especially pre-match
> and post-match. See perlre man page.

This is why they only appear when called for :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-16 23:18:15
Message-ID: Pine.BSO.4.64.0702161502280.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Fri, 16 Feb 2007, Peter Eisentraut wrote:

> Am Freitag, 16. Februar 2007 08:02 schrieb Jeremy Drake:
> > Does this version sufficiently address your concerns?
>
> I don't think anyone asked for the start position and length in the result of
> regexp_split(). The result should be an array of text. That is what Perl et
> al. do.

The length is not returned, I simply call length() on the string result to
make sure that no whitespace gets in.

The start position was suggested in these two messages from Alvaro Herrera:
http://archives.postgresql.org/pgsql-patches/2007-02/msg00276.php
http://archives.postgresql.org/pgsql-patches/2007-02/msg00281.php

This was meant to address your concern about the order not necessarily
being preserved of the split results when being returned as SETOF. This
gives the benefits of returning SETOF while still allowing the order to be
preserved if desired (simply add ORDER BY startpos to guarantee the
correct order).

In case you haven't noticed, I am rather averse to making this return
text[] because it is much easier in my experience to use the results when
returned in SETOF rather than text[], and in all of the code that I have
experience with where this would be useful I would end up using
information_schema._pg_expandarray (a function that, AFAIK, is
documented nowhere) to convert it into SETOF text. While, if you really
really wanted a text[], you could use the (fully documented) ARRAY(select
resultstr from regexp_split(...) order by startpos) construct.

> As for the regexp_matches() function, it seems to me that it returns too much
> information at once. What is the use case for getting all of prematch,
> fullmatch, matches, and postmatch in one call?

It was requested by David Fetter:
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php

It was not horribly difficult to provide, and it seemed reasonable to me.
I have no need for them personally.

--
Some people in this department wouldn't recognize subtlety if it hit
them on the head.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 08:02:24
Message-ID: 200702170902.26138.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> In case you haven't noticed, I am rather averse to making this return
> text[] because it is much easier in my experience to use the results
> when returned in SETOF rather than text[],

The primary use case I know for string splitting is parsing
comma/pipe/whatever separated fields into a row structure, and the way
I see it your API proposal makes that exceptionally difficult.

I don't know what your use case is, though. All of this is missing
actual use cases.

> While, if you
> really really wanted a text[], you could use the (fully documented)
> ARRAY(select resultstr from regexp_split(...) order by startpos)
> construct.

I think, however, that we should be providing simple primitives that can
be combined into complex expressions rather than complex primitives
that have to be dissected apart to get simple results.

> > As for the regexp_matches() function, it seems to me that it
> > returns too much information at once. What is the use case for
> > getting all of prematch, fullmatch, matches, and postmatch in one
> > call?
>
> It was requested by David Fetter:
> http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php
>
> It was not horribly difficult to provide, and it seemed reasonable to
> me. I have no need for them personally.

David Fetter has also repeated failed to offer a use case for this, so I
hesitate to accept this.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: David Fetter <david(at)fetter(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 08:16:06
Message-ID: 20070217081606.GE9182@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, Feb 17, 2007 at 09:02:24AM +0100, Peter Eisentraut wrote:
> Jeremy Drake wrote:
> > In case you haven't noticed, I am rather averse to making this return
> > text[] because it is much easier in my experience to use the results
> > when returned in SETOF rather than text[],
>
> The primary use case I know for string splitting is parsing
> comma/pipe/whatever separated fields into a row structure, and the way
> I see it your API proposal makes that exceptionally difficult.
>
> I don't know what your use case is, though. All of this is missing
> actual use cases.
>
> > While, if you
> > really really wanted a text[], you could use the (fully documented)
> > ARRAY(select resultstr from regexp_split(...) order by startpos)
> > construct.
>
> I think, however, that we should be providing simple primitives that can
> be combined into complex expressions rather than complex primitives
> that have to be dissected apart to get simple results.
>
> > > As for the regexp_matches() function, it seems to me that it
> > > returns too much information at once. What is the use case for
> > > getting all of prematch, fullmatch, matches, and postmatch in one
> > > call?
> >
> > It was requested by David Fetter:
> > http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php
> >
> > It was not horribly difficult to provide, and it seemed reasonable
> > to me. I have no need for them personally.
>
> David Fetter has also repeated failed to offer a use case for this,
> so I hesitate to accept this.

What is it about having the whole match, pre-match and post-match
available that you're objecting to? Are you saying there aren't
common uses for any or all of these? Regular expression users use
them all over the place, and adding this capability to SQL seems like
a reasonable next step :)

Cheers,
D
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 08:23:17
Message-ID: Pine.BSO.4.64.0702170005560.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 17 Feb 2007, Peter Eisentraut wrote:

> Jeremy Drake wrote:
> > In case you haven't noticed, I am rather averse to making this return
> > text[] because it is much easier in my experience to use the results
> > when returned in SETOF rather than text[],
>
> The primary use case I know for string splitting is parsing
> comma/pipe/whatever separated fields into a row structure, and the way
> I see it your API proposal makes that exceptionally difficult.

For this case see string_to_array:
http://developer.postgresql.org/pgdocs/postgres/functions-array.html
select string_to_array('a|b|c', '|');
string_to_array
-----------------
{a,b,c}
(1 row)

> I don't know what your use case is, though. All of this is missing
> actual use cases.

The particular use case I had for this function was at a previous
employer, and I am not sure exactly how much detail is appropriate to
divulge. Basically, the project was doing some text processing inside of
postgres, and getting all of the words from a string into a table with
some processing (excluding stopwords and so forth) as efficiently as
possible was a big concern.

The regexp_split function code was based on some code that a friend of
mine wrote which used PCRE rather than postgres' internal regexp support.
I don't know exactly what his use-case was, but he probably had
one because he wrote the function and had it returning SETOF text ;)
Perhaps he can share a general idea of what it was (nudge nudge)?

> > While, if you
> > really really wanted a text[], you could use the (fully documented)
> > ARRAY(select resultstr from regexp_split(...) order by startpos)
> > construct.
>
> I think, however, that we should be providing simple primitives that can
> be combined into complex expressions rather than complex primitives
> that have to be dissected apart to get simple results.

The most simple primitive is string_to_array(text, text) returns text[],
but it was not sufficient for our needs.

> > > As for the regexp_matches() function, it seems to me that it
> > > returns too much information at once. What is the use case for
> > > getting all of prematch, fullmatch, matches, and postmatch in one
> > > call?
> >
> > It was requested by David Fetter:
> > http://archives.postgresql.org/pgsql-hackers/2007-02/msg00056.php
> >
> > It was not horribly difficult to provide, and it seemed reasonable to
> > me. I have no need for them personally.
>
> David Fetter has also repeated failed to offer a use case for this, so I
> hesitate to accept this.

I have no strong opinion either way, so I will let those who do argue it
out and wait for the dust to settle ;)

--
The Law, in its majestic equality, forbids the rich, as well as the
poor, to sleep under the bridges, to beg in the streets, and to steal
bread.
-- Anatole France


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 09:20:08
Message-ID: 200702171020.09991.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

David Fetter wrote:
> What is it about having the whole match, pre-match and post-match
> available that you're objecting to? Are you saying there aren't
> common uses for any or all of these? Regular expression users use
> them all over the place,

You keep saying that, and I keep saying please show a use case.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, Jeremy Drake <pgsql(at)jdrake(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 16:19:11
Message-ID: 19417.1171729151@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> David Fetter wrote:
>> What is it about having the whole match, pre-match and post-match
>> available that you're objecting to? Are you saying there aren't
>> common uses for any or all of these? Regular expression users use
>> them all over the place,

> You keep saying that, and I keep saying please show a use case.

Maybe I'm missing something, but ISTM that given the ability to return
multiple match substrings there is no need for such features. Given
an arbitrary regex 'xyz', write '^(.*)(xyz)(.*)$' and you'll get back
the pre-match, whole match, and post-match in addition to any
parenthesized substrings that 'xyz' contains. If you only need some
of them, you can leave out the corresponding parts of this pattern.

So I'd vote against complicating the API in order to make special
provision for these results. I claim that all we need is a function
taking (string text, pattern text, flags text) and returning either
array of text or setof text containing the matched substrings in
whatever order is standard (order by left-parenthesis position,
I think). In the degenerate case where there are no parenthesized
subpatterns, just return the whole match as a single element.

As for the argument about array vs setof, I could see doing both to
end the argument of which one is really superior for any particular
problem.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 17:12:54
Message-ID: 200702171812.55703.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> For this case see string_to_array:

That only works with fixed delimiters, and I was hoping that
regexp_split would be an alternative with regexp delimiters. It's
certainly another argument that all the split functions in PostgreSQL
should offer analogous interfaces. There is no reason being offered
why splitting on a fixed string should result in an array and splitting
on a regexp should result in a table.

> The particular use case I had for this function was at a previous
> employer, and I am not sure exactly how much detail is appropriate to
> divulge. Basically, the project was doing some text processing
> inside of postgres, and getting all of the words from a string into a
> table with some processing (excluding stopwords and so forth) as
> efficiently as possible was a big concern.

We already have tsearch for that.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Mark Dilger <pgsql(at)markdilger(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 19:27:06
Message-ID: 45D7570A.2050502@markdilger.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> The regexp_split function code was based on some code that a friend of
> mine wrote which used PCRE rather than postgres' internal regexp support.
> I don't know exactly what his use-case was, but he probably had
> one because he wrote the function and had it returning SETOF text ;)
> Perhaps he can share a general idea of what it was (nudge nudge)?

db=# CREATE OR REPLACE FUNCTION split(p TEXT, t TEXT) RETURNS SETOF TEXT AS $$
db$# my ($p, $t) = @_;
db$# return [ split(/$p/,$t) ];
db$# $$ LANGUAGE plperl;
CREATE FUNCTION
Time: 1.254 ms
db=# select distinct word from (select * from split('\\W+','mary had a little
lamb, whose fleece was black as soot') as word) as ss;
word
--------
a
as
black
fleece
had
lamb
little
mary
soot
was
whose
(11 rows)

Time: 30.517 ms

As you can see, this can easily be done with a plperl function. Some people may
not want to install plperl, or may not want to allow arbitrary patterns to be
handed to perl in this fashion. That was not my concern. I was simply trying
to see if I could make it faster in a C-language coded function.

In the end I dropped the project because the plperl function works fast enough
for me and I don't have any objection to plperl from a security standpoint, etc.

mark


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 19:32:58
Message-ID: Pine.BSO.4.64.0702171124190.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Finally the voice of reason :)
On Sat, 17 Feb 2007, Tom Lane wrote:

> So I'd vote against complicating the API in order to make special
> provision for these results. I claim that all we need is a function
> taking (string text, pattern text, flags text) and returning either
> array of text or setof text

For this function, it would be setof array of text, as the capture groups
would definitely go in an array, but if you asked for global in the flags,
there could be more than one match in the string.

> containing the matched substrings in
> whatever order is standard (order by left-parenthesis position,
> I think). In the degenerate case where there are no parenthesized
> subpatterns, just return the whole match as a single element.

Good idea, that did not occur to me. I was planning to throw an error in
that case.

> As for the argument about array vs setof, I could see doing both to
> end the argument of which one is really superior for any particular
> problem.

The array vs setof argument was on the split function. I will work on
doing both. Any idea how the name and/or arguments should differ? I
think that an optional limit argument for the array version like perl has
would be reasonable, but a name difference in the functions is probably
necessary to avoid confusion.

--
Speaking of Godzilla and other things that convey horror:

With a purposeful grimace and a Mongo-like flair
He throws the spinning disk drives in the air!
And he picks up a Vax and he throws it back down
As he wades through the lab making terrible sounds!
Helpless users with projects due
Scream "My God!" as he stomps on the tape drives, too!

Oh, no! He says Unix runs too slow! Go, go, DECzilla!
Oh, yes! He's gonna bring up VMS! Go, go, DECzilla!"

* VMS is a trademark of Digital Equipment Corporation
* DECzilla is a trademark of Hollow Chocolate Bunnies of Death, Inc.
-- Curtis Jackson


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-17 19:43:14
Message-ID: 23596.1171741394@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> On Sat, 17 Feb 2007, Tom Lane wrote:
>> So I'd vote against complicating the API in order to make special
>> provision for these results. I claim that all we need is a function
>> taking (string text, pattern text, flags text) and returning either
>> array of text or setof text

> For this function, it would be setof array of text, as the capture groups
> would definitely go in an array, but if you asked for global in the flags,
> there could be more than one match in the string.

Oh, right. And you could do a 2-D array if you wanted it all in one
blob (or a guarantee of order). So no need for record-returning functions?

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 00:10:20
Message-ID: Pine.BSO.4.64.0702171602390.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, 17 Feb 2007, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > On Sat, 17 Feb 2007, Tom Lane wrote:
> >> So I'd vote against complicating the API in order to make special
> >> provision for these results. I claim that all we need is a function
> >> taking (string text, pattern text, flags text) and returning either
> >> array of text or setof text
>
> > For this function, it would be setof array of text, as the capture groups
> > would definitely go in an array, but if you asked for global in the flags,
> > there could be more than one match in the string.
>
> Oh, right. And you could do a 2-D array if you wanted it all in one
> blob (or a guarantee of order).

I don't think that there is much of an argument for having this one return
a 2d array, in this particular case, even perl requires you to build a
loop, IIRC.

my $str = 'foobarbecuebazilbarf';
while($str=~/(b[^b]+)(b[^b]+)/g) {
print $1, "\t", length($1), "\n";
print $2, "\t", length($2), "\n";
print "---\n";
}

bar 3
becue 5
---
bazil 5
barf 4
---

> So no need for record-returning functions?
No.

--
Go placidly amid the noise and waste, and remember what value there may
be in owning a piece thereof.
-- National Lampoon, "Deteriorata"


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 06:15:43
Message-ID: Pine.BSO.4.64.0702172207530.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Attached, please find a new version of the patch which follows Tom's
advice (except regexp_matches returing a 2-D array, which I am not sure if
it is required/desired). Hopefully this version makes everyone happy ;)

On Sat, 17 Feb 2007, Tom Lane wrote:

> So I'd vote against complicating the API in order to make special
> provision for these results. I claim that all we need is a function
> taking (string text, pattern text, flags text) and returning either
> array of text or setof text containing the matched substrings in
> whatever order is standard (order by left-parenthesis position,
> I think). In the degenerate case where there are no parenthesized
> subpatterns, just return the whole match as a single element.

The signature is

regexp_matches(string text, pattern text[, flags text]) returns setof
text[]

and behaves as described.

>
> As for the argument about array vs setof, I could see doing both to
> end the argument of which one is really superior for any particular
> problem.

regexp_split(string text, pattern text[, flags text]) returns setof text

regexp_split_array(string text, pattern text[. flags text[, limit int]])
returns text[]

--
When I was in school, I cheated on my metaphysics exam:
I looked into the soul of the boy sitting next to me.
-- Woody Allen

Attachment Content-Type Size
regexp-split-matches-documented_witharray.patch text/plain 45.6 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 17:32:41
Message-ID: 200702181832.43132.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> > As for the argument about array vs setof, I could see doing both to
> > end the argument of which one is really superior for any particular
> > problem.
>
> regexp_split(string text, pattern text[, flags text]) returns setof
> text
>
> regexp_split_array(string text, pattern text[. flags text[, limit
> int]]) returns text[]

Since you are not splitting an array but returning an array, I would
think that "regexp_split_to_array" would be better, and the other
should then be "regexp_split_to_table".

But why does the second one have a limit and the first one doesn't? Is
this because you rely on the LIMIT clause to do the same? Is there a
guarantee that LIMIT on a table function makes a consistent order?

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 20:03:35
Message-ID: Pine.BSO.4.64.0702181202050.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 18 Feb 2007, Peter Eisentraut wrote:

> Jeremy Drake wrote:
> > > As for the argument about array vs setof, I could see doing both to
> > > end the argument of which one is really superior for any particular
> > > problem.
> >
> > regexp_split(string text, pattern text[, flags text]) returns setof
> > text
> >
> > regexp_split_array(string text, pattern text[. flags text[, limit
> > int]]) returns text[]
>
> Since you are not splitting an array but returning an array, I would
> think that "regexp_split_to_array" would be better, and the other
> should then be "regexp_split_to_table".

OK

>
> But why does the second one have a limit and the first one doesn't? Is
> this because you rely on the LIMIT clause to do the same?

Yes

> Is there a
> guarantee that LIMIT on a table function makes a consistent order?

Why wouldn't it?

--
When you are in it up to your ears, keep your mouth shut.


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 23:10:51
Message-ID: Pine.BSO.4.64.0702181508280.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 18 Feb 2007, Jeremy Drake wrote:

> On Sun, 18 Feb 2007, Peter Eisentraut wrote:
>
> > > regexp_split(string text, pattern text[, flags text]) returns setof
> > > text
> > >
> > > regexp_split_array(string text, pattern text[. flags text[, limit
> > > int]]) returns text[]
> >
> > Since you are not splitting an array but returning an array, I would
> > think that "regexp_split_to_array" would be better, and the other
> > should then be "regexp_split_to_table".
>
> OK
>
> > But why does the second one have a limit and the first one doesn't?

I will rename the functions regexp_split_to_(table|array) and I will add
an optional limit parameter to the regexp_split_to_table function, for
consistency and to avoid ordering concerns with LIMIT.

--
Sometimes I worry about being a success in a mediocre world.
-- Lily Tomlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 23:27:11
Message-ID: 15229.1171841231@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> I will rename the functions regexp_split_to_(table|array) and I will add
> an optional limit parameter to the regexp_split_to_table function, for
> consistency and to avoid ordering concerns with LIMIT.

I'd go the other way: get rid of the limit option all 'round. It seems
like fairly useless complexity.

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-18 23:56:08
Message-ID: Pine.BSO.4.64.0702181553300.18849@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 18 Feb 2007, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > I will rename the functions regexp_split_to_(table|array) and I will add
> > an optional limit parameter to the regexp_split_to_table function, for
> > consistency and to avoid ordering concerns with LIMIT.
>
> I'd go the other way: get rid of the limit option all 'round. It seems
> like fairly useless complexity.

OK, here is what is hopefully the final version of this patch. I have
renamed the functions as above, and removed the optional limit parameter.
If there are no further complaints, this should be ready to apply.

--
It is the business of little minds to shrink.
-- Carl Sandburg

Attachment Content-Type Size
regexp-split-matches-documented-renamed.patch text/plain 44.7 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-02-20 22:18:22
Message-ID: 200702202218.l1KMIMN13820@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Jeremy Drake wrote:
> On Sun, 18 Feb 2007, Tom Lane wrote:
>
> > Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > > I will rename the functions regexp_split_to_(table|array) and I will add
> > > an optional limit parameter to the regexp_split_to_table function, for
> > > consistency and to avoid ordering concerns with LIMIT.
> >
> > I'd go the other way: get rid of the limit option all 'round. It seems
> > like fairly useless complexity.
>
> OK, here is what is hopefully the final version of this patch. I have
> renamed the functions as above, and removed the optional limit parameter.
> If there are no further complaints, this should be ready to apply.
>
> --
> It is the business of little minds to shrink.
> -- Carl Sandburg
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-18 05:47:07
Message-ID: Pine.BSO.4.64.0703172243020.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sun, 18 Feb 2007, Jeremy Drake wrote:

> OK, here is what is hopefully the final version of this patch. I have
> renamed the functions as above, and removed the optional limit parameter.
> If there are no further complaints, this should be ready to apply.

The patch has been sitting in the unapplied patches queue for a while, and
the inevitable bitrot finally caught up with it. This version of the
patch is exactly the same as the one in the patches queue, except for
using new, non-conflicting oids for the functions.

--
The gods gave man fire and he invented fire engines. They gave him
love and he invented marriage.

Attachment Content-Type Size
regexp-split-matches-documented-newoids.patch text/plain 44.7 KB

From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-18 05:49:30
Message-ID: 45FCD2EA.5030600@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> The patch has been sitting in the unapplied patches queue for a while
Barring any objections, I'll apply this tomorrow.

-Neil


From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-20 21:19:05
Message-ID: 46004FC9.3000204@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> The patch has been sitting in the unapplied patches queue for a while, and
> the inevitable bitrot finally caught up with it. This version of the
> patch is exactly the same as the one in the patches queue, except for
> using new, non-conflicting oids for the functions.
>
Applied -- thanks for the patch. BTW, a few cosmetic comments:

* #includes should be sorted alphabetically (unless another #include
sort order rules take precedence, like including "postgres.h" first).

* patch included some trailing CR line endings

* IMHO using "++var" in the increment component of a for loop is bad
style in C (if only because it is needlessly inconsistent; good C++
style is not necessarily good C style)

* Comments like "/* get text type oid, too lazy to do it some other way
*/" do not exactly inspire confidence :)

-Neil


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-21 05:50:06
Message-ID: 22492.1174456206@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Neil Conway <neilc(at)samurai(dot)com> writes:
> * Comments like "/* get text type oid, too lazy to do it some other way
> */" do not exactly inspire confidence :)

Surely the code was just using the TEXTOID macro? If so, it does not
require a comment. If not, it should be fixed ...

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-21 06:12:55
Message-ID: Pine.BSO.4.64.0703202308390.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 21 Mar 2007, Tom Lane wrote:

> Neil Conway <neilc(at)samurai(dot)com> writes:
> > * Comments like "/* get text type oid, too lazy to do it some other way
> > */" do not exactly inspire confidence :)
>
> Surely the code was just using the TEXTOID macro? If so, it does not
> require a comment. If not, it should be fixed ...

Nope, it does get_fn_expr_argtype(fcinfo->flinfo, 0); The too lazy remark
was that I thought there may be a better way (like the macro you
mentioned) but did not go looking for it because I already had something
that worked that I found in the manual. If you like, I can put together
another patch that removes the param_type members from the structs and
hardcodes TEXTOID in the proper places. BTW, should I be calling
get_typlenbyvalalign on TEXTOID or are there macros for those also?

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

--
We're only in it for the volume.
-- Black Sabbath


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jeremy Drake" <pgsql(at)jdrake(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "David Fetter" <david(at)fetter(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL Patches" <pgsql-patches(at)postgresql(dot)org>, "Mark Dilger" <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-21 12:34:12
Message-ID: 87r6rihj23.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Jeremy Drake" <pgsql(at)jdrake(dot)com> writes:

> BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
> for those also?

Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
constants used throughout the source code. I'm surprised there isn't a macro
for it though.

Do you need the alignment? If so I want to check the code against the packed
varlena patch. Just in case.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Jeremy Drake" <pgsql(at)jdrake(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Neil Conway" <neilc(at)samurai(dot)com>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "David Fetter" <david(at)fetter(dot)org>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "PostgreSQL Patches" <pgsql-patches(at)postgresql(dot)org>, "Mark Dilger" <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-21 12:51:38
Message-ID: 87k5xahi91.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


"Gregory Stark" <stark(at)enterprisedb(dot)com> writes:

> "Jeremy Drake" <pgsql(at)jdrake(dot)com> writes:
>
>> BTW, should I be calling get_typlenbyvalalign on TEXTOID or are there macros
>> for those also?
>
> Hardcoding -1 for typlen of varlenas is one of the few (the only?) magic
> constants used throughout the source code. I'm surprised there isn't a macro
> for it though.
>
> Do you need the alignment? If so I want to check the code against the packed
> varlena patch. Just in case.

Ah, it's just to construct an array, that's not a concern at all. And you're
detoasting the text data types before using or storing them so that's fine.

The only thing I would say is that this should maybe be a TextGetDatum() just
for code hygiene. It should be exactly equivalent though:

+ PointerGetDatum(matchctx->orig_str),

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-21 13:45:48
Message-ID: 27337.1174484748@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> BTW, should I be calling
> get_typlenbyvalalign on TEXTOID or are there macros for those also?

By and large we tend to hard-wire those properties too, eg there are
plenty of places that do things like this:

/* XXX: this hardcodes assumptions about the regtype type */
result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');

Some are better commented than others ...

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 00:09:22
Message-ID: Pine.BSO.4.64.0703211649240.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 21 Mar 2007, Tom Lane wrote:

> Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> > BTW, should I be calling
> > get_typlenbyvalalign on TEXTOID or are there macros for those also?
>
> By and large we tend to hard-wire those properties too, eg there are
> plenty of places that do things like this:
>
> /* XXX: this hardcodes assumptions about the regtype type */
> result = construct_array(tmp_ary, num_params, REGTYPEOID, 4, true, 'i');
>
> Some are better commented than others ...

So, what action (if any) should be taken? Should all (or some) of these
values be hardcoded, or should the current calls to determine them be left
in place?

>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>

--
Fortune's Real-Life Courtroom Quote #19:

Q: Doctor, how many autopsies have you performed on dead people?
A: All my autopsies have been performed on dead people.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 04:42:23
Message-ID: 25643.1174538543@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> On Wed, 21 Mar 2007, Tom Lane wrote:
>> By and large we tend to hard-wire those properties too, eg there are
>> plenty of places that do things like this:

> So, what action (if any) should be taken? Should all (or some) of these
> values be hardcoded, or should the current calls to determine them be left
> in place?

I'd vote for making this new code look like the rest of it, to wit
hardwire the values. As-is, you're expending code space and cycles
on flexibility that's entirely illusory --- if we ever decided to
change these properties of TEXT, we'd have way more work to do than
just fixing the new regexp functions.

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 05:17:03
Message-ID: Pine.BSO.4.64.0703212215250.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Wed, 21 Mar 2007, Gregory Stark wrote:

> The only thing I would say is that this should maybe be a TextGetDatum() just
> for code hygiene. It should be exactly equivalent though:

I could not find such a thing using ctags, nor TextPGetDatum(). I looked
at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses
PointerGetDatum. If there is such a thing, it is well camoflaged...

--
If you think the problem is bad now, just wait until we've solved it.
-- Arthur Kasspe


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 05:21:28
Message-ID: Pine.BSO.4.64.0703212218450.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 22 Mar 2007, Tom Lane wrote:

> I'd vote for making this new code look like the rest of it, to wit
> hardwire the values.

Attached please find a patch which does this.

--
Although written many years ago, Lady Chatterley's Lover has just been
reissued by the Grove Press, and this pictorial account of the
day-to-day life of an English gamekeeper is full of considerable
interest to outdoor minded readers, as it contains many passages on
pheasant-raising, the apprehending of poachers, ways to control vermin,
and other chores and duties of the professional gamekeeper.
Unfortunately, one is obliged to wade through many pages of extraneous
material in order to discover and savour those sidelights on the
management of a midland shooting estate, and in this reviewer's opinion
the book cannot take the place of J. R. Miller's "Practical
Gamekeeping."
-- Ed Zern, "Field and Stream" (Nov. 1959)

Attachment Content-Type Size
regexp-funcs-hardcode-texttype.patch text/plain 2.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 05:32:53
Message-ID: 26219.1174541573@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake <pgsql(at)jdrake(dot)com> writes:
> I could not find such a thing using ctags, nor TextPGetDatum(). I looked
> at PG_RETURN_TEXT_P and it just uses PG_RETURN_POINTER, which in turn uses
> PointerGetDatum. If there is such a thing, it is well camoflaged...

AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
datatypes) is lack of obvious usefulness. A function dealing with a
"text *" doesn't normally have reason to convert that to a Datum until
it returns --- and at that point PG_RETURN_TEXT_P is the thing to use.
Do you have a counterexample, or does this just suggest that the regexp
function patch needs some refactoring?

regards, tom lane


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-22 05:47:44
Message-ID: Pine.BSO.4.64.0703212241550.19070@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Thu, 22 Mar 2007, Tom Lane wrote:

> AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
> datatypes) is lack of obvious usefulness. A function dealing with a
> "text *" doesn't normally have reason to convert that to a Datum until
> it returns --- and at that point PG_RETURN_TEXT_P is the thing to use.
> Do you have a counterexample, or does this just suggest that the regexp
> function patch needs some refactoring?

If you are asking why I have reason to convert text * to a Datum in cases
other than PG_RETURN_TEXT_P, it is used for calling text_substr functions
using DirectFunctionCallN. BTW, this usage of text_substr using
PointerGetDatum was copied from the pre-existing textregexsubstr function.

--
Malek's Law:
Any simple idea will be worded in the most complicated way.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 00:52:47
Message-ID: 200703270052.l2R0qlx14787@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> On Thu, 22 Mar 2007, Tom Lane wrote:
>
> > AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
> > datatypes) is lack of obvious usefulness. A function dealing with a
> > "text *" doesn't normally have reason to convert that to a Datum until
> > it returns --- and at that point PG_RETURN_TEXT_P is the thing to use.
> > Do you have a counterexample, or does this just suggest that the regexp
> > function patch needs some refactoring?
>
> If you are asking why I have reason to convert text * to a Datum in cases
> other than PG_RETURN_TEXT_P, it is used for calling text_substr functions
> using DirectFunctionCallN. BTW, this usage of text_substr using
> PointerGetDatum was copied from the pre-existing textregexsubstr function.

Is there a follup patch based on this discussion?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 02:03:01
Message-ID: 9846.1174960981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Jeremy Drake wrote:
>> On Thu, 22 Mar 2007, Tom Lane wrote:
>>> AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
>>> datatypes) is lack of obvious usefulness.

>> If you are asking why I have reason to convert text * to a Datum in cases
>> other than PG_RETURN_TEXT_P, it is used for calling text_substr functions
>> using DirectFunctionCallN. BTW, this usage of text_substr using
>> PointerGetDatum was copied from the pre-existing textregexsubstr function.

> Is there a follup patch based on this discussion?

Not at the moment. I suppose someone could run around and replace
PointerGetDatum by (exactly-equivalent) TextPGetDatum etc, but it seems
like mostly make-work. I definitely don't want to spend time on such
a project for 8.3.

Or were you speaking to the question of whether to adjust the regexp
patch to conform more nearly to the coding practices found elsewhere?
I agree with that, but I thought there was already a submitted patch
for it.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 02:09:00
Message-ID: 200703270209.l2R290l09647@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Jeremy Drake wrote:
> >> On Thu, 22 Mar 2007, Tom Lane wrote:
> >>> AFAIR, the reason there's no TextPGetDatum (and ditto for lots of other
> >>> datatypes) is lack of obvious usefulness.
>
> >> If you are asking why I have reason to convert text * to a Datum in cases
> >> other than PG_RETURN_TEXT_P, it is used for calling text_substr functions
> >> using DirectFunctionCallN. BTW, this usage of text_substr using
> >> PointerGetDatum was copied from the pre-existing textregexsubstr function.
>
> > Is there a follup patch based on this discussion?
>
> Not at the moment. I suppose someone could run around and replace
> PointerGetDatum by (exactly-equivalent) TextPGetDatum etc, but it seems
> like mostly make-work. I definitely don't want to spend time on such
> a project for 8.3.
>
> Or were you speaking to the question of whether to adjust the regexp
> patch to conform more nearly to the coding practices found elsewhere?
> I agree with that, but I thought there was already a submitted patch
> for it.

Yes, regex patch adjustment, and I have not seen a patch which makes
such adjustments.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 02:18:38
Message-ID: Pine.BSO.4.64.0703261916010.685@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, 26 Mar 2007, Bruce Momjian wrote:

> Tom Lane wrote:
> > Or were you speaking to the question of whether to adjust the regexp
> > patch to conform more nearly to the coding practices found elsewhere?
> > I agree with that, but I thought there was already a submitted patch
> > for it.
>
> Yes, regex patch adjustment, and I have not seen a patch which makes
> such adjustments.

http://archives.postgresql.org/pgsql-patches/2007-03/msg00285.php

--
Pope Goestheveezl was the shortest reigning pope in the history of the
Church, reigning for two hours and six minutes on 1 April 1866. The
white smoke had hardly faded into the blue of the Vatican skies before
it dawned on the assembled multitudes in St. Peter's Square that his
name had hilarious possibilities. The crowds fell about, helpless with
laughter, singing

Half a pound of tuppenny rice
Half a pound of treacle
That's the way the chimney smokes
Pope Goestheveezl

The square was finally cleared by armed carabineri with tears of
laughter streaming down their faces. The event set a record for
hilarious civic functions, smashing the previous record set when Baron
Hans Neizant B"ompzidaize was elected Landburgher of K"oln in 1653.
-- Mike Harding, "The Armchair Anarchist's Almanac"


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Gregory Stark <stark(at)enterprisedb(dot)com>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 13:56:04
Message-ID: 200703271356.l2RDu4T04370@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> On Mon, 26 Mar 2007, Bruce Momjian wrote:
>
> > Tom Lane wrote:
> > > Or were you speaking to the question of whether to adjust the regexp
> > > patch to conform more nearly to the coding practices found elsewhere?
> > > I agree with that, but I thought there was already a submitted patch
> > > for it.
> >
> > Yes, regex patch adjustment, and I have not seen a patch which makes
> > such adjustments.
>
> http://archives.postgresql.org/pgsql-patches/2007-03/msg00285.php

Ah, yes, I still have that in my mailbox, but didn't think it was
related because there was discussion _after_ the patch was posted. Will
add the patch to the queue now. Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-27 13:56:12
Message-ID: 200703271356.l2RDuCO04629@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Jeremy Drake wrote:
> On Thu, 22 Mar 2007, Tom Lane wrote:
>
> > I'd vote for making this new code look like the rest of it, to wit
> > hardwire the values.
>
> Attached please find a patch which does this.
>
>
>
> --
> Although written many years ago, Lady Chatterley's Lover has just been
> reissued by the Grove Press, and this pictorial account of the
> day-to-day life of an English gamekeeper is full of considerable
> interest to outdoor minded readers, as it contains many passages on
> pheasant-raising, the apprehending of poachers, ways to control vermin,
> and other chores and duties of the professional gamekeeper.
> Unfortunately, one is obliged to wade through many pages of extraneous
> material in order to discover and savour those sidelights on the
> management of a midland shooting estate, and in this reviewer's opinion
> the book cannot take the place of J. R. Miller's "Practical
> Gamekeeping."
> -- Ed Zern, "Field and Stream" (Nov. 1959)
Content-Description:

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Jeremy Drake <pgsql(at)jdrake(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-28 19:00:22
Message-ID: Pine.BSO.4.64.0703281154500.685@resin.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

> Jeremy Drake wrote:
> > On Thu, 22 Mar 2007, Tom Lane wrote:
> >
> > > I'd vote for making this new code look like the rest of it, to wit
> > > hardwire the values.
> >
> > Attached please find a patch which does this.

I just realized that the last patch removed all usage of fcinfo in the
setup_regexp_matches function, so this version of the patch also removes
it as a parameter to that function.

--
Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!

Attachment Content-Type Size
regexp-funcs-hardcode-texttype-2.patch text/plain 4.7 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Neil Conway <neilc(at)samurai(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-28 21:09:37
Message-ID: 200703282109.l2SL9bH17362@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

Jeremy Drake wrote:
> > Jeremy Drake wrote:
> > > On Thu, 22 Mar 2007, Tom Lane wrote:
> > >
> > > > I'd vote for making this new code look like the rest of it, to wit
> > > > hardwire the values.
> > >
> > > Attached please find a patch which does this.
>
> I just realized that the last patch removed all usage of fcinfo in the
> setup_regexp_matches function, so this version of the patch also removes
> it as a parameter to that function.
>
> --
> Think of it! With VLSI we can pack 100 ENIACs in 1 sq. cm.!
Content-Description:

[ Attachment, skipping... ]

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Neil Conway <neilc(at)samurai(dot)com>
To: Jeremy Drake <pgsql(at)jdrake(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, David Fetter <david(at)fetter(dot)org>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Mark Dilger <pgsql(at)markdilger(dot)com>
Subject: Re: patch adding new regexp functions
Date: 2007-03-28 23:00:32
Message-ID: 460AF390.3060303@samurai.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Jeremy Drake wrote:
> I just realized that the last patch removed all usage of fcinfo in the
> setup_regexp_matches function, so this version of the patch also removes
> it as a parameter to that function.
Applied, thanks.

-Neil