Re: patch adding new regexp functions

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jeremy Drake <pgsql(at)jdrake(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>, Neil Conway <neilc(at)samurai(dot)com>, David Fetter <david(at)fetter(dot)org>
Subject: Re: patch adding new regexp functions
Date: 2007-02-15 15:56:25
Message-ID: 20070215155625.GM4682@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > so that you would have the position for each match, automatically. Is
> > this information available in the regex code?
>
> Certainly, that's where we got the text snippets from to begin with.
> However, I'm not sure that this is important enough to justify a special
> type --- for one thing, since we don't have arrays of composites, that
> would foreclose responding to Peter's concern that SETOF is the wrong
> thing.

My point is that if you want to have the order in which the matches were
found, you can do that easily by looking at the positions; no need to
create an ordered array. Which does respond to Peter's concern, since
the point was to keep the ordering of matches, which an array does; but
if we provide the positions, the SETOF way does as well.

On the other hand, I don't think it's impossible to have matches that
start earlier than others in the string, but are actually found later
(say, because they are a parentized expression that ends later). So
giving the starting positions allows one to know where are they located,
rather than where were they reported. (I don't really know if the
matches are sorted before reporting though.)

> If you look at the Perl and Tcl APIs for regexes, they return
> just the strings, not the numerical positions; and I've not heard anyone
> complaining about that.

I know, but that may be just because it would be too much extra
complexity for them (in terms of user API) to be returning the positions
along the text. I know I'd be fairly annoyed if =~ in Perl returned an
array of hashes { text => 'foo', position => 42} instead of array of
text. We don't have that problem.

In fact, I would claim that's much easier to deal with a SETOF function
than is to deal with text[].

Regarding the "nobody complains" argument, I don't find that
particularly compelling; witness how people gets used to working around
limitations in MySQL ... ;-)

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2007-02-15 16:28:17 Re: ERROR: failed to build any 8-way joins
Previous Message Tom Lane 2007-02-15 15:42:49 Re: Plan for compressed varlena headers

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-02-15 16:01:47 Re: Autovacuum launcher
Previous Message Alvaro Herrera 2007-02-15 15:43:31 Re: Autovacuum launcher