Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern

From: Jeevan Chalke <jeevan(dot)chalke(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: REGEXP_MATCHES() strange behavior with '^' and '$' pattern
Date: 2013-07-31 12:41:54
Message-ID: CAM2+6=WtoQkTv_vyTNrDw-PC=XheFz8KK1Ng6udS+nK8JAfMCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oops forgot patch.

Attached now.

On Wed, Jul 31, 2013 at 6:03 PM, Jeevan Chalke <
jeevan(dot)chalke(at)enterprisedb(dot)com> wrote:

> Hi,
>
> While playing with regular expression I found some strange behavior of
> regexp_matches() function.
>
> Consider following sql query and its output:
>
> postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3'
> || chr(10) || '4', '^', 'mg');
> regexp_matches
> ----------------
> {""}
> {""}
> {""}
> {""}
> {""}
> {""}
> {""}
> (7 rows)
>
> It suppose to return me 4 rows and not 7. Similar behavior found with
> pattern '$'.
>
> It seems that these start and end anchor characters are not matching
> correctly. Or rather they are matching twice.
>
> To get a root cause of it, I put elog(INFO,..) into the
> setup_regexp_matches() function where we copy matches into the struct and
> found following values.
>
>
> postgres=# select regexp_matches('1' || chr(10) || '2' || chr(10) || '3'
> || chr(10) || '4', '^', 'mg');
> INFO: start_search: 0 rm_so: 0 rm_eo: 0
> INFO: updated start_search: 1
> INFO: start_search: 1 rm_so: 2 rm_eo: 2
> INFO: updated start_search: 2
> INFO: start_search: 2 rm_so: 2 rm_eo: 2
> INFO: updated start_search: 3
> INFO: start_search: 3 rm_so: 4 rm_eo: 4
> INFO: updated start_search: 4
> INFO: start_search: 4 rm_so: 4 rm_eo: 4
> INFO: updated start_search: 5
> INFO: start_search: 5 rm_so: 6 rm_eo: 6
> INFO: updated start_search: 6
> INFO: start_search: 6 rm_so: 6 rm_eo: 6
> INFO: updated start_search: 7
>
> Certainly, after second pass, updated start_search should be 3 as last
> matched pattern was at 2 and of zero length since so = eo.
>
> I have modified that logic to look similar as that of replace_text_regexp()
> function. As regexp_replace works well.
>
> Attached patch with test-case. Please have a look and let me know if I
> assumed something wrong.
>
> Thanks
>
> --
> Jeevan B Chalke
>
>

--
Jeevan B Chalke

Attachment Content-Type Size
regexp_matches_bug_with_zero_match_string.patch application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2013-07-31 13:43:48 Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Previous Message Jeevan Chalke 2013-07-31 12:33:03 REGEXP_MATCHES() strange behavior with '^' and '$' pattern