BUG #3645: regular expression back references seem broken

Lists: pgsql-bugs
From: "Eric Haszlakiewicz" <erh+pgsql(at)swapsimple(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3645: regular expression back references seem broken
Date: 2007-10-01 00:43:49
Message-ID: 200710010043.l910hn3H021900@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3645
Logged by: Eric Haszlakiewicz
Email address: erh+pgsql(at)swapsimple(dot)com
PostgreSQL version: 8.2.5
Operating system: NetBSD
Description: regular expression back references seem broken
Details:

I was attempting to create a simple regular expression that uses back
references and I noticed some very odd behaviour. This regexp is supposed
to match a string where all the characters are the same:

^(.)\1*$

If I try it, it doesn't work. I would expect this to return false:

template1=# select 'xyz' ~ E'^(.)\\1*$';
?column?
----------
t
(1 row)

But adding some extra parens does:
template1=# select 'xyz' ~ E'^(.)(\\1)*$';
?column?
----------
f
(1 row)

As does changing the "." to an "x":

template1=# select 'xyz' ~ E'^(x)\\1*$';
?column?
----------
f
(1 row)

As does forcing it to be a extended regular expression:

template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
?column?
----------
f
(1 row)

The docs claim: "A single non-zero digit, not followed by another digit, is
always taken as a back reference." (The note at the end of 9.7.3.3)

It's relatively easy to work around the problem, but it certainly led to a
fair bit of head scratching while trying to debug some code. :)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Eric Haszlakiewicz" <erh+pgsql(at)swapsimple(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3645: regular expression back references seem broken
Date: 2007-10-01 15:34:54
Message-ID: 7405.1191252894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Eric Haszlakiewicz" <erh+pgsql(at)swapsimple(dot)com> writes:
> I would expect this to return false:

> template1=# select 'xyz' ~ E'^(.)\\1*$';
> ?column?
> ----------
> t
> (1 row)

Seems to be a bug in the Tcl regexp library we use. It's already
reported upstream:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894

regards, tom lane


From: Eric Haszlakiewicz <erh+pgsql(at)swapsimple(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3645: regular expression back references seem broken
Date: 2007-10-02 08:07:05
Message-ID: 4701FC29.50609@swapsimple.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> "Eric Haszlakiewicz" <erh+pgsql(at)swapsimple(dot)com> writes:
>> I would expect this to return false:
>
>> template1=# select 'xyz' ~ E'^(.)\\1*$';
>> ?column?
>> ----------
>> t
>> (1 row)
>
> Seems to be a bug in the Tcl regexp library we use. It's already
> reported upstream:
> https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894
>
> regards, tom lane

er.. it's been languishing there for over 2 years. That doesn't sound
very promising for getting it fixed. :(

eric


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Eric Haszlakiewicz <erh+pgsql(at)swapsimple(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3645: regular expression back references seem broken
Date: 2008-03-25 00:00:35
Message-ID: 200803250000.m2P00Z109617@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


Added to TODO:

* Fix regular expression bug when using complex back-references

http://archives.postgresql.org/pgsql-bugs/2007-10/msg00000.php

---------------------------------------------------------------------------

Eric Haszlakiewicz wrote:
>
> The following bug has been logged online:
>
> Bug reference: 3645
> Logged by: Eric Haszlakiewicz
> Email address: erh+pgsql(at)swapsimple(dot)com
> PostgreSQL version: 8.2.5
> Operating system: NetBSD
> Description: regular expression back references seem broken
> Details:
>
> I was attempting to create a simple regular expression that uses back
> references and I noticed some very odd behaviour. This regexp is supposed
> to match a string where all the characters are the same:
>
> ^(.)\1*$
>
> If I try it, it doesn't work. I would expect this to return false:
>
> template1=# select 'xyz' ~ E'^(.)\\1*$';
> ?column?
> ----------
> t
> (1 row)
>
> But adding some extra parens does:
> template1=# select 'xyz' ~ E'^(.)(\\1)*$';
> ?column?
> ----------
> f
> (1 row)
>
> As does changing the "." to an "x":
>
> template1=# select 'xyz' ~ E'^(x)\\1*$';
> ?column?
> ----------
> f
> (1 row)
>
> As does forcing it to be a extended regular expression:
>
>
> template1=# select 'xyz' ~ E'(?e)^(.)\\1*$';
> ?column?
> ----------
> f
> (1 row)
>
> The docs claim: "A single non-zero digit, not followed by another digit, is
> always taken as a back reference." (The note at the end of 9.7.3.3)
>
> It's relatively easy to work around the problem, but it certainly led to a
> fair bit of head scratching while trying to debug some code. :)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +