Re: regexp_split_to_array hangs backend

Lists: pgsql-hackers
From: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: regexp_split_to_array hangs backend
Date: 2007-08-10 11:26:40
Message-ID: 162867790708100426p73df8f32n47d08ff17822133e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

I found small bug

regexp_split_to_array('123456','1');
regexp_split_to_array('123456','6');
regexp_split_to_array('123456','.');

these parameters hangs backend.

following patch correct it

Regards
Pavel Stehule

./regexp.c
*** ./regexp.c.orig 2007-08-10 14:17:15.000000000 +0200
--- ./regexp.c 2007-08-10 14:19:36.000000000 +0200
***************
*** 1048,1053 ****
--- 1048,1056 ----
{
int length = splitctx->match.rm_so - startpos + 1;

+ /* set the offset to the end of this match for
next time */
+ splitctx->offset = pmatch->rm_eo;
+
/*
* If we are trying to match at the beginning
of the string and
* we got a zero-length match, or if we just
matched where we
***************
*** 1063,1070 ****

Int32GetDatum(startpos),

Int32GetDatum(length));

- /* set the offset to the end of this match for
next time */
- splitctx->offset = pmatch->rm_eo;

return result;
}
--- 1066,1071 ----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: regexp_split_to_array hangs backend
Date: 2007-08-10 22:52:36
Message-ID: 15028.1186786356@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
> I found small bug

> regexp_split_to_array('123456','1');
> regexp_split_to_array('123456','6');
> regexp_split_to_array('123456','.');

> these parameters hangs backend.

This code's got more problems than that :-(

The one that's bothering me right now is that regexp_match() and
regexp_split() cache a compiled regex on first entry to the function,
and then blithely assume it will still be there on repeated calls.

I think probably the best thing to do is do all the matching on the
first call, and have the saved state include an array of character
positions of matches; then repeat calls to the SRF just iterate through
the array.

It seems a bit short of comments too. Working on it now.

regards, tom lane