Lists: | pgsql-hackers |
---|
From: | "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> |
---|---|
To: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | regexp_split_to_array hangs backend |
Date: | 2007-08-10 11:26:40 |
Message-ID: | 162867790708100426p73df8f32n47d08ff17822133e@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hello,
I found small bug
regexp_split_to_array('123456','1');
regexp_split_to_array('123456','6');
regexp_split_to_array('123456','.');
these parameters hangs backend.
following patch correct it
Regards
Pavel Stehule
./regexp.c
*** ./regexp.c.orig 2007-08-10 14:17:15.000000000 +0200
--- ./regexp.c 2007-08-10 14:19:36.000000000 +0200
***************
*** 1048,1053 ****
--- 1048,1056 ----
{
int length = splitctx->match.rm_so - startpos + 1;
+ /* set the offset to the end of this match for
next time */
+ splitctx->offset = pmatch->rm_eo;
+
/*
* If we are trying to match at the beginning
of the string and
* we got a zero-length match, or if we just
matched where we
***************
*** 1063,1070 ****
Int32GetDatum(startpos),
Int32GetDatum(length));
- /* set the offset to the end of this match for
next time */
- splitctx->offset = pmatch->rm_eo;
return result;
}
--- 1066,1071 ----
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: regexp_split_to_array hangs backend |
Date: | 2007-08-10 22:52:36 |
Message-ID: | 15028.1186786356@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
"Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com> writes:
> I found small bug
> regexp_split_to_array('123456','1');
> regexp_split_to_array('123456','6');
> regexp_split_to_array('123456','.');
> these parameters hangs backend.
This code's got more problems than that :-(
The one that's bothering me right now is that regexp_match() and
regexp_split() cache a compiled regex on first entry to the function,
and then blithely assume it will still be there on repeated calls.
I think probably the best thing to do is do all the matching on the
first call, and have the saved state include an array of character
positions of matches; then repeat calls to the SRF just iterate through
the array.
It seems a bit short of comments too. Working on it now.
regards, tom lane