string_to_array() is confused by ambiguous field separator

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-bugs(at)postgreSQL(dot)org
Subject: string_to_array() is confused by ambiguous field separator
Date: 2006-10-06 21:12:10
Message-ID: 6008.1160169130@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Good:

regression=# select string_to_array('123xx456xx789', 'xx');
string_to_array
-----------------
{123,456,789}
(1 row)

Not so good:

regression=# select string_to_array('123xx456xxx789', 'xx');
ERROR: negative substring length not allowed

The proximate problem is that in the inner loop in text_position(),
if it finds a match but hasn't yet found matchnum of them, it advances
only one character instead of advancing over the whole match. This
means it can report overlapping successive matches, which leads to an
invalid subscript calculation in text_to_array(). I think the correct
approach is to ignore overlapping matches, so that the result in the
second case would be
{123,456,x789}

There's another problem here, which is that the API of text_position()
is poorly chosen anyway: as defined, parsing a string of N fields
requires O(N^2) work. It'd be better to pass it a starting character
number for the search instead of a field number to find, and to break
out the setup step so that we don't have to repeat the conversion to
pg_wchar format for each field.

Any objections?

regards, tom lane

Browse pgsql-bugs by date

  From Date Subject
Next Message Jean Tourrilhes 2006-10-07 01:28:44 BUG #2681: duplicate key violates unique constraint
Previous Message Tom Lane 2006-10-06 19:07:18 Re: BUG #2674: libedit not detected