Re: [v9.2] make_greater_string() does not return a string in some cases

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-23 13:11:59
Message-ID: CA+TgmoavQP28OY0QRkXSQiS131u2sEFc7e72yU3x=pjr0BPU=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Sep 23, 2011 at 8:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Sep 22, 2011 at 10:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Anyway, I won't stand in the way of the patch as long as it's modified
>>> to limit the number of values considered for any one character position
>>> to something reasonably small.
>
>> I think that limit in both the old and new code is 1, except that the
>> new code does it more efficiently.
>
>> Am I confused?
>
> Yes, or else I am.  Consider a 4-byte UTF8 character at the end of the
> string.  The existing code increments the last byte up to 255 (rejecting
> everything past 0xBF), then gives up and truncates that character away.
> So the maximum number of tries for that character position is between 0
> and 127 depending on what the original character was (with at most 63 of
> the incremented values getting past the verifymbstr test).
>
> The proposed patch is going to iterate through all Unicode code points
> up to U+7FFFFF before giving up.  Since it's possible that we need to
> increment something further left to succeed at all, this doesn't seem
> like a good plan.

I think you're misreading the code. It does this:

while (len > 0)
{
boring stuff;
if (charincfunc(lastchar, charlen))
{
more boring stuff;
if (we made a greater string)
return it;
cleanup;
}
truncate away last character;
}

I don't see how that's ever going to try more than one character in
the same position.

What may be confusing you is that the old code has two loops: an outer
loop that tests whether we've made a greater string, and an inner loop
that tests whether we've made a validly encoded string at all. In the
new code, at least in the UTF-8 case, the inner loop is GONE
altogether. Instead of iterating until we construct a valid
character, we just use our mad UTF-8 skillz to assemble one, and
return it.

Or else I need to go drink a few cups of tea and look at this again.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Magnus Hagander 2011-09-23 13:39:46 Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present
Previous Message Lou Picciano 2011-09-23 13:01:37 Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-09-23 13:29:01 Re: new createuser option for replication role
Previous Message Lou Picciano 2011-09-23 13:01:37 Re: Re: [BUGS] BUG #6189: libpq: sslmode=require verifies server certificate if root.crt is present