Re: [v9.2] make_greater_string() does not return a string in some cases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [v9.2] make_greater_string() does not return a string in some cases
Date: 2011-09-22 15:46:43
Message-ID: 21348.1316706403@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> One thing I was thinking about is that it would be useful to have some
> metric for judging how well any given algorithm that we might pick
> here actually works.

Well, the metric that we were indirectly using earlier was the
number of characters in a given locale for which the algorithm
fails to find a greater one (excluding whichever character is "last",
I guess, or you could just recognize there's always at least one).

> For example, if we were to try all possible
> three character strings in some encoding and run make_greater_string()
> on each one of them, we could then measure the failure percentage. Or
> if that's too many cases to crank through then we could limit it some
> way -

Even in UTF8 there's only a couple million assigned code points, so for
test purposes anyway it doesn't seem like we couldn't crank through them
all. Also, in many cases you could probably figure it out by analysis
instead of brute-force testing every case.

A more reasonable objection might be that a whole lot of those code
points are things nobody cares about, and so we need to weight the
results somehow by the actual popularity of the character. Not sure
how to take that into account.

Another issue here is that we need to consider not just whether we find
a greater character, but "how much greater" it is. This would apply to
my suggestion of incrementing the top byte without considering
lower-order bytes --- we'd be skipping quite a lot of code space for
each increment, and it's conceivable that that would be quite hurtful in
some cases. Not sure how to account for that either. An extreme
example here is an "incrementer" that just immediately returns the last
character in the sort order for any lesser input.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2011-09-22 16:09:45 Re: [v9.2] make_greater_string() does not return a string in some cases
Previous Message Robert Haas 2011-09-22 15:26:56 Re: [v9.2] make_greater_string() does not return a string in some cases

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2011-09-22 15:49:42 Re: unaccent contrib
Previous Message Alvaro Herrera 2011-09-22 15:44:58 Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs)