Re: multibyte-character aware support for function "downcase_truncate_identifier()"

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Rajanikant Chirmade <rajanikant(dot)chirmade(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Date: 2010-11-23 17:12:49
Message-ID: 11120.1290532369@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, that's why there's been no movement on this since 2004 :-(. The
>> amount of work needed for a better solution seems far out of proportion
>> to the benefits.

> We could extend the existing logic to handle multi-bytes characters
> though, couldn't we? It's not going to fix all the problems but at
> least it'll do something sane.

Not easily, cheaply, or portably. The closest you could get in that
line would be to use towlower(), which doesn't exist everywhere
(though I grant probably most platforms have it by now). The much much
bigger problem though is that we don't know what character representation
towlower() deals in. We recently kluged the regex code to assume that
the wchar_t representation for UTF8 locales is the standardized Unicode
code point. I haven't heard of that breaking, but 9.0 hasn't been out
that long. In other multibyte encodings we have no idea how to use that
function, short of invoking mbstowcs/wcstombs or local equivalent, which
is expensive and doesn't readily allow a short-circuit for ASCII.

And, after you've hacked your way through all that, you still end up
with case-folding behavior that depends on the prevailing locale.
Which is dangerous for the previously cited reasons, and arguably not
spec-compliant.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2010-11-23 17:13:12 Re: multibyte-character aware support for function "downcase_truncate_identifier()"
Previous Message Stefan Kaltenbrunner 2010-11-23 17:12:18 NLS builds on windows and lc_messages