Re: Database object names and libpq in UTF-8 locale on Windows

From: Sebastien FLAESCH <sf(at)4js(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Database object names and libpq in UTF-8 locale on Windows
Date: 2012-11-22 10:26:51
Message-ID: 50ADFDEB.4050103@4js.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, Andrew,

We have the same issue in our product: Support UTF-8 on Windows.

You know certainly that UTF-8 code page (65001) is no supported by MS Windows
when you set the locale with setlocale(). You cannot rely on standard libc
functions such as isalpha(), mbtowc(), mbstowc(), wctomb(), wcstombs(),
strcoll(), which depend on the current locale.

You should start to centralize all basic character-set related functions
(upper/lower, comparison, etc) in a library, to ease the port on Windows.

Then convert UTF-8 data to wide char and call wide char functions.

For example, to implement an uppercase() function:

1) Convert UTF-8 to Wide Char (algorithm can be easily found)
2) Use towupper()
3) Convert Wide Char result to UTF-8 (algorithm can be easily found)

To compare characters:

1) Convert s1 in UTF-8 to Wide Char => wcs1
2) Convert s2 in UTF-8 to Wide Char => wcs2
3) Use wcscoll(wcs1, wcs2)

Regards,
Seb

On 11/21/2012 06:07 PM, Tom Lane wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>> On 11/21/2012 11:11 AM, Tom Lane wrote:
>>> I'm not sure that's the only place we're doing this ...
>
>> Oh, Hmm, darn. Where else do you think we might?
>
> Dunno, but grepping for isupper and/or tolower should find any such
> places.
>
> regards, tom lane
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2012-11-22 10:46:15 review: Deparsing DDL command strings
Previous Message Chen Huajun 2012-11-22 10:09:20 fix ecpg core dump when there's a very long struct variable name in .pgc file