Re: locales and encodings on Windows

From: Aleksander Kmetec <aleksander(dot)kmetec(at)intera(dot)si>
To: pgsql-hackers-win32(at)postgresql(dot)org
Subject: Re: locales and encodings on Windows
Date: 2004-11-11 06:37:03
Message-ID: 4193088F.2030306@intera.si
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers-win32

Come on, people. This was the second time I reported this bug and also
the second time nobody responded to my report. :-(

If it is indeed not possible to initdb with a utf8 (65001) locale, then
this will cause a flood of bug reports once a large number of people
start using PG on Windows. Can somebody try and confirm this problem?
Simply try running initdb with a --locale value of german_germany.65001,
spanish_spain.65001, french_france.65001 or any other locale you think
should be supported by your system. You will need to do this from the
command line, not from the installer. Does initdb accept this value or
does it replace it with your current system locale?

Unless somebody can come up with a solution, my suggestion for a
work-around would be to remove unsupported encodings from the installer
or at least warn users that their database will not be fully functional
if they happen to choose one of the unsupported encodings.

Any comments?

Last October there was a discussion on pgsql-hackers about writing
locale support for PG, so it wouldn't depend on the system for locale
functionality any more. Is anyone still working on that?

Regards,
Aleksander

Aleksander Kmetec wrote:
> I would like to bring to your attention a problem regarding locale
> support on Windows. The description below uses UNICODE/UTF8, but the
> issue isn't limited to just this encoding.
>
> Because Postgres relies on the operating system for some string related
> functions, the OS needs to support the same encoding as the one that is
> used as the database encoding. Unfortunately, Windows does not support
> some encodings that are available as server-side encodings for PG.
>
> Here is a short example in case the previous paragraph doesn't make much
> sense: with a UNICODE database (actually UTF8) you need to use a
> compatible locale when running initdb; in my case that's "sl_SI.utf8"
> (on Linux) or "Slovenian_Slovenia.65001" (on Windows).
>
> 65001 is Windows codepage number for utf8; except it's not a really a
> valid codepage. The document at
> http://www.sharmahd.com/tm/codepages.html states that: "65000 (UTF-7)
> and 65001 (UTF-8) are pseudo codepages. There are no corresponding NLS
> files. The code page IDs can only be used with WideCharToMultiByte( )
> and MultiByteToWideChar( ) API calls."
>
> This means that UPPER(), LOWER() and ORDER BY do not work correctly for
> unicode databases. Currently it's not even possible to run initdb with
> a locale which uses 65001 encoding. A small change to initdb enabled me
> to set LC_COLLATE to Slovenian_Slovenia.65001, but the sort order was
> still badly messed up, which makes sense considering the above quote.
>
> After some checking I came up with this list of encodings which are
> supported by PG, but not mentioned anywhere as supported by Windows:
> UTF8
> EUC_CN
> EUC_TW
> LATIN6 (ISO 8859-10/ECMA 144)
> LATIN7 (ISO 8859-13)
> LATIN8 (ISO 8859-14)
> LATIN10 (ISO 8859-16/ASRO SR 14111)
>
> Is there a solution for this, other than marking these encodings as not
> available on Windows?
>
> Regards,
> Aleksander
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

In response to

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Magnus Hagander 2004-11-11 08:42:41 Re: postgresql 8 beta 4 will not install
Previous Message Paul Kirschner 2004-11-11 01:20:35 Re: postgresql 8 beta 4 will not install