setlocale() on Windows is broken

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Subject: setlocale() on Windows is broken
Date: 2011-08-31 13:05:31
Message-ID: 4E5E319B.9090505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While looking through old emails, I bumped into this:

http://archives.postgresql.org/message-id/25219.1303306707@sss.pgh.pa.us

To recap, setlocale() on Windows is broken for locale names that contain
dots or apostrophes in the country name. That includes "Hong Kong
S.A.R.", "Macau S.A.R.", and "U.A.E." and "People's Republic of China".

In April, I put in a hack to initdb to map those problematic names to
aliases that don't contain dots:

People's Republic of China -> China
Hong Kong S.A.R. -> HKG
U.A.E. -> ARE
Macau S.A.R. -> ZHM

However, Hiroshi pointed out in the thread linked above that that
doesn't completely solve the problem. If you set locale to "HKG", for
example, setlocale(LC_ALL, NULL) still returns the full name, "Hong Kong
S.A.R.", and if you feed that back to setlocale() it fails. In
particular, check_locale() uses "saved = setlocale(LC_XXX, NULL)" to get
the current value, and tries to restore it later with "setlocale(LC_XXX,
saved)".

At first, I thought I should revert my hack in initdb, since it's not
fully solving the problem anyway. But it doesn't really help - you run
into the same issue if you set locale to one of those aliases manually.
And that's exactly what users will have to do if we don't map those
locales automatically.

Microsoft should fix their bug. I don't have much faith in that
happening, however. So, I think we should move the mapping from initdb
to somewhere in src/port, so that the mapping is done every time
setlocale() is called. That would fix the problem with check_locale():
even though "setlocale(LC_XXX, NULL)" returns a value that won't work,
the setlocale() call to restore it would map it to an alias that does
work again.

In addition to that, I think we should check the return value of
setlocale() in check_locale(), and throw a warning if restoring the old
locale fails. The session's locale will still be screwed, but at least
you'll know if it happens.

I'll go write a patch for that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Subject: Re: setlocale() on Windows is broken
Date: 2011-09-01 08:36:49
Message-ID: 4E5F4421.5030704@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.08.2011 16:05, Heikki Linnakangas wrote:
> While looking through old emails, I bumped into this:
>
> http://archives.postgresql.org/message-id/25219.1303306707@sss.pgh.pa.us
>
> To recap, setlocale() on Windows is broken for locale names that contain
> dots or apostrophes in the country name. That includes "Hong Kong
> S.A.R.", "Macau S.A.R.", and "U.A.E." and "People's Republic of China".
>
> In April, I put in a hack to initdb to map those problematic names to
> aliases that don't contain dots:
>
> People's Republic of China -> China
> Hong Kong S.A.R. -> HKG
> U.A.E. -> ARE
> Macau S.A.R. -> ZHM
>
> However, Hiroshi pointed out in the thread linked above that that
> doesn't completely solve the problem. If you set locale to "HKG", for
> example, setlocale(LC_ALL, NULL) still returns the full name, "Hong Kong
> S.A.R.", and if you feed that back to setlocale() it fails. In
> particular, check_locale() uses "saved = setlocale(LC_XXX, NULL)" to get
> the current value, and tries to restore it later with "setlocale(LC_XXX,
> saved)".
>
>
> At first, I thought I should revert my hack in initdb, since it's not
> fully solving the problem anyway. But it doesn't really help - you run
> into the same issue if you set locale to one of those aliases manually.
> And that's exactly what users will have to do if we don't map those
> locales automatically.
>
> Microsoft should fix their bug. I don't have much faith in that
> happening, however. So, I think we should move the mapping from initdb
> to somewhere in src/port, so that the mapping is done every time
> setlocale() is called. That would fix the problem with check_locale():
> even though "setlocale(LC_XXX, NULL)" returns a value that won't work,
> the setlocale() call to restore it would map it to an alias that does
> work again.
>
> In addition to that, I think we should check the return value of
> setlocale() in check_locale(), and throw a warning if restoring the old
> locale fails. The session's locale will still be screwed, but at least
> you'll know if it happens.

I've committed a patch along those lines.

It turned out to be pretty difficult to reproduce user-visible buggy
behavior caused by this bug, so for the sake of the archives, here's a
recipe on that:

1. Set system locale to "Chinese_Hong Kong S.A.R..950"

2. initdb -D data --locale="Arabic_ARE"

3. Launch psql.

CREATE TABLE foo (a text);
INSERT INTO foo VALUES ('a'), ('A');

-- Verify that the order is 'a', 'A'
SELECT * FROM foo ORDER BY a;

-- This fails, as it should
CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

-- This also fails, as it should
CREATE DATABASE postgres WITH LC_COLLATE='C' TEMPLATE=template0;

-- The order returned by this is now wrong: 'A', 'a'
SELECT * FROM foo ORDER BY a;

It's a bizarre looking sequence, but that does it.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com