Re: UTF-8 encoding problem w/ libpq

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF-8 encoding problem w/ libpq
Date: 2013-06-10 09:38:39
Message-ID: 51B59E9F.5020200@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.06.2013 09:39, Martin Schäfer wrote:
>> Can't really blame Windows on that. On Windows, we don't require that the
>> encoding and LC_CTYPE's charset match. The OP used UTF-8 encoding in the
>> server, but LC_CTYPE="English_United Kingdom.1252", ie. LC_CTYPE implies
>> WIN1252 encoding. We allow that and it generally works on Windows
>> because in varstr_cmp, we use MultiByteToWideChar() followed by
>> wcscoll_l(), which doesn't care about the charset implied by LC_CTYPE.
>> But for isupper(), it matters.
>
> Does this mean that the UTF-8 messing up would disappear if the database were using a different locale for LC_CTYPE? If so, which locale should I use?
> This would be useful for a temporary workaround.

Maybe, not sure. The logical thing to do would be to set LC_CTYPE to
"English_United Kingdom.65001", which tell Windows to expect UTF-8
charset. However, old discussions on this subject suggest that Windows
won't accept that:

http://www.postgresql.org/message-id/20071015090954.GD4653@svr2.hagander.net

It's still worth a try, I think. Things might've changed since then. If
that doesn't work, you could also try some other random codepages as a
workaround. If you're lucky, one of them might work better, even though
it would still be the wrong codepage for UTF-8.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2013-06-10 09:40:19 Re: Placing hints in line pointers
Previous Message Dimitri Fontaine 2013-06-10 09:31:33 Re: pg_dump with postgis extension dumps rules separately