Re: UTF-8 encoding problem w/ libpq

From: Martin Schäfer <Martin(dot)Schaefer(at)cadcorp(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF-8 encoding problem w/ libpq
Date: 2013-06-04 06:39:57
Message-ID: 11A8567A97B15648846060F5CD818EB8CAC2253F62@DEV001EX.Dev.cadcorp.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Can't really blame Windows on that. On Windows, we don't require that the
> encoding and LC_CTYPE's charset match. The OP used UTF-8 encoding in the
> server, but LC_CTYPE="English_United Kingdom.1252", ie. LC_CTYPE implies
> WIN1252 encoding. We allow that and it generally works on Windows
> because in varstr_cmp, we use MultiByteToWideChar() followed by
> wcscoll_l(), which doesn't care about the charset implied by LC_CTYPE.
> But for isupper(), it matters.

Does this mean that the UTF-8 messing up would disappear if the database were using a different locale for LC_CTYPE? If so, which locale should I use?
This would be useful for a temporary workaround.

> > We talked about this before and went off into the weeds about whether
> > it was sensible to try to use towlower() and whether that wouldn't
> > create undesirably platform-sensitive results. I wonder though if we
> > couldn't just fix this code to not do anything to high-bit-set bytes
> > in multibyte encodings.
>
> Yeah, we should do that. It makes no sense to call isupper or tolower on
> bytes belonging to multi-byte characters.

Actually, I would expect that 'create table HÄUSER (...)' would create a table named 'häuser', and not a table named 'hÄuser', so towlower seems the right choice IMHO.

Martin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-06-04 12:39:18 Re: local_preload_libraries logspam
Previous Message Ben Zeev, Lior 2013-06-04 05:57:46 Re: PostgreSQL Process memory architecture