Re: Locale + encoding combinations

From: Dave Page <dpage(at)postgresql(dot)org>
To: Trevor Talbot <quension(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Locale + encoding combinations
Date: 2007-10-12 14:26:00
Message-ID: 470F83F8.5020503@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Trevor Talbot wrote:
> The encoding output is the one you specified.

OK.

> Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there. The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them. As you've seen, it uses some sort of
> best-fit mapping I don't know the details of. (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)

Right, that makes sense. The codepages used by setlocale etc. are just
translation tables to/from the internal unicode representation.

> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby. I can try to do more digging if
> needed.

It does (sort of) work as a codepage, it just doesn't have the NLS file
to define how things like UPPER() and LOWER() should work.

Regards, Dave

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2007-10-12 14:28:26 Re: Locales and Encodings
Previous Message Tom Lane 2007-10-12 14:19:57 Re: First steps with 8.3 and autovacuum launcher