From: | Magnus Hagander <magnus(at)hagander(dot)net> |
---|---|
To: | Trevor Talbot <quension(at)gmail(dot)com> |
Cc: | Dave Page <dpage(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Locale + encoding combinations |
Date: | 2007-10-12 14:45:10 |
Message-ID: | 20071012144510.GH6334@svr2.hagander.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 12, 2007 at 06:03:52AM -0700, Trevor Talbot wrote:
> On 10/12/07, Dave Page <dpage(at)postgresql(dot)org> wrote:
> > Tom Lane wrote
> > > That still leaves us with the problem of how to tell whether a locale
> > > spec is bad on Windows. Judging by your example, Windows checks whether
> > > the code page is present but not whether it is sane for the base locale.
> > > What happens when there's a mismatch --- eg, what encoding do system
> > > messages come out in?
> >
> > I'm not sure how to test that specifically, but it seems that accented
> > characters simply fall back to their undecorated equivalents if the
> > encoding is not appropriate, eg:
> >
> > Dave(at)SNAKE:~$ ./setlc French_France.1252
> > Locale: French_France.1252
> > The date is: sam. 01 of août 2007
> > Dave(at)SNAKE:~$ ./setlc French_France.28597
> > Locale: French_France.28597
> > The date is: sam. 01 of aout 2007
> >
> > (the encodings used there are WIN1252 and ISO8859-7 (Greek)).
> >
> > I'm happy to test further is you can suggest how I can figure out the
> > encoding actually output.
>
> The encoding output is the one you specified. Keep in mind,
> underneath Windows is mostly working with Unicode, so all characters
> exist and the locale rules specify their behavior there. The encoding
> is just the byte stream it needs to force them all into after doing
> whatever it does to them. As you've seen, it uses some sort of
> best-fit mapping I don't know the details of. (It will drop accent
> marks and choose characters with similar shape where possible, by
> default.)
>
> I think it's a bit more complex for input/transform cases where you
> operate on the byte stream directly without intermediate conversion to
> Unicode, which is why UTF-8 doesn't work as a codepage, but again I
> don't have the details nearby. I can try to do more digging if
> needed.
Just so the non-windows-savvy people get it.. When Windows documentation or
users refer to Unicode, they mean UTF-16.
//Magnus
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-10-12 14:45:58 | Re: Including Snapshot Info with Indexes |
Previous Message | Magnus Hagander | 2007-10-12 14:40:40 | Re: pg_tablespace_size() |