From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
Cc: | tgl(at)sss(dot)pgh(dot)pa(dot)us, laurenz(dot)albe(at)wien(dot)gv(dot)at, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: invalidly encoded strings |
Date: | 2007-09-10 16:09:06 |
Message-ID: | 46E56C22.6090101@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers pgsql-patches |
Tatsuo Ishii wrote:
>
> I don't understand whole discussion.
>
> Why do you think that employing the Unicode code point as the chr()
> argument could avoid endianness issues? Are you going to represent
> Unicode code point as UCS-4? Then you have to specify the endianness
> anyway. (see the UCS-4 standard for more details)
>
The code point is simply a number. The result of chr() will be a text
value one char (not one byte) wide, in the relevant database encoding.
U+nnnn maps to the same Unicode char and hence the same UTF8 encoding
pattern regardless of endianness. e.g. U+00a9 is the copyright symbol on
all machines. So to get this char in a UTF8 database you could call
"select chr(169)" and get back the byte pattern \xC2A9.
> Or are you going to represent Unicode point as a character string such
> as 'U+0259'? Then representing any encoding as a string could avoid
> endianness issues anyway, and I don't see Unicode code point is any
> better than others.
>
The argument will be a number, as now.
> Also I'd like to point out all encodings has its own code point
> systems as far as I know. For example, EUC-JP has its corresponding
> code point systems, ASCII, JIS X 0208 and JIS X 0212. So I don't see
> we can't use "code point" as chr()'s argument for othe encodings(of
> course we need optional parameter specifying which character set is
> supposed).
>
Where can I find the tables that map code points (as opposed to
encodings) to characters for these others?
cheers
andrew
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-09-10 16:09:54 | Re: invalidly encoded strings |
Previous Message | Martijn van Oosterhout | 2007-09-10 16:08:05 | Re: invalidly encoded strings |
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2007-09-10 16:09:54 | Re: invalidly encoded strings |
Previous Message | Martijn van Oosterhout | 2007-09-10 16:08:05 | Re: invalidly encoded strings |