Wrong charset mappings

From: Thomas O'Dowd <tom(at)nooper(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Wrong charset mappings
Date: 2003-02-07 04:05:36
Message-ID: 1044590736.14765.494.camel@beast.uwillsee.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

Hi all,

One Japanese character has been causing my head to swim lately. I've
finally tracked down the problem to both Java 1.3 and Postgresql.

The problem character is namely:
utf-16: 0x301C
utf-8: 0xE3809C
SJIS: 0x8160
EUC_JP: 0xA1C1
Otherwise known as the WAVE DASH character.

The confusion stems from a very similar character 0xFF5E (utf-16) or
0xEFBD9E (utf-8) the FULLWIDTH TILDE.

Java has just lately (1.4.1) finally fixed their mappings so that 0x301C
maps correctly to both the correct SJIS and EUC-JP character. Previously
(at least in 1.3.1) they mapped SJIS to 0xFF5E and EUC to 0x301C,
causing all sorts of trouble.

Postgresql at least picked one of the two characters namely 0xFF5E, so
conversions in and out of the database to/from sjis/euc seemed to be
working. Problem is when you try to view utf-8 from the database or if
you read the data into java (utf-16) and try converting to euc or sjis
from there.

Anyway, I think postgresql needs to be fixed for this character. In my
opinion what needs to be done is to change the mappings...

euc-jp -> utf-8 -> euc-jp
====== ======== ======
0xA1C1 -> 0xE3809C 0xA1C1

sjis -> utf-8 -> sjis
====== ======== ======
0x8160 -> 0xE3809C 0x8160

As to what to do with the current mapping of 0xEFBD9E (utf-8)? It
probably should be removed. Maybe you could keep the mapping back to the
sjis/euc characters to help backward compatibility though. I'm not sure
what is the correct approach there.

If anyone can tell me how to edit the mappings under:
src/backend/utils/mb/Unicode/

and rebuild postgres to use them, then I can test this out locally.

Looking forward to your replies.

Tom.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-02-07 04:35:48 Planning a change of representation in the planner
Previous Message Christopher Kings-Lynne 2003-02-07 02:28:39 Re: 7.2 result sets and plpgsql

Browse pgsql-jdbc by date

  From Date Subject
Next Message Achilleus Mantzios 2003-02-07 11:56:29 Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Previous Message Barry Lind 2003-02-07 02:18:04 Re: JDBC access with md5 password