Re: character conversion problem about UTF-8-->SHIFT_JIS_2004

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: bhyuan(at)gmail(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: character conversion problem about UTF-8-->SHIFT_JIS_2004
Date: 2008-02-13 14:48:05
Message-ID: 20080213.234805.84360360.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> hi
>
> I used Postgresql7.4.3 with php for more than 3years.
> Now I want to change my database to Postgresql8.3.
> But I occur such problem
> ----------------------------------------------------------
> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"
> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in
> "SHIFT_JIS_2004"
> ----------------------------------------------------------
> The database was encoded by UTF-8,
> to export data as .csv file,
> I use set client_encoding='SJIS' at client.
> When I use Postgresql7.4.3,no problem occur,
> but after I chaged to Postgresql8.3 ,the error was occured.
>
> Can I ignore the error message ?
> or any othe method to solve this problem.

First of all, you should aware that SHIFT_JIS_2004 is a comppletely
different beast from SJIS. If you want to continue to use SJIS data in
7.4, you must use SJIS, not SHIFT_JIS_2004 on 8.3. Or do you have any
particular reason to use SHIFT_JIS_2004?

BTW,

> ERROR: character 0xe9ab99 of encoding "UTF8" has no equivalent in "SJIS"

I don't see this error message with PostgreSQL 8.3.0 running on a
Linux box. I can store UTF-8 0xe9ab99 (== U+9AD9) and retrieve it from
the SJIS client side (0xe9ab99 corresponds to 0xfbfc). Actually we can
confirm this by looking at line 6914 in
src/backend/utils/mb/Unicode/utf8_to_sjis.map:

{0xe9ab99, 0xfbfc},

Note that the left is the value for UTF-8, and the right side the
value for SJIS. I recommend you to double check your PostgreSQL 8.3
installation.

For your convenience, I have attatched a dump containing a table
(called "t1") which has the UTF-8 character in question.

$ createdb -E UTF_8 test
$ gunzip -c /tmp/t1.dump.gz|psql test
$ psql -c "set client_encoding to SJIS;select * from t1" test
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Attachment Content-Type Size
t1.dump.gz application/octet-stream 377 bytes

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Andrew Sullivan 2008-02-13 15:12:29 Re: SELECT CAST(123 AS char) -> 1
Previous Message Tatsuo Ishii 2008-02-13 14:46:56 Re: character conversion problem about UTF-8-->SHIFT_JIS_2004