UTF-8 encoding problem

Lists: pgsql-general
From: bhyuan <bhyuan(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: UTF-8 encoding problem
Date: 2007-08-16 06:40:28
Message-ID: 20070816151739.99DF.BHYUAN@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

hi

I use UTF-8 as server character encoding,
and use sjis as client character encoding.
For some reason, some none sjis encoding character was insert into the database.
WHEN I use
set client_encoding='SJIS
select * from xxx
I got such error message
Native Error: ERROR: character 0xc2a0 of encoding "UTF8" has no equivalent in "SJIS"

I just want to ignore the none-sjis encoding character and go on without any
errors.
I use postgresql8.1, it seems that the postgresql shoud report error at the case
-----------------------------
If the conversion of a particular character is not possible ? suppose you chose EUC_JP for the server and LATIN1 for the client, then some Japanese characters do not have a representation in LATIN1 ? then an error is reported.
-----------------------------

Can I ignore the error message by confiing the config file?

Thanks for any idea.

bhyuan


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: bhyuan <bhyuan(at)gmail(dot)com>
Subject: Re: UTF-8 encoding problem
Date: 2007-08-16 09:03:39
Message-ID: 200708161103.39420.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Am Donnerstag, 16. August 2007 08:40 schrieb bhyuan:
> Can I ignore the error message by confiing the config file?

No, there are not provisions for that. Some errors of this type used to be
ignored, but that led to SQL injection-like security issues, so you don't
want that.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: bhyuan <bhyuan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8 encoding problem
Date: 2007-08-16 13:21:57
Message-ID: 20070816214048.9A0B.BHYUAN@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for your replay.

Maybe SQL injection-like security issues will occour,
but I find that differend version of Postgresql get different result.

Such as the sql
set client_encoding='SJIS';
select '\xc3\xaa',* from xxx;

on V7.4 @RH3 got
\xc3\xaa

on V8(dot)1(dot)2(at)RH4 got
(blank)

on V8(dot)1(dot)4(at)FreeBSD6 got
ERROR: character 0xc3aa of encoding "UTF8" has no equivalent in "SJIS"

AND
Version 8.1
http://www.postgresql.org/docs/8.1/interactive/multibyte.html#AEN22591
------------------------------
If the conversion of a particular character is not possible -- suppose you chose EUC_JP for the server and LATIN1 for the client, then some Japanese characters do not have a representation in LATIN1 -- then an error is reported.
------------------------------

Version 7.4
http://www.postgresql.org/docs/7.4/interactive/multibyte.html#AEN18371
------------------------------
If the conversion of a particular character is not possible -- suppose you chose EUC_JP for the server and LATIN1 for the client, then some Japanese characters cannot be converted to LATIN1 -- it is transformed to its hexadecimal byte values in parentheses, e.g., (826C).

I got confused, I just want to get the right sql result enen some character was
not encoded corrctlly.
Just like V8(dot)1(dot)2(at)RH4 the not right character was ignored.
....

On Thu, 16 Aug 2007 11:03:39 +0200
Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> Am Donnerstag, 16. August 2007 08:40 schrieb bhyuan:
> > Can I ignore the error message by confiing the config file?
>
> No, there are not provisions for that. Some errors of this type used to be
> ignored, but that led to SQL injection-like security issues, so you don't
> want that.
>
> --
> Peter Eisentraut
> http://developer.postgresql.org/~petere/
--
bhyuan <bhyuan(at)gmail(dot)com>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: bhyuan <bhyuan(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: UTF-8 encoding problem
Date: 2007-08-16 13:54:56
Message-ID: 200708161554.56563.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Am Donnerstag, 16. August 2007 15:21 schrieb bhyuan:
> Maybe SQL injection-like security issues will occour,
> but I find that differend version of Postgresql get different result.

That just shows that some versions are more broken than others. But there was
a lot of thought put into the current behavior, so it won't be changed back
without sufficient cause.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/