Re: Invalid EUC_TW character sequence found

Lists: pgsql-bugs
From: Gene Leung <gene(at)regaltronic(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Invalid EUC_TW character sequence found
Date: 2002-06-25 09:40:42
Message-ID: 3D183A9A.5610F889@regaltronic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Recently, I have installed the version 7.2.1 to my Redhat 6.1 server
with the following configure:

./configure --prefix=/usr/local/pgsql --enable-multibyte=EUC_TW
--with-perl --with-python --with-tcl --enable-odbc

After the installation, I have tried to restore some of my old databases
from version 7.0.2 but in vain owing to invalid character found.

Then I have tried to input some chinese character (big 5) directly, It
gave me some errors as shown below from the pgAdmin II:

2002-06-25 17:20:13 - SQL (AccessControl): UPDATE "site" SET "cname" =
'­» ´ä ¦r' WHERE "siteid" = '001' AND "name" = 'this is HK' AND "cname"
= '­» ´ä'

2002-06-25 17:20:13 -
*******************************************************************
2002-06-25 17:20:13 - Error
2002-06-25 17:20:13 -
*******************************************************************
2002-06-25 17:20:13 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click:
-2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)

If there is any compatibility issue between the old version and the new
one.

Best Regards
Gene Leung


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: gene(at)regaltronic(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Invalid EUC_TW character sequence found
Date: 2002-06-25 13:52:50
Message-ID: 20020625.225250.85416619.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)

The error message says all. You had invalid data (maybe raw Big5
data?) in your database.

(1) If you are sure you have raw Big5 data in the old database,
convert them to EUC_TW then load them.

(2) If you have EUC_TW and Big5 mixed data, then you have a serious
problem. You probably have to fix the the dump data by hand.
--
Tatsuo Ishii


From: Gene Leung <gene(at)regaltronic(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, gordon(at)gforce(dot)ods(dot)org
Subject: Re: Invalid EUC_TW character sequence found
Date: 2002-06-26 02:06:38
Message-ID: 3D1921AE.E61FFB71@regaltronic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi Tatsuoi,

Thanks for your quick response. Actually I tried both way (1. dump and
restore, 2. create a new database in version 7.2.1) but in vain.

The first way is to dump a database from 7.0.2 database containing EUC_TW
data

List of databases
Database | Owner | Encoding
---------------+----------+----------
AccessControl | postgres | EUC_TW

The old database was created by the EUC_TW encoding. It works fine with the
chinese characters for version 7.0.2. However when I follow the
instruction to do the upgrade with restore to my redhat 6.1, it gives error
such as Invalid EUC_TW character sequence found.

Then I search for the news group, with "Invalid EUC_TW character sequence
found", a guy named Gordon Luk has the same problem as me. Actually he is
my friend, originally I thought it may be the problem of Redhat 7.3 with
postgresql pre-installed. So I decided to try with the tar file and did the
installation to Redhat 6.1.

The second way to confirm version 7.2.1 can not accept chinese input is to
create a new database with the following command:

CREATE DATABASE "test" WITH ENCODING = 'EUC_TW';

then create table site (name varchar(50)); and insert data directly with
pgAdmin II, it gives error as follows:

2002-06-26 09:22:28 - SQL (test): INSERT INTO "site" ("name") VALUES
('­»´ä¦r')

2002-06-26 09:22:28 -
*******************************************************************
2002-06-26 09:22:28 - Error
2002-06-26 09:22:28 -
*******************************************************************
2002-06-26 09:22:28 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click:
-2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)
2002-06-26 09:22:28 - Windows Version: Windows 2000 v5.0 build 2195 Service
Pack 2
2002-06-26 09:22:28 - pgSchema Version: 1.2.0
2002-06-26 09:22:28 - MDAC Version: 2.5
2002-06-26 09:22:28 - DBMS Version: 07.02.0001 PostgreSQL 7.2.1 on
i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
2002-06-26 09:22:28 - Connection String (Master Connection):
Provider=MSDASQL.1;Extended
Properties="DRIVER={PostgreSQL};DATABASE=template1;SERVER=sql;PORT=5432;UID=harry;PWD=********;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=65536;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=1;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0"

If the coming version can not support chinese, it may be a big problem for a
lot of people. As a database user myself, we do not have much knowledge
about those encoding stuff. And we have to rely on you guys. You guys have
already done a lot of good things to the open source. Just keep on
searching the best.

Thanks!

Best Regards
Gene Leung

Tatsuo Ishii wrote:

> > -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)
>
> The error message says all. You had invalid data (maybe raw Big5
> data?) in your database.
>
> (1) If you are sure you have raw Big5 data in the old database,
> convert them to EUC_TW then load them.
>
> (2) If you have EUC_TW and Big5 mixed data, then you have a serious
> problem. You probably have to fix the the dump data by hand.
> --
> Tatsuo Ishii


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: gene(at)regaltronic(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org, gordon(at)gforce(dot)ods(dot)org
Subject: Re: Invalid EUC_TW character sequence found
Date: 2002-06-26 02:13:41
Message-ID: 20020626.111341.48530869.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> The second way to confirm version 7.2.1 can not accept chinese input is to
> create a new database with the following command:
>
> CREATE DATABASE "test" WITH ENCODING = 'EUC_TW';
>
> then create table site (name varchar(50)); and insert data directly with
> pgAdmin II, it gives error as follows:
>
> -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)

0xa672 cannot be a correct EUC_TW character. Check your application.
--
Tatsuo Ishii


From: Gene Leung <gene(at)regaltronic(dot)com>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: pgsql-bugs(at)postgresql(dot)org, gordon(at)gforce(dot)ods(dot)org
Subject: Re: Invalid EUC_TW character sequence found
Date: 2002-06-26 03:30:19
Message-ID: 3D19354B.E13E40AE@regaltronic.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Not all chinese characters can not be input to the application, only some of
them,

2002-06-26 11:12:32 - SQL (test): INSERT INTO "site" ("name") VALUES ('¬ü¥úµó')
2002-06-26 11:12:47 - SQL (test): INSERT INTO "site" ("name") VALUES ('¥«µó¥«')
2002-06-26 11:14:42 - SQL (test): INSERT INTO "site" ("name") VALUES ('¦r')

2002-06-26 11:14:42 -
*******************************************************************
2002-06-26 11:14:42 - Error
2002-06-26 11:14:42 -
*******************************************************************
2002-06-26 11:14:42 - Error in pgAdmin II:frmSQLOutput.cmdSave_Click:
-2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)
2002-06-26 11:14:42 - Windows Version: Windows 2000 v5.0 build 2195 Service Pack
2
2002-06-26 11:14:42 - pgSchema Version: 1.2.0
2002-06-26 11:14:42 - MDAC Version: 2.5
2002-06-26 11:14:42 - DBMS Version: 07.02.0001 PostgreSQL 7.2.1 on
i686-pc-linux-gnu, compiled by GCC egcs-2.91.66
2002-06-26 11:14:42 - Connection String (Master Connection):
Provider=MSDASQL.1;Extended
Properties="DRIVER={PostgreSQL};DATABASE=template1;SERVER=sql;PORT=5432;UID=harry;PWD=********;ReadOnly=0;Protocol=6.4;FakeOidIndex=0;ShowOidColumn=0;RowVersioning=0;ShowSystemTables=0;ConnSettings=;Fetch=100;Socket=4096;UnknownSizes=0;MaxVarcharSize=254;MaxLongVarcharSize=65536;Debug=0;CommLog=0;Optimizer=1;Ksqo=1;UseDeclareFetch=0;TextAsLongVarchar=1;UnknownsAsLongVarchar=1;BoolsAsChar=1;Parse=0;CancelAsFreeStmt=0;ExtraSysTablePrefixes=dd_;LFConversion=1;UpdatableCursors=1;DisallowPremature=0;TrueIsMinus1=0"

From the above, I inserted three rows to the table using the pgAdmin II , the
first two without any problem except the last one. I even confirm these input
with the psql on the server side. Same result applies the above problem.

To me, the third insert is a character that display correctly in my application,
I do not see any problem. And I do not know and can not tell how to check that
'¦r' is not a correct ECU_TW character. Please give me some hint for checking,
thanks!!

Best Regards
Gene Leung

Tatsuo Ishii wrote:

> > The second way to confirm version 7.2.1 can not accept chinese input is to
> > create a new database with the following command:
> >
> > CREATE DATABASE "test" WITH ENCODING = 'EUC_TW';
> >
> > then create table site (name varchar(50)); and insert data directly with
> > pgAdmin II, it gives error as follows:
> >
> > -2147467259 - ERROR: Invalid EUC_TW character sequence found (0xa672)
>
> 0xa672 cannot be a correct EUC_TW character. Check your application.
> --
> Tatsuo Ishii


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: gene(at)regaltronic(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org, gordon(at)gforce(dot)ods(dot)org
Subject: Re: Invalid EUC_TW character sequence found
Date: 2002-06-26 03:42:06
Message-ID: 20020626.124206.102120976.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

> To me, the third insert is a character that display correctly in my application,
> I do not see any problem. And I do not know and can not tell how to check that
> 'xx' is not a correct ECU_TW character. Please give me some hint for checking,
> thanks!!

Ok, here are some rules to verify EUC_TW characters:

(1) if the first byte is 0x8e, then the 8th bit of following three
bytes must be set

(2) else if the first byte is 0x8f, then the 8th bit of following two
bytes must be set

(3) else if the 8th bit of the first byte is set, then the 8th bit of
following one bytes must be set

(4) else (that means the 8th bit of the first byte is not set) then
that must be an ASCII character.

Apparently 0xa672 does not satisfy all of above.
--
Tatsuo Ishii