Migrate postgres databases from SQL_ASCII to UNICODE encoding

Lists: pgsql-adminpgsql-jdbcpgsql-sql
From: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
To: pgsql-jdbc(at)postgresql(dot)org, pgsql-admin(at)postgresql(dot)org
Subject: Migrate postgres databases from SQL_ASCII to UNICODE encoding
Date: 2004-07-17 20:03:31
Message-ID: 40F98613.1060405@sistemat.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql


A simple question, we need to migrate many (>20) postgres databases from
SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.

With Dump/Restore , we get an error (Invalid Unicode) in any field that
has a 8 bits character coming from the SQL_ASCII , even setting the
client_encoding to WIN, ISO-8859-1, and others encodings.

I'm really worried about this problem since , we have many DBs to
migrate and even with a ODBC program we can't port tables from
Postgresql/SQL_ASCII to Postgresql/UNICODE.

Any suggestion will be *very* appreciated.

Regards,
Dario.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
Cc: pgsql-jdbc(at)postgresql(dot)org, pgsql-admin(at)postgresql(dot)org
Subject: Re: Migrate postgres databases from SQL_ASCII to UNICODE encoding
Date: 2004-07-17 21:18:16
Message-ID: 16994.1090099096@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
> A simple question, we need to migrate many (>20) postgres databases from
> SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.

SQL_ASCII is not an encoding (it's more like the absence of knowledge
about an encoding). What is the data actually stored as?

> With Dump/Restore , we get an error (Invalid Unicode) in any field that
> has a 8 bits character coming from the SQL_ASCII , even setting the
> client_encoding to WIN, ISO-8859-1, and others encodings.

It might work to just UPDATE pg_database to set datencoding to the
correct value reflecting what you have actually stored. You might then
need to REINDEX any indexes on textual columns, but I don't think
anything else would go wrong.

If you have a mishmash of different encodings in a single database, then
of course there is no simple solution; you are in for some pain while
you try to fix the data.

regards, tom lane


From: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Oliver Jowett <oliver(at)opencloud(dot)com>, pgsql-sql(at)postgresql(dot)org, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-17 23:26:17
Message-ID: 40F9B599.6080205@sistemat.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

Tom Lane wrote:

>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>
>
>>A simple question, we need to migrate many (>20) postgres databases from
>>SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
>>
>>
>
>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>about an encoding). What is the data actually stored as?
>
>
>>With Dump/Restore , we get an error (Invalid Unicode) in any field that
>>has a 8 bits character coming from the SQL_ASCII , even setting the
>>client_encoding to WIN, ISO-8859-1, and others encodings.
>>
>>
>
>It might work to just UPDATE pg_database to set datencoding to the
>correct value reflecting what you have actually stored. You might then
>need to REINDEX any indexes on textual columns, but I don't think
>anything else would go wrong.
>
>If you have a mishmash of different encodings in a single database, then
>of course there is no simple solution; you are in for some pain while
>you try to fix the data.
>
>

Yes you are right , the original data come from a DB2 with CodePage
IBM-850 and was inserted without complains in a Postgres 7.3.6 with
SQL_ASCII.

Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx ,
isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters
become garbage.
More even if we accept this garbage chars and we set encoding to e.g.
ISO-8859-1 it's impossible go to a UNICODE because this garbage chars
are invalid in client's encoding , so they are reject (in translation
process as invalid unicode chars).

We are in a big problem, and the only way out I can imagine is fix the
data by hand :-! .

Dario,


From: "Andrew Lazarus" <drlaz(at)attglobal(dot)net>
To: <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 02:00:38
Message-ID: 075c01c46c6b$059ef320$847ba8c0@ANDYXP
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

If you pg_dump in ASCII format, could you figure out what has happened to
the 8-bit characters, if they are recognizable garbage (not truncated to
some genuine character) you then write a sed script. It by hand, but it
doesn't sound terrible.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
Cc: Oliver Jowett <oliver(at)opencloud(dot)com>, pgsql-sql(at)postgresql(dot)org, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 06:29:04
Message-ID: 28179.1090132144@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>> SQL_ASCII is not an encoding (it's more like the absence of knowledge
>> about an encoding). What is the data actually stored as?

> Yes you are right , the original data come from a DB2 with CodePage
> IBM-850 and was inserted without complains in a Postgres 7.3.6 with
> SQL_ASCII.

Ugh. You'll have to work out how to convert that codepage to one of the
encodings that PG supports. Or else add it as a supported encoding
(I'm not sure how hard that is, but it's not out of the question).

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-sql(at)postgresql(dot)org
Subject: Re: [JDBC] [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 08:28:06
Message-ID: 200407181028.06765.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

Dario V. Fassi wrote:
> Yes you are right , the original data come from a DB2 with CodePage
> IBM-850 and was inserted without complains in a Postgres 7.3.6 with
> SQL_ASCII.

If you have a program named "recode" installed you could try using that
to recode the dump file to a supported encoding.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/


From: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
To: pgsql-sql(at)postgresql(dot)org
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 19:47:20
Message-ID: 40FAD3C8.3080503@sistemat.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql


Tom Lane wrote:

>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>
>
>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>about an encoding). What is the data actually stored as?
>>>
>>>
>>Yes you are right , the original data come from a DB2 with CodePage
>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with
>>SQL_ASCII.
>>
>>
>
>Ugh. You'll have to work out how to convert that codepage to one of the
>encodings that PG supports. Or else add it as a supported encoding
>(I'm not sure how hard that is, but it's not out of the question).
>
> regards, tom lane
>
>
Ok, I'm interested is do that , but someone is the development group
would give me some assistence to get a start point to work with.
I program in Java, C, and less frequently in C++.

Dario.


From: Michael Stephenson <mstephenson(at)openworld(dot)org>
To: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
Cc: pgsql-sql(at)postgresql(dot)org, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-19 10:25:49
Message-ID: 40FBA1AD.2050904@openworld.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

What we've done in the past is just run the dump file through iconv - this has
support for IBM-850.

http://www.gnu.org/software/libiconv/documentation/libiconv/iconv.1.html

Hope this helps,

Michael

Dario V. Fassi wrote:
>
> Tom Lane wrote:
>
>>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>>
>>
>>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>>about an encoding). What is the data actually stored as?
>>>>
>>>>
>>>Yes you are right , the original data come from a DB2 with CodePage
>>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with
>>>SQL_ASCII.
>>>
>>>
>>
>>Ugh. You'll have to work out how to convert that codepage to one of the
>>encodings that PG supports. Or else add it as a supported encoding
>>(I'm not sure how hard that is, but it's not out of the question).
>>
>> regards, tom lane
>>
>>
> Ok, I'm interested is do that , but someone is the development group
> would give me some assistence to get a start point to work with.
> I program in Java, C, and less frequently in C++.
>
> Dario.
>
>

--
Web Applications Developer
Open World Ltd, 11 Riverside Court, Riverside Road, Bath, BA2 3DZ.
Tel: +44 1225 444950 Fax: +44 1225 336738 http://www.openworld.co.uk/

CONFIDENTIALITY NOTICE
The information contained in this message is confidential, intended only for
the use of the individual or the entity named as recipient. If the reader of
this message is not that recipient, you are notified that any dissemination,
distribution or copy of this message is strictly prohibited. If you have
received this message in error, please immediately notify us by telephone on
the number above. Your co-operation is appreciated.


From: Xavier Poinsard <xpoinsard(at)openpricer(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: [ADMIN] Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-19 12:46:18
Message-ID: cdgfqq$vmn$1@sea.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc pgsql-sql

Dario V. Fassi a écrit :
>
> Tom Lane wrote:
>
>>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>>
>>
>>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>>about an encoding). What is the data actually stored as?
>>>>
>>>>
>>>Yes you are right , the original data come from a DB2 with CodePage
>>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with
>>>SQL_ASCII.
>>>
>>>
>>
>>Ugh. You'll have to work out how to convert that codepage to one of the
>>encodings that PG supports. Or else add it as a supported encoding
>>(I'm not sure how hard that is, but it's not out of the question).
>>
did you try the recode utility ?