Re: ADMIN Migrate postgres databases from SQL_ASCII to UNICODE

Lists: pgsql-admin
From: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-17 23:34:14
Message-ID: 40F9B776.9050601@sistemat.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Tom Lane wrote:

>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>
>
>>A simple question, we need to migrate many (>20) postgres databases from
>>SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
>>
>>
>
>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>about an encoding). What is the data actually stored as?
>
>
>>With Dump/Restore , we get an error (Invalid Unicode) in any field that
>>has a 8 bits character coming from the SQL_ASCII , even setting the
>>client_encoding to WIN, ISO-8859-1, and others encodings.
>>
>>
>
>It might work to just UPDATE pg_database to set datencoding to the
>correct value reflecting what you have actually stored. You might then
>need to REINDEX any indexes on textual columns, but I don't think
>anything else would go wrong.
>
>If you have a mishmash of different encodings in a single database, then
>of course there is no simple solution; you are in for some pain while
>you try to fix the data.
>
>

Yes you are right , the original data come from a DB2 with CodePage
IBM-850 and was inserted without complains in a Postgres 7.3.6 with
SQL_ASCII.

Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx ,
isn't no one postgresql's encoding.
So when in change via pg_databases the encoding , 8 bits characters
become garbage.
More even if we accept this garbage chars and we set encoding to e.g.
ISO-8859-1 it's impossible go to a UNICODE because this garbage chars
are invalid in client's encoding , so they are reject (in translation
process as invalid unicode chars).

We are in a big problem, and the only way out I can imagine is fix the
data by hand :-! .

Dario,


From: Markus Bertheau <twanger(at)bluetwanger(dot)de>
To: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 10:30:52
Message-ID: 1090146652.2741.1.camel@teetnang
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

В Вск, 18.07.2004, в 01:34, Dario V. Fassi пишет:
> Tom Lane wrote:
> > "Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
> >
> > > A simple question, we need to migrate many (>20) postgres databases from
> > > SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
> > >
> > SQL_ASCII is not an encoding (it's more like the absence of knowledge
> > about an encoding). What is the data actually stored as?
> >
> > > With Dump/Restore , we get an error (Invalid Unicode) in any field that
> > > has a 8 bits character coming from the SQL_ASCII , even setting the
> > > client_encoding to WIN, ISO-8859-1, and others encodings.
> > >
> > It might work to just UPDATE pg_database to set datencoding to the
> > correct value reflecting what you have actually stored. You might then
> > need to REINDEX any indexes on textual columns, but I don't think
> > anything else would go wrong.
> >
> > If you have a mishmash of different encodings in a single database, then
> > of course there is no simple solution; you are in for some pain while
> > you try to fix the data.
> >
>
> Yes you are right , the original data come from a DB2 with CodePage
> IBM-850 and was inserted without complains in a Postgres 7.3.6 with
> SQL_ASCII.
>
> Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx ,
> isn't no one postgresql's encoding.
> So when in change via pg_databases the encoding , 8 bits characters
> become garbage.
> More even if we accept this garbage chars and we set encoding to e.g.
> ISO-8859-1 it's impossible go to a UNICODE because this garbage
> chars are invalid in client's encoding , so they are reject (in
> translation process as invalid unicode chars).

You can dump the data, convert it with for example iconv and then load
the data again.

--
Markus Bertheau <twanger(at)bluetwanger(dot)de>


From: "Patrice Trognon" <trognon(dot)patrice(at)wanadoo(dot)fr>
To: "Markus Bertheau" <twanger(at)bluetwanger(dot)de>
Cc: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>, pgsql-admin(at)postgresql(dot)org
Subject: Re: ADMIN Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 11:27:52
Message-ID: 20040718132752.6475772@devptro.(none)
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


Hi,

If you have blobs you must split your SQL_ASCII to convert it
with iconv.
I export my full database.
I create a new SQL_ASCII database with only tables that don't
contains blobs. (db_without_blobs)
I create a new SQL_ASCII database with only tables that contains
blobs. (db_with_blobs)
Export your db_without_blobs.
convert it with iconv
Create your UNICODE database.
import db_without_blobs.
import db_with_blobs.

It's done.

1) export your database.
/usr/local/postgres/bin/pg_dump -Fc -v -b -d -U YOUR_LOGIN YOUR_DATABASE > full_database.dump

2) create work database, without blobs.
/usr/local/postgres/bin/createdb db_without_blobs

3) import all datas
/usr/local/postgres/bin/pg_restore -d db_without_blobs -Fc -v -U YOUR_LOGIN full_database.dump

4) create work database, with blobs.
/usr/local/postgres/bin/createdb db_with_blobs

5) import all datas.
/usr/local/postgres/bin/pg_restore -d db_with_blobs -Fc -v -U YOUR_LOGIN full_database.dump

6) Connect to db_without_blobs and drop all tables with blobs.
6') Connect to db_with_blobs and drop all tables without blobs.

7) export datas in text mode :
/usr/local/postgres/bin/pg_dump -Fp -v -i -D -U YOUR_LOGIN db_without_blobs > db_without_blobs.dump

8) convert into UTF-8
iconv --from-code=ISO-8859-1 --to-code=UTF-8 -o db_without_blobs-utf.dump db_without_blobs.dump

9) Create unicode database.
/usr/local/postgres/bin/createdb  -E UNICODE db_unicode

10) importat UTF-8 datas
/usr/local/postgres/bin/psql db_unicode < db_without_blobs.dump

11) export datas with blobs in binary format.
12) import this dump info db_unicode .

Best Regards,

Patrice Trognon.

> В Вск, 18.07.2004, в 01:34, Dario V. Fassi пишет:
> > Tom Lane wrote:
> > > "Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
> > >
> > > > A simple question, we need to migrate many (>20) postgres databases from
> > > > SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
> > > >
> > > SQL_ASCII is not an encoding (it's more like the absence of knowledge
> > > about an encoding). What is the data actually stored as?
> > >
> > > > With Dump/Restore , we get an error (Invalid Unicode) in any field that
> > > > has a 8 bits character coming from the SQL_ASCII , even setting the
> > > > client_encoding to WIN, ISO-8859-1, and others encodings.
> > > >
> > > It might work to just UPDATE pg_database to set datencoding to the
> > > correct value reflecting what you have actually stored. You might then
> > > need to REINDEX any indexes on textual columns, but I don't think
> > > anything else would go wrong.
> > >
> > > If you have a mishmash of different encodings in a single database, then
> > > of course there is no simple solution; you are in for some pain while
> > > you try to fix the data.
> > >
> >
> > Yes you are right , the original data come from a DB2 with CodePage
> > IBM-850 and was inserted without complains in a Postgres 7.3.6 with
> > SQL_ASCII.
> >
> > Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx ,
> > isn't no one postgresql's encoding.
> > So when in change via pg_databases the encoding , 8 bits characters
> > become garbage.
> > More even if we accept this garbage chars and we set encoding to e.g.
> > ISO-8859-1 it's impossible go to a UNICODE because this garbage
> > chars are invalid in client's encoding , so they are reject (in
> > translation process as invalid unicode chars).
>
> You can dump the data, convert it with for example iconv and then load
> the data again.
>
> --
> Markus Bertheau <twanger(at)bluetwanger(dot)de>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>


From: "Dario V(dot) Fassi" <software(at)sistemat(dot)com(dot)ar>
To: Patrice Trognon <trognon(dot)patrice(at)wanadoo(dot)fr>
Cc: Markus Bertheau <twanger(at)bluetwanger(dot)de>, pgsql-admin(at)postgresql(dot)org
Subject: Re: ADMIN Migrate postgres databases from SQL_ASCII to UNICODE
Date: 2004-07-18 20:01:08
Message-ID: 40FAD704.8@sistemat.com.ar
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Thanks very much to all.
Dario.

Patrice Trognon wrote:

>Hi,
>
>If you have blobs you must split your SQL_ASCII to convert it
>with iconv.
>I export my full database.
>I create a new SQL_ASCII database with only tables that don't
>contains blobs. (db_without_blobs)
>I create a new SQL_ASCII database with only tables that contains
>blobs. (db_with_blobs)
>Export your db_without_blobs.
>convert it with iconv
>Create your UNICODE database.
>import db_without_blobs.
>import db_with_blobs.
>
>It's done.
>
>1) export your database.
>/usr/local/postgres/bin/pg_dump -Fc -v -b -d -U YOUR_LOGIN YOUR_DATABASE > full_database.dump
>
>2) create work database, without blobs.
>/usr/local/postgres/bin/createdb db_without_blobs
>
>3) import all datas
>/usr/local/postgres/bin/pg_restore -d db_without_blobs -Fc -v -U YOUR_LOGIN full_database.dump
>
>4) create work database, with blobs.
>/usr/local/postgres/bin/createdb db_with_blobs
>
>5) import all datas.
>/usr/local/postgres/bin/pg_restore -d db_with_blobs -Fc -v -U YOUR_LOGIN full_database.dump
>
>6) Connect to db_without_blobs and drop all tables with blobs.
>6') Connect to db_with_blobs and drop all tables without blobs.
>
>7) export datas in text mode :
>/usr/local/postgres/bin/pg_dump -Fp -v -i -D -U YOUR_LOGIN db_without_blobs > db_without_blobs.dump
>
>8) convert into UTF-8
>iconv --from-code=ISO-8859-1 --to-code=UTF-8 -o db_without_blobs-utf.dump db_without_blobs.dump
>
>9) Create unicode database.
>/usr/local/postgres/bin/createdb -E UNICODE db_unicode
>
>10) importat UTF-8 datas
>/usr/local/postgres/bin/psql db_unicode < db_without_blobs.dump
>
>11) export datas with blobs in binary format.
>12) import this dump info db_unicode .
>
>Best Regards,
>
>Patrice Trognon.
>
>
>
>
>
>
>
>
>>Ð’ Ð’Ñк, 18.07.2004, в 01:34, Dario V. Fassi пишет:
>>
>>
>>>Tom Lane wrote:
>>>
>>>
>>>>"Dario V. Fassi" <software(at)sistemat(dot)com(dot)ar> writes:
>>>>
>>>>
>>>>
>>>>>A simple question, we need to migrate many (>20) postgres databases from
>>>>>SQL_ASCII encoding to UNICODE encoding, over a 7.3.6 server.
>>>>>
>>>>>
>>>>>
>>>>SQL_ASCII is not an encoding (it's more like the absence of knowledge
>>>>about an encoding). What is the data actually stored as?
>>>>
>>>>
>>>>
>>>>>With Dump/Restore , we get an error (Invalid Unicode) in any field that
>>>>>has a 8 bits character coming from the SQL_ASCII , even setting the
>>>>>client_encoding to WIN, ISO-8859-1, and others encodings.
>>>>>
>>>>>
>>>>>
>>>>It might work to just UPDATE pg_database to set datencoding to the
>>>>correct value reflecting what you have actually stored. You might then
>>>>need to REINDEX any indexes on textual columns, but I don't think
>>>>anything else would go wrong.
>>>>
>>>>If you have a mishmash of different encodings in a single database, then
>>>>of course there is no simple solution; you are in for some pain while
>>>>you try to fix the data.
>>>>
>>>>
>>>>
>>>Yes you are right , the original data come from a DB2 with CodePage
>>>IBM-850 and was inserted without complains in a Postgres 7.3.6 with
>>>SQL_ASCII.
>>>
>>>Now we are in a Jail , because IBM-850 , isn't WIN, isn't ISO-xx ,
>>>isn't no one postgresql's encoding.
>>>So when in change via pg_databases the encoding , 8 bits characters
>>>become garbage.
>>>More even if we accept this garbage chars and we set encoding to e.g.
>>>ISO-8859-1 it's impossible go to a UNICODE because this garbage
>>>chars are invalid in client's encoding , so they are reject (in
>>>translation process as invalid unicode chars).
>>>
>>>
>>You can dump the data, convert it with for example iconv and then load
>>the data again.
>>
>>--
>>Markus Bertheau <twanger(at)bluetwanger(dot)de>
>>
>>