Can I change the character encoding for a DB?

Lists: pgsql-admin
From: "Iain" <iain(at)mst(dot)co(dot)jp>
To: <pgsql-admin(at)postgresql(dot)org>
Subject: Can I change the character encoding for a DB?
Date: 2004-01-26 08:28:49
Message-ID: 000901c3e3e6$6cd00060$7201a8c0@mst1x5r347kymb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi All,

I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP.
Can I do this without creating a new database? I didn't find a way in the
docs, sorry.

set server_encoding = EUC_JP;

didn't work.

My understanding is that the encoding is purely information and doesn't
actually affect the data stored in the DB. Is that right?

The reason I want to do this is that my java app is corrupting Japanese
Kanji. I suspect that the JDBC driver uses this information. VB apps using
ODBS seem to be fine.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Iain" <iain(at)mst(dot)co(dot)jp>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Can I change the character encoding for a DB?
Date: 2004-01-26 17:05:55
Message-ID: 8421.1075136755@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

"Iain" <iain(at)mst(dot)co(dot)jp> writes:
> I created a DB with SQL_ASCII encoding but I want to change it to EUC_JP.
> Can I do this without creating a new database?

If you're certain that the data already in the database is in EUC_JP
encoding, it would probably work to do a quick UPDATE on the
pg_database.encoding entry for the database (and then start fresh
backend sessions, since they won't notice otherwise). If there's data in
there that is not correct per EUC_JP then this will create big problems,
so don't say I didn't warn you ...

regards, tom lane


From: "Iain" <iain(at)mst(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Can I change the character encoding for a DB?
Date: 2004-01-27 02:37:41
Message-ID: 000c01c3e47e$8928f1c0$7201a8c0@mst1x5r347kymb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi Tom,

Thanks for the info and warning. Fortunately this is just a development
database so any problems won't be too big a deal.

Does dumping and re-creating the DB avoid such a potential problem?

I read the info written by Tasuo, but I have to admit that the whole
encoding thing is still full of mystery for me. For example, if the database
has a SQL_ASCII encoding and you enter japanese text, what encoding is used?
Does it depend on the client used to enter it? I wonder if it is possible
that there is some SJIS data in my DB... My understanding is that the DB
doesn't ensure that data makes sense in the set encoding, it just allows the
db to make conversions when dealing with clients such as the ODBC and JDBC
drivers.

Anyway, in this case the base data came from a EUC_JP encoded DB, but a VB
app (and some psql scripts) has been used to enter and modify data. It uses
the ODBC driver provided for postgres, everything had been working smoothly
and I only noticed a problem when I tried getting data using java and the
jdbc driver. I created a new mini test DB with the EUC_JP encoding and the
data was returned correctly.

regards,
Iain

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Iain" <iain(at)mst(dot)co(dot)jp>
Cc: <pgsql-admin(at)postgresql(dot)org>
Sent: Tuesday, January 27, 2004 2:05 AM
Subject: Re: [ADMIN] Can I change the character encoding for a DB?

> "Iain" <iain(at)mst(dot)co(dot)jp> writes:
> > I created a DB with SQL_ASCII encoding but I want to change it to
EUC_JP.
> > Can I do this without creating a new database?
>
> If you're certain that the data already in the database is in EUC_JP
> encoding, it would probably work to do a quick UPDATE on the
> pg_database.encoding entry for the database (and then start fresh
> backend sessions, since they won't notice otherwise). If there's data in
> there that is not correct per EUC_JP then this will create big problems,
> so don't say I didn't warn you ...
>
> regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Iain" <iain(at)mst(dot)co(dot)jp>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Can I change the character encoding for a DB?
Date: 2004-01-27 03:45:09
Message-ID: 9716.1075175109@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

"Iain" <iain(at)mst(dot)co(dot)jp> writes:
> I read the info written by Tasuo, but I have to admit that the whole
> encoding thing is still full of mystery for me. For example, if the database
> has a SQL_ASCII encoding and you enter japanese text, what encoding is used?
> Does it depend on the client used to enter it?

I'm not an expert on this stuff, but my understanding is that setting
the encoding to SQL_ASCII disables all encoding-conversion logic ---
whatever bytes the client sends are what get stored. So if you want
to change the setting from SQL_ASCII to something else, you'd better be
sure that everything every client has stored is compatible with the new
encoding specification.

> My understanding is that the DB
> doesn't ensure that data makes sense in the set encoding,

When you select a real encoding spec (not SQL_ASCII) then the DB does
check that entered strings are valid in that encoding. Also it will
attempt to do encoding conversion, if clients specify they are using
a client_encoding that is different from the database encoding (and
is not SQL_ASCII).

Hope that helps --- if not, try Tatsuo ...

regards, tom lane


From: "Iain" <iain(at)mst(dot)co(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Can I change the character encoding for a DB?
Date: 2004-01-27 04:32:51
Message-ID: 001301c3e48e$a00bc240$7201a8c0@mst1x5r347kymb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi Tom,

Thanks for that. It's somewhat clearer now.

I went ahead and rebuilt the DB with no problems so I guess that whatever
data was in the DB converted OK.

Regards
Iain

> "Iain" <iain(at)mst(dot)co(dot)jp> writes:
> > I read the info written by Tasuo, but I have to admit that the whole
> > encoding thing is still full of mystery for me. For example, if the
database
> > has a SQL_ASCII encoding and you enter japanese text, what encoding is
used?
> > Does it depend on the client used to enter it?
>
> I'm not an expert on this stuff, but my understanding is that setting
> the encoding to SQL_ASCII disables all encoding-conversion logic ---
> whatever bytes the client sends are what get stored. So if you want
> to change the setting from SQL_ASCII to something else, you'd better be
> sure that everything every client has stored is compatible with the new
> encoding specification.
>
> > My understanding is that the DB
> > doesn't ensure that data makes sense in the set encoding,
>
> When you select a real encoding spec (not SQL_ASCII) then the DB does
> check that entered strings are valid in that encoding. Also it will
> attempt to do encoding conversion, if clients specify they are using
> a client_encoding that is different from the database encoding (and
> is not SQL_ASCII).
>
> Hope that helps --- if not, try Tatsuo ...
>
> regards, tom lane