Re: JDBC Latin1 problem

Lists: pgsql-jdbc
From: "J(dot) Michael Crawford" <jmichael(at)gwi(dot)net>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: JDBC Latin1 problem
Date: 2004-08-04 18:34:39
Message-ID: 6.1.1.1.2.20040804140140.0337bf10@mail.nuomo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

We have a PostgreSQL 7.3.4 database running in redhat, defined as Latin1
when created, with the client connection sent to Latin1. The data looks
fine when queried. When using our server application running in windows
(jvm build 1.4.2_04-b05), with the pg73jdbc3.jar jdbc driver, we get the
right information from the rs.getString() method, and it looks fine in web
pages encoded with Latin1.

However, when using the exact same classes and driver on the redhat
Linux box (jvm build 1.4.2_01-b06), we get garbage wherever there's a latin
character. Same database, same classes, same driver.

The Linux jvm has a default character encoding of UTF8, while the
windows jvm has a default character encoding of Cp1252.

Here's what we've tried:

?charSet=

We've tried every combination of adding ?charSet= to the url, such as
?charSet=LATIN1, ?charSet=ISO_8859_1, ?charSet=UTF_8, ?charSet=UNICODE, and
even appending the following to the url: "?charset=" +
sun.io.ByteToCharConverter.getDefault().getCharacterEncoding(); We've
tried both dashes and underscores. We've also tried ?encoding= as well.

data = resultSet.getString("page_title_es");
byte[] text=data.getBytes("utf-8");
String data1=new String(text,"LATIN1");

We've tried the above going from utf-8 to Latin1, utf-8 to utf-8, utf-8
to iso-8859-1, iso-8859-1 to utf-8, iso-8859-1 to Latin1, iso-8859-1 to
iso-8859-1, Latin1 to Latin1, Latin1 to utf-8, Latin1 to iso-8859-1. We
may have tried others. Nothing has helped. We'll try nine at a time and
all will come up with some variation of "Introducción" instead of
"Introducción" (hopefully this message shows the second introduccion with
an accent over the o, and not gibberish).

Any ideas? There's probably something very simple that will look
obvious in hindsight -- I'm sure that others have retrieved Latin1
characters from a Latin1 database via jdbc on a Linux jvm with a default
character set of utf-8. I'm just not sure we'll be able to figure this one
out on our own.

- Mike


From: smota <samuelmota(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC Latin1 problem
Date: 2004-08-04 19:01:10
Message-ID: a8bb739d040804120175de6c27@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

How are you outputing this data?

Velocity, JSP and others "template processors" allow you to specify
the character set, make sure it's set for a compatible one (usually
ISO-8859-1).

You can set the LC_ALL environment variable to pt_BR.ISO8859-1 (of
course you can switch pt_BR for your locale code) and then start your
AS.
(for me this was the solution, setting all character set processing
options on Webwork and Velocity was not enough).

Regards,

Samuel


From: "J(dot) Michael Crawford" <jmichael(at)gwi(dot)net>
To: smota <samuelmota(at)gmail(dot)com>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: JDBC Latin1 problem
Date: 2004-08-04 20:18:36
Message-ID: 6.1.1.1.2.20040804161557.0327cec0@pop.suscom-maine.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

<<How are you outputing this data? Velocity, JSP and others "template
processors" allow you to specify the character set>>

We're currently outputting the data from our own java application, which
will be turned into a servlet in a few weeks. We skipped JSP and Velocity
because we had a very unique data schema with very unique needs (it would
have taken more time to do it the "easy" way in this case). Thus, I'm not
sure I can find an environment variable.

Thanks for the tip, though.

BTW, all jibberish comes directly from the database, and displays as
such using System.out.println().

- Mike

At 03:01 PM 8/4/2004, smota wrote:
>How are you outputing this data?
>
>Velocity, JSP and others "template processors" allow you to specify
>the character set, make sure it's set for a compatible one (usually
>ISO-8859-1).
>
>You can set the LC_ALL environment variable to pt_BR.ISO8859-1 (of
>course you can switch pt_BR for your locale code) and then start your
>AS.
>(for me this was the solution, setting all character set processing
>options on Webwork and Velocity was not enough).
>
>Regards,
>
>Samuel
>
>---------------------------(end of broadcast)---------------------------
>TIP 7: don't forget to increase your free space map settings