Strange encoding problems

From: Shilad Sen <shilad(at)sourcelight(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Strange encoding problems
Date: 2003-09-10 05:27:22
Message-ID: 20030910052721.GA5724@nokomis.shilad.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

Greetings,

I'm trying to track down an encoding problem I'm having.

I've created a database encoded in latin1 (createdb -E latin1). I've imported
some data. I believe this all worked, because when I use psql, the encoded
characters "look" correct.

When a query returns the funny character (in this case Latin1 233, an e with
an acute accent), it looks to me like it returns a utf-8 encoding of the
latin1 encodings of the character.

For example, under Postgresql 7.2 when I set the charSet property to latin1, I
get back the latin1 characters 195 and 169 where 233 should be. Note that
these two characters form the UTF-8 encoding for character 233.

Under Postgresql 7.3, I get back the UTF-8 characters 195 and 169. This
probably has to do with the fact that the server is multi-byte.

I'm using the 7.3.4 jdbc drivers.

I've poked through the source, and an even stranger thing is that under 7.2,
the wire-encoding for the character is 2 bytes long, while in 7.3 the wire
encoding is 4 bytes. Again, in both cases, the decoded value is two
characters that, when treated as bytes would form a utf-8 encoding for 233.

Wow - this is kind of confusing to explain.

It's possible that I totally missed the boat on some configuration step.
Anybody have any insights?

Shilad

Browse pgsql-jdbc by date

  From Date Subject
Next Message Vikram Yadav 2003-09-10 07:00:57 postgresql driver for JDBC !
Previous Message Barry Lind 2003-09-10 00:27:52 Re: contrib/ltree