Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Barry Lind <blind(at)xythos(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-05 13:20:51
Message-ID: Pine.LNX.4.44.0302051116200.6193-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-jdbc

On Wed, 5 Feb 2003, Achilleus Mantzios wrote:

> On Tue, 4 Feb 2003, Barry Lind wrote:
>
> > Achilleus,
> >
> > What is the character set of your database? My guess is that it is
> > SQLASCII which is a 7bit character set. If you are storing ISO-8859-7
> > data you should have that as your database character set. All reports
>
> Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
> If you read the code, you will see that the driver for all 7.3 versions
> forces UTF-8 client encoding.
>
> From AbstractJdbc1Connection.java i read:
>
> //We also set the client encoding so that the driver only needs
> //to deal with utf8. We can only do this in 7.3 because multibyte
> //support is now always included
>
> So what happens is that the database converts from
> sqlascii -> utf-8 (client encoding),
> and then the driver from utf-8 -> Unicode (with line 164 in
> Encoding.java).
>
> So, if you store in the database the chars 0xA0 0x0A
> you have a test case!
> (the Encoding.decodeUTF8 method throws the indicated Exception).
>
> Dont be mislead by me saying that i had 8bit chars (greek)
> in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
> enter greek data eitherway).
>
> Now the real problems are
> a) Greek chars, mainly my fault but backwards compatibility problem.
> In 7.2.3 the server returned SQL_ASCII chars, interpreted these
> as greek UTF8 chars and returned valid greek java unicode strings
> and everybody was happy.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Excuse me, i was wrong.
What happened is that i inserted in java, 8bit ASCII chars
(not greek UTF8), and data were stored as SQLASCII,
then in my jsp, i just read those ASCII chars, and because my
servlet container encoding was ISO-8859-1 no conversion was done,
and then because my page's charset was set to ISO-8859-7,
the browser displayed greek chars correctly.

>
> Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
> the problem
>
> b) NOT GREEK RELATED!
> With database_encoding set to SQL_ASCII, the server converts these wierd
> 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>
>
> > of problems I have seen in this regards were because the database
> > character set didn't match the character set of the actual data. This
> > is important because the jdbc driver needs to convert the data to java
> > unicode, and if the database character set is incorrectly defined it
> > cannot do this correctly.
> >
> > If this isn't your problem, please submit a test case that shows your
> > problem so that we can look into it.
> >
> > thanks,
> > --Barry
> >
> >
> > Achilleus Mantzios wrote:
> > > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> > >
> > > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > > (which is most likely reloaded from a 7.2.x dump)
> > > For instance, in my case all text data in my 7.2.3 were
> > > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > > I was not able to read these data correctly since the driver
> > > assumed i stored them in utf-8.
> > >
> > > 2) When the contents of a varchar or text field are the
> > > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> > >
> > > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > > java.lang.ArrayIndexOutOfBoundsException
> > > 2003-01-27 11:50:55,666 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > > 2003-01-27 11:50:55,668 ERROR [STDERR] at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> > >
> > > In order to solve these 2 problems for my case , i.e. with no need
> > > for unicode support i wrote this simple patch.
> > > (Note this patch is usefull only for people who DONT NEED
> > > multibyte support)
> > > --------------------------cut here------------------------------
> > > *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
> > > --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
> > > ***************
> > > *** 372,382 ****
> > > //support is now always included
> > > if (haveMinimumServerVersion("7.3"))
> > > {
> > > java.sql.ResultSet acRset =
> > > ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >
> > > //set encoding to be unicode
> > > ! encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > > if (!acRset.next())
> > > {
> > > --- 372,384 ----
> > > //support is now always included
> > > if (haveMinimumServerVersion("7.3"))
> > > {
> > > + // java.sql.ResultSet acRset =
> > > + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > > java.sql.ResultSet acRset =
> > > ! ExecSQL("show autocommit");
> > >
> > > //set encoding to be unicode
> > > ! // encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > > if (!acRset.next())
> > > {
> > > -------------------cut here-------------------------------------------
> > > ==================================================================
> > > Achilleus Mantzios
> > > S/W Engineer
> > > IT dept
> > > Dynacom Tankers Mngmt
> > > Nikis 4, Glyfada
> > > Athens 16610
> > > Greece
> > > tel: +30-10-8981112
> > > fax: +30-10-8981877
> > > email: achill(at)matrix(dot)gatewaynet(dot)com
> > > mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Don't 'kill -9' the postmaster
> > >
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel: +30-10-8981112
> fax: +30-10-8981877
> email: achill(at)matrix(dot)gatewaynet(dot)com
> mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr

In response to

Browse pgsql-jdbc by date

  From Date Subject
Next Message Amit Kelkar 2003-02-05 14:35:50 how to unsubscribe
Previous Message Achilleus Mantzios 2003-02-05 13:11:31 Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility