Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility

Lists: pgsql-jdbc
From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-01-28 12:58:24
Message-ID: Pine.LNX.4.44.0301281057280.14316-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.

1) The new 7.3.1 assumes data is stored in UNICODE in the database
(which is most likely reloaded from a 7.2.x dump)
For instance, in my case all text data in my 7.2.3 were
ISO-8859-7 (Greek) (8bit ASCII compatible).
I was not able to read these data correctly since the driver
assumed i stored them in utf-8.

2) When the contents of a varchar or text field are the
ASCII 0xA0 0x0A (which for some reason IE strangely produces)
the driver throws an java.lang.ArrayIndexOutOfBoundsException :

2003-01-27 11:50:55,665 ERROR [STDERR]
java.lang.ArrayIndexOutOfBoundsException
2003-01-27 11:50:55,666 ERROR [STDERR] at
org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
2003-01-27 11:50:55,667 ERROR [STDERR] at
org.postgresql.core.Encoding.decode(Encoding.java:165)
2003-01-27 11:50:55,667 ERROR [STDERR] at
org.postgresql.core.Encoding.decode(Encoding.java:181)
2003-01-27 11:50:55,668 ERROR [STDERR] at
org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)

In order to solve these 2 problems for my case , i.e. with no need
for unicode support i wrote this simple patch.
(Note this patch is usefull only for people who DONT NEED
multibyte support)
--------------------------cut here------------------------------
*** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
--- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
***************
*** 372,382 ****
//support is now always included
if (haveMinimumServerVersion("7.3"))
{
java.sql.ResultSet acRset =
! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");

//set encoding to be unicode
! encoding = Encoding.getEncoding("UNICODE", null);

if (!acRset.next())
{
--- 372,384 ----
//support is now always included
if (haveMinimumServerVersion("7.3"))
{
+ // java.sql.ResultSet acRset =
+ // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
java.sql.ResultSet acRset =
! ExecSQL("show autocommit");

//set encoding to be unicode
! // encoding = Encoding.getEncoding("UNICODE", null);

if (!acRset.next())
{
-------------------cut here-------------------------------------------
==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-01-28 15:37:40
Message-ID: Pine.LNX.4.44.0301281334100.14676-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Tue, 28 Jan 2003, Achilleus Mantzios wrote:

I found another guy having the same problem (as problem #2)
using postgresql 7.3.1 with resin.
http://www.caucho.com/support/ejb-interest/0212/0049.html

He suggested that the garbled input was do to the "smart quotes"
some MS products are inserting.

>
> Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
>
> 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> (which is most likely reloaded from a 7.2.x dump)
> For instance, in my case all text data in my 7.2.3 were
> ISO-8859-7 (Greek) (8bit ASCII compatible).
> I was not able to read these data correctly since the driver
> assumed i stored them in utf-8.
>
> 2) When the contents of a varchar or text field are the
> ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> the driver throws an java.lang.ArrayIndexOutOfBoundsException :
>
> 2003-01-27 11:50:55,665 ERROR [STDERR]
> java.lang.ArrayIndexOutOfBoundsException
> 2003-01-27 11:50:55,666 ERROR [STDERR] at
> org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> 2003-01-27 11:50:55,667 ERROR [STDERR] at
> org.postgresql.core.Encoding.decode(Encoding.java:165)
> 2003-01-27 11:50:55,667 ERROR [STDERR] at
> org.postgresql.core.Encoding.decode(Encoding.java:181)
> 2003-01-27 11:50:55,668 ERROR [STDERR] at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>
> In order to solve these 2 problems for my case , i.e. with no need
> for unicode support i wrote this simple patch.
> (Note this patch is usefull only for people who DONT NEED
> multibyte support)
> --------------------------cut here------------------------------
> *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
> --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
> ***************
> *** 372,382 ****
> //support is now always included
> if (haveMinimumServerVersion("7.3"))
> {
> java.sql.ResultSet acRset =
> ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>
> //set encoding to be unicode
> ! encoding = Encoding.getEncoding("UNICODE", null);
>
> if (!acRset.next())
> {
> --- 372,384 ----
> //support is now always included
> if (haveMinimumServerVersion("7.3"))
> {
> + // java.sql.ResultSet acRset =
> + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> java.sql.ResultSet acRset =
> ! ExecSQL("show autocommit");
>
> //set encoding to be unicode
> ! // encoding = Encoding.getEncoding("UNICODE", null);
>
> if (!acRset.next())
> {
> -------------------cut here-------------------------------------------
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel: +30-10-8981112
> fax: +30-10-8981877
> email: achill(at)matrix(dot)gatewaynet(dot)com
> mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Barry Lind <blind(at)xythos(dot)com>
To: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-05 01:40:56
Message-ID: 3E406BA8.6080105@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Achilleus,

What is the character set of your database? My guess is that it is
SQLASCII which is a 7bit character set. If you are storing ISO-8859-7
data you should have that as your database character set. All reports
of problems I have seen in this regards were because the database
character set didn't match the character set of the actual data. This
is important because the jdbc driver needs to convert the data to java
unicode, and if the database character set is incorrectly defined it
cannot do this correctly.

If this isn't your problem, please submit a test case that shows your
problem so that we can look into it.

thanks,
--Barry

Achilleus Mantzios wrote:
> Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
>
> 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> (which is most likely reloaded from a 7.2.x dump)
> For instance, in my case all text data in my 7.2.3 were
> ISO-8859-7 (Greek) (8bit ASCII compatible).
> I was not able to read these data correctly since the driver
> assumed i stored them in utf-8.
>
> 2) When the contents of a varchar or text field are the
> ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> the driver throws an java.lang.ArrayIndexOutOfBoundsException :
>
> 2003-01-27 11:50:55,665 ERROR [STDERR]
> java.lang.ArrayIndexOutOfBoundsException
> 2003-01-27 11:50:55,666 ERROR [STDERR] at
> org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> 2003-01-27 11:50:55,667 ERROR [STDERR] at
> org.postgresql.core.Encoding.decode(Encoding.java:165)
> 2003-01-27 11:50:55,667 ERROR [STDERR] at
> org.postgresql.core.Encoding.decode(Encoding.java:181)
> 2003-01-27 11:50:55,668 ERROR [STDERR] at
> org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
>
> In order to solve these 2 problems for my case , i.e. with no need
> for unicode support i wrote this simple patch.
> (Note this patch is usefull only for people who DONT NEED
> multibyte support)
> --------------------------cut here------------------------------
> *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
> --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
> ***************
> *** 372,382 ****
> //support is now always included
> if (haveMinimumServerVersion("7.3"))
> {
> java.sql.ResultSet acRset =
> ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
>
> //set encoding to be unicode
> ! encoding = Encoding.getEncoding("UNICODE", null);
>
> if (!acRset.next())
> {
> --- 372,384 ----
> //support is now always included
> if (haveMinimumServerVersion("7.3"))
> {
> + // java.sql.ResultSet acRset =
> + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> java.sql.ResultSet acRset =
> ! ExecSQL("show autocommit");
>
> //set encoding to be unicode
> ! // encoding = Encoding.getEncoding("UNICODE", null);
>
> if (!acRset.next())
> {
> -------------------cut here-------------------------------------------
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel: +30-10-8981112
> fax: +30-10-8981877
> email: achill(at)matrix(dot)gatewaynet(dot)com
> mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>


From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Barry Lind <blind(at)xythos(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-05 13:11:31
Message-ID: Pine.LNX.4.44.0302051027200.1908-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Tue, 4 Feb 2003, Barry Lind wrote:

> Achilleus,
>
> What is the character set of your database? My guess is that it is
> SQLASCII which is a 7bit character set. If you are storing ISO-8859-7
> data you should have that as your database character set. All reports

Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
If you read the code, you will see that the driver for all 7.3 versions
forces UTF-8 client encoding.

From AbstractJdbc1Connection.java i read:

//We also set the client encoding so that the driver only needs
//to deal with utf8. We can only do this in 7.3 because multibyte
//support is now always included

So what happens is that the database converts from
sqlascii -> utf-8 (client encoding),
and then the driver from utf-8 -> Unicode (with line 164 in
Encoding.java).

So, if you store in the database the chars 0xA0 0x0A
you have a test case!
(the Encoding.decodeUTF8 method throws the indicated Exception).

Dont be mislead by me saying that i had 8bit chars (greek)
in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
enter greek data eitherway).

Now the real problems are
a) Greek chars, mainly my fault but backwards compatibility problem.
In 7.2.3 the server returned SQL_ASCII chars, interpreted these
as greek UTF8 chars and returned valid greek java unicode strings
and everybody was happy.

Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
the problem

b) NOT GREEK RELATED!
With database_encoding set to SQL_ASCII, the server converts these wierd
2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.

I think you should deal with problem b).
To create a test case is easy.
Create a SQL_ASCII database, then insert these 2 chars in a text column
(having typed these two chars with some utility like khexedit),
and then out.println this string.

> of problems I have seen in this regards were because the database
> character set didn't match the character set of the actual data. This
> is important because the jdbc driver needs to convert the data to java
> unicode, and if the database character set is incorrectly defined it
> cannot do this correctly.
>
> If this isn't your problem, please submit a test case that shows your
> problem so that we can look into it.
>
> thanks,
> --Barry
>
>
> Achilleus Mantzios wrote:
> > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> >
> > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > (which is most likely reloaded from a 7.2.x dump)
> > For instance, in my case all text data in my 7.2.3 were
> > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > I was not able to read these data correctly since the driver
> > assumed i stored them in utf-8.
> >
> > 2) When the contents of a varchar or text field are the
> > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> >
> > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > java.lang.ArrayIndexOutOfBoundsException
> > 2003-01-27 11:50:55,666 ERROR [STDERR] at
> > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > 2003-01-27 11:50:55,668 ERROR [STDERR] at
> > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> >
> > In order to solve these 2 problems for my case , i.e. with no need
> > for unicode support i wrote this simple patch.
> > (Note this patch is usefull only for people who DONT NEED
> > multibyte support)
> > --------------------------cut here------------------------------
> > *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
> > --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
> > ***************
> > *** 372,382 ****
> > //support is now always included
> > if (haveMinimumServerVersion("7.3"))
> > {
> > java.sql.ResultSet acRset =
> > ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> >
> > //set encoding to be unicode
> > ! encoding = Encoding.getEncoding("UNICODE", null);
> >
> > if (!acRset.next())
> > {
> > --- 372,384 ----
> > //support is now always included
> > if (haveMinimumServerVersion("7.3"))
> > {
> > + // java.sql.ResultSet acRset =
> > + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > java.sql.ResultSet acRset =
> > ! ExecSQL("show autocommit");
> >
> > //set encoding to be unicode
> > ! // encoding = Encoding.getEncoding("UNICODE", null);
> >
> > if (!acRset.next())
> > {
> > -------------------cut here-------------------------------------------
> > ==================================================================
> > Achilleus Mantzios
> > S/W Engineer
> > IT dept
> > Dynacom Tankers Mngmt
> > Nikis 4, Glyfada
> > Athens 16610
> > Greece
> > tel: +30-10-8981112
> > fax: +30-10-8981877
> > email: achill(at)matrix(dot)gatewaynet(dot)com
> > mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Don't 'kill -9' the postmaster
> >
>
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Barry Lind <blind(at)xythos(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-05 13:20:51
Message-ID: Pine.LNX.4.44.0302051116200.6193-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Wed, 5 Feb 2003, Achilleus Mantzios wrote:

> On Tue, 4 Feb 2003, Barry Lind wrote:
>
> > Achilleus,
> >
> > What is the character set of your database? My guess is that it is
> > SQLASCII which is a 7bit character set. If you are storing ISO-8859-7
> > data you should have that as your database character set. All reports
>
> Yes it is SQL_ASCII. (BTW 8bit chars are stored just fine).
> If you read the code, you will see that the driver for all 7.3 versions
> forces UTF-8 client encoding.
>
> From AbstractJdbc1Connection.java i read:
>
> //We also set the client encoding so that the driver only needs
> //to deal with utf8. We can only do this in 7.3 because multibyte
> //support is now always included
>
> So what happens is that the database converts from
> sqlascii -> utf-8 (client encoding),
> and then the driver from utf-8 -> Unicode (with line 164 in
> Encoding.java).
>
> So, if you store in the database the chars 0xA0 0x0A
> you have a test case!
> (the Encoding.decodeUTF8 method throws the indicated Exception).
>
> Dont be mislead by me saying that i had 8bit chars (greek)
> in 7.2.3. (The Exception problem was on pure ASCII data, the users rarely
> enter greek data eitherway).
>
> Now the real problems are
> a) Greek chars, mainly my fault but backwards compatibility problem.
> In 7.2.3 the server returned SQL_ASCII chars, interpreted these
> as greek UTF8 chars and returned valid greek java unicode strings
> and everybody was happy.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Excuse me, i was wrong.
What happened is that i inserted in java, 8bit ASCII chars
(not greek UTF8), and data were stored as SQLASCII,
then in my jsp, i just read those ASCII chars, and because my
servlet container encoding was ISO-8859-1 no conversion was done,
and then because my page's charset was set to ISO-8859-7,
the browser displayed greek chars correctly.

>
> Now in 7.3.1 the server tried to convert SQL_ASCII to UTF-8 and hence
> the problem
>
> b) NOT GREEK RELATED!
> With database_encoding set to SQL_ASCII, the server converts these wierd
> 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>
>
> > of problems I have seen in this regards were because the database
> > character set didn't match the character set of the actual data. This
> > is important because the jdbc driver needs to convert the data to java
> > unicode, and if the database character set is incorrectly defined it
> > cannot do this correctly.
> >
> > If this isn't your problem, please submit a test case that shows your
> > problem so that we can look into it.
> >
> > thanks,
> > --Barry
> >
> >
> > Achilleus Mantzios wrote:
> > > Hi i encountered 2 problems regarding the 7.3.1 jdbc driver.
> > >
> > > 1) The new 7.3.1 assumes data is stored in UNICODE in the database
> > > (which is most likely reloaded from a 7.2.x dump)
> > > For instance, in my case all text data in my 7.2.3 were
> > > ISO-8859-7 (Greek) (8bit ASCII compatible).
> > > I was not able to read these data correctly since the driver
> > > assumed i stored them in utf-8.
> > >
> > > 2) When the contents of a varchar or text field are the
> > > ASCII 0xA0 0x0A (which for some reason IE strangely produces)
> > > the driver throws an java.lang.ArrayIndexOutOfBoundsException :
> > >
> > > 2003-01-27 11:50:55,665 ERROR [STDERR]
> > > java.lang.ArrayIndexOutOfBoundsException
> > > 2003-01-27 11:50:55,666 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decodeUTF8(Encoding.java:259)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decode(Encoding.java:165)
> > > 2003-01-27 11:50:55,667 ERROR [STDERR] at
> > > org.postgresql.core.Encoding.decode(Encoding.java:181)
> > > 2003-01-27 11:50:55,668 ERROR [STDERR] at
> > > org.postgresql.jdbc1.AbstractJdbc1ResultSet.getString(AbstractJdbc1ResultSet.java:97)
> > >
> > > In order to solve these 2 problems for my case , i.e. with no need
> > > for unicode support i wrote this simple patch.
> > > (Note this patch is usefull only for people who DONT NEED
> > > multibyte support)
> > > --------------------------cut here------------------------------
> > > *** AbstractJdbc1Connection.java.orig Tue Jan 28 09:42:54 2003
> > > --- AbstractJdbc1Connection.java Tue Jan 28 09:50:09 2003
> > > ***************
> > > *** 372,382 ****
> > > //support is now always included
> > > if (haveMinimumServerVersion("7.3"))
> > > {
> > > java.sql.ResultSet acRset =
> > > ! ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > >
> > > //set encoding to be unicode
> > > ! encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > > if (!acRset.next())
> > > {
> > > --- 372,384 ----
> > > //support is now always included
> > > if (haveMinimumServerVersion("7.3"))
> > > {
> > > + // java.sql.ResultSet acRset =
> > > + // ExecSQL("set client_encoding = 'UNICODE'; show autocommit");
> > > java.sql.ResultSet acRset =
> > > ! ExecSQL("show autocommit");
> > >
> > > //set encoding to be unicode
> > > ! // encoding = Encoding.getEncoding("UNICODE", null);
> > >
> > > if (!acRset.next())
> > > {
> > > -------------------cut here-------------------------------------------
> > > ==================================================================
> > > Achilleus Mantzios
> > > S/W Engineer
> > > IT dept
> > > Dynacom Tankers Mngmt
> > > Nikis 4, Glyfada
> > > Athens 16610
> > > Greece
> > > tel: +30-10-8981112
> > > fax: +30-10-8981877
> > > email: achill(at)matrix(dot)gatewaynet(dot)com
> > > mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
> > >
> > >
> > >
> > > ---------------------------(end of broadcast)---------------------------
> > > TIP 4: Don't 'kill -9' the postmaster
> > >
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel: +30-10-8981112
> fax: +30-10-8981877
> email: achill(at)matrix(dot)gatewaynet(dot)com
> mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Barry Lind <blind(at)xythos(dot)com>
To: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-05 16:46:00
Message-ID: 3E413FC8.90003@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Achilleus Mantzios wrote:
> b) NOT GREEK RELATED!
> With database_encoding set to SQL_ASCII, the server converts these wierd
> 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
>
> I think you should deal with problem b).
> To create a test case is easy.
> Create a SQL_ASCII database, then insert these 2 chars in a text column
> (having typed these two chars with some utility like khexedit),
> and then out.println this string.
>

Achilleus,

I want to understand what you mean by 'deal with the problem'. Since
0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
think of is to produce a better exception in this case. So instead of
the current ArrayIndexOutOfBounds exception, this case would throw a SQL
Exception with a message something like: "Invalid characters were
found. This is most likely caused by stored data containing characters
that are invalid for the character set the database was created in. The
most common example of this is storing 8bit data in a SQL_ASCII database."

thanks,
--Barry


From: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
To: Barry Lind <blind(at)xythos(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-07 11:56:29
Message-ID: Pine.LNX.4.44.0302070954100.7803-100000@matrix.gatewaynet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Wed, 5 Feb 2003, Barry Lind wrote:

>
>
> Achilleus Mantzios wrote:
> > b) NOT GREEK RELATED!
> > With database_encoding set to SQL_ASCII, the server converts these wierd
> > 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
> >
> > I think you should deal with problem b).
> > To create a test case is easy.
> > Create a SQL_ASCII database, then insert these 2 chars in a text column
> > (having typed these two chars with some utility like khexedit),
> > and then out.println this string.
> >
>
> Achilleus,
>
> I want to understand what you mean by 'deal with the problem'. Since

What i mean, is simply that either we dont allow these chars
to get inserted (setString methods maybe), and we let the
decodeUTF-8 method as is, or allow them to get inserted
and then convert them to the traditional '?' char.

Thanx

> 0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
> think of is to produce a better exception in this case. So instead of
> the current ArrayIndexOutOfBounds exception, this case would throw a SQL
> Exception with a message something like: "Invalid characters were
> found. This is most likely caused by stored data containing characters
> that are invalid for the character set the database was created in. The
> most common example of this is storing 8bit data in a SQL_ASCII database."
>
> thanks,
> --Barry
>
>
>

==================================================================
Achilleus Mantzios
S/W Engineer
IT dept
Dynacom Tankers Mngmt
Nikis 4, Glyfada
Athens 16610
Greece
tel: +30-10-8981112
fax: +30-10-8981877
email: achill(at)matrix(dot)gatewaynet(dot)com
mantzios(at)softlab(dot)ece(dot)ntua(dot)gr


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Achilleus Mantzios <achill(at)matrix(dot)gatewaynet(dot)com>
Cc: Barry Lind <blind(at)xythos(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: 7.3.1 UTF-8 bug(?) and 7.2.x Charset compatibility
Date: 2003-02-07 12:19:59
Message-ID: 1044620399.1058.55.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

This doesn't really solve the problem. The driver isn't the only way to
get information into the database. The driver should be able to handle
anything that it receives gracefully though

Dave
On Fri, 2003-02-07 at 06:56, Achilleus Mantzios wrote:
> On Wed, 5 Feb 2003, Barry Lind wrote:
>
> >
> >
> > Achilleus Mantzios wrote:
> > > b) NOT GREEK RELATED!
> > > With database_encoding set to SQL_ASCII, the server converts these wierd
> > > 2 chars (0xA0 0x0A) to UTF-8, and then the driver simply fails.
> > >
> > > I think you should deal with problem b).
> > > To create a test case is easy.
> > > Create a SQL_ASCII database, then insert these 2 chars in a text column
> > > (having typed these two chars with some utility like khexedit),
> > > and then out.println this string.
> > >
> >
> > Achilleus,
> >
> > I want to understand what you mean by 'deal with the problem'. Since
>
> What i mean, is simply that either we dont allow these chars
> to get inserted (setString methods maybe), and we let the
> decodeUTF-8 method as is, or allow them to get inserted
> and then convert them to the traditional '?' char.
>
> Thanx
>
> > 0xA0 and 0x0A are invalid SQL_ASCII characters, the only thing I can
> > think of is to produce a better exception in this case. So instead of
> > the current ArrayIndexOutOfBounds exception, this case would throw a SQL
> > Exception with a message something like: "Invalid characters were
> > found. This is most likely caused by stored data containing characters
> > that are invalid for the character set the database was created in. The
> > most common example of this is storing 8bit data in a SQL_ASCII database."
> >
> > thanks,
> > --Barry
> >
> >
> >
>
> ==================================================================
> Achilleus Mantzios
> S/W Engineer
> IT dept
> Dynacom Tankers Mngmt
> Nikis 4, Glyfada
> Athens 16610
> Greece
> tel: +30-10-8981112
> fax: +30-10-8981877
> email: achill(at)matrix(dot)gatewaynet(dot)com
> mantzios(at)softlab(dot)ece(dot)ntua(dot)gr
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
--
Dave Cramer <Dave(at)micro-automation(dot)net>


From: Michael Adler <adler(at)glimpser(dot)org>
To: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, Barry Lind <blind(at)xythos(dot)com>
Subject: emacs behave like pgjindent?
Date: 2003-02-07 13:50:04
Message-ID: Pine.NEB.4.53.0302070831420.20145@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


Has anyone crafted a mode-hook so that emacs behaves roughly like
pgjindent? The default JDE style is quite different. If no one has done
the leg work, I may take a stab.

-Mike


From: Dave Cramer <Dave(at)micro-automation(dot)net>
To: Michael Adler <adler(at)glimpser(dot)org>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: emacs behave like pgjindent?
Date: 2003-02-07 14:20:03
Message-ID: 1044627603.1027.77.camel@inspiron.cramers
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

You're welcome to go ahead, but I'm not sure how many folks use emacs. I
confess to using JBuilder, or vi

Dave
On Fri, 2003-02-07 at 08:50, Michael Adler wrote:
> Has anyone crafted a mode-hook so that emacs behaves roughly like
> pgjindent? The default JDE style is quite different. If no one has done
> the leg work, I may take a stab.
>
> -Mike
--
Dave Cramer <Dave(at)micro-automation(dot)net>


From: Michael Adler <adler(at)glimpser(dot)org>
To: Dave Cramer <Dave(at)micro-automation(dot)net>
Cc: Michael Adler <adler(at)glimpser(dot)org>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: emacs behave like pgjindent?
Date: 2003-02-07 14:45:05
Message-ID: Pine.NEB.4.53.0302070941030.20145@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


If anyone's interested, this does a decent job. The difference I saw was
that emacs will still let a blank line have a tab on it. pgjindent will
trim it off with entab.

(defun my-jde-mode-hook()
;; attempt to match PostgreSQL's pgjindent style
(setq tab-width 4)
(setq indent-tabs-mode t)
(c-set-offset 'substatement-open 0)
)
(add-hook 'jde-mode-hook 'my-jde-mode-hook)

- Mike Adler

On Fri, 7 Feb 2003, Dave Cramer wrote:
> Date: 07 Feb 2003 09:20:03 -0500
> From: Dave Cramer <Dave(at)micro-automation(dot)net>
> To: Michael Adler <adler(at)glimpser(dot)org>
> Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>,
> Barry Lind <blind(at)xythos(dot)com>
> Subject: Re: [JDBC] emacs behave like pgjindent?
>
> You're welcome to go ahead, but I'm not sure how many folks use emacs. I
> confess to using JBuilder, or vi
>
> Dave
> On Fri, 2003-02-07 at 08:50, Michael Adler wrote:
> > Has anyone crafted a mode-hook so that emacs behaves roughly like
> > pgjindent? The default JDE style is quite different. If no one has done
> > the leg work, I may take a stab.
> >
> > -Mike
> --
> Dave Cramer <Dave(at)micro-automation(dot)net>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>


From: Michael Adler <adler(at)glimpser(dot)org>
To: Dave Cramer <Dave(at)micro-automation(dot)net>
Cc: Michael Adler <adler(at)glimpser(dot)org>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: new class layout to support COPY protocal
Date: 2003-02-07 15:03:46
Message-ID: Pine.NEB.4.53.0302070946020.20145@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


I'm working on supporting the COPY protocol (again). Unless people are
unsatisfied with the largeobject way of accessing pg-specific
functionality, I'll adopt their way of doing things. For example:

org.postgresql.copy.CopyManager copyMgr;
copyMgr = ((org.postgresql.PGConnection)con).getCopyAPI();
copyMgr.copyOut("tablename", outputStream);
copyMgr.copyIn("tablename", inputStream);

I have working code with unit tests, but it still needs polishing. I
simply wanted to know if this class layout would be met with approval.

- Mike Adler


From: Michael Adler <adler(at)glimpser(dot)org>
To: Dave Cramer <Dave(at)micro-automation(dot)net>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: patch for COPY
Date: 2003-02-07 21:56:58
Message-ID: Pine.NEB.4.53.0302071648290.23858@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


I've attached a tar file that includes a context diff and two additional
files. This should provide COPY capabilities for the JDBC driver. I can
write up the Docbook documentation if the patch (or some version of it) is
to be incorporated.

I will be unable to respond to comments next week, but I can respond this
weekend and the following weekend and thereafter.

- Mike Adler

Attachment Content-Type Size
copypatch.tar.gz application/octet-stream 4.7 KB

From: Kris Jurka <books(at)ejurka(dot)com>
To: Michael Adler <adler(at)glimpser(dot)org>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: patch for COPY
Date: 2003-02-08 02:47:18
Message-ID: Pine.LNX.4.33.0302072140210.5842-100000@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


One of the failings of the copy protocol is that on error basically the
connection is hosed. Is it possible to reset the connection state on
error for the user?

Also are there plans to support other elements of the COPY syntax? For
example NULL AS, OIDS, and column lists.

Kris Jurka

On Fri, 7 Feb 2003, Michael Adler wrote:

>
> I've attached a tar file that includes a context diff and two additional
> files. This should provide COPY capabilities for the JDBC driver. I can
> write up the Docbook documentation if the patch (or some version of it) is
> to be incorporated.
>
> I will be unable to respond to comments next week, but I can respond this
> weekend and the following weekend and thereafter.
>
> - Mike Adler


From: Michael Adler <adler(at)glimpser(dot)org>
To: Kris Jurka <books(at)ejurka(dot)com>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: patch for COPY
Date: 2003-02-08 14:57:23
Message-ID: Pine.NEB.4.53.0302080924270.3347@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Fri, 7 Feb 2003, Kris Jurka wrote:
> One of the failings of the copy protocol is that on error basically the
> connection is hosed. Is it possible to reset the connection state on
> error for the user?

Assuming the rest of the driver can support this behavior, I'm guess that
we should make this optional.

> Also are there plans to support other elements of the COPY syntax? For
> example NULL AS, OIDS, and column lists.

Yes. My current thinking is to provide a method that takes an arbitrary
COPY command. This also gives us backwards compatibility since the command
syntax has changed from 7.2 to 7.3.

Mike Adler


From: Kris Jurka <books(at)ejurka(dot)com>
To: Michael Adler <adler(at)glimpser(dot)org>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: patch for COPY
Date: 2003-02-09 09:21:29
Message-ID: Pine.LNX.4.33.0302090411020.29526-100000@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Sat, 8 Feb 2003, Michael Adler wrote:
>
> On Fri, 7 Feb 2003, Kris Jurka wrote:
> > One of the failings of the copy protocol is that on error basically the
> > connection is hosed. Is it possible to reset the connection state on
> > error for the user?
>
> Assuming the rest of the driver can support this behavior, I'm guess that
> we should make this optional.

That's the question. Can we reset the driver to a close enough state that
it is transparent to the user. With normal JDBC access the user will
expect to get an SQLException call connection.rollback() and continue as
usual. This could be tricky.

> > Also are there plans to support other elements of the COPY syntax? For
> > example NULL AS, OIDS, and column lists.
>
> Yes. My current thinking is to provide a method that takes an arbitrary
> COPY command. This also gives us backwards compatibility since the command
> syntax has changed from 7.2 to 7.3.

What is the expected use case for a copyIn? Is an InputStream a
reasonable means for input. Would defining a CopyInputSource interface
for a user's class to implement be useful? The JDBC driver could then
pull data directly from the user's representation without an intermediate
persistance to the InputStream.

Kris Jurka


From: Michael Adler <adler(at)glimpser(dot)org>
To: Kris Jurka <books(at)ejurka(dot)com>
Cc: Dave Cramer <Dave(at)micro-automation(dot)net>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: patch for COPY
Date: 2003-02-09 17:08:05
Message-ID: Pine.NEB.4.53.0302090916010.13490@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Sun, 9 Feb 2003, Kris Jurka wrote:
>
> On Sat, 8 Feb 2003, Michael Adler wrote:
> >
> > On Fri, 7 Feb 2003, Kris Jurka wrote:
> > > One of the failings of the copy protocol is that on error basically the
> > > connection is hosed. Is it possible to reset the connection state on
> > > error for the user?
> >
> > Assuming the rest of the driver can support this behavior, I'm guess that
> > we should make this optional.
>
> That's the question. Can we reset the driver to a close enough state that
> it is transparent to the user. With normal JDBC access the user will
> expect to get an SQLException call connection.rollback() and continue as
> usual. This could be tricky.
>

If we take libpq as the standard for what's practical to acheive with the
FE/BE protocol, I don't think we'll be able to maintain much. libpq simply
closes and opens the connection. (following test with a 7.2 installation)

testdb=# set datestyle to German;
SET VARIABLE
testdb=# show datestyle;
NOTICE: DateStyle is German with European conventions
SHOW VARIABLE
testdb=# \i isf
psql:isf:1: ERROR: copy: line 1, pg_atoi: error in "T": can't parse "T"
psql:isf:1: lost synchronization with server, resetting connection
testdb=#
testdb=# show datestyle;
NOTICE: DateStyle is ISO with US (NonEuropean) conventions
SHOW VARIABLE

I wonder if the best we can do is to establish a fresh connection and
begin a transaction. If they call rollback, it will rollback nothing, but
at least it behaves outwardly in a uniform fashion.

> What is the expected use case for a copyIn? Is an InputStream a
> reasonable means for input. Would defining a CopyInputSource interface
> for a user's class to implement be useful? The JDBC driver could then
> pull data directly from the user's representation without an intermediate
> persistance to the InputStream.

For my needs, an InputStream is reasonable.

FileInputStream fis = new FileInputStream("dumpfile");
copyIn("destination_table", fis);

Whether someone else finds that insufficient is another matter.

Personally, I think that eschewing java.io would increase the complexity
of the driver without a demonstrated need for the functionality. It's
likely that I lack the imagination to see how useful such a feature would
be. I will leave the decision to someone with more experience on this
project.

If a user has particular needs and is concerned with memory footprint, I
would recommend the Piped(Input/Output)Streams.

Mike Adler


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Adler <adler(at)glimpser(dot)org>
Cc: Kris Jurka <books(at)ejurka(dot)com>, Dave Cramer <Dave(at)micro-automation(dot)net>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>, Barry Lind <blind(at)xythos(dot)com>
Subject: Re: patch for COPY
Date: 2003-02-09 17:30:06
Message-ID: 1578.1044811806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Michael Adler <adler(at)glimpser(dot)org> writes:
> On Fri, 7 Feb 2003, Kris Jurka wrote:
>>> One of the failings of the copy protocol is that on error basically the
>>> connection is hosed. Is it possible to reset the connection state on
>>> error for the user?

> If we take libpq as the standard for what's practical to acheive with the
> FE/BE protocol, I don't think we'll be able to maintain much. libpq simply
> closes and opens the connection. (following test with a 7.2 installation)

It might be best to just leave this as an open problem until the COPY
protocol is fixed. Making COPY able to recover from errors is one of
the "must fix" items for the next FE/BE protocol revision. There had
been talk of doing this for 7.4, but given the lack of progress so far
I wouldn't want to promise results for 7.4. Maybe 7.5 though. We have
enough accumulated reasons for protocol changes that I think it's
getting to be a high-priority issue.

regards, tom lane


From: Michael Adler <adler(at)glimpser(dot)org>
To: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Cc: Kris Jurka <books(at)ejurka(dot)com>, rmischook(at)yahoo(dot)com, Dave Cramer <Dave(at)micro-automation(dot)net>, Barry Lind <blind(at)xythos(dot)com>
Subject: revised patch for COPY
Date: 2003-02-26 17:51:59
Message-ID: Pine.NEB.4.53.0302261227020.6153@reva.sixgirls.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc


Here's another version of a patch that gives you COPY capabilities. The
difference is that in addition to the simple and default:

copyOut("tablename",outputStream);

you can also access other COPY features by supplying your own COPY query:

copyOutQuery("COPY "+tablename+" WITH OID TO STDOUT DELIMITERS '\t' WITH
NULL AS '\N'",outputStream);

This feature speeds up my application 40x and I bet it will be useful to
others as well. I wrote it to integrate cleanly into the driver, so
please let me know if its not appropriate for the main project.

Comments?

Mike Adler

Attachment Content-Type Size
CopyTest.java text/plain 5.2 KB
copydiff text/plain 8.1 KB
CopyManager.java text/plain 7.7 KB