Re: invalid byte sequence for encoding "UTF8": 0x00

Lists: pgsql-adminpgsql-jdbc
From: "James Im" <im-james(at)hotmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-20 10:48:43
Message-ID: BAY7-F17FFE0E324AB3B642C547E96890@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Hi,

I've got another problem. I sometimes get the following SQLException
when doing an insert:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
Exception: org.postgresql.util.PSQLException
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)

By the way, the insert is done with a PreparedStatement and I use only
setLong(), setString(), setTimestamp() and setInt().

I don't understand it very well. It is obviously an encoding exception
but I don't know why it happens and what I could do avoid it.

Any idea?

_________________________________________________________________
Opret en personlig blog og del dine billeder p MSN Spaces:
http://spaces.msn.com/


From: Altaf Malik <mmalik_altaf(at)yahoo(dot)com>
To: James Im <im-james(at)hotmail(dot)com>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-20 10:57:07
Message-ID: 793207.12351.qm@web39108.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Try to change the encoding of your database to "Unicode".
I hope this helps.

--Altaf Malik
EnterpriseDB
www.enterprisedb.com

James Im <im-james(at)hotmail(dot)com> wrote:
Hi,

I've got another problem. I sometimes get the following SQLException
when doing an insert:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
Exception: org.postgresql.util.PSQLException
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)

By the way, the insert is done with a PreparedStatement and I use only
setLong(), setString(), setTimestamp() and setInt().

I don't understand it very well. It is obviously an encoding exception
but I don't know why it happens and what I could do avoid it.

Any idea?

_________________________________________________________________
Opret en personlig blog og del dine billeder på MSN Spaces:
http://spaces.msn.com/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
message can get through to the mailing list cleanly


---------------------------------
Don't get soaked. Take a quick peak at the forecast
with theYahoo! Search weather shortcut.


From: Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To: Altaf Malik <mmalik_altaf(at)yahoo(dot)com>
Cc: James Im <im-james(at)hotmail(dot)com>, Postgres JDBC <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-20 11:13:39
Message-ID: 1171970019.3101.328.camel@coppola.muc.ecircle.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

I've had the same error, and it is in fact because in Java you can
actually have a "0x0" character in your string, and that's valid
unicode. So that's translated to the character 0x0 in UTF8, which in
turn is not accepted because the server uses null terminated strings...
so the only way is to make sure your strings don't contain the character
'\u0000'.

I identified the place in my code which was generating such a character
and fixed, and I didn't have other problems after that... even if I
still think forbidding a valid character is a somewhat arbitrary
restriction.

HTH,
Csaba.

On Tue, 2007-02-20 at 11:57, Altaf Malik wrote:
> Try to change the encoding of your database to "Unicode".
> I hope this helps.
>
> --Altaf Malik
> EnterpriseDB
> www.enterprisedb.com
> James Im <im-james(at)hotmail(dot)com> wrote:
> Hi,
>
> I've got another problem. I sometimes get the following
> SQLException
> when doing an insert:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x00
> Exception: org.postgresql.util.PSQLException
> org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:1525)
> org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1309)
> org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:188)
> org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:452)
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:354)
> org.postgresql.jdbc2.AbstractJdbc2Statement.executeUpdate(AbstractJdbc2Statement.java:308)
>
> By the way, the insert is done with a PreparedStatement and I
> use only
> setLong(), setString(), setTimestamp() and setInt().
>
>
> I don't understand it very well. It is obviously an encoding
> exception
> but I don't know why it happens and what I could do avoid it.
>
> Any idea?
>
> _________________________________________________________________
> Opret en personlig blog og del dine billeder på MSN Spaces:
> http://spaces.msn.com/
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an
> appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that
> your
> message can get through to the mailing list cleanly
>
>
> ______________________________________________________________________
> Don't get soaked. Take aquick peak at the forecast
> with theYahoo! Search weather shortcut.


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: James Im <im-james(at)hotmail(dot)com>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-20 11:50:14
Message-ID: 45DAE076.1060407@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

James Im wrote:

> I've got another problem. I sometimes get the following SQLException
> when doing an insert:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x00

You're trying to insert a string which contains a '\0' character. The
server can't handle strings containing embedded NULs, as it uses C-style
string termination internally.

-O


From: Ken Johanson <pg-user(at)kensystem(dot)com>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: James Im <im-james(at)hotmail(dot)com>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-22 06:37:51
Message-ID: 45DD3A3F.5050609@kensystem.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Oliver Jowett wrote:
> James Im wrote:
>
>> I've got another problem. I sometimes get the following SQLException
>> when doing an insert:
>>
>> ERROR: invalid byte sequence for encoding "UTF8": 0x00
>
> You're trying to insert a string which contains a '\0' character. The
> server can't handle strings containing embedded NULs, as it uses C-style
> string termination internally.
>

At least on other servers/drivers I believe nulls are supported (and
should be according to some spec) (The only special-meaning char is
single quote).

I'm wondering how the binary protocol works insofar as handling the NULL
byte; does it precede it with a backslash? I'm wondering if this would
be possible for the String conversion as well -- just for sake of
consistency with other DBs (and since some API inevitable expect users
to send binary data through a char-sequence interface)


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Ken Johanson <pg-user(at)kensystem(dot)com>
Cc: James Im <im-james(at)hotmail(dot)com>, pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-22 08:54:01
Message-ID: 45DD5A29.8020400@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Ken Johanson wrote:

> At least on other servers/drivers I believe nulls are supported (and
> should be according to some spec) (The only special-meaning char is
> single quote).

The driver can't do anything about it, it's a server issue. I can think
of some ways the server could support it without extensive changes ..
e.g. use a "modified UTF8" representation which stores \u0000 as 0xc0
0x80 internally .. but you'd have to take that up with the backend
developers.

> I'm wondering how the binary protocol works insofar as handling the NULL
> byte; does it precede it with a backslash?

The driver sends string parameters out-of-line without escaping (i.e.
length field, then raw utf-8 data). The error you see is generated when
the server notices that there's a \u0000 there; it rejects the string
entirely rather than silently mangling it.

-O


From: Tore Halset <halset(at)pvv(dot)ntnu(dot)no>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2007-02-23 17:31:38
Message-ID: E5CB7F50-AA37-4986-A2F7-E74F255BF4B6@pvv.ntnu.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

On Feb 22, 2007, at 07:37, Ken Johanson wrote:

> At least on other servers/drivers I believe nulls are supported
> (and should be according to some spec) (The only special-meaning
> char is single quote).

Yes. I got this error while copying data from a MS SQL Server to
PostgreSQL.

- Tore.


From: JasmineLiu <liuyuanyuangogo(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-27 07:43:18
Message-ID: 1409125398008-5816498.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

the context is:
http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-quot-UTF8-quot-0x00-td2172080.html

I've also got this problem while copy or insert data from MS SQL Server to
PostgreSQL.
SQLServer 2008 R2, encoding :GBK
PostgreSQL 9.3.4, encoding:UTF8

Rather than modify the column value in sql server,
are there any other ways to solve this problem?
Better to give me an examples.
Thanks!

Yours,
Jasmine

--
View this message in context: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816498.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "JasmineLiu *EXTERN*" <liuyuanyuangogo(at)gmail(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-27 08:11:09
Message-ID: A737B7A37273E048B164557ADEF4A58B17D2EDBA@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are there any other ways to solve this problem?
> Better to give me an examples.

You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.

Yours,
Laurenz Albe


From: JasmineLiu <liuyuanyuangogo(at)gmail(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-27 08:22:45
Message-ID: 1409127765169-5816502.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Thanks Laurenz Albe !
Can you give me an example of changing the data in transit?

Regards!

Yours,
Jasmine Liu

--
View this message in context: http://postgresql.1045698.n5.nabble.com/invalid-byte-sequence-for-encoding-UTF8-0x00-tp2172080p5816502.html
Sent from the PostgreSQL - jdbc mailing list archive at Nabble.com.


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "JasmineLiu *EXTERN*" <liuyuanyuangogo(at)gmail(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-27 08:44:34
Message-ID: A737B7A37273E048B164557ADEF4A58B17D2EE01@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

JasmineLiu wrote:
> Can you give me an example of changing the data in transit?

You export the data to a file, modify the file (with tools like sed
or something more complicated), and load the result.

Yours,
Laurenz Albe


From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-28 12:45:23
Message-ID: VisenaEmail.261.166fd0b593f014fb.1481ca37e82@tc7-on
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

På onsdag 27. august 2014 kl. 10:11:09, skrev Albe Laurenz <
laurenz(dot)albe(at)wien(dot)gv(dot)at <mailto:laurenz(dot)albe(at)wien(dot)gv(dot)at>>: JasmineLiu wrote:
> I've also got this problem while copy or insert data from MS SQL Server to
> PostgreSQL.
> SQLServer 2008 R2, encoding :GBK
> PostgreSQL 9.3.4, encoding:UTF8
>
> Rather than modify the column value in sql server,
> are  there any other ways to solve this problem?
> Better to give me an  examples.

You will never be able to insert a null character into a PostgreSQL database.
You can either modify the source data or change the data in transit.   This
is not 100% true, but is true for text-fields. Youcan insert \0 into BYTEA
columns.     Usually the \0 isn't important so you can do this in JAVA before
inserting into PG:   someString.replace('\0', ' ') or
someString.replaceAll("\0", "")     -- Andreas Joseph Krogh CTO / Partner -
Visena AS Mobile: +47 909 56 963 andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com>
www.visena.com <https://www.visena.com> <https://www.visena.com>  


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Andreas Joseph Krogh *EXTERN*" <andreas(at)visena(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-29 09:54:35
Message-ID: A737B7A37273E048B164557ADEF4A58B17D2FA97@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Joseph Krogh wrote:
>> You will never be able to insert a null character into a PostgreSQL database.
>> You can either modify the source data or change the data in transit.
>
> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.

I was talking about characters, not bytes.

Yours,
Laurenz Albe


From: Andreas Joseph Krogh <andreas(at)visena(dot)com>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-08-30 17:21:41
Message-ID: VisenaEmail.19.2d1a83e32f2eda79.14827ebcb25@tc7-on
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

På fredag 29. august 2014 kl. 11:54:35, skrev Albe Laurenz <
laurenz(dot)albe(at)wien(dot)gv(dot)at <mailto:laurenz(dot)albe(at)wien(dot)gv(dot)at>>: Joseph Krogh wrote:
>> You will never be able to insert a null character into a PostgreSQL
database.
>> You can either modify the source data or change the data in transit.
>
> This is not 100% true, but is true for text-fields. You can insert \0 into
BYTEA columns.

I was talking about characters, not bytes.   '\0' is a character. I see noe
specification of character-fields (like varchar and text) in your answer.   --
Andreas Joseph Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963
andreas(at)visena(dot)com <mailto:andreas(at)visena(dot)com> www.visena.com
<https://www.visena.com> <https://www.visena.com>  


From: "liuyuanyuan" <liuyuanyuangogo(at)gmail(dot)com>
To: "'Albe Laurenz'" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: 答复: [JDBC] invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-09-01 02:04:24
Message-ID: 006001cfc589$1102c7a0$330856e0$@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Hi, Laurenz Albe!
Thanks for all your help! I've understand. Thank you so much!

Best Regards!
Jasmine

-----邮件原件-----
发件人: Albe Laurenz [mailto:laurenz(dot)albe(at)wien(dot)gv(dot)at]
发送时间: 2014年8月29日 17:58
收件人: liuyuanyuan *EXTERN*
主题: RE: [JDBC] invalid byte sequence for encoding "UTF8": 0x00

> Are there other error encoded Strings that I haven't catch?
> If there other error encoded String occurs, maybe my program will catch exception again.
> Is there a foolproof method that could resolve all problem caused by encoding ?

To the best of my knowledge, all you need to check is if the data is valid
UTF-8 and does not contain a \u0000 character.
Then PostgreSQL should not have a problem with it (the server encoding is UTF8, otherwise it may be that the server cannot store a valid character).

Yours,
Laurenz Albe


From: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Andreas Joseph Krogh *EXTERN*" <andreas(at)visena(dot)com>, "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0x00
Date: 2014-09-01 08:41:55
Message-ID: A737B7A37273E048B164557ADEF4A58B17D305A8@ntex2010i.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-jdbc

Andreas Joseph Krogh wrote:
>>>> You will never be able to insert a null character into a PostgreSQL database.
>>>> You can either modify the source data or change the data in transit.
>>>
>>> This is not 100% true, but is true for text-fields. You can insert \0 into BYTEA columns.
>>
>> I was talking about characters, not bytes.
>
> '\0' is a character. I see noe specification of character-fields (like varchar and text) in your
> answer.

My definition would be:
A character is something that is normally written on paper and has
to be encoded to be stored in a computer system.
(seems not to stray to far from Wikipedia's definition.)

Characters can only occur in text fields.

An element of a bytea is not a character along these lines; hence the
type is called "BYTE Array".

But let's not split hairs, this is getting away from the problem at hand,
and I think you know what I mean and vice versa.

Yours,
Laurenz Albe