invalid byte sequence for encoding "UTF8": 0xab

Lists: pgsql-general
From: "Grand, Mark D(dot)" <mgrand(at)emory(dot)edu>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-05 11:49:09
Message-ID: EE87606F3DC6EE40BC1551A9EEA4713901432AC12032@EXCHANGE10.Enterprise.emory.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I am having a vexing problem with a script I am writing to populate reference tables in a new database.

I am running postgreSQL 8.3 with psql 8.3.7.
Psql reads this SQL statement:
INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
VALUES ('Super-User Authorization',
'This allows a super-user to administer all meta-data.',
'UserID <Administer> ()');

and I get this message:
ERROR: invalid byte sequence for encoding "UTF8": 0xab
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

It is complaining about the '<' character. I do not understand why. The database is created the commands
CREATE DATABASE mayyou
WITH OWNER=meta_auth ENCODING='UTF8';
ALTER DATABASE mayyou SET client_encoding = 'UTF8';

When I give psql the \encoding command, it replies
UTF8

Why is it complaining about this valid character code?

________________________________
This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Grand, Mark D(dot)" <mgrand(at)emory(dot)edu>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-05 13:57:51
Message-ID: 3213.1244210271@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Grand, Mark D." <mgrand(at)emory(dot)edu> writes:
> ... I get this message:
> ERROR: invalid byte sequence for encoding "UTF8": 0xab
> HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

> It is complaining about the '<' character. I do not understand why.

The ASCII code for '<' is 0x3c, not 0xab. I am not sure what you are
actually typing; although it's suggestive that the LATIN1 code 0xab
corresponds to a symbol that looks approximately like '<<'. The most
likely bet is that you are typing the wrong thing and using a terminal
emulator that is not set to generate UTF8-encoded characters. You
should try to make sure that client_encoding is set to match what your
keyboard actually generates.

regards, tom lane


From: Vick Khera <vivek(at)khera(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-05 15:10:26
Message-ID: 2968dfd60906050810j59a3bfcfn4595f28346b23de9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Jun 5, 2009 at 9:57 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The ASCII code for '<' is 0x3c, not 0xab.  I am not sure what you are
> actually typing; although it's suggestive that the LATIN1 code 0xab
> corresponds to a symbol that looks approximately like '<<'.  The most
> likely bet is that you are typing the wrong thing and using a terminal

Must be something with your mail program, because in the version I am
reading postgres is complaining about the "approximately like '<<'"
symbol.


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Grand, Mark D(dot) *EXTERN*" <mgrand(at)emory(dot)edu>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-08 09:59:06
Message-ID: D960CB61B694CF459DCFB4B0128514C202FF6634@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Mark D. Grand wrote:
> I am having a vexing problem with a script I am writing to
> populate reference tables in a new database.
>
> I am running postgreSQL 8.3 with psql 8.3.7.
>
> Psql reads this SQL statement:
>
> INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
> VALUES ('Super-User Authorization',
> 'This allows a super-user to administer all meta-data.',
> 'UserID «Administer» ()');
>
> and I get this message:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xab
>
> HINT: This error can also happen if the byte sequence does
> not match the encoding expected by the server, which is
> controlled by "client_encoding".
>
> It is complaining about the '«' character. I do not
> understand why. The database is created the commands
>
> CREATE DATABASE mayyou
> WITH OWNER=meta_auth ENCODING='UTF8';
>
> ALTER DATABASE mayyou SET client_encoding = 'UTF8';
>
> When I give psql the \encoding command, it replies
> UTF8
>
> Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "«" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe


From: "Grand, Mark D(dot)" <mgrand(at)emory(dot)edu>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-08 11:28:20
Message-ID: EE87606F3DC6EE40BC1551A9EEA4713901432D471218@EXCHANGE10.Enterprise.emory.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

It turns out that my problem was that the editor I was using (emacs) does not properly support utf8 encoding.

-----Original Message-----
From: Albe Laurenz [mailto:laurenz(dot)albe(at)wien(dot)gv(dot)at]
Sent: Monday, June 08, 2009 5:59 AM
To: Grand, Mark D.; pgsql-general(at)postgresql(dot)org
Subject: RE: [GENERAL] invalid byte sequence for encoding "UTF8": 0xab

Mark D. Grand wrote:
> I am having a vexing problem with a script I am writing to
> populate reference tables in a new database.
>
> I am running postgreSQL 8.3 with psql 8.3.7.
>
> Psql reads this SQL statement:
>
> INSERT INTO META_AUTH.DOMAIN_META_ASSERTION (TITLE, DESCRIPTION, META_ASSERTION)
> VALUES ('Super-User Authorization',
> 'This allows a super-user to administer all meta-data.',
> 'UserID <Administer> ()');
>
> and I get this message:
>
> ERROR: invalid byte sequence for encoding "UTF8": 0xab
>
> HINT: This error can also happen if the byte sequence does
> not match the encoding expected by the server, which is
> controlled by "client_encoding".
>
> It is complaining about the '<' character. I do not
> understand why. The database is created the commands
>
> CREATE DATABASE mayyou
> WITH OWNER=meta_auth ENCODING='UTF8';
>
> ALTER DATABASE mayyou SET client_encoding = 'UTF8';
>
> When I give psql the \encoding command, it replies
> UTF8
>
> Why is it complaining about this valid character code?

The database stores characters in UTF-8, and the client
expects UTF-8 characters, but presumably the characters you
feed into psql are not UTF-8.

If this is some kind of UNIX, it might be instructive to
type 'echo "<" | od -t x1' on the command line.

Also knowing the current locale might help to determine the problem.

Yours,
Laurenz Albe

This e-mail message (including any attachments) is for the sole use of
the intended recipient(s) and may contain confidential and privileged
information. If the reader of this message is not the intended
recipient, you are hereby notified that any dissemination, distribution
or copying of this message (including any attachments) is strictly
prohibited.

If you have received this message in error, please contact
the sender by reply e-mail message and destroy all copies of the
original message (including attachments).


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: "Grand\, Mark D(dot)" <mgrand(at)emory(dot)edu>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, "pgsql-general\(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: invalid byte sequence for encoding "UTF8": 0xab
Date: 2009-06-08 11:54:12
Message-ID: 871vpvkkkb.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Grand, Mark D." <mgrand(at)emory(dot)edu> writes:

> It turns out that my problem was that the editor I was using (emacs)
> does not properly support utf8 encoding.

Emacs does support utf8 properly.
http://www.emacswiki.org/emacs/ChangingEncodings

It could be I'm biased because I use emacs from CVS, which is going to
be emacs23, and is as stable as emacs has always been for me.
http://emacs.orebokech.com/
http://atomized.org/wp-content/cocoa-emacs-nightly/

From within emacs, to get a ton of information about char under point,
try C-x = (one line version) or M-x describe-char (full version): <
Char: < (60, #o74, #x3c) point=1312 of 4162 (31%) <301-4163> column=66

character: < (60, #o74, #x3c)
preferred charset: ascii (ASCII (ISO646 IRV))
code point: 0x3C
syntax: . which means: punctuation
category: .:Base, a:ASCII, l:Latin, r:Roman
buffer code: #x3C
file code: #x3C (encoded by coding system utf-8-emacs)
display: by this font (glyph code)
xft:-bitstream-Bitstream Vera Sans Mono-normal-normal-normal-*-16-*-*-*-m-0-iso10646-1 (#x1F)

But I guess we're off topic now.

HTH, regards,
--
dim