Re: invalid byte sequence for encoding "UNICODE"

Lists: pgsql-general
From: AlannY <m(at)alanny(dot)ru>
To: pgsql-general(at)postgresql(dot)org
Subject: invalid byte sequence for encoding "UNICODE"
Date: 2008-07-24 18:06:17
Message-ID: 4888C499.6050109@alanny.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi there.

Many times, I'm confronting with that strange problem: invalid byte
sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
to use some symbols which is not a part of UNICODE. But what is that
symbals?

I'm attaching a screenshot with THAT dead-symbol. As you can see - it's
an unknown symbol in the end of Cyrillic. First of all, I have checked
my data with iconv (iconv -f UTF-8 -t UTF-8 data.txt) and there are no
errors, so, I guess, there are no dead-symbols.

So the question is: is it possible to find a *table* with forbitten
characters for encoding "UNICODE"? If I can get it -> I can kill that
dead-characters in my program ;-)

Thank you.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: AlannY <m(at)alanny(dot)ru>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UNICODE"
Date: 2008-07-24 19:29:46
Message-ID: 10254.1216927786@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

AlannY <m(at)alanny(dot)ru> writes:
> Many times, I'm confronting with that strange problem: invalid byte
> sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
> to use some symbols which is not a part of UNICODE. But what is that
> symbals?

Doesn't it tell you? AFAICS every PG version that uses that error
message phrasing gives you the exact byte sequence it's complaining
about.

It would also be worth asking what PG version you are using anyway.
If it's not a pretty recent update then updating might help --- I
think there were some bugs in the encoding verification stuff awhile
back.

regards, tom lane


From: valgog <valgog(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: invalid byte sequence for encoding "UNICODE"
Date: 2008-07-25 07:31:06
Message-ID: a3a4f151-45b2-4dab-b0ed-7e8991d42b8f@k36g2000pri.googlegroups.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Jul 24, 8:06 pm, m(dot)(dot)(dot)(at)alanny(dot)ru (AlannY) wrote:
> Hi there.
>
> Many times, I'm confronting with that strange problem: invalid byte
> sequence for encoding "UNICODE". So, I guess, Postgresql can't allow me
> to use some symbols which is not a part of UNICODE. But what is that
> symbals?
>
> I'm attaching a screenshot with THAT dead-symbol. As you can see - it's
> an unknown symbol in the end of Cyrillic. First of all, I have checked
> my data with iconv (iconv -f UTF-8 -t UTF-8 data.txt) and there are no
> errors, so, I guess, there are no dead-symbols.
>
> So the question is: is it possible to find a *table* with forbitten
> characters for encoding "UNICODE"? If I can get it -> I can kill that
> dead-characters in my program ;-)
>
> Thank you.
>
> --
> Sent via pgsql-general mailing list (pgsql-gene(dot)(dot)(dot)(at)postgresql(dot)org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general

To say the truth, there are no characters, forbidden in UNICODE as
there are no characters, that you can have, that are not in UNICODE.
The other thing is UTF8, that encodes real UNICODE into 8bit byte
sequence. There errors occur.

What does the command:

show lc_ctype;

show?

As Tom has said, more information about your system would be really
handy...

With best regards,

-- Valentine Gogichashvili