Re: BUG #4253: to_tsvector: error with some configurations

Lists: pgsql-bugs
From: "Giorgio Valoti" <giorgio_v(at)mac(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4253: to_tsvector: error with some configurations
Date: 2008-06-18 12:37:17
Message-ID: 200806181237.m5ICbHH2055080@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4253
Logged by: Giorgio Valoti
Email address: giorgio_v(at)mac(dot)com
PostgreSQL version: 8.3.3
Operating system: Mac OS X 10.5.3
Description: to_tsvector: error with some configurations
Details:

Using every language containing the "a grave" letter (c3 a0) causes an error
when the function "ts_vector" is invoked.

test=> select to_tsvector('italian','prova');
ERROR: invalid byte sequence for encoding "UTF8": 0xc3
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".

test=> select to_tsvector('french','prova');
ERROR: invalid byte sequence for encoding "UTF8": 0xc3
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".

test=> select to_tsvector('portuguese','prova');
ERROR: invalid byte sequence for encoding "UTF8": 0xc3
HINT: This error can also happen if the byte sequence does not match the
encoding expected by the server, which is controlled by "client_encoding".


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Giorgio Valoti" <giorgio_v(at)mac(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4253: to_tsvector: error with some configurations
Date: 2008-06-18 15:16:48
Message-ID: 27306.1213802208@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Giorgio Valoti" <giorgio_v(at)mac(dot)com> writes:
> Using every language containing the "a grave" letter (c3 a0) causes an error
> when the function "ts_vector" is invoked.

> test=> select to_tsvector('italian','prova');
> ERROR: invalid byte sequence for encoding "UTF8": 0xc3

Hmm, works for me:

z=# select to_tsvector('italian','prova');
to_tsvector
-------------
'prov':1
(1 row)

What database encoding (server_encoding) are you using? Is it possible
that the text search configuration files have been rewritten into a
non-UTF8 encoding?

regards, tom lane