Better error reporting for tsearch config file problems

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Better error reporting for tsearch config file problems
Date: 2008-06-18 15:39:38
Message-ID: 27688.1213803578@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm not totally sure that bug #4253 reflects a problem in the user's
text search stopword files, but I'm suspicious. If I deliberately
introduce an encoding error into italian.stop, I get

regression=# select to_tsvector('italian','prova');
ERROR: invalid byte sequence for encoding "UTF8": 0xc30a
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

This is fairly bad, because (a) it doesn't tell me where the problem is,
and (b) the HINT is outright misleading --- it will tend to make users
look to problems on the client side.

I am thinking that we could improve this dramatically by using the
"error context" feature. I want to add a line like this:
CONTEXT: line 42 of configuration file "/path/to/italian.stop"
and maybe even include the text of the line, when available (it'd be
unsafe to do that for encoding errors, for obvious reasons). If the
context callback were active throughout the reading of each config
file, we could also get rid of the places where that info is supplied
in an ad-hoc manner, eg in dict_thesaurus.c

ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("unexpected delimiter at line %d of thesaurus file \"%s\"",
lineno, filename)));

becomes just

ereport(ERROR,
(errcode(ERRCODE_CONFIG_FILE_ERROR),
errmsg("unexpected delimiter")));

since the context callback will supply the rest.

This should be a simple, easily tested patch, and I am tempted to
propose back-patching it into 8.3, on the grounds that people need
help now --- by the time 8.4 is out they'll already have debugged
their config files. However, altering these existing error messages
would create extra work for translators. Any votes pro or con on that?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, pgtranslation-translators(at)pgfoundry(dot)org
Subject: Re: Better error reporting for tsearch config file problems
Date: 2008-06-18 17:55:16
Message-ID: 20080618175516.GH5077@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Translators: refer to
http://archives.postgresql.org/message-id/27688.1213803578%40sss.pgh.pa.us

Tom Lane wrote:

> This should be a simple, easily tested patch, and I am tempted to
> propose back-patching it into 8.3, on the grounds that people need
> help now --- by the time 8.4 is out they'll already have debugged
> their config files. However, altering these existing error messages
> would create extra work for translators. Any votes pro or con on that?

Hmm, I have on my TODO doing this sort of stuff also for pg_hba.conf
parsing (and I think other places I don't recall right now). So +1 on
the proposal itself.

On backpatching the message changes, you are right that it creates extra
work for translators. In most cases it's just removing a part of the
string, so no sweat. I think the rest of the changes is creating the
new context string, which is going to be a substring of something that's
already translated, so msgmerge is probably going to pick it up as a
fuzzy. On the whole, not a lot of work.

So my vote goes for backpatching this change to 8.3.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.