Re: A plan to improve error messages with context, hint and details.

Lists: pgsql-hackers
From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: A plan to improve error messages with context, hint and details.
Date: 2004-03-04 17:06:21
Message-ID: Pine.LNX.4.58.0403041742590.28778@sablons.cri.ensmp.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dear Hackers,

Motivation
----------

As a basic user of postgresql, I've been quite disappointed by the lack of
help provided by postgresql error messages dealing with syntax and
semantical errors, especially in long sql statements:

ERROR: syntax error at or near "(" at character 326

This makes students feel angry against the software, the computer or even
the teacher (say, me;-), for the bad time they have dealing with syntactic
issues. It makes them turn to mysql;-)

I think it is important an issue, as the interface is the first contact
between the user and the software. The internals and features may be
great, but it is not bad either to help users to deal more easily with the
beast.

There are several points on which great improvements can be obtained
without much disruption in the current code structure.

Note that this plan is for "ERROR", where the processing is somehow
stopped, and the user must chose a course of action in order to solve the
problem. However, other level of reports (such as NOTICE or WARNING) may
also benefit from it.

Also, things may not be as simple as described, but the purpose of the
mail is to describe the ultimate goal, and to outline the path that I
think should lead to it.

So here is my suggested plan:

(1) Lexical/syntax error source localisation
--------------------------------------------

An extract of the offending source must be shown if possible along syntax
error messages.

This can be achieved very simply and at low cost, since all the
information is already there, as well as most needed fields in ErrorData.

However it may be required to handle multi-line details or an additionnal
sub-detail (I would chose that) in ErrorData in order to show a cursor:

ERROR: Syntax error at or near "(" at character 14
DETAIL: CREATE TABLE (id SERIAL ...
DETAIL: ^

The only actual issue seems to be multi-byte encodings in the buffer, but
I noticed that some support functions are already available.

(2) Hints about syntax errors
-----------------------------

All generated error messages, especially from the parser, should be
assorted with a HINT to help the user, if possible. Something like:

HINT: table name expected

This requires more work as all syntax error sources need to be catched and
a relevant HINT must be provided. There is a little bit of an issue here
as yyerror call to ereport is rather simplistic.

I would suggest to have a "current_hint" (scalar or maybe stack)
maintained by the parsor, that would be used by yyerror to fill the hint
field. The yacc code may look something like:

<code>
CreateUserStmt:
Create USER { hint("user id"); }
Userid { hint("user options or WITH"); }
opt_with { hint("user options"); }
OptUserList { ... };

Create: CREATE { hint("USER|DATABASE|SCHEMA|..."); };
</code>

The changes are pretty systematic and simple, and they do not modidy the
actual grammar of the parsor. However they should affect a lot of lines in
"gram.y", if not all.

(3) About semantical errors, which may be detected later on ...
---------------------------------------------------------------

... in the processing of the command. The problem is different because it
occurs in functions that can be called from quite different contexts, and
the context is not really known to the function. Thus when the error
occurs in the function, it cannot provide a useful context. As none is
provided, the user must guess...

For this I have used in the past the following trick: a stack describes
the processing context and is updated by functions with pushes and pops.
If an error occurs, the stack provides the context information needed,
something like:

CONTEXT: parsing user query

Or

CONTEXT: in create table "foo", in constraint "bla", checking reference types ...

This is basically a user-oriented view of the call stack to help with
error messages. It's really incremental, as if nothing is done the context
will be vague, but if some key functions care to update it precisely, the
context reported will be much more helpful.

It should typically be maintained in the error logging part, and used on
errors to build a context if none is provided, or to be provided as a
separate content next to "message, detail, hint, context". Also care must
be taken to reset the stack on errors. Note that the current "context"
management in elog allows multiple context information to be provided, but
I haven't seen any "pop" facility which would be needed by functions so as
to change the current context simply, the strings are just appended one
to the other.

Two questions
-------------

My research background is in code optimisation within compilers, but now I
mostly teach computer science stuff in engineering schools. I would be
interested in giving some of my time on these issues, but:

(1) Do postgresql "Masters" think this issue is worth being pursued, or
any patch will be rejected as it is considered intrinsicly useless?

"Our users do not need hint or context information, only hard-core
engineers use postgresql, sissy guys will rather use mysql" ;-)

Indeed, I don't mind having a patch being rejected because of my
poor programming, or because the result is not fine enough, as
I can improve it and re-submit later.

However, if the issue is considered useless, it means that
I will lose my time anyway, so I would prefer not to give it try;-)

(2) Does someone has any comment about these problems or
the way I intend to try to address them?

Are they currently being addressed by someone else?
It doesn't look so from the TODO list.

If the plan make sense, it may be added to the TODO list,
and I wish to claim it or part of it.

Have a nice day,

--
Fabien Coelho - coelho(at)cri(dot)ensmp(dot)fr


From: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint
Date: 2004-03-04 20:01:18
Message-ID: Pine.LNX.4.44.0403042046230.13979-100000@zigo.dhs.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 4 Mar 2004, Fabien COELHO wrote:

> (1) Do postgresql "Masters" think this issue is worth being pursued, or
> any patch will be rejected as it is considered intrinsicly useless?

While I can't speak for others I would anyhow guess that everyone wants
better error messages. I know I want it.

I spend a fair amount of time on irc helping people with all kinds of
postgresql problems, and it's not uncommon that people need help to
understand the error messages. It's not hard to understand them for people
who are "in the game", but for others it can be difficult.

> (2) Does someone has any comment about these problems or
> the way I intend to try to address them?

About the implementation idea with hints. I'm not sure will be so easy to
implement as you suggested. Maybe if one add hints to every construct,
and set to empty hint where it does not make sense.

What I'm afraid of is to get hints or advices that are plain wrong. If
that happens a lot it's better to not have hints at all. Like for this:

CREATE TABLE foo (bar int key);

and gettting an error

CREATE TABLE foo (bar int key);
^
HINT: table name expected

or something.

But if it works good in practice, why not having hints.

Another thing one can add to make better errors are rules that handle
typical errors, like:

stmt:
CREATE TABLE ( anything ); { generate error "missing table name" }

--
/Dennis Björklund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint and details.
Date: 2004-03-04 21:04:05
Message-ID: 28690.1078434245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
> (1) Lexical/syntax error source localisation

> An extract of the offending source must be shown if possible along syntax
> error messages.

You would do well to go to the archives and read some of the previous
discussion of these issues. In particular, we determined that the
appropriate place for this sort of thing is on the client side, not in
the backend. Only the client knows what is a reasonable way to mark the
error position. (Think about GUI clients vs command line clients,
different size windows, etc.) As of 7.4 we do send back a cursor offset
as one of the fields of an error message, and I think the next step is
to make the client-side code do something with that.

> All generated error messages, especially from the parser, should be
> assorted with a HINT to help the user, if possible. Something like:
> HINT: table name expected

I think the odds of doing anything very successful in the way of syntax
hints are small. The cases where you can produce a reliable hint (ie,
one that's not likely to be misleading) are not very interesting.
I don't think your idea of relying on the last parser state will work
well, because the parser frequently is off a word or two as to the
actual location of the error (since it will do its best to make sense of
what it sees, until there is no remaining possibility of a successful
parse). That will be enough to cause you to deliver the wrong hint,
and a wrong hint is worse than none.

ISTM that client-side code that pops up the syntax summary for the
command would probably accomplish far more, for far less work.

We do have some hints for semantic errors, and adding more isn't a bad
project, but you still have the same hazard of whether you can deliver
useful hints.

> ... in the processing of the command. The problem is different because it
> occurs in functions that can be called from quite different contexts, and
> the context is not really known to the function. Thus when the error
> occurs in the function, it cannot provide a useful context. As none is
> provided, the user must guess...

I am not understanding what you have against the existing error context
stack mechanism. Certainly it could be used in more places than it is,
but why do you feel a need to build a separate one?

regards, tom lane


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Dennis Bjorklund <db(at)zigo(dot)dhs(dot)org>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint
Date: 2004-03-05 08:21:27
Message-ID: Pine.LNX.4.58.0403050844190.821@mordor.coelho.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> > (2) Does someone has any comment about these problems or
> > the way I intend to try to address them?
>
> About the implementation idea with hints. I'm not sure will be so easy to
> implement as you suggested. Maybe if one add hints to every construct,
> and set to empty hint where it does not make sense.

Sure.

> What I'm afraid of is to get hints or advices that are plain wrong.
> CREATE TABLE foo (bar int key);
> HINT: table name expected
> or something.

Sure, I agree with you.

> But if it works good in practice, why not having hints.

I agree that the actual validation is whether the result looks good,
and are most of the time useful.

> stmt: CREATE TABLE ( anything ); { generate error "missing table name" }

Yes, that's an idea, but it would change the syntax definition, and
may create conflicts, so it's an harder way to do it.

--
Fabien Coelho - coelho(at)cri(dot)ensmp(dot)fr


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint
Date: 2004-03-05 08:22:04
Message-ID: Pine.LNX.4.58.0403050850430.821@mordor.coelho.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> > (1) Lexical/syntax error source localisation
>
> > An extract of the offending source must be shown if possible along syntax
> > error messages.
>
> You would do well to go to the archives and read some of the previous
> discussion of these issues.

I'll do that.

> In particular, we determined that the appropriate place for this sort of
> thing is on the client side, not in the backend.

The current status of clients is that none of those I use will report
anything useful.

The server can use the DETAIL field (which is not used anyway) so as to
give the details, and the client can chose to ignore it if it can do a
better job. That would be providing a basic capability common to all
clients, instead of waiting for a final implementation in all clients.

> Only the client knows what is a reasonable way to mark the
> error position.

Sure.

> > All generated error messages, especially from the parser, should be
> > assorted with a HINT to help the user, if possible. Something like:
>
> I think the odds of doing anything very successful in the way of syntax
> hints are small. The cases where you can produce a reliable hint (ie,
> one that's not likely to be misleading) are not very interesting.

This is an a-priori jugement.
I hope this is not ground to reject any patch without a try.

What I can do is take notice of all syntax errors from my students
and use them as test cases to see what hint would help;-)
I just have to turn on the database logs.

> [...] That will be enough to cause you to deliver the wrong hint,
> and a wrong hint is worse than none.

I agree that if the hint are always wrong, then no hint is better.
So in case we do not know, it is better not to say anything.

However, I think that a lot of cases can be given good hints, and
for those cases it should be done.

Also, it is better to judge on the result.

> ISTM that client-side code that pops up the syntax summary for the
> command would probably accomplish far more, for far less work.
>
> We do have some hints for semantic errors, and adding more isn't a bad
> project, but you still have the same hazard of whether you can deliver
> useful hints.

I agree.

> > ... in the processing of the command. The problem is different because it
> > occurs in functions that can be called from quite different contexts, and
> > the context is not really known to the function. Thus when the error
> > occurs in the function, it cannot provide a useful context. As none is
> > provided, the user must guess...
>
> I am not understanding what you have against the existing error context
> stack mechanism. Certainly it could be used in more places than it is,
> but why do you feel a need to build a separate one?

What I understand of the error stack is that it is for error message
recursion.

About the "context" field, I have seen in my quick browsing is how one can
add information to it, but I haven't seen (yet) how to remove some
information to put some other in place. Tell me if I'm wrong. If I'm
wrong, it means that the infractructure is already in place, and one just
need to put some more context calls.

What is really needed is indeed a context stack in the error stack, and
what I've seen is a simple string in the error stack, where contexts are
appended one to the other at the string level.

-> processing create table "hello"
-> processing constraint $1: CHECK (ind>0)
<-
-> processing constraint foo: xxx REFERENCES "bye"
<-
-> processing constraint bla: data NOT NULL
<-
<-

Have a nice day,

--
Fabien Coelho - coelho(at)cri(dot)ensmp(dot)fr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint and details.
Date: 2004-03-05 14:32:38
Message-ID: 6654.1078497158@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> writes:
>> In particular, we determined that the appropriate place for this sort of
>> thing is on the client side, not in the backend.

> The current status of clients is that none of those I use will report
> anything useful.

So fix the clients. We have been through this discussion before; you
are not the first to claim we should do a quick-and-dirty solution on
the backend side. That wasn't the consensus then, and I don't think
it is now.

>> I think the odds of doing anything very successful in the way of syntax
>> hints are small. The cases where you can produce a reliable hint (ie,
>> one that's not likely to be misleading) are not very interesting.

> This is an a-priori jugement.
> I hope this is not ground to reject any patch without a try.

No, but it will be a likely reason for rejecting the patch *after*
trying it ;-). Consider yourself warned that the first thing I'll
do is try to make the patch produce misleading hints.

>> I am not understanding what you have against the existing error context
>> stack mechanism.

> About the "context" field, I have seen in my quick browsing is how one can
> add information to it, but I haven't seen (yet) how to remove some
> information to put some other in place.

Why would you want to? Removing available information is never correct.

> What is really needed is indeed a context stack in the error stack, and
> what I've seen is a simple string in the error stack, where contexts are
> appended one to the other at the string level.

I think you're misinterpreting what the code does. The context hooks
are installed and removed as if on a stack, but we don't form an error
string until an error actually occurs. At that point, stringing
together the outputs of the active hooks is the correct thing to do.

regards, tom lane


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint
Date: 2004-03-05 15:17:31
Message-ID: Pine.LNX.4.58.0403051542090.12725@sablons.cri.ensmp.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Dear Tom,

> > I hope this is not ground to reject any patch without a try.
>
> No, but it will be a likely reason for rejecting the patch *after*
> trying it ;-). Consider yourself warned that the first thing I'll
> do is try to make the patch produce misleading hints.

With this approach, you can remove ALL hints from postgres, as I'm pretty
sure *you* can "build up" cases where any hint does not give the actual
solution.

The question is whether the hint might be useful in most/lot of cases...
if not, then I agree it should not be there. if yes, then it is better
than the current nothing.

So I'm okay for any "fair" trial;-)

Anyway I'll make a proof of concept implementation on one command and
submit as it a test case before going into an full implementation.

> > About the "context" field, I have seen in my quick browsing is how one can
> > add information to it, but I haven't seen (yet) how to remove some
> > information to put some other in place.
>
> Why would you want to? Removing available information is never correct.

I tried to explain it with little drawing in the mail.

The idea is that as the code is running, going up and down the call
stack, a "context stack" is constantly updated to reflect what is
being done for the user at the moment, when the information is
available to a function...

If no error occurs, then the context information is never used.

If an error occurs somewhere deep in the code, then the stack helps
describe the "user oriented" context of the current processing from which
the exception was raised. The deep function does not need to know
what was actually going on above to do that.

> > What is really needed is indeed a context stack in the error stack, and
> > what I've seen is a simple string in the error stack, where contexts are
> > appended one to the other at the string level.
>
> I think you're misinterpreting what the code does.

That's really possible, I only had a quick look at the source.

--
Fabien Coelho - coelho(at)cri(dot)ensmp(dot)fr


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A plan to improve error messages with context, hint
Date: 2004-03-05 15:33:30
Message-ID: 40489DCA.9020603@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Fabien COELHO wrote:

>The current status of clients is that none of those I use will report
>anything useful.
>
>
pgAdmin3's query tool will set a mark on the offending line.

Regards,
Andreas