Proposed patch for error locations

Lists: pgsql-interfacespgsql-patches
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-patches(at)postgreSQL(dot)org
Subject: Proposed patch for error locations
Date: 2006-03-12 17:37:35
Message-ID: 5114.1142185055@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

The attached WIP patch improves the parser so that cursor positions can be
attached to many errors reported during parse analysis, such as complaints
about nonexistent columns or functions. This is along the lines proposed
in my rant here:
http://archives.postgresql.org/pgsql-patches/2006-02/msg00234.php
to wit, that we make use of bison's "locations" feature to track token
positions in the grammar, and add locations to raw parse tree nodes as
needed.

Some examples of what it can do:

regression=# select unique1,unique2,tem,hundred from tenk1 t;
ERROR: column "tem" does not exist at character 24
LINE 1: select unique1,unique2,tem,hundred from tenk1 t;
^
regression=# select unique1+unique2,ten+hundred,thousand+stringu1 from tenk1;
ERROR: operator does not exist: integer + name at character 44
HINT: No operator matches the given name and argument type(s). You may need to add explicit type casts.
LINE 1: select unique1+unique2,ten+hundred,thousand+stringu1 from te...
^

These examples are perhaps not all that exciting, but in a complex SQL
statement extending over dozens of lines I think it's really important
to be able to do this.

I haven't gone further than making it finger the locations of bogus column
references, operators, and functions --- it's probably worth improving
TypeCast and maybe a few other raw-parse-tree constructs too. Another thing
that needs to be looked at is using the mechanism to locate statement
boundaries, which was the starting-point of the thread mentioned above.

One thing I'm noticing already is that the addition of "at character N"
to a lot of these messages isn't an improvement. In psql it's certainly
redundant with the error-cursor display. As I recall, we include that string
in the primary message text to improve usability with poorer application APIs
that might only show the primary error message to the user. I'm thinking
maybe it's time to expect those client libraries to do better. Any thoughts
about that?

Any comments in general?

regards, tom lane

Attachment Content-Type Size
error-position-1.patch.gz application/octet-stream 18.0 KB

From: Michael Glaesemann <grzm(at)myrealbox(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches(at)postgreSQL(dot)org
Subject: Re: Proposed patch for error locations
Date: 2006-03-13 07:42:59
Message-ID: C3817205-17E1-4D6C-9D92-ED6045123F92@myrealbox.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches


On Mar 13, 2006, at 2:37 , Tom Lane wrote:

> to wit, that we make use of bison's "locations" feature to track token
> positions in the grammar, and add locations to raw parse tree nodes as
> needed.
>
> Some examples of what it can do:

This looks really nice.

> One thing I'm noticing already is that the addition of "at
> character N"
> to a lot of these messages isn't an improvement. In psql it's
> certainly
> redundant with the error-cursor display.

The pure character count is definitely difficult to use with larger
queries. I think it could be more useful if it were
line:char_of_line. Would others find this useful?

Michael Glaesemann
grzm myrealbox com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch for error locations
Date: 2006-03-13 11:43:31
Message-ID: 20060313114331.GC6714@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

> The attached WIP patch improves the parser so that cursor positions
> can be attached to many errors reported during parse analysis, such
> as complaints about nonexistent columns or functions. This is along
> the lines proposed in my rant here:

> http://archives.postgresql.org/pgsql-patches/2006-02/msg00234.php

> to wit, that we make use of bison's "locations" feature to track
> token positions in the grammar, and add locations to raw parse tree
> nodes as needed.

My only comment is: Wouldn't it be clearer in the long run to create a
ParseNode base structure which is used in place of NodeTag for
parsenodes, and put the "int location" in there.

Then you could create a new function called makeParseNode which works
like makeNode but takes an extra argument giving the location. Or some
variation of the two.

The thing I'm wondering is whether there will ever be any other
information we want to store, like token length so you can bracket the
expression and/or extract the exact expression with the issue. For
example, the (start,end) of a function call includes everything within
the brackets. By abstrating into a makeParseNode function, you may be
able to avoid changing a lot of code down the line.

Finally, is the count in characters or bytes (w.r.t. multibyte
encodings)?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Proposed patch for error locations
Date: 2006-03-13 14:22:35
Message-ID: Pine.LNX.4.64.0603131505520.21383@briare.cri.ensmp.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches


Dear Tom,

> I haven't gone further than making it finger the locations of bogus column
> references, operators, and functions --- it's probably worth improving
> TypeCast and maybe a few other raw-parse-tree constructs too.

Yes. Other area would also benefit, such as type names:

psql: CREATE FUNCTION foo() RETURNS INTEGET LANGUAGE plpgsql AS ...
ERROR: type "integet" does not exist

That is a kind of "entry level" user features which are good for my
students ... as well as myself. Thus I'm looking forward to have more help
along those lines, although I cannot give a hand at the time.

--
Fabien.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Glaesemann <grzm(at)myrealbox(dot)com>
Cc: pgsql-patches(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: [PATCHES] Proposed patch for error locations
Date: 2006-03-13 15:11:31
Message-ID: 7274.1142262691@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

[ cross-posting to pgsql-interfaces in hopes of drawing more comments ]

Michael Glaesemann <grzm(at)myrealbox(dot)com> writes:
> On Mar 13, 2006, at 2:37 , Tom Lane wrote:
>> http://archives.postgresql.org/pgsql-patches/2006-03/msg00153.php

> This looks really nice.

>> One thing I'm noticing already is that the addition of "at character N"
>> to a lot of these messages isn't an improvement. In psql it's
>> certainly redundant with the error-cursor display.

> The pure character count is definitely difficult to use with larger
> queries. I think it could be more useful if it were
> line:char_of_line. Would others find this useful?

The change in behavior would actually be in libpq, because it's
PQerrorMessage that is doing the deed (assuming a reasonably modern
server and libpq). What I was considering proposing was that we migrate
the error-cursor feature out of psql and into PQerrorMessage. This
would mean that you'd get responses like

ERROR: column "foo" does not exist
LINE 1: select foo from a;
^

from all libpq-using applications not just psql. We could make this
conditional on the error verbosity --- in "terse" mode the "LINE N"
output wouldn't appear, and "at character N" still would. Applications
should already be expecting multiline outputs from PQerrorMessage if
they're in non-terse mode, so this ought to be OK. Comments?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: Proposed p.tch for error locations
Date: 2006-03-13 15:32:09
Message-ID: 20060313153209.GA15002@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

Tom Lane wrote:

> The change in behavior would actually be in libpq, because it's
> PQerrorMessage that is doing the deed (assuming a reasonably modern
> server and libpq). What I was considering proposing was that we migrate
> the error-cursor feature out of psql and into PQerrorMessage. This
> would mean that you'd get responses like
>
> ERROR: column "foo" does not exist
> LINE 1: select foo from a;
> ^

This doesn't work on terminal using a variable-width font, does it?
Sure, you can have all the interfaces use a monospace font, but it seems
a weird thing to do. I think the line/character position should be
returned in a separate error attribute in ereport. So for example
pgAdmin could count characters and mark it in bold or use a different
color.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Proposed patch for error locations
Date: 2006-03-13 15:44:24
Message-ID: 7550.1142264664@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> My only comment is: Wouldn't it be clearer in the long run to create a
> ParseNode base structure which is used in place of NodeTag for
> parsenodes, and put the "int location" in there.

Only if we really wanted *every* parse node to carry a location, which
I think is overkill. If we took it far enough we'd be needing location
decoration on List nodes too, and that is definitely not happening.

There's a tradeoff here between completeness of information and
performance. In some quick tests with pgbench I don't detect any
noticeable performance loss with the patch as proposed, but it'd be
very easy to cross the threshold to where it does cost performance,
if we get too excited about decorating parse nodes with more than a
minimal amount of overhead.

> Finally, is the count in characters or bytes (w.r.t. multibyte
> encodings)?

Bytes; see the comments. Converting to a character count only happens
if an actual error message has to be emitted. (This is uglier than I
would like, but I don't see any sufficiently-cheap way to get the lexer
to count multibyte characters correctly.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: Proposed p.tch for error locations
Date: 2006-03-13 15:57:07
Message-ID: 7693.1142265427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> This doesn't work on terminal using a variable-width font, does it?
> Sure, you can have all the interfaces use a monospace font, but it seems
> a weird thing to do. I think the line/character position should be
> returned in a separate error attribute in ereport. So for example
> pgAdmin could count characters and mark it in bold or use a different
> color.

That information is already available to pgAdmin, and has been since
7.4; if they are failing to exploit it that's their problem not libpq's.

Basically what's at stake here is the behavior of "dumb" applications
that are just using PQerrorMessage and not doing anything smart with
error message fields. It does not seem unreasonable to me to assume
fixed-width font in that context. We haven't seen any complaints about
psql doing it have we?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: Proposed p.tch for error locations
Date: 2006-03-13 16:03:35
Message-ID: 20060313160335.GN8274@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > This doesn't work on terminal using a variable-width font, does it?
> > Sure, you can have all the interfaces use a monospace font, but it seems
> > a weird thing to do. I think the line/character position should be
> > returned in a separate error attribute in ereport. So for example
> > pgAdmin could count characters and mark it in bold or use a different
> > color.
>
> That information is already available to pgAdmin, and has been since
> 7.4; if they are failing to exploit it that's their problem not libpq's.

Aye, that's fine. I don't really know if pgAdmin uses the info or not.
I thought the proposal to remove the "at character N" in the error
message meant that the only way to get that information would be from
the "LINE Y: ...\n[some whitespace]^" message, which would be pretty
cumbersome. But if there already is a field in the error message for
the position, then nothing is lost.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgreSQL(dot)org, pgsql-interfaces(at)postgreSQL(dot)org
Subject: Re: Proposed p.tch for error locations
Date: 2006-03-13 16:08:30
Message-ID: 7837.1142266110@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Tom Lane wrote:
>> That information is already available to pgAdmin, and has been since
>> 7.4; if they are failing to exploit it that's their problem not libpq's.

> Aye, that's fine. I don't really know if pgAdmin uses the info or not.
> I thought the proposal to remove the "at character N" in the error
> message meant that the only way to get that information would be from
> the "LINE Y: ...\n[some whitespace]^" message, which would be pretty
> cumbersome.

You should reflect on the fact that psql is currently generating that
line from outside libpq ...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed p.tch for error locations
Date: 2006-03-13 21:28:38
Message-ID: 10458.1142285318@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

I took another look at this and realized that for PQerrorMessage to emit
a cursor-pointer line, it'd be necessary for libpq to hold onto a copy
of the query last sent, which it doesn't do presently. This is annoying
for long queries --- in the worst case it could provoke out-of-memory
failures that don't occur now.

Plan B would be for psql to stop using PQerrorMessage and generate the
message report text from the message fields, omitting "at character N".
This would require duplicating a couple dozen lines from inside libpq.
It'd also mean that the report text gets built twice, once inside libpq
and once by psql, which is probably not such a big deal since error
messages ought not be a performance-critical path ... except there's
still that little question of overrunning memory.

Plan C is just to drop the "at character N" string from what
PQerrorMessage generates. We could make this configurable (maybe via
additional PGVerbosity values), or just not worry about whether it's
important.

Thoughts?

regards, tom lane


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Glaesemann <grzm(at)myrealbox(dot)com>, pgsql-patches(at)postgresql(dot)org, pgsql-interfaces(at)postgresql(dot)org
Subject: Re: [PATCHES] Proposed patch for error locations
Date: 2006-03-14 02:53:19
Message-ID: 4416301F.9070800@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-interfaces pgsql-patches

> from all libpq-using applications not just psql. We could make this
> conditional on the error verbosity --- in "terse" mode the "LINE N"
> output wouldn't appear, and "at character N" still would. Applications
> should already be expecting multiline outputs from PQerrorMessage if
> they're in non-terse mode, so this ought to be OK. Comments?

Sounds like it'd be handy in phpPgAdmin...