Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.

Lists: pgsql-committerspgsql-hackers
From: Robert Haas <rhaas(at)postgresql(dot)org>
To: pgsql-committers(at)postgresql(dot)org
Subject: pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 14:42:05
Message-ID: E1SeSIH-00081X-BB@gemulon.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Mark JSON error detail messages for translation.

Per gripe from Tom Lane.

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/36b7e3da17bcca4efe5584d95c386cec2a221a13

Modified Files
--------------
src/backend/utils/adt/json.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 18:40:52
Message-ID: 20494.1339526452@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <rhaas(at)postgresql(dot)org> writes:
> Mark JSON error detail messages for translation.
> Per gripe from Tom Lane.

OK, that was one generically bad thing about json.c's error messages.
The other thing that was bothering me was the style of the errdetail
strings, eg

ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for type json"),
errdetail("line %d: Invalid escape \"\\%s\".",
lex->line_number, extract_mb_char(s))));

I'm not thrilled with the "line %d:" prefixes. That seems to me to
violate the letter and spirit of the guideline about making errdetail
messages be complete sentences. Furthermore, the line number is not
the most important part of the detail message, if indeed it has any
usefulness at all which is debatable. (It's certainly not going to
be tremendously useful when the error report doesn't show any of the
specific input string being complained of.)

Also, a minority of the errors report the whole input string:

ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for type json: \"%s\"",
lex->input),
errdetail("The input string ended unexpectedly.")));

And then there are some that provide the whole input string AND a line
number, just in case you thought there was any intentional design
decision there.

A minimum change IMO would be to recast the detail messages as complete
sentences, say "The escape sequence \"\\%s\" in line %d is invalid."
But I'd like a better-thought-out solution to identifying the location
of the error.

I notice that there's an unfinished attempt to maintain a line_start
pointer; if that were carried through, we could imagine printing the
current line up to the point of an error, which might provide a
reasonable balance between verbosity and insufficient context.
As an example, instead of

regression=# select '{"x":1,
z "y":"txt1"}'::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"x":1,
^
DETAIL: line 2: Token "z" is invalid.

maybe we could print

regression=# select '{"x":1,
z "y":"txt1"}'::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"x":1,
^
DETAIL: Token "z" is invalid in line 2: "z".

or perhaps better let it run to the end of the line:

regression=# select '{"x":1,
z "y":"txt1"}'::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"x":1,
^
DETAIL: Token "z" is invalid in line 2: "z "y":"txt1"}".

Thoughts?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Robert Haas <rhaas(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 18:49:04
Message-ID: CA+TgmoZ0sVGkfkXWKy1kGF5GuNuHBLgcZPhkw6mQAC1uGZyheQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <rhaas(at)postgresql(dot)org> writes:
>> Mark JSON error detail messages for translation.
>> Per gripe from Tom Lane.
>
> OK, that was one generically bad thing about json.c's error messages.
> The other thing that was bothering me was the style of the errdetail
> strings, eg
>
>                ereport(ERROR,
>                        (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                         errmsg("invalid input syntax for type json"),
>                         errdetail("line %d: Invalid escape \"\\%s\".",
>                                   lex->line_number, extract_mb_char(s))));
>
> I'm not thrilled with the "line %d:" prefixes.  That seems to me to
> violate the letter and spirit of the guideline about making errdetail
> messages be complete sentences.  Furthermore, the line number is not
> the most important part of the detail message, if indeed it has any
> usefulness at all which is debatable.  (It's certainly not going to
> be tremendously useful when the error report doesn't show any of the
> specific input string being complained of.)

I am amenable to rephrasing the errdetail to put the line number
elsewhere in the string, but I am strongly opposed to getting rid of
it. JSON blobs can easily run to dozens or hundreds of lines, and you
will want to know the line number. Knowing the contents of the line
will in many cases be LESS useful, because the line might contain,
say, a single closing bracket. That's not gonna help much if it
happens many times in the input value, which is not unlikely.

> Also, a minority of the errors report the whole input string:
>
>        ereport(ERROR,
>                (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                 errmsg("invalid input syntax for type json: \"%s\"",
>                        lex->input),
>                 errdetail("The input string ended unexpectedly.")));
>
> And then there are some that provide the whole input string AND a line
> number, just in case you thought there was any intentional design
> decision there.

I did try.

> A minimum change IMO would be to recast the detail messages as complete
> sentences, say "The escape sequence \"\\%s\" in line %d is invalid."
> But I'd like a better-thought-out solution to identifying the location
> of the error.
>
> I notice that there's an unfinished attempt to maintain a line_start
> pointer; if that were carried through, we could imagine printing the
> current line up to the point of an error, which might provide a
> reasonable balance between verbosity and insufficient context.
> As an example, instead of
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  line 2: Token "z" is invalid.
>
> maybe we could print
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  Token "z" is invalid in line 2: "z".
>
> or perhaps better let it run to the end of the line:
>
> regression=# select '{"x":1,
> z  "y":"txt1"}'::json;
> ERROR:  invalid input syntax for type json
> LINE 1: select '{"x":1,
>               ^
> DETAIL:  Token "z" is invalid in line 2: "z  "y":"txt1"}".
>
> Thoughts?

I'm not sure I find that an improvement, but I'm open to what other
people think.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 20:52:20
Message-ID: 23424.1339534340@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jun 12, 2012 at 2:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I'm not thrilled with the "line %d:" prefixes. That seems to me to
>> violate the letter and spirit of the guideline about making errdetail
>> messages be complete sentences. Furthermore, the line number is not
>> the most important part of the detail message, if indeed it has any
>> usefulness at all which is debatable. (It's certainly not going to
>> be tremendously useful when the error report doesn't show any of the
>> specific input string being complained of.)

> I am amenable to rephrasing the errdetail to put the line number
> elsewhere in the string, but I am strongly opposed to getting rid of
> it. JSON blobs can easily run to dozens or hundreds of lines, and you
> will want to know the line number.

Agreed, the input strings could be quite large, but in that case we
definitely don't want to be including the whole input in the primary
error message (which is supposed to be short). I doubt it'd even be
a good idea to try to put it into errdetail; thus I'm thinking about
providing one line only.

> Knowing the contents of the line
> will in many cases be LESS useful, because the line might contain,
> say, a single closing bracket.

True, but I don't think the line number alone is adequate. There are
too many situations where it's not entirely clear what value is being
complained of. (Considering only syntax errors thrown from literal
constants in SQL commands is quite misleading about the requirements
here.) So I think we need to include at least some of the input in
the error.

>> I notice that there's an unfinished attempt to maintain a line_start
>> pointer; if that were carried through, we could imagine printing the
>> current line up to the point of an error, which might provide a
>> reasonable balance between verbosity and insufficient context.
>> ...
>> or perhaps better let it run to the end of the line:

> I'm not sure I find that an improvement, but I'm open to what other
> people think.

Anybody here besides the crickets?

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 21:10:40
Message-ID: 1339535104-sup-6995@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers


Excerpts from Tom Lane's message of mar jun 12 16:52:20 -0400 2012:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:

> >> I notice that there's an unfinished attempt to maintain a line_start
> >> pointer; if that were carried through, we could imagine printing the
> >> current line up to the point of an error, which might provide a
> >> reasonable balance between verbosity and insufficient context.
> >> ...
> >> or perhaps better let it run to the end of the line:
>
> > I'm not sure I find that an improvement, but I'm open to what other
> > people think.
>
> Anybody here besides the crickets?

I think providing both partial line contents (so +1 for maintaining
line_start carefully as required) and line number would be useful enough
to track down problems.

I am not sure about the idea of letting the detail run to the end of the
line; that would be problematic should the line be long (there might not
be newlines in the literal at all, which is not that unusual). I think
it should be truncated at, say, 76 chars or so.

For the case where you have a single } in a line, this isn't all that
helpful; we could consider printing the previous line as well. But if
you end up with

}
}

then it's not that helpful either. I am not sure.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-12 21:26:46
Message-ID: 24035.1339536406@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> I am not sure about the idea of letting the detail run to the end of the
> line; that would be problematic should the line be long (there might not
> be newlines in the literal at all, which is not that unusual). I think
> it should be truncated at, say, 76 chars or so.

Yeah, I was wondering about trying to provide a given amount of context
instead of fixing it to "one line". We could do something like

(1) back up N characters;
(2) find the next newline, if there is one at least M characters before
the error point;
(3) print from there to the error point.

This would give between M and N characters of context, except when the
error point is less than M characters from the start of the input. Not
sure how to display such text together with a line number though; with a
multi-line fragment it would not be obvious which part the line number
refers to. (Backing up over multibyte data might be a bit complicated
too, but I assume we can think of something workable for that.)

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 01:52:56
Message-ID: 28472.1339552376@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

I wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> I am not sure about the idea of letting the detail run to the end of the
>> line; that would be problematic should the line be long (there might not
>> be newlines in the literal at all, which is not that unusual). I think
>> it should be truncated at, say, 76 chars or so.

> Yeah, I was wondering about trying to provide a given amount of context
> instead of fixing it to "one line". We could do something like
> (1) back up N characters;
> (2) find the next newline, if there is one at least M characters before
> the error point;
> (3) print from there to the error point.

After experimenting with this for awhile I concluded that the above is
overcomplicated, and that we might as well just print up to N characters
of context; in most input, the line breaks are far enough apart that
preferentially breaking at them just leads to not having very much
context. Also, it seems like it might be a good idea to present the
input as a CONTEXT line, because that provides more space; you can fit
50 or so characters of data without overrunning standard display width.
This gives me output like

regression=# select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0,"twenty":0,"hundred":0,"thousand":800,
"twothous
and":800,"fivethous":3800,"tenthous":8800,"odd":0,"even":1,
"stringu1":"MAAAAA","stringu2":"AAAAAA","string4":"AAAAxx"}'
::json;
ERROR: invalid input syntax for type json
LINE 1: select '{"unique1":8800,"unique2":0,"two":0,"four":0,"ten":0...
^
DETAIL: Character with value "0x0a" must be escaped.
CONTEXT: JSON data, line 1: ..."twenty":0,"hundred":0,"thousand":800,
"twothous

regression=#

I can't give too many examples because I've only bothered to context-ify
this single error case as yet ;-) ... but does this seem like a sane way
to go?

The code for this is as attached. Note that I'd rip out the normal-path
tracking of line boundaries; it seems better to have a second scan of
the data in the error case and save the cycles in non-error cases.

...
ereport(ERROR,
(errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
errmsg("invalid input syntax for type json"),
errdetail("Character with value \"0x%02x\" must be escaped.",
(unsigned char) *s),
report_json_context(lex)));
...

/*
* Report a CONTEXT line for bogus JSON input.
*
* The return value isn't meaningful, but we make it non-void so that this
* can be invoked inside ereport().
*/
static int
report_json_context(JsonLexContext *lex)
{
char *context_start;
char *context_end;
char *line_start;
int line_number;
char *ctxt;
int ctxtlen;

/* Choose boundaries for the part of the input we will display */
context_start = lex->input;
context_end = lex->token_terminator;
line_start = context_start;
line_number = 1;
for (;;)
{
/* Always advance over a newline, unless it's the current token */
if (*context_start == '\n' && context_start < lex->token_start)
{
context_start++;
line_start = context_start;
line_number++;
continue;
}
/* Otherwise, done as soon as we are close enough to context_end */
if (context_end - context_start < 50)
break;
/* Advance to next multibyte character */
if (IS_HIGHBIT_SET(*context_start))
context_start += pg_mblen(context_start);
else
context_start++;
}

/* Get a null-terminated copy of the data to present */
ctxtlen = context_end - context_start;
ctxt = palloc(ctxtlen + 1);
memcpy(ctxt, context_start, ctxtlen);
ctxt[ctxtlen] = '\0';

return errcontext("JSON data, line %d: %s%s",
line_number,
(context_start > line_start) ? "..." : "",
ctxt);
}

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 02:07:44
Message-ID: CA+TgmoZCO5fjukp4jPdmgV7YkCN03THvVCT5hK87vxVY2CDWRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The code for this is as attached.  Note that I'd rip out the normal-path
> tracking of line boundaries; it seems better to have a second scan of
> the data in the error case and save the cycles in non-error cases.

Really?!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 14:35:31
Message-ID: 9778.1339598131@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The code for this is as attached. Note that I'd rip out the normal-path
>> tracking of line boundaries; it seems better to have a second scan of
>> the data in the error case and save the cycles in non-error cases.

> Really?!

Um ... do you have a problem with that idea, and if so what? It would
be considerably more complicated to do it without a second pass.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 15:03:38
Message-ID: CA+TgmoZJ2i8APLZnwCdCXfuw8E1vGNNdBeZDd=eiPUOL4WCr1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The code for this is as attached.  Note that I'd rip out the normal-path
>>> tracking of line boundaries; it seems better to have a second scan of
>>> the data in the error case and save the cycles in non-error cases.
>
>> Really?!
>
> Um ... do you have a problem with that idea, and if so what?  It would
> be considerably more complicated to do it without a second pass.

Could you explain how it's broken now, and why it will be hard to fix?
People may well want to use a cast to JSON within an exception block
as a way of testing whether strings are valid JSON. We should not
assume that the cost of an exception is totally irrelevant, because
this might be iterated.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 15:06:05
Message-ID: 201206131706.05440.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>> The code for this is as attached. Note that I'd rip out the
> >>> normal-path tracking of line boundaries; it seems better to have a
> >>> second scan of the data in the error case and save the cycles in
> >>> non-error cases.
> >>
> >> Really?!
> >
> > Um ... do you have a problem with that idea, and if so what? It would
> > be considerably more complicated to do it without a second pass.
>
> Could you explain how it's broken now, and why it will be hard to fix?
> People may well want to use a cast to JSON within an exception block
> as a way of testing whether strings are valid JSON. We should not
> assume that the cost of an exception is totally irrelevant, because
> this might be iterated.
Exception blocks/subtransctions already are considerably expensive. I have a
hard time believing this additional cost would be measureable.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 15:18:22
Message-ID: CA+TgmoacGLSNBGdg5Ju4KEBaV4MH4rioTmJpR3NgQZV-bQK58g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
>> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> >>> The code for this is as attached.  Note that I'd rip out the
>> >>> normal-path tracking of line boundaries; it seems better to have a
>> >>> second scan of the data in the error case and save the cycles in
>> >>> non-error cases.
>> >>
>> >> Really?!
>> >
>> > Um ... do you have a problem with that idea, and if so what?  It would
>> > be considerably more complicated to do it without a second pass.
>>
>> Could you explain how it's broken now, and why it will be hard to fix?
>>  People may well want to use a cast to JSON within an exception block
>> as a way of testing whether strings are valid JSON.  We should not
>> assume that the cost of an exception is totally irrelevant, because
>> this might be iterated.
> Exception blocks/subtransctions already are considerably expensive. I have a
> hard time believing this additional cost would be measureable.

According to my testing, the main cost of an exception block catching
a division-by-zero error is that of generating the error message,
primarily sprintf()-type stuff. The cost of scanning a multi-megabyte
string seems likely to be much larger.

Mind you, I'm not going to spend a lot of time crying into my beer if
it turns out that there's no other reasonable way to implement this,
but I do think that it's entirely appropriate to ask why it's not
possible to do better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 15:22:11
Message-ID: 201206131722.11344.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 11:06 AM, Andres Freund <andres(at)2ndquadrant(dot)com>
wrote:
> > On Wednesday, June 13, 2012 05:03:38 PM Robert Haas wrote:
> >> On Wed, Jun 13, 2012 at 10:35 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> > Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> >> >> On Tue, Jun 12, 2012 at 9:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> >>> The code for this is as attached. Note that I'd rip out the
> >> >>> normal-path tracking of line boundaries; it seems better to have a
> >> >>> second scan of the data in the error case and save the cycles in
> >> >>> non-error cases.
> >> >>
> >> >> Really?!
> >> >
> >> > Um ... do you have a problem with that idea, and if so what? It would
> >> > be considerably more complicated to do it without a second pass.
> >>
> >> Could you explain how it's broken now, and why it will be hard to fix?
> >> People may well want to use a cast to JSON within an exception block
> >> as a way of testing whether strings are valid JSON. We should not
> >> assume that the cost of an exception is totally irrelevant, because
> >> this might be iterated.
> >
> > Exception blocks/subtransctions already are considerably expensive. I
> > have a hard time believing this additional cost would be measureable.
>
> According to my testing, the main cost of an exception block catching
> a division-by-zero error is that of generating the error message,
> primarily sprintf()-type stuff. The cost of scanning a multi-megabyte
> string seems likely to be much larger.
True. I ignored that there doesn't have to be an xid assigned yet... I still
think its not very relevant though.

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 15:55:40
Message-ID: 11665.1339602940@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
>> According to my testing, the main cost of an exception block catching
>> a division-by-zero error is that of generating the error message,
>> primarily sprintf()-type stuff. The cost of scanning a multi-megabyte
>> string seems likely to be much larger.

> True. I ignored that there doesn't have to be an xid assigned yet... I still
> think its not very relevant though.

I don't think it's relevant either. In the first place, I don't think
that errors occurring "multi megabytes" into a JSON blob are going to
occur often enough to be performance critical. In the second place,
removing cycles from the non-error case is worth something too, probably
more than making the error case faster is worth.

In any case, the proposed scheme for providing context requires that
you know where the error is before you can identify the context. I
considered schemes that would keep track of the last N characters or
line breaks in case one of them proved to be the one we need. That
would add enough cycles to non-error cases to almost certainly not be
desirable. I also considered trying to back up, but that doesn't look
promising either for arbitrary multibyte encodings.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 16:00:43
Message-ID: CA+TgmoaprEKkBOBptC2CWB3uJ-+bH0UNgQdxvqsXg+sxUNO77w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> On Wednesday, June 13, 2012 05:18:22 PM Robert Haas wrote:
>>> According to my testing, the main cost of an exception block catching
>>> a division-by-zero error is that of generating the error message,
>>> primarily sprintf()-type stuff.  The cost of scanning a multi-megabyte
>>> string seems likely to be much larger.
>
>> True. I ignored that there doesn't have to be an xid assigned yet... I still
>> think its not very relevant though.
>
> I don't think it's relevant either.  In the first place, I don't think
> that errors occurring "multi megabytes" into a JSON blob are going to
> occur often enough to be performance critical.  In the second place,
> removing cycles from the non-error case is worth something too, probably
> more than making the error case faster is worth.
>
> In any case, the proposed scheme for providing context requires that
> you know where the error is before you can identify the context.  I
> considered schemes that would keep track of the last N characters or
> line breaks in case one of them proved to be the one we need.  That
> would add enough cycles to non-error cases to almost certainly not be
> desirable.  I also considered trying to back up, but that doesn't look
> promising either for arbitrary multibyte encodings.

Oh, I see. :-(

Well, I guess I'll have to suck it up, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 16:27:25
Message-ID: 12158.1339604845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> In any case, the proposed scheme for providing context requires that
>> you know where the error is before you can identify the context. I
>> considered schemes that would keep track of the last N characters or
>> line breaks in case one of them proved to be the one we need. That
>> would add enough cycles to non-error cases to almost certainly not be
>> desirable. I also considered trying to back up, but that doesn't look
>> promising either for arbitrary multibyte encodings.

> Oh, I see. :-(

Attached is a complete proposed patch for this, with some further
adjustments to make the output look a bit more like what we use for
SQL error context printouts (in particular, "..." at both ends of the
excerpt when appropriate).

One thing I did not touch, but am less than happy with, is the wording
of this detail message:

errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".",
token),

This seems uselessly verbose to me. It could be argued that enumerating
all the options is helpful to rank JSON novices ... but unless you know
exactly what an "object" is and why that's different from the other
options, it's not all that helpful. I'm inclined to think that
something like this would be better:

errdetail("Expected JSON value, but found \"%s\".",
token),

Thoughts?

regards, tom lane

Attachment Content-Type Size
json-error-context-1.patch text/x-patch 28.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 16:45:34
Message-ID: CA+Tgmoa56bb68ONjQrv-6gA_V64hJzEo5hhh-s+UgEudiJrpYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 13, 2012 at 12:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Wed, Jun 13, 2012 at 11:55 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> In any case, the proposed scheme for providing context requires that
>>> you know where the error is before you can identify the context.  I
>>> considered schemes that would keep track of the last N characters or
>>> line breaks in case one of them proved to be the one we need.  That
>>> would add enough cycles to non-error cases to almost certainly not be
>>> desirable.  I also considered trying to back up, but that doesn't look
>>> promising either for arbitrary multibyte encodings.
>
>> Oh, I see.  :-(
>
> Attached is a complete proposed patch for this, with some further
> adjustments to make the output look a bit more like what we use for
> SQL error context printouts (in particular, "..." at both ends of the
> excerpt when appropriate).

I like some of these changes - in particular, the use of errcontext(),
but some of them still seem off.

! DETAIL: Token "'" is invalid.
! CONTEXT: JSON data, line 1: '...

This doesn't make sense to me.

! DETAIL: Character with value 0x0a must be escaped.
! CONTEXT: JSON data, line 1: "abc
! ...

This seems an odd way to present this, especially if the goal is to
NOT include the character needing escaping in the log unescaped, which
I thought was the point of saying 0x0a.

! DETAIL: "\u" must be followed by four hexadecimal digits.
! CONTEXT: JSON data, line 1: "\u000g...

Why does this end in ... even though there's nothing further in the
input string?

> One thing I did not touch, but am less than happy with, is the wording
> of this detail message:
>
>        errdetail("Expected string, number, object, array, true, false, or null, but found \"%s\".",
>                  token),
>
> This seems uselessly verbose to me.  It could be argued that enumerating
> all the options is helpful to rank JSON novices ... but unless you know
> exactly what an "object" is and why that's different from the other
> options, it's not all that helpful.  I'm inclined to think that
> something like this would be better:
>
>        errdetail("Expected JSON value, but found \"%s\".",
>                  token),
>
> Thoughts?

Good idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 17:04:31
Message-ID: 12857.1339607071@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I like some of these changes - in particular, the use of errcontext(),
> but some of them still seem off.

> ! DETAIL: Token "'" is invalid.
> ! CONTEXT: JSON data, line 1: '...

> This doesn't make sense to me.

Well, the input is two single quotes and the complaint is that the first
one of them doesn't constitute a valid JSON token. What would you
expect to see instead?

I considered putting double quotes around the excerpt text, but we do
not do that in SQL error-context reports, and it seems likely to just be
confusing given the prevalence of double quotes in JSON data. But happy
to consider opposing arguments.

Another thing that could be done here is to print the rest of the line,
rather than "...", if there's not very much of it. I'm not sure that's
an improvement though. The advantage of the proposed design is that the
point where the excerpt ends is exactly where the error was detected;
lacking an error cursor, I don't see how else to present that info.

> ! DETAIL: Character with value 0x0a must be escaped.
> ! CONTEXT: JSON data, line 1: "abc
> ! ...

> This seems an odd way to present this, especially if the goal is to
> NOT include the character needing escaping in the log unescaped, which
> I thought was the point of saying 0x0a.

Do you think it would be better to present something that isn't what the
user typed? Again, I don't see an easy improvement here. If you don't
want newlines in the logged context, what will we do for something like

{"foo": {
"bar":44
}
]

There basically isn't any useful context to present unless we are
willing to back up several lines, and I don't think it'll be more
readable if we escape all the newlines.

> ! DETAIL: "\u" must be followed by four hexadecimal digits.
> ! CONTEXT: JSON data, line 1: "\u000g...

> Why does this end in ... even though there's nothing further in the
> input string?

There is something further, namely the trailing double quote.

The examples in the regression tests are not really designed to show
cases where this type of error reporting is an improvement, because
there's hardly any context around the error sites.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 18:09:19
Message-ID: CA+TgmoYPZnPzhW6tjTRz7qYiOXz7thaF049nqjs=cy7FZZhedA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I like some of these changes - in particular, the use of errcontext(),
>> but some of them still seem off.
>
>> ! DETAIL:  Token "'" is invalid.
>> ! CONTEXT:  JSON data, line 1: '...
>
>> This doesn't make sense to me.
>
> Well, the input is two single quotes and the complaint is that the first
> one of them doesn't constitute a valid JSON token.  What would you
> expect to see instead?

Oh, I see.

> Another thing that could be done here is to print the rest of the line,
> rather than "...", if there's not very much of it.  I'm not sure that's
> an improvement though.  The advantage of the proposed design is that the
> point where the excerpt ends is exactly where the error was detected;
> lacking an error cursor, I don't see how else to present that info.

OK.

>> ! DETAIL:  Character with value 0x0a must be escaped.
>> ! CONTEXT:  JSON data, line 1: "abc
>> ! ...
>
>> This seems an odd way to present this, especially if the goal is to
>> NOT include the character needing escaping in the log unescaped, which
>> I thought was the point of saying 0x0a.
>
> Do you think it would be better to present something that isn't what the
> user typed?  Again, I don't see an easy improvement here.  If you don't
> want newlines in the logged context, what will we do for something like
>
>        {"foo": {
>                "bar":44
>                }
>        ]
>
> There basically isn't any useful context to present unless we are
> willing to back up several lines, and I don't think it'll be more
> readable if we escape all the newlines.

Hmm. If your plan is to trace back to the opening brace you were
expecting to match, I don't think that's going to work either. What
if there are three pages (or 3MB) of data in between?

> The examples in the regression tests are not really designed to show
> cases where this type of error reporting is an improvement, because
> there's hardly any context around the error sites.

Yeah, true.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 18:55:24
Message-ID: 14578.1339613724@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jun 13, 2012 at 1:04 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> ! DETAIL: Character with value 0x0a must be escaped.
>>> ! CONTEXT: JSON data, line 1: "abc
>>> ! ...
>>>
>>> This seems an odd way to present this, especially if the goal is to
>>> NOT include the character needing escaping in the log unescaped, which
>>> I thought was the point of saying 0x0a.

>> Do you think it would be better to present something that isn't what the
>> user typed? Again, I don't see an easy improvement here. If you don't
>> want newlines in the logged context, what will we do for something like
>>
>> {"foo": {
>> "bar":44
>> }
>> ]

> Hmm. If your plan is to trace back to the opening brace you were
> expecting to match, I don't think that's going to work either. What
> if there are three pages (or 3MB) of data in between?

No, that's not the proposal; I only anticipate printing a few dozen
characters of context. But that could still mean printing something
like

DETAIL: expected "," or "}", but found "]".
CONTEXT: JSON data, line 123: ..."bar":44
}
]

which I argue is much more useful than just seeing the "]". So the
question is whether it's still as useful if we mangle the whitespace.
I'm thinking it's not. We don't mangle whitespace when printing SQL
statements into the log, anyway.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: [COMMITTERS] pgsql: Mark JSON error detail messages for translation.
Date: 2012-06-13 23:49:07
Message-ID: 23330.1339631347@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-committers pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> ! DETAIL: Character with value 0x0a must be escaped.
>>> ! CONTEXT: JSON data, line 1: "abc
>>> ! ...
>>>
>>> This seems an odd way to present this, especially if the goal is to
>>> NOT include the character needing escaping in the log unescaped, which
>>> I thought was the point of saying 0x0a.

I thought of a simple way to address that objection for this particular
case: we can just truncate the context display *at* the offending
character, instead of *after* it. This is playing a bit fast and loose
with the general principle that the context should end at the point of
detection of the error; but given that the character in question is
always unprintable, I think it's probably not going to bother anyone.

I've gone ahead and committed this before branching, though I'm
certainly still willing to entertain suggestions for further
improvement.

regards, tom lane