Quick Links

Re: machine-readable explain output v4

Lists:	pgsql-hackers

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	machine-readable explain output v4
Date:	2009-07-30 03:05:48
Message-ID:	603c8f070907292005t495c515dv5759e71666bcb2e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

OK, here's the updated version of my machine-readable explain output
patch. This needed heavy updating as a result of the changes that Tom
asked me to make to the explain options patch, and the further changes
he made himself. In addition to any regressions I may have introduced
during the rebasing process, there is one definite problem here: in
the previous version of this patch, explain (format xml) returned XML
data; now, it's back to text.

The reason for this regression is that Tom asked me to change
ExplainStmt to just carry a list of nodes and to do all the parsing in
ExplainQuery. Unfortunately, the TupleDesc is constructed by
ExplainResultDesc() which can't trivially be changed to take an
ExplainState, because UtilityTupleDescriptor() also wants to call it.
We could possibly fix this by a hack similar to the one we already
added to GetCommandLogLevel(), but I haven't done that here.

...Robert

Attachment	Content-Type	Size
explain_format-v4.patch	text/x-diff	74.0 KB

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-02 16:06:40
Message-ID:	200908021806.40634.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi Robert, Hi all,

On Thursday 30 July 2009 05:05:48 Robert Haas wrote:
> OK, here's the updated version of my machine-readable explain output
> patch. This needed heavy updating as a result of the changes that Tom
> asked me to make to the explain options patch, and the further changes
> he made himself. In addition to any regressions I may have introduced
> during the rebasing process, there is one definite problem here: in
> the previous version of this patch, explain (format xml) returned XML
> data; now, it's back to text.

> The reason for this regression is that Tom asked me to change
> ExplainStmt to just carry a list of nodes and to do all the parsing in
> ExplainQuery. Unfortunately, the TupleDesc is constructed by
> ExplainResultDesc() which can't trivially be changed to take an
> ExplainState, because UtilityTupleDescriptor() also wants to call it.
> We could possibly fix this by a hack similar to the one we already
> added to GetCommandLogLevel(), but I haven't done that here.
Hm. I think its cleaner to move the option parsing into its own function - its
3 places parsing options then and probably not going to get less.
I am not sure this is cleaner than including the parsed options in ExplainStm
though... (possibly in a separate struct to avoid changing copy/equal-funcs
everytime)

Some more comments on the (new) version of the patch:
- The regression tests are gone?
- Currently a value scan looks like »Values Scan on "*VALUES*"« What about
adding its alias at least in verbose mode? This currently is inconsistent with
other scans. Also he output columns of a VALUES scan are named column$N even
if names as specified like in AS foo(colname)
- why do xml/json contain no time units anymore? (e.g. Total Runtime).
Admittedly thats already inconsistent in the current text format...

- Code patterns like:
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, "Total runtime: %.3f ms\n",
1000.0 * totaltime);
else if (es->format == EXPLAIN_FORMAT_XML)
appendStringInfo(es->str,
" <Total-Runtime>%.3f</Total-Runtime>\n",
1000.0 * totaltime);
else if (es->format == EXPLAIN_FORMAT_JSON)
appendStringInfo(es->str, ",\n \"Total runtime\" : %.3f",
1000.0 * totaltime);
or
if (es->format == EXPLAIN_FORMAT_TEXT)
appendStringInfo(es->str, " for constraint %s", conname);
else if (es->format == EXPLAIN_FORMAT_XML)
{
appendStringInfoString(es->str, " <Constraint-Name>");
escape_xml(es->str, conname);
appendStringInfoString(es->str, "</Constraint-Name>\n");
}
else if (es->format == EXPLAIN_FORMAT_JSON)
{
appendStringInfo(es->str, "\n \"Constraint Name\": ");
escape_json(es->str, conname);
}

possibly could be simplified using ExplainPropertyText or a function accepting
a format string.
At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in
ExplainOnePlan and report_triggers.

On Friday 31 July 2009 23:13:54 Robert Haas wrote:
> On Sat, Jul 18, 2009 at 10:29 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> > One part where I find the code flow ugly is 'did_boilerplate' in
> > report_triggers/its callsites.
> > I can see why it is done that way, but its not exactly obvious to read
> > when you want to find out how the format looks.
> Suggestions?
The only idea without building more xml/json infrastructure I have is using a
separate stringbuffer inside report_triggers - but thats not much nicer.

Thats all I could find right now...

Andres

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-02 21:34:04
Message-ID:	603c8f070908021434p12157605r6be4633c18631e7e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> Hi Robert, Hi all,
>
> On Thursday 30 July 2009 05:05:48 Robert Haas wrote:
>> OK, here's the updated version of my machine-readable explain output
>> patch. This needed heavy updating as a result of the changes that Tom
>> asked me to make to the explain options patch, and the further changes
>> he made himself. In addition to any regressions I may have introduced
>> during the rebasing process, there is one definite problem here: in
>> the previous version of this patch, explain (format xml) returned XML
>> data; now, it's back to text.
>
>> The reason for this regression is that Tom asked me to change
>> ExplainStmt to just carry a list of nodes and to do all the parsing in
>> ExplainQuery. Unfortunately, the TupleDesc is constructed by
>> ExplainResultDesc() which can't trivially be changed to take an
>> ExplainState, because UtilityTupleDescriptor() also wants to call it.
>> We could possibly fix this by a hack similar to the one we already
>> added to GetCommandLogLevel(), but I haven't done that here.
>
> Hm. I think its cleaner to move the option parsing into its own function - its
> 3 places parsing options then and probably not going to get less.
> I am not sure this is cleaner than including the parsed options in ExplainStm
> though... (possibly in a separate struct to avoid changing copy/equal-funcs
> everytime)

Well, this is why we need Tom to weigh in.

> Some more comments on the (new) version of the patch:
> - The regression tests are gone?

Tom added some that look adequate to me to create_index.sql, as a
separate commit, so I don't think I need to do this in my patch any
more. Maybe some of those examples should be changed to output JSON
or XML, though, but I'd rather leave this up to Tom's discretion on
commit because I think he has opinions about this and I think my
chances of guessing what they are are low.

> - Currently a value scan looks like »Values Scan on "*VALUES*"« What about
> adding its alias at least in verbose mode? This currently is inconsistent with
> other scans.

I don't know why this doesn't work, but I think it's unrelated to this patch.

> Also he output columns of a VALUES scan are named column$N even
> if names as specified like in AS foo(colname)

This is consistent with how other types of scans are treated.

rhaas=# explain verbose select x,y,z from (select * from pg_class) a(x,y,z);
QUERY PLAN
----------------------------------------------------------------------
Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244 width=72)
Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype
(2 rows)

> - why do xml/json contain no time units anymore? (e.g. Total Runtime).
> Admittedly thats already inconsistent in the current text format...

I'm not sure what you mean by "any more". I don't think any version
of these patches I ever submitted did otherwise, nor do I think it's
particularly valuable. Costs are implicitly in units of
PostgreSQL-costing and times are implicitly in units of milliseconds,
just as they are in the text format. Changing this would require
clients to strip off the trailing 'ms' before converting to a
floating-point number, which seems like an irritation with no
corresponding benefit.

> - Code patterns like:
> if (es->format == EXPLAIN_FORMAT_TEXT)
> appendStringInfo(es->str, "Total runtime: %.3f ms\n",
> 1000.0 * totaltime);
> else if (es->format == EXPLAIN_FORMAT_XML)
> appendStringInfo(es->str,
> " <Total-Runtime>%.3f</Total-Runtime>\n",
> 1000.0 * totaltime);
> else if (es->format == EXPLAIN_FORMAT_JSON)
> appendStringInfo(es->str, ",\n \"Total runtime\" : %.3f",
> 1000.0 * totaltime);
> or
> if (es->format == EXPLAIN_FORMAT_TEXT)
> appendStringInfo(es->str, " for constraint %s", conname);
> else if (es->format == EXPLAIN_FORMAT_XML)
> {
> appendStringInfoString(es->str, " <Constraint-Name>");
> escape_xml(es->str, conname);
> appendStringInfoString(es->str, "</Constraint-Name>\n");
> }
> else if (es->format == EXPLAIN_FORMAT_JSON)
> {
> appendStringInfo(es->str, "\n \"Constraint Name\": ");
> escape_json(es->str, conname);
> }
>
> possibly could be simplified using ExplainPropertyText or a function accepting
> a format string.
> At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple places in
> ExplainOnePlan and report_triggers.

Well, the whole explain output format is pretty idiosyncratic, and I
had to work pretty hard to beat it into submission. I think that it
would not be totally trivial to do what you're suggesting here because
it would require adding code to manage es->indent outside of
ExplainPrintPlan(), which we currently don't. I'm not sure whether
that works out to a net win.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-02 22:35:45
Message-ID:	24831.1249252545@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
>> - Currently a value scan looks like Values Scan on "*VALUES*" What about
>> adding its alias at least in verbose mode? This currently is inconsistent with
>> other scans.

> I don't know why this doesn't work, but I think it's unrelated to this patch.

If you mean something like

regression=# explain select * from (values(1)) ss;
QUERY PLAN
-------------------------------------------------------------
Values Scan on "*VALUES*" (cost=0.00..0.01 rows=1 width=4)
(1 row)

I think the reason is that the alias applies to a SubqueryScan node that
later gets optimized away. The VALUES per se doesn't have an alias.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-02 23:29:09
Message-ID:	200908030129.10167.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sunday 02 August 2009 23:34:04 Robert Haas wrote:
> On Sun, Aug 2, 2009 at 12:06 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
> > Hi Robert, Hi all,
> >
> > On Thursday 30 July 2009 05:05:48 Robert Haas wrote:
> >> OK, here's the updated version of my machine-readable explain output
> >> patch. This needed heavy updating as a result of the changes that Tom
> >> asked me to make to the explain options patch, and the further changes
> >> he made himself. In addition to any regressions I may have introduced
> >> during the rebasing process, there is one definite problem here: in
> >> the previous version of this patch, explain (format xml) returned XML
> >> data; now, it's back to text.
> >>
> >> The reason for this regression is that Tom asked me to change
> >> ExplainStmt to just carry a list of nodes and to do all the parsing in
> >> ExplainQuery. Unfortunately, the TupleDesc is constructed by
> >> ExplainResultDesc() which can't trivially be changed to take an
> >> ExplainState, because UtilityTupleDescriptor() also wants to call it.
> >> We could possibly fix this by a hack similar to the one we already
> >> added to GetCommandLogLevel(), but I haven't done that here.
> > Hm. I think its cleaner to move the option parsing into its own function
> > - its 3 places parsing options then and probably not going to get less. I
> > am not sure this is cleaner than including the parsed options in
> > ExplainStm though... (possibly in a separate struct to avoid changing
> > copy/equal-funcs everytime)
> Well, this is why we need Tom to weigh in.
Yes.

> > Some more comments on the (new) version of the patch:
> > - The regression tests are gone?
> Tom added some that look adequate to me to create_index.sql, as a
> separate commit, so I don't think I need to do this in my patch any
> more. Maybe some of those examples should be changed to output JSON
> or XML, though, but I'd rather leave this up to Tom's discretion on
> commit because I think he has opinions about this and I think my
> chances of guessing what they are are low.
Yea, I was referring to ones using xml/json.

> > - Currently a value scan looks like »Values Scan on "*VALUES*"« What
> > about adding its alias at least in verbose mode? This currently is
> > inconsistent with other scans.
> I don't know why this doesn't work, but I think it's unrelated to this
> patch.
True.

> > Also he output columns of a VALUES scan are named column$N even
> > if names as specified like in AS foo(colname)
> This is consistent with how other types of scans are treated.
> rhaas=# explain verbose select x,y,z from (select * from pg_class)
> a(x,y,z); QUERY PLAN
> ----------------------------------------------------------------------
> Seq Scan on pg_catalog.pg_class (cost=0.00..8.44 rows=244 width=72)
> Output: pg_class.relname, pg_class.relnamespace, pg_class.reltype
> (2 rows)
This is someone weird considering since using it directly in relations works
different:
explain (verbose) SELECT * FROM pg_namespace AS f(a,b,c);
QUERY PLAN
---------------------------------------------------------------------------
Seq Scan on pg_catalog.pg_namespace f (cost=0.00..1.06 rows=6 width=100)
Output: a, b, c
(2 rows)

Not your "guilt" though. Still its rather strange and looks worth fixable.

> > - why do xml/json contain no time units anymore? (e.g. Total Runtime).
> > Admittedly thats already inconsistent in the current text format...
> I'm not sure what you mean by "any more". I don't think any version
> of these patches I ever submitted did otherwise, nor do I think it's
> particularly valuable. Costs are implicitly in units of
> PostgreSQL-costing and times are implicitly in units of milliseconds,
> just as they are in the text format. Changing this would require
> clients to strip off the trailing 'ms' before converting to a
> floating-point number, which seems like an irritation with no
> corresponding benefit.
I did not think any of your patches did - it was just a difference between the
original output and the new formats I just noted - as I said its not even
consistent in the text format.

> > - Code patterns like:
> > if (es->format == EXPLAIN_FORMAT_TEXT)
> > appendStringInfo(es->str, "Total runtime: %.3f
> > ms\n", 1000.0 * totaltime); else if (es->format == EXPLAIN_FORMAT_XML)
> > appendStringInfo(es->str,
> > "
> > <Total-Runtime>%.3f</Total-Runtime>\n", 1000.0 * totaltime); else if
> > (es->format == EXPLAIN_FORMAT_JSON)
> > appendStringInfo(es->str, ",\n \"Total
> > runtime\" : %.3f", 1000.0 * totaltime); or
> > if (es->format == EXPLAIN_FORMAT_TEXT)
> > appendStringInfo(es->str, " for constraint
> > %s", conname); else if (es->format == EXPLAIN_FORMAT_XML) {
> > appendStringInfoString(es->str, "
> > <Constraint-Name>"); escape_xml(es->str, conname);
> > appendStringInfoString(es->str,
> > "</Constraint-Name>\n"); }
> > else if (es->format == EXPLAIN_FORMAT_JSON)
> > {
> > appendStringInfo(es->str, "\n
> > \"Constraint Name\": "); escape_json(es->str, conname);
> > }
> >
> > possibly could be simplified using ExplainPropertyText or a function
> > accepting a format string.
> > At least for the !EXPLAIN_FORMAT_TEXT this seems simple at multiple
> > places in ExplainOnePlan and report_triggers.
> Well, the whole explain output format is pretty idiosyncratic, and I
> had to work pretty hard to beat it into submission. I think that it
> would not be totally trivial to do what you're suggesting here because
> it would require adding code to manage es->indent outside of
> ExplainPrintPlan(), which we currently don't. I'm not sure whether
> that works out to a net win.
Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
patch. The result looks simpler for my eyes.

While re-checking this I noticed a newly introduced bug in report_triggers() -
in case of a trigger/conname==false "Trigger Name" gets printed twice due to a
duplicated else - merge glitch? (fixed in attached patch as well)

Attachment	Content-Type	Size
0001-reduce-json-xml-duplication-fix-duplicate-output.patch	text/x-patch	5.8 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-02 23:57:48
Message-ID:	26060.1249257468@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> The reason for this regression is that Tom asked me to change
>>> ExplainStmt to just carry a list of nodes and to do all the parsing in
>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by
>>> ExplainResultDesc() which can't trivially be changed to take an
>>> ExplainState, because UtilityTupleDescriptor() also wants to call it.
>>> We could possibly fix this by a hack similar to the one we already
>>> added to GetCommandLogLevel(), but I haven't done that here.

I don't see anything particularly wrong with having ExplainResultDesc
do the same kind of thing GetCommandLogLevel is doing.

The alternative is to have it call a function that extracts all the
parameters into an ExplainState, but the thing I have against that is
that it makes for a larger cross-section of user mistakes that will
result in throwing an elog() before we actually reach execution.
I didn't want GetCommandLogLevel throwing any errors it doesn't actually
have to, because that would interfere with logging of statements that
could perfectly well get logged normally. I'm not sure if there are any
comparable reasons to not want errors from UtilityTupleDescriptor, but
there could well be.

>> - The regression tests are gone?

> Tom added some that look adequate to me to create_index.sql, as a
> separate commit, so I don't think I need to do this in my patch any
> more. Maybe some of those examples should be changed to output JSON
> or XML, though, but I'd rather leave this up to Tom's discretion on
> commit because I think he has opinions about this and I think my
> chances of guessing what they are are low.

Well, of course the existing tests are not going to exercise XML or
JSON output format. Dunno how much we care. I had supposed that
XML or JSON would always emit all the fields and leave it to the
recipient to suppress what they don't want. If we want to have
platform-independent regression tests then we'd need to make the
COSTS option effective for XML/JSON format --- do we want that?

> I'm not sure what you mean by "any more". I don't think any version
> of these patches I ever submitted did otherwise, nor do I think it's
> particularly valuable. Costs are implicitly in units of
> PostgreSQL-costing and times are implicitly in units of milliseconds,
> just as they are in the text format. Changing this would require
> clients to strip off the trailing 'ms' before converting to a
> floating-point number, which seems like an irritation with no
> corresponding benefit.

I agree with not labeling these numbers. We definitely can't label
the costs with anything useful, and labeling runtimes would be a
bit inconsistent.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-03 00:41:59
Message-ID:	200908030241.59569.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 03 August 2009 01:57:48 Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > > - The regression tests are gone?
> > Tom added some that look adequate to me to create_index.sql, as a
> > separate commit, so I don't think I need to do this in my patch any
> > more. Maybe some of those examples should be changed to output JSON
> > or XML, though, but I'd rather leave this up to Tom's discretion on
> > commit because I think he has opinions about this and I think my
> > chances of guessing what they are are low.
> Well, of course the existing tests are not going to exercise XML or
> JSON output format. Dunno how much we care. I had supposed that
> XML or JSON would always emit all the fields and leave it to the
> recipient to suppress what they don't want. If we want to have
> platform-independent regression tests then we'd need to make the
> COSTS option effective for XML/JSON format --- do we want that?
Options such as COSTS do effect XML/JSON right now. While not important for
COSTS itself, I think its generally good to do so because a certain option
might not be done per default efficiencywise and I don't see a reason to
specialcase COSTS.

Andres

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-03 00:50:08
Message-ID:	603c8f070908021750u375094d4l9b2e4561c52b76bd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Well, of course the existing tests are not going to exercise XML or
> JSON output format. Dunno how much we care. I had supposed that
> XML or JSON would always emit all the fields and leave it to the
> recipient to suppress what they don't want. If we want to have
> platform-independent regression tests then we'd need to make the
> COSTS option effective for XML/JSON format --- do we want that?

Well, as I've said many, many times on these threads, I feel strongly
that the choice of output format shouldn't for the most part affect
the information which is displayed. Output format should be an
option, and the information to display should be an option, and the
two should be, as far as possible, separate. So what you're
suggesting is the way it works now.

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00879.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00916.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00917.php
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00989.php

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-05 11:07:33
Message-ID:	603c8f070908050407o43f7ca66sefa5795590e2ae6b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> The reason for this regression is that Tom asked me to change
>>>> ExplainStmt to just carry a list of nodes and to do all the parsing in
>>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by
>>>> ExplainResultDesc() which can't trivially be changed to take an
>>>> ExplainState, because UtilityTupleDescriptor() also wants to call it.
>>>> We could possibly fix this by a hack similar to the one we already
>>>> added to GetCommandLogLevel(), but I haven't done that here.
>
> I don't see anything particularly wrong with having ExplainResultDesc
> do the same kind of thing GetCommandLogLevel is doing.

After I did this, I thought it would be useful to add a regression
test to make sure that it is doing the right thing. So I came up with
this:

CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
DECLARE
x RECORD;
BEGIN
EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
RETURN pg_typeof(x."QUERY PLAN");
END
$$ LANGUAGE plpgsql;

This works the first time you run it in a particular session, but then
if change $1 so as to get a different answer, it fails:

rhaas=# select test_explain_format('text');
test_explain_format
---------------------
text
(1 row)

rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard
ALL PLANS TEMP
rhaas=# discard plans;
DISCARD PLANS
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=# discard all;
DISCARD ALL
rhaas=# select test_explain_format('xml');
ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
rhaas=#

If I quit psql and start back up again, then it works:

rhaas=# select test_explain_format('xml');
test_explain_format
---------------------
xml
(1 row)

So I guess that leads me to -

(1) How do I make this work?
(2) Is it worth making this work?

...Robert

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-05 11:20:05
Message-ID:	4A796AE5.9090006@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>
>>>>> The reason for this regression is that Tom asked me to change
>>>>> ExplainStmt to just carry a list of nodes and to do all the parsing in
>>>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by
>>>>> ExplainResultDesc() which can't trivially be changed to take an
>>>>> ExplainState, because UtilityTupleDescriptor() also wants to call it.
>>>>> We could possibly fix this by a hack similar to the one we already
>>>>> added to GetCommandLogLevel(), but I haven't done that here.
>>>>>
>> I don't see anything particularly wrong with having ExplainResultDesc
>> do the same kind of thing GetCommandLogLevel is doing.
>>
>
> After I did this, I thought it would be useful to add a regression
> test to make sure that it is doing the right thing. So I came up with
> this:
>
> CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
> DECLARE
> x RECORD;
> BEGIN
> EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
> RETURN pg_typeof(x."QUERY PLAN");
> END
> $$ LANGUAGE plpgsql;
>
> This works the first time you run it in a particular session, but then
> if change $1 so as to get a different answer, it fails:
>
> rhaas=# select test_explain_format('text');
> test_explain_format
> ---------------------
> text
> (1 row)
>
> rhaas=# select test_explain_format('xml');
> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
> rhaas=# discard
> ALL PLANS TEMP
> rhaas=# discard plans;
> DISCARD PLANS
> rhaas=# select test_explain_format('xml');
> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
> rhaas=# discard all;
> DISCARD ALL
> rhaas=# select test_explain_format('xml');
> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
> rhaas=#
>
> If I quit psql and start back up again, then it works:
>
> rhaas=# select test_explain_format('xml');
> test_explain_format
> ---------------------
> xml
> (1 row)
>
> So I guess that leads me to -
>
> (1) How do I make this work?
> (2) Is it worth making this work?

You could have the function create an inner function which it then runs
and drops. I have had to perform such gymnastics in the past to get
around similar problems. And yes, it's ugly as hell.

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-05 11:52:11
Message-ID:	603c8f070908050452r2e71fae0j78a0650f095761e7@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 5, 2009 at 7:20 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> Robert Haas wrote:
>> On Sun, Aug 2, 2009 at 7:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>>>> The reason for this regression is that Tom asked me to change
>>>>>> ExplainStmt to just carry a list of nodes and to do all the parsing in
>>>>>> ExplainQuery. Unfortunately, the TupleDesc is constructed by
>>>>>> ExplainResultDesc() which can't trivially be changed to take an
>>>>>> ExplainState, because UtilityTupleDescriptor() also wants to call it.
>>>>>> We could possibly fix this by a hack similar to the one we already
>>>>>> added to GetCommandLogLevel(), but I haven't done that here.
>>>>>>
>>>
>>> I don't see anything particularly wrong with having ExplainResultDesc
>>> do the same kind of thing GetCommandLogLevel is doing.
>>>
>>
>> After I did this, I thought it would be useful to add a regression
>> test to make sure that it is doing the right thing. So I came up with
>> this:
>>
>> CREATE OR REPLACE FUNCTION test_explain_format(text) RETURNS text AS $$
>> DECLARE
>> x RECORD;
>> BEGIN
>> EXECUTE 'explain (format ' || $1 || ') select 1' INTO x;
>> RETURN pg_typeof(x."QUERY PLAN");
>> END
>> $$ LANGUAGE plpgsql;
>>
>> This works the first time you run it in a particular session, but then
>> if change $1 so as to get a different answer, it fails:
>>
>> rhaas=# select test_explain_format('text');
>> test_explain_format
>> ---------------------
>> text
>> (1 row)
>>
>> rhaas=# select test_explain_format('xml');
>> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
>> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
>> rhaas=# discard
>> ALL PLANS TEMP
>> rhaas=# discard plans;
>> DISCARD PLANS
>> rhaas=# select test_explain_format('xml');
>> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
>> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
>> rhaas=# discard all;
>> DISCARD ALL
>> rhaas=# select test_explain_format('xml');
>> ERROR: type of "x.QUERY PLAN" does not match that when preparing the plan
>> CONTEXT: PL/pgSQL function "test_explain_format" line 5 at RETURN
>> rhaas=#
>>
>> If I quit psql and start back up again, then it works:
>>
>> rhaas=# select test_explain_format('xml');
>> test_explain_format
>> ---------------------
>> xml
>> (1 row)
>>
>> So I guess that leads me to -
>>
>> (1) How do I make this work?
>> (2) Is it worth making this work?
>
>
> You could have the function create an inner function which it then runs and
> drops. I have had to perform such gymnastics in the past to get around
> similar problems. And yes, it's ugly as hell.

<hurls>

Well, I guess I could do it like this:

CREATE OR REPLACE FUNCTION test_explain_format() RETURNS text[] AS $$
DECLARE
xt RECORD;
xx RECORD;
xj RECORD;
BEGIN
EXPLAIN (FORMAT TEXT) SELECT 1 INTO xt;
EXPLAIN (FORMAT XML) SELECT 1 INTO xx;
EXPLAIN (FORMAT JSON) SELECT 1 INTO xj;
RETURN ARRAY[
pg_typeof(xt."QUERY PLAN"),
pg_typeof(xx."QUERY PLAN"),
pg_typeof(xj."QUERY PLAN")
];
END
$$ LANGUAGE plpgsql;

Fortunately there is not an unlimited space to probe here...

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-05 14:17:13
Message-ID:	9854.1249481833@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> (2) Is it worth making this work?

No, I don't think so. The odds of such a test ever showing anything
interesting seem minimal.

plpgsql's inability to cope with the case would be nice to fix, but
I'm not holding my breath for it...

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-06 02:48:37
Message-ID:	603c8f070908051948o10832925paae2db13e0e5eb15@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
>> Well, the whole explain output format is pretty idiosyncratic, and I
>> had to work pretty hard to beat it into submission. I think that it
>> would not be totally trivial to do what you're suggesting here because
>> it would require adding code to manage es->indent outside of
>> ExplainPrintPlan(), which we currently don't. I'm not sure whether
>> that works out to a net win.
> Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
> patch. The result looks simpler for my eyes.

I looked at this some more. I think it's a mess. It's probably right
to do what you're suggesting here, but this patch only changes pieces
of it without making the whole thing consistent. report_triggers(),
for example, kludges a value into es->indent but then ignores it when
determining how much to indent <Trigger>, etc. I can't even figure
out why this works now, let alone being able to maintain it down the
line.

I'm working on trying to fix this.

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-07 01:56:20
Message-ID:	603c8f070908061856h546556f5iad648410d1d34e0c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 5, 2009 at 10:48 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Aug 2, 2009 at 7:29 PM, Andres Freund<andres(at)anarazel(dot)de> wrote:
>>> Well, the whole explain output format is pretty idiosyncratic, and I
>>> had to work pretty hard to beat it into submission. I think that it
>>> would not be totally trivial to do what you're suggesting here because
>>> it would require adding code to manage es->indent outside of
>>> ExplainPrintPlan(), which we currently don't. I'm not sure whether
>>> that works out to a net win.
>> Thats why I suggested doing it for JSON/XML only. E.g. like in the attached
>> patch. The result looks simpler for my eyes.
>
> I looked at this some more. I think it's a mess. It's probably right
> to do what you're suggesting here, but this patch only changes pieces
> of it without making the whole thing consistent. report_triggers(),
> for example, kludges a value into es->indent but then ignores it when
> determining how much to indent <Trigger>, etc. I can't even figure
> out why this works now, let alone being able to maintain it down the
> line.
>
> I'm working on trying to fix this.

Revised patch attached. I'm not convinced this is as good as it can
be, but I've been looking at this patch for so long that I'm starting
to get cross-eyed, and I'd like to Tom at least have a look at this
and assess it before we run out of CommitFest.

...Robert

Attachment	Content-Type	Size
explain_format-v5.patch	text/x-diff	73.5 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-09 19:57:00
Message-ID:	25411.1249847820@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Revised patch attached. I'm not convinced this is as good as it can
> be, but I've been looking at this patch for so long that I'm starting
> to get cross-eyed, and I'd like to Tom at least have a look at this
> and assess it before we run out of CommitFest.

I'm starting to look at this now. I feel unqualified to opine on the
quality of the XML/JSON schema design, and given the utter lack of
documentation of what that design is, I'm not sure that anyone who
could comment on it has done so. Could we have a spec please?

Also, the JSON code seems a bit messy/poorly factorized. Is there
a reason for that, or just it's not as mature as the XML code?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-09 22:19:37
Message-ID:	603c8f070908091519g3f54248bk93e82773ad900432@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 9, 2009 at 3:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Revised patch attached. I'm not convinced this is as good as it can
>> be, but I've been looking at this patch for so long that I'm starting
>> to get cross-eyed, and I'd like to Tom at least have a look at this
>> and assess it before we run out of CommitFest.
>
> I'm starting to look at this now. I feel unqualified to opine on the
> quality of the XML/JSON schema design, and given the utter lack of
> documentation of what that design is, I'm not sure that anyone who
> could comment on it has done so. Could we have a spec please?

*scratches head*

You're not the first person to make that request, and I'd like to
respond to it to well, but I don't really know what to write. Most of
the discussion about the XML/JSON output format thus far has been
around things like whether we should downcase everything, and even the
people offering these comments have mostly labelled them with words to
the effect of "I know this is trival but...". I think that the reason
for this is that fundamentally explain output is fundamentally a tree,
and XML and JSON both have ways of representing a tree with properties
hanging off the nodes, and this patch uses those ways. I can't figure
out what else there is, so I don't know what I'm explaining why I
didn't do.

The one significant representational choice that I'm aware of having
made is to use nested tags rather than attributes in the XML format.
This seems to me to offer several advantages. First, it's clearly
impossible to standardize on attributes, because attributes can only
be text, and it seems to me that if we're going to try to output
structured data, we want to take that as far as we can, and we have
attributes (like sort keys) that are lists rather than scalars. Using
tags means that they can have substructure when needed. Second, it
seems likely to me that people will want to extend explain further in
the future: indeed, that was the whole point of the explain-options
patch which was already committed. That's pretty simple in the
current design - just add a few more calls to ExplainPropertyText or
ExplainPropertyList in the appropriate place, and you're done. I'm
pretty sure that splitting things up between attributes and nested
tags would complicate such modifications.

Peter Eisentraut, in an earlier review of this patch, complained about
the format as well, saying something along the lines of "this is
trying to be all things to all people". I don't want to dismiss that
criticism, but neither can I put my finger on the problem. In an
ideal world, we'd like to be all things to all people, but it's
usually not possible to achieve that in practice. Still, it's not
clear to me what need this wouldn't serve. It's possible to generate
the text format from the XML or JSON format, so it should be
well-suited to graphical presentation of explain output. It's also
possible to grope through the output and, say, find the average cost
of all your seqscan nodes, or verify the absence of merge joins, or
anything of that sort that someone might think that they want to do.

In a nutshell, the design is "take all the fields we have now and put
XML/JSON markup around them so they're easier to get to". Maybe
that's not enough of a design, but I don't have any other ideas.

> Also, the JSON code seems a bit messy/poorly factorized. Is there
> a reason for that, or just it's not as mature as the XML code?

I wrote them together, so it's not a question of code maturity, but I
wouldn't rule out me being dumb. I'm open to suggestions... AFAICS,
the need to comma-separate list and hash elements is most of the
problem. I had thought about replacing es->needs_separator with a
list so that we could push/pop elements, but I wasn't totally sure
whether that was a good idea.

...Robert

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-09 23:21:35
Message-ID:	4A7F59FF.4060302@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> The one significant representational choice that I'm aware of having
> made is to use nested tags rather than attributes in the XML format.
> This seems to me to offer several advantages. First, it's clearly
> impossible to standardize on attributes, because attributes can only
> be text, and it seems to me that if we're going to try to output
> structured data, we want to take that as far as we can, and we have
> attributes (like sort keys) that are lists rather than scalars. Using
> tags means that they can have substructure when needed. Second, it
> seems likely to me that people will want to extend explain further in
> the future: indeed, that was the whole point of the explain-options
> patch which was already committed. That's pretty simple in the
> current design - just add a few more calls to ExplainPropertyText or
> ExplainPropertyList in the appropriate place, and you're done. I'm
> pretty sure that splitting things up between attributes and nested
> tags would complicate such modifications.
>
>
>

In general, in XML one uses an attribute for a named property of an
object that can only have one value at a time. A classic example is the
dimensions of an object - it can only have one width and height.
Children (nested tags, particularly) are used for things it can have an
arbitrary number of, or things which in turn can have children. the
HTML <p> and <body> elements are (respectively) examples of these.
Generally, attribute values especially should be short - I recently saw
an example that had an entire image hex encoded in an XML attribute,
which struck me as just horrible. Enumerations, date and time values,
booleans, measurements - these are common types of attribute values.
Extracting a value from an attribute is no more or less difficult than
from a nested tag, using the XPath query language.

The XML Schema standard is a language for specifying the structure of a
given XML document type, and while it is undoubtedly complex, it is also
much more powerful than the older DTD mechanism. I think we should be
creating (and publishing) an XML Schema specification for any XML
documents we are producing. There are a number of members of the
community who are equipped to help produce these.

There is probably a good case for using an explicit namespace with such
docs. So we might have something like:

<pg:explain
xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> ....

BTW, has anyone tried validating the XML at all? I just looked very
briefly at the patch at
<http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
I noticed this which makes me suspicious:

+ if (es.format == EXPLAIN_FORMAT_XML)
+ appendStringInfoString(es.str,
+ "<explain xmlns=\"http://www.postgresql.org/2009/explain\" <http://www.postgresql.org/2009/explain%5C%22>;>\n");

That ";" after the attribute is almost certainly wrong. This is a classic case of what I was talking about a month or two ago. Building up XML (or any structured doc, really, XML is not special in this regard) by ad hoc methods is horribly error prone. if you don't want to rely on libxml, then I think you need to develop a lightweight abstraction rather than just appending to a StringInfo.

cheers

andrew

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 00:08:28
Message-ID:	200908100208.28769.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote:
> Robert Haas wrote:
> > The one significant representational choice that I'm aware of having
> > made is to use nested tags rather than attributes in the XML format.
> > This seems to me to offer several advantages. First, it's clearly
> > impossible to standardize on attributes, because attributes can only
> > be text, and it seems to me that if we're going to try to output
> > structured data, we want to take that as far as we can, and we have
> > attributes (like sort keys) that are lists rather than scalars. Using
> > tags means that they can have substructure when needed. Second, it
> > seems likely to me that people will want to extend explain further in
> > the future: indeed, that was the whole point of the explain-options
> > patch which was already committed. That's pretty simple in the
> > current design - just add a few more calls to ExplainPropertyText or
> > ExplainPropertyList in the appropriate place, and you're done. I'm
> > pretty sure that splitting things up between attributes and nested
> > tags would complicate such modifications.
> The XML Schema standard is a language for specifying the structure of a
> given XML document type, and while it is undoubtedly complex, it is also
> much more powerful than the older DTD mechanism. I think we should be
> creating (and publishing) an XML Schema specification for any XML
> documents we are producing. There are a number of members of the
> community who are equipped to help produce these.
I produced/mailed a relaxng version for a a bit older version and I plan to
refresh and document it once the format seems suitably stable. I am not sure
it is yet. If yes, this should not take that long...
(Relaxng because you easily can convert it into most other XML schema
description languages)

> There is probably a good case for using an explicit namespace with such
> docs. So we might have something like:
>
> <pg:explain
> xmlns:pg="http://www.postgresql.org/xmlspecs/explain/v1.xsd"> ....
>
> BTW, has anyone tried validating the XML at all? I just looked very
> briefly at the patch at
> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
> I noticed this which makes me suspicious:
>
> + if (es.format == EXPLAIN_FORMAT_XML)
> + appendStringInfoString(es.str,
> + "<explain xmlns=\"http://www.postgresql.org/2009/explain\"
> <http://www.postgresql.org/2009/explain%5C%22>;>\n");
That bug is fixed - as referenced above I wrote a schema and validated it. So,
yes, the generated XML was valid at least before the last round of
refactoring. And I looked through the output quite a bit so I would surprised
if there is such a breakage.

> That ";" after the attribute is almost certainly wrong. This is a classic
> case of what I was talking about a month or two ago. Building up XML (or
> any structured doc, really, XML is not special in this regard) by ad hoc
> methods is horribly error prone. if you don't want to rely on libxml, then
> I think you need to develop a lightweight abstraction rather than just
> appending to a StringInfo.
Actually by now a non-insignificant portion already "outsources" this - only
some special cases (empty attributes, no newlines wanted, initial element with
namespace) do not do this.

While it would be possible to add another step inbetween and generate a format
neutral tree and generate the different formats out of it I am not sure that
this is worthwile.
The current text format will need to stay special cased anyway because its far
to inconsistent to generate it from anything abstract and I don't see any
completely new formats coming (i.e. not just optional parts)?

Andres

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 00:42:36
Message-ID:	4A7F6CFC.20403@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund wrote:
>> BTW, has anyone tried validating the XML at all? I just looked very
>> briefly at the patch at
>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
>> I noticed this which makes me suspicious:
>>
>> + if (es.format == EXPLAIN_FORMAT_XML)
>> + appendStringInfoString(es.str,
>> + "<explain xmlns=\"http://www.postgresql.org/2009/explain\"
>> <http://www.postgresql.org/2009/explain%5C%22>;>\n");
>>
> That bug is fixed - as referenced above I wrote a schema and validated it. So,
> yes, the generated XML was valid at least before the last round of
> refactoring. And I looked through the output quite a bit so I would surprised
> if there is such a breakage.
> then someone needs to update the commitfest page with the lastest patch. That's the link I followed.
>
>
>
>

Hmm. I wonder how i got the link to an old version of the patch.

Anyway, I'm glad it's fixed.

cheers

andrew

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 00:48:29
Message-ID:	4A7F6E5D.2040703@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan wrote:
>
>
> Andres Freund wrote:
>>> BTW, has anyone tried validating the XML at all? I just looked very
>>> briefly at the patch at
>>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
>>> I noticed this which makes me suspicious:
>>>
>>> + if (es.format == EXPLAIN_FORMAT_XML)
>>> + appendStringInfoString(es.str,
>>> + "<explain
>>> xmlns=\"http://www.postgresql.org/2009/explain\"
>>> <http://www.postgresql.org/2009/explain%5C%22>;>\n");
>>>
>> That bug is fixed - as referenced above I wrote a schema and
>> validated it. So, yes, the generated XML was valid at least before
>> the last round of refactoring. And I looked through the output quite
>> a bit so I would surprised if there is such a breakage.
>> then someone needs to update the commitfest page with the lastest
>> patch. That's the link I followed.
>>
>>
>>
>
> Hmm. I wonder how i got the link to an old version of the patch.
>
> Anyway, I'm glad it's fixed.

I takle it back. It's still there at
<http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php>
posted 3 days ago.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 00:53:16
Message-ID:	3387.1249865596@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote:
>> That ";" after the attribute is almost certainly wrong. This is a classic
>> case of what I was talking about a month or two ago. Building up XML (or
>> any structured doc, really, XML is not special in this regard) by ad hoc
>> methods is horribly error prone. if you don't want to rely on libxml, then
>> I think you need to develop a lightweight abstraction rather than just
>> appending to a StringInfo.

> While it would be possible to add another step inbetween and generate a format
> neutral tree and generate the different formats out of it I am not sure that
> this is worthwile.
> The current text format will need to stay special cased anyway because its far
> to inconsistent to generate it from anything abstract and I don't see any
> completely new formats coming (i.e. not just optional parts)?

The patch as-submitted does have a lightweight abstraction layer of
sorts, but the main code line feels free to ignore that and hack around
it. I agree that we can't avoid special-casing the text output, but
I'm trying to improve the encapsulation otherwise. What I've got at the
moment is attached. I'd be interested in comments on the grouping
notion in particular --- I reverse-engineered that out of what the code
was doing, and I'm sure it could use improvement.

regards, tom lane

/*
* Explain a property, such as sort keys or targets, that takes the form of
* a list of unlabeled items. "data" is a list of C strings.
*/
static void
ExplainPropertyList(const char *qlabel, List *data, ExplainState *es)
{
ListCell *lc;
bool first = true;

switch (es->format)
{
case EXPLAIN_FORMAT_TEXT:
appendStringInfoSpaces(es->str, es->indent * 2);
appendStringInfo(es->str, " %s: ", qlabel);
foreach(lc, data)
{
if (!first)
appendStringInfoString(es->str, ", ");
appendStringInfoString(es->str, (const char *) lfirst(lc));
first = false;
}
appendStringInfoChar(es->str, '\n');
break;

case EXPLAIN_FORMAT_XML:
ExplainXMLTag(qlabel, X_OPENING, es);
foreach(lc, data)
{
char *str;

appendStringInfoSpaces(es->str, es->indent * 2 + 2);
appendStringInfoString(es->str, "<Item>");
str = escape_xml((const char *) lfirst(lc));
appendStringInfoString(es->str, str);
pfree(str);
appendStringInfoString(es->str, "</Item>\n");
}
ExplainXMLTag(qlabel, X_CLOSING, es);
break;

case EXPLAIN_FORMAT_JSON:
ExplainJSONLineEnding(es);
appendStringInfoSpaces(es->str, es->indent * 2);
escape_json(es->str, qlabel);
appendStringInfoString(es->str, ": [");
foreach(lc, data)
{
if (!first)
appendStringInfoString(es->str, ", ");
escape_json(es->str, (const char *) lfirst(lc));
first = false;
}
appendStringInfoChar(es->str, ']');
break;
}
}

#define ExplainPropertyText(qlabel, value, es) \
ExplainProperty(qlabel, value, false, es)

/*
* Explain a simple property.
*
* If "numeric" is true, the value is a number (or other value that
* doesn't need quoting in JSON).
*/
static void
ExplainProperty(const char *qlabel, const char *value, bool numeric,
ExplainState *es)
{
switch (es->format)
{
case EXPLAIN_FORMAT_TEXT:
appendStringInfoSpaces(es->str, es->indent * 2);
appendStringInfo(es->str, " %s: %s\n", qlabel, value);
break;

case EXPLAIN_FORMAT_XML:
{
char *str;

appendStringInfoSpaces(es->str, es->indent * 2);
ExplainXMLTag(qlabel, X_OPENING | X_NOWHITESPACE, es);
str = escape_xml(value);
appendStringInfoString(es->str, str);
pfree(str);
ExplainXMLTag(qlabel, X_CLOSING | X_NOWHITESPACE, es);
appendStringInfoChar(es->str, '\n');
}
break;

case EXPLAIN_FORMAT_JSON:
ExplainJSONLineEnding(es);
appendStringInfoSpaces(es->str, es->indent * 2);
escape_json(es->str, qlabel);
appendStringInfoString(es->str, ": ");
if (numeric)
appendStringInfoString(es->str, value);
else
escape_json(es->str, value);
break;
}
}

/*
* Explain an integer-valued property.
*/
static void
ExplainPropertyInteger(const char *qlabel, int value, ExplainState *es)
{
char buf[32];

snprintf(buf, sizeof(buf), "%d", value);
ExplainProperty(qlabel, buf, true, es);
}

/*
* Explain a long-integer-valued property.
*/
static void
ExplainPropertyLong(const char *qlabel, long value, ExplainState *es)
{
char buf[32];

snprintf(buf, sizeof(buf), "%ld", value);
ExplainProperty(qlabel, buf, true, es);
}

/*
* Explain a float-valued property, using the specified number of
* fractional digits.
*/
static void
ExplainPropertyFloat(const char *qlabel, double value, int ndigits,
ExplainState *es)
{
char buf[256];

snprintf(buf, sizeof(buf), "%.*f", ndigits, value);
ExplainProperty(qlabel, buf, true, es);
}

/*
* Open a group of related objects.
*
* objtype is the type of the group object, labelname is its label within
* a containing object (if any).
*
* If labeled is true, the group members will be labeled properties,
* while if it's false, they'll be unlabeled objects.
*/
static void
ExplainOpenGroup(const char *objtype, const char *labelname,
bool labeled, ExplainState *es)
{
switch (es->format)
{
case EXPLAIN_FORMAT_TEXT:
/* nothing to do */
break;

case EXPLAIN_FORMAT_XML:
ExplainXMLTag(objtype, X_OPENING, es);
break;

case EXPLAIN_FORMAT_JSON:
ExplainJSONLineEnding(es);
appendStringInfoSpaces(es->str, 2 * es->indent);
if (labelname)
{
escape_json(es->str, labelname);
appendStringInfoString(es->str, ": ");
}
appendStringInfoChar(es->str, labeled ? '{' : '[');

/*
* In JSON mode, the grouping_stack is an integer list. 0 means
* we've emitted nothing at this grouping level, 1 means we've
* emitted something (and so the next item needs a comma).
* See ExplainJSONLineEnding().
*/
es->grouping_stack = lcons_int(0, es->grouping_stack);
break;
}
es->indent++;
}

/*
* Close a group of related objects.
* Parameters must match the corresponding ExplainOpenGroup call.
*/
static void
ExplainCloseGroup(const char *objtype, const char *labelname,
bool labeled, ExplainState *es)
{
es->indent--;
switch (es->format)
{
case EXPLAIN_FORMAT_TEXT:
/* nothing to do */
break;

case EXPLAIN_FORMAT_XML:
ExplainXMLTag(objtype, X_CLOSING, es);
break;

case EXPLAIN_FORMAT_JSON:
appendStringInfoChar(es->str, '\n');
appendStringInfoSpaces(es->str, 2 * es->indent);
appendStringInfoChar(es->str, labeled ? '}' : ']');
es->grouping_stack = list_delete_first(es->grouping_stack);
break;
}
}

/*
* Emit opening or closing XML tag.
*
* "flags" must contain X_OPENING or X_CLOSING, which optionally can be
* OR'ed with X_NOWHITESPACE to suppress the whitespace we'd normally add.
*
* XML tag names can't contain white space, so we replace any spaces in
* "tagname" with dashes.
*/
static void
ExplainXMLTag(const char *tagname, int flags, ExplainState *es)
{
const char *s;

if ((flags & X_NOWHITESPACE) == 0)
appendStringInfoSpaces(es->str, 2 * es->indent);
appendStringInfoCharMacro(es->str, '<');
if ((flags & X_CLOSING) != 0)
appendStringInfoCharMacro(es->str, '/');
for (s = tagname; *s; s++)
appendStringInfoCharMacro(es->str, (*s == ' ') ? '-' : *s);
appendStringInfoCharMacro(es->str, '>');
if ((flags & X_NOWHITESPACE) == 0)
appendStringInfoCharMacro(es->str, '\n');
}

/*
* Emit a JSON line ending.
*
* JSON requires a comma after each property but the last. To facilitate this,
* in JSON mode, the text emitted for each property begins just prior to the
* preceding line-break (and comma, if applicable).
*/
static void
ExplainJSONLineEnding(ExplainState *es)
{
Assert(es->format == EXPLAIN_FORMAT_JSON);
if (linitial_int(es->grouping_stack) != 0)
appendStringInfoChar(es->str, ',');
else
linitial_int(es->grouping_stack) = 1;
appendStringInfoChar(es->str, '\n');
}

/*
* Produce a JSON string literal, properly escaping characters in the text.
*/
static void
escape_json(StringInfo buf, const char *str)
{
const char *p;

appendStringInfoCharMacro(buf, '\"');
for (p = str; *p; p++)
{
switch (*p)
{
case '\b':
appendStringInfoString(buf, "\\b");
break;
case '\f':
appendStringInfoString(buf, "\\f");
break;
case '\n':
appendStringInfoString(buf, "\\n");
break;
case '\r':
appendStringInfoString(buf, "\\r");
break;
case '\t':
appendStringInfoString(buf, "\\t");
break;
case '"':
appendStringInfoString(buf, "\\\"");
break;
case '\\':
appendStringInfoString(buf, "\\\\");
break;
default:
if ((unsigned char) *p < ' ')
appendStringInfo(buf, "\\u%04x", (int) *p);
else
appendStringInfoCharMacro(buf, *p);
break;
}
}
appendStringInfoCharMacro(buf, '\"');
}

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:00:12
Message-ID:	603c8f070908091800h2aaaaec7nf24d329cc3728214@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 9, 2009 at 8:48 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
>
> Andrew Dunstan wrote:
>>
>>
>> Andres Freund wrote:
>>>>
>>>> BTW, has anyone tried validating the XML at all? I just looked very
>>>> briefly at the patch at
>>>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
>>>> I noticed this which makes me suspicious:
>>>>
>>>> + if (es.format == EXPLAIN_FORMAT_XML)
>>>> + appendStringInfoString(es.str,
>>>> + "<explain xmlns=\"http://www.postgresql.org/2009/explain\"
>>>> <http://www.postgresql.org/2009/explain%5C%22>;>\n");
>>>>
>>>
>>> That bug is fixed - as referenced above I wrote a schema and validated
>>> it. So, yes, the generated XML was valid at least before the last round of
>>> refactoring. And I looked through the output quite a bit so I would
>>> surprised if there is such a breakage.
>>> then someone needs to update the commitfest page with the lastest patch.
>>> That's the link I followed.
>>>
>>>
>>>
>>
>> Hmm. I wonder how i got the link to an old version of the patch.
>>
>> Anyway, I'm glad it's fixed.
>
>
> I takle it back. It's still there at
> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php> posted 3
> days ago.

What the hell? I have every version of that patch I've ever submitted
in ~/patch/explain-as-submitted, and that extra semicolon is not there
in any of them. Furthermore, when I open up the attachment from my
sent mail, the semicolon isn't there either. Yet I see it at the link
you provided just as clearly as you do. Is there a bug in the
archives code???

...Robert

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:00:22
Message-ID:	200908100300.23143.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 02:48:29 Andrew Dunstan wrote:
> Andrew Dunstan wrote:
> > Andres Freund wrote:
> >>> BTW, has anyone tried validating the XML at all? I just looked very
> >>> briefly at the patch at
> >>> <http://archives.postgresql.org/pgsql-hackers/2009-07/msg01944.php> and
> >>> I noticed this which makes me suspicious:
> >>>
> >>> + if (es.format == EXPLAIN_FORMAT_XML)
> >>> + appendStringInfoString(es.str,
> >>> + "<explain
> >>> xmlns=\"http://www.postgresql.org/2009/explain\"
> >>> <http://www.postgresql.org/2009/explain%5C%22>;>\n");
> >>
> >> That bug is fixed - as referenced above I wrote a schema and
> >> validated it. So, yes, the generated XML was valid at least before
> >> the last round of refactoring. And I looked through the output quite
> >> a bit so I would surprised if there is such a breakage.
> >> then someone needs to update the commitfest page with the lastest
> >> patch. That's the link I followed.
> >
> > Hmm. I wonder how i got the link to an old version of the patch.
> >
> > Anyway, I'm glad it's fixed.
>
> I takle it back. It's still there at
> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php>
> posted 3 days ago.
That seems to be a failure of the mail interface? I neither see it when
opening the attachement manually nor in the git repository?

Andres

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:03:25
Message-ID:	3545.1249866205@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I takle it back. It's still there at
> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php>
> posted 3 days ago.

Hmm, I think the archive website must be mangling that somehow.
What I have in the code I'm reviewing is

if (es.format == EXPLAIN_FORMAT_XML)
appendStringInfoString(es.str,
"<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n");

I was planning to complain about the format of this URL --- shouldn't it
be more like http://www.postgresql.org/explain/v1 ? --- but there's
no semicolon.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:04:05
Message-ID:	603c8f070908091804j6d8825b5s7f77efa163983f1b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 9, 2009 at 9:03 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> I takle it back. It's still there at
>> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php>
>> posted 3 days ago.
>
> Hmm, I think the archive website must be mangling that somehow.
> What I have in the code I'm reviewing is
>
> if (es.format == EXPLAIN_FORMAT_XML)
> appendStringInfoString(es.str,
> "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n");
>
> I was planning to complain about the format of this URL --- shouldn't it
> be more like http://www.postgresql.org/explain/v1 ? --- but there's
> no semicolon.

Peter Eisentraut suggested it. Take it up with him because I don't care. :-)

...Robert

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:16:21
Message-ID:	603c8f070908091816kbdf6f84i585c127a47638d72@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 9, 2009 at 8:53 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote:
>>> That ";" after the attribute is almost certainly wrong. This is a classic
>>> case of what I was talking about a month or two ago. Building up XML (or
>>> any structured doc, really, XML is not special in this regard) by ad hoc
>>> methods is horribly error prone. if you don't want to rely on libxml, then
>>> I think you need to develop a lightweight abstraction rather than just
>>> appending to a StringInfo.
>
>> While it would be possible to add another step inbetween and generate a format
>> neutral tree and generate the different formats out of it I am not sure that
>> this is worthwile.
>> The current text format will need to stay special cased anyway because its far
>> to inconsistent to generate it from anything abstract and I don't see any
>> completely new formats coming (i.e. not just optional parts)?
>
> The patch as-submitted does have a lightweight abstraction layer of
> sorts, but the main code line feels free to ignore that and hack around
> it. I agree that we can't avoid special-casing the text output, but
> I'm trying to improve the encapsulation otherwise. What I've got at the
> moment is attached. I'd be interested in comments on the grouping
> notion in particular --- I reverse-engineered that out of what the code
> was doing, and I'm sure it could use improvement.

I haven't tested it but it looks pretty good to me. I probably should
have done something like this to begin with, but I fell into the trap
of being too timid about introducing new concepts.

One subtle point that isn't documented and probably should be is that
JSON can't support a container that behaves partly like a list and
partly like a hash, as XML can. So for example in XML a <Plan> tag
could have children like <Startup-Cost> (one each) and could also have
its inner, outer, and sub-plans in there as <Plan> tags right under
the parent <Plan>. I'm not sure this would be good design anyway, but
it COULD be done. In JSON, this will crash and burn, because the
container is either an array (which precludes labelling the elements)
or a hash (which precludes duplicates).

I've avoided this problem by introducing an additional layer of
grouping whenever this situation comes up; for symmetry it exists in
both output formats. So for example in the above-mentioned case the
<Plan> has a child called <Plans> which in turn contains each <Plan>
that belongs under the parent; in JSON, this maps nicely to a property
called "Plans" which points to an array of hashes, each of which
represents a plan.

We should probably have a long comment somewhere in explain.c
explaining all of this. It's the sort of thing that you figure out
you need to have when you're writing the code, but it might not be too
obvious reading the code after the fact. It's also a reason why I'm
really glad I decided to write two alternative output formats;
otherwise, I fear that we would have blundered into a bunch of
XML-isms that wouldn't have been translatable into anything else.

...Robert

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:19:56
Message-ID:	200908100319.56863.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 02:53:16 Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On Monday 10 August 2009 01:21:35 Andrew Dunstan wrote:
> >> That ";" after the attribute is almost certainly wrong. This is a
> >> classic case of what I was talking about a month or two ago. Building up
> >> XML (or any structured doc, really, XML is not special in this regard)
> >> by ad hoc methods is horribly error prone. if you don't want to rely on
> >> libxml, then I think you need to develop a lightweight abstraction
> >> rather than just appending to a StringInfo.
> >
> > While it would be possible to add another step inbetween and generate a
> > format neutral tree and generate the different formats out of it I am not
> > sure that this is worthwile.
> > The current text format will need to stay special cased anyway because
> > its far to inconsistent to generate it from anything abstract and I don't
> > see any completely new formats coming (i.e. not just optional parts)?
>
> The patch as-submitted does have a lightweight abstraction layer of
> sorts, but the main code line feels free to ignore that and hack around
> it. I agree that we can't avoid special-casing the text output, but
> I'm trying to improve the encapsulation otherwise. What I've got at the
> moment is attached. I'd be interested in comments on the grouping
> notion in particular --- I reverse-engineered that out of what the code
> was doing, and I'm sure it could use improvement.
Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or
handling of X_OPENING|X_CLOSING would allow to handle empty tags like in
ExplainOneUtility (<Notify />).

Otherwise it seems to look nice.

Andres

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:32:08
Message-ID:	6877.1249867928@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or
> handling of X_OPENING|X_CLOSING would allow to handle empty tags like in
> ExplainOneUtility (<Notify />).

Yeah, I was just wondering what to do with the <Notify /> code. I'm
not sure if it's worth trying to fold it into the OpenGroup/CloseGroup
code --- it might be easier to have an ExplainEmptyGroup or something
like that.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:34:36
Message-ID:	603c8f070908091834l78e08eb4m8f6d41479472eed2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
>> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE or
>> handling of X_OPENING|X_CLOSING would allow to handle empty tags like in
>> ExplainOneUtility (<Notify />).
>
> Yeah, I was just wondering what to do with the <Notify /> code. I'm
> not sure if it's worth trying to fold it into the OpenGroup/CloseGroup
> code --- it might be easier to have an ExplainEmptyGroup or something
> like that.

Well there's no rule that says you can't emit it as

...Robert

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:43:22
Message-ID:	200908100343.23128.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 03:34:36 Robert Haas wrote:
> On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andres Freund <andres(at)anarazel(dot)de> writes:
> >> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE
> >> or handling of X_OPENING|X_CLOSING would allow to handle empty tags like
> >> in ExplainOneUtility (<Notify />).
> >
> > Yeah, I was just wondering what to do with the <Notify /> code. I'm
> > not sure if it's worth trying to fold it into the OpenGroup/CloseGroup
> > code --- it might be easier to have an ExplainEmptyGroup or something
> > like that.
>
> Well there's no rule that says you can't emit it as
>
> <Notify>
> </Notify>
That wont work nicely with json output.

Andres

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 01:47:59
Message-ID:	4A7F7C4F.2040305@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> One subtle point that isn't documented and probably should be is that
> JSON can't support a container that behaves partly like a list and
> partly like a hash, as XML can. So for example in XML a <Plan> tag
> could have children like <Startup-Cost> (one each) and could also have
> its inner, outer, and sub-plans in there as <Plan> tags right under
> the parent <Plan>. I'm not sure this would be good design anyway, but
> it COULD be done. In JSON, this will crash and burn, because the
> container is either an array (which precludes labelling the elements)
> or a hash (which precludes duplicates).
>
>
>

Right, this is fairly well known, I think. There are methods to map XML
to JSON, and it can even be done in such a way that you can make a
complete round trip, but in the schemes I've seen the JSON doesn't
really look like what you would use if you designed the JSON document
from scratch, or if it does then you're losing something.

cheers

andrew

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 02:02:47
Message-ID:	200908100402.47713.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 03:43:22 Andres Freund wrote:
> On Monday 10 August 2009 03:34:36 Robert Haas wrote:
> > On Sun, Aug 9, 2009 at 9:32 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > > Andres Freund <andres(at)anarazel(dot)de> writes:
> > >> Adding the notion of opening a 'empty' Group together with X_OPENCLOSE
> > >> or handling of X_OPENING|X_CLOSING would allow to handle empty tags
> > >> like in ExplainOneUtility (<Notify />).
> > >
> > > Yeah, I was just wondering what to do with the <Notify /> code. I'm
> > > not sure if it's worth trying to fold it into the OpenGroup/CloseGroup
> > > code --- it might be easier to have an ExplainEmptyGroup or something
> > > like that.
> >
> > Well there's no rule that says you can't emit it as
> > <Notify>
> > </Notify>
> That wont work nicely with json output.
I have not the slightest clue what I was thinking while typing that...

Andres

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 05:56:47
Message-ID:	15639.1249883807@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Committed after significant hacking to try to make the format
abstraction layer a tad more complete.

There are still some open issues:

* I still think we need a written spec for the non-text output formats.
One of the problems with machine reading of the text format is you have
to reverse-engineer what the possibilities are, and this patch hasn't
made that better. A list of the possible fields, and the possible
values for those fields that have finite domains, would be a start.

* There are some decisions that seem a bit questionable to me, like
using "Parent Relationship" tags rather than having the child plans
as labeled attributes of the parent node. But I can't really evaluate
this for lack of experience with designing XML/JSON structures.

* As already noted, the URL for the XML schema seems questionable.
I think that versioning should go more like v1, v2, ... instead of
being tied to a year.

* I complained earlier that I thought dumping expressions as text
was pretty bogus --- it'll leave anything that's trying to
do analysis in depth still having to parse complicated stuff.
I don't know exactly what I want instead, but at the very least it
seems like the variables used in an expression ought to be more
readily available.

Anyway, it's committed so that people can play with it. We're a
lot more likely to get feedback if people actually try to use the
feature.

regards, tom lane

From:	Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 06:09:40
Message-ID:	4A7FB9A4.207@kaltenbrunner.cc
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> I takle it back. It's still there at
>> <http://archives.postgresql.org/pgsql-hackers/2009-08/msg00485.php>
>> posted 3 days ago.
>
> Hmm, I think the archive website must be mangling that somehow.
> What I have in the code I'm reviewing is
>
> if (es.format == EXPLAIN_FORMAT_XML)
> appendStringInfoString(es.str,
> "<explain xmlns=\"http://www.postgresql.org/2009/explain\">\n");
>
> I was planning to complain about the format of this URL --- shouldn't it
> be more like http://www.postgresql.org/explain/v1 ? --- but there's
> no semicolon.

that url seems too general anyway - can we do something like
http://www.postgresql.org/schema/explain/v1 or
http://www.postgresql.org/xml/2009/explain/?

Stefan

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 11:08:40
Message-ID:	603c8f070908100408s578b31b5w879e2163acf916a5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Revised patch attached. I'm not convinced this is as good as it can
>> be, but I've been looking at this patch for so long that I'm starting
>> to get cross-eyed, and I'd like to Tom at least have a look at this
>> and assess it before we run out of CommitFest.
>
> Committed after significant hacking to try to make the format
> abstraction layer a tad more complete.

Looks nice, thank you.

> There are still some open issues:
>
> * I still think we need a written spec for the non-text output formats.
> One of the problems with machine reading of the text format is you have
> to reverse-engineer what the possibilities are, and this patch hasn't
> made that better. A list of the possible fields, and the possible
> values for those fields that have finite domains, would be a start.

Where would we put this in the documentation? Seems like it might
need a new section/chapter somewhere.

> * There are some decisions that seem a bit questionable to me, like
> using "Parent Relationship" tags rather than having the child plans
> as labeled attributes of the parent node. But I can't really evaluate
> this for lack of experience with designing XML/JSON structures.

That would work for XML, but I think it doesn't for JSON.

> * As already noted, the URL for the XML schema seems questionable.
> I think that versioning should go more like v1, v2, ... instead of
> being tied to a year.

Or what about being based on the major PostgreSQL major version?
Would it be lame to think about something like
http://www.postgresql.org/docs/8.5/static/sql-explain.html ?

> * I complained earlier that I thought dumping expressions as text
> was pretty bogus --- it'll leave anything that's trying to
> do analysis in depth still having to parse complicated stuff.
> I don't know exactly what I want instead, but at the very least it
> seems like the variables used in an expression ought to be more
> readily available.
>
> Anyway, it's committed so that people can play with it. We're a
> lot more likely to get feedback if people actually try to use the
> feature.

Awesome.

...Robert

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 12:39:22
Message-ID:	4A8014FA.10507@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund wrote:
> I produced/mailed a relaxng version for a a bit older version and I plan to
> refresh and document it once the format seems suitably stable. I am not sure
> it is yet. If yes, this should not take that long...
> (Relaxng because you easily can convert it into most other XML schema
> description languages)
>
>

I don't mind doing both, but I think one should be authoritative, and
whatever the relative technical merits are (please, let's not debate
that here) the fact after quite some years is that XML Schemas have much
more traction. See the thread that starts at
<http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For
example, Xerces-J supports XML Schemas natively.

Anyway, now it's committed I will be having a play with it.

cheers

andrew

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 12:42:43
Message-ID:	200908101442.44971.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Monday 10 August 2009 14:39:22 Andrew Dunstan wrote:
> Andres Freund wrote:
> > I produced/mailed a relaxng version for a a bit older version and I plan
> > to refresh and document it once the format seems suitably stable. I am
> > not sure it is yet. If yes, this should not take that long...
> > (Relaxng because you easily can convert it into most other XML schema
> > description languages)
> I don't mind doing both, but I think one should be authoritative, and
> whatever the relative technical merits are (please, let's not debate
> that here) the fact after quite some years is that XML Schemas have much
> more traction. See the thread that starts at
> <http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For
> example, Xerces-J supports XML Schemas natively.
I don't really mind which format gets choosen - I just had relaxng one
already done.
Do you plan to write a XML-Schema Schema? Just to avoid duplicated work...

Andres

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 13:00:45
Message-ID:	4A8019FD.3040302@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund wrote:
> On Monday 10 August 2009 14:39:22 Andrew Dunstan wrote:
>
>> Andres Freund wrote:
>>
>>> I produced/mailed a relaxng version for a a bit older version and I plan
>>> to refresh and document it once the format seems suitably stable. I am
>>> not sure it is yet. If yes, this should not take that long...
>>> (Relaxng because you easily can convert it into most other XML schema
>>> description languages)
>>>
>> I don't mind doing both, but I think one should be authoritative, and
>> whatever the relative technical merits are (please, let's not debate
>> that here) the fact after quite some years is that XML Schemas have much
>> more traction. See the thread that starts at
>> <http://lists.xml.org/archives/xml-dev/200804/msg00058.html>. For
>> example, Xerces-J supports XML Schemas natively.
>>
> I don't really mind which format gets choosen - I just had relaxng one
> already done.
> Do you plan to write a XML-Schema Schema? Just to avoid duplicated work...
>
>
>

I'll see what I can do.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 14:54:16
Message-ID:	1971.1249916056@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> There are still some open issues:
>>
>> * I still think we need a written spec for the non-text output formats.

> Where would we put this in the documentation? Seems like it might
> need a new section/chapter somewhere.

I think it'd be sufficient to put it on the EXPLAIN reference page.
IIRC, the COPY data format is documented on COPY's reference page,
and this is equally particular to a single SQL command.

>> * There are some decisions that seem a bit questionable to me, like
>> using "Parent Relationship" tags rather than having the child plans
>> as labeled attributes of the parent node. But I can't really evaluate
>> this for lack of experience with designing XML/JSON structures.

> That would work for XML, but I think it doesn't for JSON.

Why not? Something like

"Inner": { ... }

fits in JSON AFAICS. I don't know if there are any recognized style
guidelines in the structured-document world that would approve or
deprecate the way you've done it. I do see the advantage that code
can visit all the subplans of a plan without knowing much about the
plan tree structure. What's bothering me the most is that member
subplans of an Append are mushed together with other child plan types.
The ordering of the members is significant. Now the chosen
representation does preserve that order, but ISTM mushing all the
child plan types together like this makes it *look* like the Plans
attribute is unordered; especially when the ordering is in fact
not significant for every other child plan type.

>> * As already noted, the URL for the XML schema seems questionable.
>> I think that versioning should go more like v1, v2, ... instead of
>> being tied to a year.

> Or what about being based on the major PostgreSQL major version?
> Would it be lame to think about something like
> http://www.postgresql.org/docs/8.5/static/sql-explain.html ?

Yeah. In the first place, I imagine the schema will change a few times
before 8.5 is released. In the second, once we do get it right it'll
probably be stable across multiple releases. I think we should just
have a serial number on them, and not assume that either years or
releases are appropriate quantifiers.

>> * I complained earlier that I thought dumping expressions as text
>> was pretty bogus --- it'll leave anything that's trying to
>> do analysis in depth still having to parse complicated stuff.
>> I don't know exactly what I want instead, but at the very least it
>> seems like the variables used in an expression ought to be more
>> readily available.

On this point: basically what's bothering me is that by dumping
expressions as undifferentiated text blobs, we are making the same
mistake in small that this patch is meant to solve in large.
I don't really want to do the work of decomposing expressions right
now, but I'm not happy that it's impossible to do so without a
non-backwards-compatible DTD break. What do you think of emitting
expressions as containers, with the text as the only property?
Instead of

something like

This would leave room to add additional properties beside the text,
and not break existing clients when we do it.

Another issue that bothers me a bit is the "Strategy" field.
It may or may not appear depending on "Node Type", and when it
does appear, the possible values vary depending on "Node Type".
Is that sort of non-orthogonality considered good style?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 15:59:28
Message-ID:	603c8f070908100859ha000d35jd8b80c230f66942f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 10, 2009 at 1:56 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> There are still some open issues:
>>>
>>> * I still think we need a written spec for the non-text output formats.
>
>> Where would we put this in the documentation? Seems like it might
>> need a new section/chapter somewhere.
>
> I think it'd be sufficient to put it on the EXPLAIN reference page.
> IIRC, the COPY data format is documented on COPY's reference page,
> and this is equally particular to a single SQL command.

OK, I was just worried it might be long.

>>> * There are some decisions that seem a bit questionable to me, like
>>> using "Parent Relationship" tags rather than having the child plans
>>> as labeled attributes of the parent node. But I can't really evaluate
>>> this for lack of experience with designing XML/JSON structures.
>
>> That would work for XML, but I think it doesn't for JSON.
>
> Why not? Something like
>
> "Inner": { ... }
>
> fits in JSON AFAICS. I don't know if there are any recognized style
> guidelines in the structured-document world that would approve or
> deprecate the way you've done it. I do see the advantage that code
> can visit all the subplans of a plan without knowing much about the
> plan tree structure. What's bothering me the most is that member
> subplans of an Append are mushed together with other child plan types.
> The ordering of the members is significant. Now the chosen
> representation does preserve that order, but ISTM mushing all the
> child plan types together like this makes it *look* like the Plans
> attribute is unordered; especially when the ordering is in fact
> not significant for every other child plan type.

Oh, I see what you mean. Yeah, we could do that, and I thought about
it. I decided on this, because it would potentially let you recurse
down the tree of nodes without having to handle every kind of
sub-plan-node separately.

>>> * As already noted, the URL for the XML schema seems questionable.
>>> I think that versioning should go more like v1, v2, ... instead of
>>> being tied to a year.
>
>> Or what about being based on the major PostgreSQL major version?
>> Would it be lame to think about something like
>> http://www.postgresql.org/docs/8.5/static/sql-explain.html ?
>
> Yeah. In the first place, I imagine the schema will change a few times
> before 8.5 is released. In the second, once we do get it right it'll
> probably be stable across multiple releases. I think we should just
> have a serial number on them, and not assume that either years or
> releases are appropriate quantifiers.

That's fine then. I'm easy; the schema URL is the least interesting
part of this IMO.

>>> * I complained earlier that I thought dumping expressions as text
>>> was pretty bogus --- it'll leave anything that's trying to
>>> do analysis in depth still having to parse complicated stuff.
>>> I don't know exactly what I want instead, but at the very least it
>>> seems like the variables used in an expression ought to be more
>>> readily available.
>
> On this point: basically what's bothering me is that by dumping
> expressions as undifferentiated text blobs, we are making the same
> mistake in small that this patch is meant to solve in large.
> I don't really want to do the work of decomposing expressions right
> now, but I'm not happy that it's impossible to do so without a
> non-backwards-compatible DTD break. What do you think of emitting
> expressions as containers, with the text as the only property?
> Instead of
>
> <Filter>(f1 > 0)</Filter>
>
> something like
>
> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter>
>
> This would leave room to add additional properties beside the text,
> and not break existing clients when we do it.

Well, there you seem to be adding TWO containers, which is probably
overkill. I could see adding one.

> Another issue that bothers me a bit is the "Strategy" field.
> It may or may not appear depending on "Node Type", and when it
> does appear, the possible values vary depending on "Node Type".
> Is that sort of non-orthogonality considered good style?

It is in the land of Robert, but someone else might prefer something
different. I'm not attached to doing it this way, but as a guy who
does a lot of database work I tend to prefer to avoid having a
multiple, distinct fields for similar information. It makes it had to
read the SELECT * FROM table output in my 80-character terminal
window. :-)

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 16:13:15
Message-ID:	11607.1249920795@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter>
>>
>> This would leave room to add additional properties beside the text,
>> and not break existing clients when we do it.

> Well, there you seem to be adding TWO containers, which is probably
> overkill. I could see adding one.

Uh, no, I see one container and a property. If we do just

then where do we put additional information about the expression
when the time comes?

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 16:28:05
Message-ID:	20090810162805.GB4796@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas escribió:

> What the hell? I have every version of that patch I've ever submitted
> in ~/patch/explain-as-submitted, and that extra semicolon is not there
> in any of them. Furthermore, when I open up the attachment from my
> sent mail, the semicolon isn't there either. Yet I see it at the link
> you provided just as clearly as you do. Is there a bug in the
> archives code???

Hmm, wow, interesting. The mbox from which the archives page is created
does _not_ have a semicolon there. A(nother) Mhonarc bug perhaps?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 16:39:00
Message-ID:	603c8f070908100939i7ef9f4aet4a92c5930f8bf5f5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 10, 2009 at 10:54 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter>
>>>
>>> This would leave room to add additional properties beside the text,
>>> and not break existing clients when we do it.
>
>> Well, there you seem to be adding TWO containers, which is probably
>> overkill. I could see adding one.
>
> Uh, no, I see one container and a property. If we do just
>
> <Filter><Expr>(f1 > 0)</Expr></Filter>
>
> then where do we put additional information about the expression
> when the time comes?

I would assume you would just write:

<Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter>

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 16:47:37
Message-ID:	12204.1249922857@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Uh, no, I see one container and a property. If we do just
>>
>> <Filter><Expr>(f1 > 0)</Expr></Filter>
>>
>> then where do we put additional information about the expression
>> when the time comes?

> I would assume you would just write:

> <Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter>

Perhaps the issue would be clearer in JSON notation. We have

"Filter": "(f1 > 0)"

What I suggest is

"Filter": { "Text": "(f1 > 0)" }

I don't see where you're going to shoehorn in any additional information
without the container, and once you have the container you need to name
the property, no?

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 17:33:06
Message-ID:	603c8f070908101033p175bee9cmbc109a2055ffafd2@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 12:47 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 10, 2009 at 12:13 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Uh, no, I see one container and a property. If we do just
>>>
>>> <Filter><Expr>(f1 > 0)</Expr></Filter>
>>>
>>> then where do we put additional information about the expression
>>> when the time comes?
>
>> I would assume you would just write:
>
>> <Filter><Text>(f1 > 0)</Text><Other-Stuff>thing!</Other-Stuff></Filter>
>
> Perhaps the issue would be clearer in JSON notation. We have
>
> "Filter": "(f1 > 0)"
>
> What I suggest is
>
> "Filter": { "Text": "(f1 > 0)" }
>
> I don't see where you're going to shoehorn in any additional information
> without the container, and once you have the container you need to name
> the property, no?

I agree. The JSON looks perfect to me.

I may be thick as a post here and say "oh, I'm a moron" when you
explain this to me, but I still don't understand why that would
require the XML notation to interpose an intermediate node. Why can't
"filter" node itself can be the labelled container?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 17:45:45
Message-ID:	19479.1249926345@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I may be thick as a post here and say "oh, I'm a moron" when you
> explain this to me, but I still don't understand why that would
> require the XML notation to interpose an intermediate node. Why can't
> "filter" node itself can be the labelled container?

Filter isn't a node; it's a property of the containing Plan node.
The way we have this set up, there's a distinction between properties
and groups, which AFAICS we have to have in order to have directly
comparable structures in XML and JSON. Didn't you design this
yourself?

(I think part of the issue is that containers in JSON are anonymous
whereas XML wants to assign them a named type. That's fine with me,
in fact the JSON approach looks rather impoverished.)

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 20:28:01
Message-ID:	603c8f070908101328n37c37a44nebe5a7c3412ad42a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I may be thick as a post here and say "oh, I'm a moron" when you
>> explain this to me, but I still don't understand why that would
>> require the XML notation to interpose an intermediate node. Why can't
>> "filter" node itself can be the labelled container?
>
> Filter isn't a node; it's a property of the containing Plan node.

My use of the word node was poorly chosen, since that word has a
specific meaning in the context of PG.

> The way we have this set up, there's a distinction between properties
> and groups, which AFAICS we have to have in order to have directly
> comparable structures in XML and JSON. Didn't you design this
> yourself?

Yes, I did. But the point is that as far as I can see, the following
two things are equivalent:

<Filter><Text>(f1 > 0)</Text></Filter>
"Filter": { "Text": "(f1 > 0)" }

And this is not:

The latter would be equivalent to something like this in JSON:

"Filter" : { "Expr" : { "Text: "(f1 > 0)" } }

or if you intended the <Expr> thing to be an array-type container,
then it would be equivalent to this:

"Filter" : { [ { "Text: "(f1 > 0)" } ] }

Would it be helpful for me to try to reduce this to code?

> (I think part of the issue is that containers in JSON are anonymous
> whereas XML wants to assign them a named type. That's fine with me,
> in fact the JSON approach looks rather impoverished.)

That does make things a little tricky, though it has the virtue of
mapping nicely onto data structures other than XML.

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-10 23:57:10
Message-ID:	27430.1249948630@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> The way we have this set up, there's a distinction between properties
>> and groups, which AFAICS we have to have in order to have directly
>> comparable structures in XML and JSON. Didn't you design this
>> yourself?

> Yes, I did. But the point is that as far as I can see, the following
> two things are equivalent:

> <Filter><Text>(f1 > 0)</Text></Filter>
> "Filter": { "Text": "(f1 > 0)" }

> And this is not:

> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter>

Well, in that case we need to redesign the "grouping" abstraction in
explain.c, because it's set up to treat a JSON hash as equivalent to
a parent node type (or whatever you want to call it) in XML. But
I'm not sure I want to do it. It appears to me that what you want
to do is dumb down the XML representation so it's just as impoverished
as the JSON one; to wit that there won't be any abstract concept of
an "Expr" node, and all client programs will have to implicitly know
that, say, Filter is an instance of an Expr and therefore can be
expected to have thus-and-such fields inside it. This does not seem
like an improvement. It puzzles me that you are pushing to make
expression containers anonymous and unrecognizable when you wanted the
opposite for child plans.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 03:50:33
Message-ID:	603c8f070908102050u77816be1k1b3bab237b8cad3e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Aug 10, 2009 at 7:57 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 10, 2009 at 1:45 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The way we have this set up, there's a distinction between properties
>>> and groups, which AFAICS we have to have in order to have directly
>>> comparable structures in XML and JSON. Didn't you design this
>>> yourself?
>
>> Yes, I did. But the point is that as far as I can see, the following
>> two things are equivalent:
>
>> <Filter><Text>(f1 > 0)</Text></Filter>
>> "Filter": { "Text": "(f1 > 0)" }
>
>> And this is not:
>
>> <Filter><Expr><Text>(f1 > 0)</Text></Expr></Filter>
>
> Well, in that case we need to redesign the "grouping" abstraction in
> explain.c, because it's set up to treat a JSON hash as equivalent to
> a parent node type (or whatever you want to call it) in XML. But
> I'm not sure I want to do it. It appears to me that what you want
> to do is dumb down the XML representation so it's just as impoverished
> as the JSON one; to wit that there won't be any abstract concept of
> an "Expr" node, and all client programs will have to implicitly know
> that, say, Filter is an instance of an Expr and therefore can be
> expected to have thus-and-such fields inside it. This does not seem
> like an improvement. It puzzles me that you are pushing to make
> expression containers anonymous and unrecognizable when you wanted the
> opposite for child plans.

I think I might be starting to understand what you're getting at here.
Let me check: I think what you're saying is that the Expr node is
potentially useful to clients for identifying where in the tree the
Exprs are, even without specific knowledge about the different types
of filter nodes, much as the Plans node is useful for finding all of
the subplan objects.

If that is correct, then let me make two comments. First, I don't
think it should be a goal of this format to be self-documenting. To
the extent that it can be so without mucking up the format, great.
But self-documenting formats are often a recipe for massive bloat, and
in my experience they tend to create more problems than they solve. I
think keeping it lean is the way to go.

But, second, there might be a way that we can have our cake and eat it
too. Instead of decorating each node with properties Filter,
Join-Filter, One-Time-Filter, and so on, what we could do is decorate
each node with a single property called Filters, which would be a list
of hashes:

'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }]

Would that address your concerns, or am I still way off base here?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 16:11:49
Message-ID:	10337.1250007109@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I think I might be starting to understand what you're getting at here.
> Let me check: I think what you're saying is that the Expr node is
> potentially useful to clients for identifying where in the tree the
> Exprs are, even without specific knowledge about the different types
> of filter nodes, much as the Plans node is useful for finding all of
> the subplan objects.

Well, my point is that there are a bunch of different places that
expressions can appear in a plan tree --- filters, join filters, sort
keys, hash clauses, merge clauses, index clauses, output expressions,
and I'm not sure I remembered them all, and I fully expect that some
more might appear in future. I don't think it's appropriate to require
clients to keep track of exactly which tree properties are expressions,
especially not when we can easily label them.

> If that is correct, then let me make two comments. First, I don't
> think it should be a goal of this format to be self-documenting.

It isn't. But I think it should be a goal that a client can, say,
find all the references to a given variable without a huge amount of
hard-wired knowledge about specific properties of specific node types.
So it should be able to find all the expressions with a relatively
dumb search of the tree (eg, XPath?).

> But, second, there might be a way that we can have our cake and eat it
> too. Instead of decorating each node with properties Filter,
> Join-Filter, One-Time-Filter, and so on, what we could do is decorate
> each node with a single property called Filters, which would be a list
> of hashes:

> 'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }]

I still say that's a bad idea for child Plans; and it's a worse one for
expressions. The point about retaining knowledge of child order is
absolutely critical for many of the places where exprs appear, such as
sort keys and output columns. You'd have to seriously uglify the
notation in order to do it like this.

One way that we could handle this, I guess, is to embed more information
in the property names --- eg use "Expr Text" where I just had "Text".
But it's not apparent to me why that should be considered better style
than the other way.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 17:11:47
Message-ID:	603c8f070908111011u5816cb9ega9b4b6f648543b64@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 11, 2009 at 12:11 PM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think I might be starting to understand what you're getting at here.
>> Let me check: I think what you're saying is that the Expr node is
>> potentially useful to clients for identifying where in the tree the
>> Exprs are, even without specific knowledge about the different types
>> of filter nodes, much as the Plans node is useful for finding all of
>> the subplan objects.
>
> Well, my point is that there are a bunch of different places that
> expressions can appear in a plan tree --- filters, join filters, sort
> keys, hash clauses, merge clauses, index clauses, output expressions,
> and I'm not sure I remembered them all, and I fully expect that some
> more might appear in future. I don't think it's appropriate to require
> clients to keep track of exactly which tree properties are expressions,
> especially not when we can easily label them.
>
>> If that is correct, then let me make two comments. First, I don't
>> think it should be a goal of this format to be self-documenting.
>
> It isn't. But I think it should be a goal that a client can, say,
> find all the references to a given variable without a huge amount of
> hard-wired knowledge about specific properties of specific node types.
> So it should be able to find all the expressions with a relatively
> dumb search of the tree (eg, XPath?).
>
>> But, second, there might be a way that we can have our cake and eat it
>> too. Instead of decorating each node with properties Filter,
>> Join-Filter, One-Time-Filter, and so on, what we could do is decorate
>> each node with a single property called Filters, which would be a list
>> of hashes:
>
>> 'Filters' : [ { 'Filter-Type' => 'Join', 'Text-Expr' => 'a.foo < b.bar' }]
>
> I still say that's a bad idea for child Plans; and it's a worse one for
> expressions. The point about retaining knowledge of child order is
> absolutely critical for many of the places where exprs appear, such as
> sort keys and output columns. You'd have to seriously uglify the
> notation in order to do it like this.
>
> One way that we could handle this, I guess, is to embed more information
> in the property names --- eg use "Expr Text" where I just had "Text".
> But it's not apparent to me why that should be considered better style
> than the other way.

Unfortunately, I have to admit to total confusion. The idea in the
last paragraph seems reasonable to me, but since I don't understand
the other alternative, I can't say whether it's better or worse. I
wonder if we would be better off waiting for feedback from actual tool
authors. I have no illusions that the format is perfect...

...Robert

From:	Mike <ipso(at)snappymail(dot)ca>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 18:17:15
Message-ID:	20090811111715.547270e5@ipso.snappymail.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 11 Aug 2009 13:11:47 -0400
Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> Unfortunately, I have to admit to total confusion. The idea in the
> last paragraph seems reasonable to me, but since I don't understand
> the other alternative, I can't say whether it's better or worse. I
> wonder if we would be better off waiting for feedback from actual tool
> authors. I have no illusions that the format is perfect...

Have any tool authors stepped up and committed resources to utilizing
this feature in the near term?

This is one of the features I have been most interested in, and I
believe it has the potential to greatly improve PostgreSQL's
"reputation" of being hard to optimize if used correctly.

I envision a web site where users can paste the XML explain
output and have it return some pretty specific information and fancy
graphs describing exactly whats going on with their query and possible
ideas on how to further optimize the query itself and even PostgreSQL
configuration settings.

The reason I would like to provide this tool in a web-based form is
that no additional software installation would be necessary for the
user, reducing any hurdles to using it to zero.

I'm guessing that my vision likely exceeds the scope of this feature in
its initial form at least, but assuming no one else has stepped up, I'm
more than willing to start committing resources as early as this
weekend with the understanding that this feature is still in
development and likely will change several times before or if its
finally committed for 8.5.

--
Mike

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Mike <ipso(at)snappymail(dot)ca>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 18:30:27
Message-ID:	24537.1250015427@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Mike <ipso(at)snappymail(dot)ca> writes:
> Have any tool authors stepped up and committed resources to utilizing
> this feature in the near term?

I don't think anyone's promised much. If you want to have a go at using
it, we'd be very happy.

> I'm guessing that my vision likely exceeds the scope of this feature in
> its initial form at least, but assuming no one else has stepped up, I'm
> more than willing to start committing resources as early as this
> weekend with the understanding that this feature is still in
> development and likely will change several times before or if its
> finally committed for 8.5.

It's definitely committed for 8.5, but the exact format of the output
is (obviously) still subject to change.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Mike <ipso(at)snappymail(dot)ca>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 19:59:48
Message-ID:	4A81CDB4.5070405@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Mike <ipso(at)snappymail(dot)ca> writes:
>
>> Have any tool authors stepped up and committed resources to utilizing
>> this feature in the near term?
>>
>
> I don't think anyone's promised much. If you want to have a go at using
> it, we'd be very happy.
>
>
>> I'm guessing that my vision likely exceeds the scope of this feature in
>> its initial form at least, but assuming no one else has stepped up, I'm
>> more than willing to start committing resources as early as this
>> weekend with the understanding that this feature is still in
>> development and likely will change several times before or if its
>> finally committed for 8.5.
>>
>
> It's definitely committed for 8.5, but the exact format of the output
> is (obviously) still subject to change.
>
>
>

Good. I had a look at this for a little while yesterday. I built it, did
an install, loaded auto_explain and then ran the regression tests. I
didn't like the output much. It looks like the XML has been dumbed down
to fit a data model that works for JSON as well, particularly in the
lack of use of attributes. An XML processor won't care that much, but
humans will certainly find it more tiresome to read. In effect we are
swapping horizontal expansion for vertical expansion. It would be nicer
to be able to fit a plan into a screen.

I also took the last relaxng spec I could find and used a nice little
tool called Trang to translate it into an XML Schema spec. The good news
is that that worked. The bad news is that the spec almost certainly
needs some tightening, especially around those elements that probably
should be XML attributes.

cheers

andrew

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mike <ipso(at)snappymail(dot)ca>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 20:39:40
Message-ID:	200908112239.41064.andres@anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tuesday 11 August 2009 21:59:48 Andrew Dunstan wrote:
> Tom Lane wrote:
> > Mike <ipso(at)snappymail(dot)ca> writes:
> >> Have any tool authors stepped up and committed resources to utilizing
> >> this feature in the near term?
> >
> > I don't think anyone's promised much. If you want to have a go at using
> > it, we'd be very happy.
> >
> >> I'm guessing that my vision likely exceeds the scope of this feature in
> >> its initial form at least, but assuming no one else has stepped up, I'm
> >> more than willing to start committing resources as early as this
> >> weekend with the understanding that this feature is still in
> >> development and likely will change several times before or if its
> >> finally committed for 8.5.
> >
> > It's definitely committed for 8.5, but the exact format of the output
> > is (obviously) still subject to change.
> Good. I had a look at this for a little while yesterday. I built it, did
> an install, loaded auto_explain and then ran the regression tests. I
> didn't like the output much. It looks like the XML has been dumbed down
> to fit a data model that works for JSON as well, particularly in the
> lack of use of attributes. An XML processor won't care that much, but
> humans will certainly find it more tiresome to read. In effect we are
> swapping horizontal expansion for vertical expansion. It would be nicer
> to be able to fit a plan into a screen.
The problem is that nobody yet came up with one that is easy to generate and
liked by many people...
Aesthetically I do not like the current schema as well, but personally I don't
think it matters that much.

Everything fancy has the problem of requiring relatively much code...

> I also took the last relaxng spec I could find and used a nice little
> tool called Trang to translate it into an XML Schema spec. The good news
> is that that worked. The bad news is that the spec almost certainly
> needs some tightening, especially around those elements that probably
> should be XML attributes.
Unrelated to the other issues the relaxng schema has some missing pieces if I
remember it correctly (constraint => constraint-name, trigger => trigger-name)
I think). It is also by far not strict enough yet...

Andres

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 21:38:33
Message-ID:	603c8f070908111438x2d801be1t8056256feda7285b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Aug 11, 2009 at 3:59 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> Good. I had a look at this for a little while yesterday. I built it, did an
> install, loaded auto_explain and then ran the regression tests. I didn't
> like the output much. It looks like the XML has been dumbed down to fit a
> data model that works for JSON as well, particularly in the lack of use of
> attributes. An XML processor won't care that much, but humans will certainly
> find it more tiresome to read. In effect we are swapping horizontal
> expansion for vertical expansion. It would be nicer to be able to fit a plan
> into a screen.

Isn't that what text format is for?

Not that the XML format is perfect; I haven't heard one person say
that they like anything about it except that it is already
implemented.

...Robert

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Andres Freund" <andres(at)anarazel(dot)de>, <pgsql-hackers(at)postgresql(dot)org>,"Mike" <ipso(at)snappymail(dot)ca>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 21:54:39
Message-ID:	4A81A24F02000025000298F8@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:

>> find it more tiresome to read. In effect we are swapping horizontal
>> expansion for vertical expansion. It would be nicer to be able to
>> fit a plan into a screen.
>
> Isn't that what text format is for?

In my experience XML which represents anything of any real complexity
is tiresome to read, period. I would worry more about it accurately
representing the data and being easy to massage with software. That's
not an opinion on any particular choice here, just on what should
matter in making a choice.

-Kevin

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-11 21:58:34
Message-ID:	4A81E98A.1000705@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas wrote:
> On Tue, Aug 11, 2009 at 3:59 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>
>> Good. I had a look at this for a little while yesterday. I built it, did an
>> install, loaded auto_explain and then ran the regression tests. I didn't
>> like the output much. It looks like the XML has been dumbed down to fit a
>> data model that works for JSON as well, particularly in the lack of use of
>> attributes. An XML processor won't care that much, but humans will certainly
>> find it more tiresome to read. In effect we are swapping horizontal
>> expansion for vertical expansion. It would be nicer to be able to fit a plan
>> into a screen.
>>
>
> Isn't that what text format is for?
>
> Not that the XML format is perfect; I haven't heard one person say
> that they like anything about it except that it is already
> implemented.
>
>
>

Well, I don't think that the fact that we are producing machine-readable
output means we can just ignore the human side of it. It is more than
likely that such output will be read by both machines and humans.
Obviously, we need to make sure that it meets machine processing needs
first, but once we have done that we should not ignore the human
requirements.

BTW, I think it's great that we have machine-readable formats. This
opens the door to a lot of tools. Well done.

No doubt there will be disagreements about the formats. Our community
certainly has a tendency to argue over matters of appearance, sometimes
to the point of silliness. My understanding was that the idea of getting
this in now was to let people have a look, and play to some extent.

cheers

andrew

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 08:22:31
Message-ID:	1250065351.26985.50.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote:
> Well, I don't think that the fact that we are producing machine-readable
> output means we can just ignore the human side of it. It is more than
> likely that such output will be read by both machines and humans.
> Obviously, we need to make sure that it meets machine processing needs
> first, but once we have done that we should not ignore the human
> requirements.

XML is easy to transform to about anything else. I would vote for the
XML output to only focus on machine readability and completeness, and
then if needed provide style sheets to make it human readable. Then
anybody could have his preferred style to look at it. If there would be
a tool to format the XML according to some style sheet (to make it easy
for those who don't care about XML and style sheets), I would volunteer
to provide the XSL style sheets for different formats on request.

Cheers,
Csaba.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 13:42:00
Message-ID:	4A82C6A8.7060601@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Csaba Nagy wrote:
> On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote:
>
>> Well, I don't think that the fact that we are producing machine-readable
>> output means we can just ignore the human side of it. It is more than
>> likely that such output will be read by both machines and humans.
>> Obviously, we need to make sure that it meets machine processing needs
>> first, but once we have done that we should not ignore the human
>> requirements.
>>
>
> XML is easy to transform to about anything else. I would vote for the
> XML output to only focus on machine readability and completeness, and
> then if needed provide style sheets to make it human readable. Then
> anybody could have his preferred style to look at it. If there would be
> a tool to format the XML according to some style sheet (to make it easy
> for those who don't care about XML and style sheets), I would volunteer
> to provide the XSL style sheets for different formats on request.
>
>
>

Have you actually looked at a logfile with this in it? A simple
stylesheet won't do at all. What you get is not an XML document but a
text document with little bits of XML embedded in it. So you would need
a program to parse that file and either turn it into a single legal XML
document or pass each piece of XML individually to your XSLT processor.
Bleah.

And all this because you pose a false dichotomy between correctness and
completeness on one hand and human readability on the other. I don't
accept that at all. I think we can and should improve human readability
without sacrificing anything on the correctness and completeness front.
In fact, that also needs improving, and we can do them both at the same
time.

I want to be able to have machine readable explain output, but I also
want to be able to browse the logs without wasting more brain cells than
necessary and without having to use external tools other than by
standard text browser (less). As Pooh Bear said, "Both please!"

One thing I have noticed that we should talk about is that the explain
XML output doesn't contain the query that is being explained. That's
unfortunate - it means that any logfile processor will need to extract
the statement from the surrounding text rather than just pulling out the
XML and passing it to an XML processor.

Another design issue is this: The root node of an XML document is
ideally a distinguished element that can't occur within itself.
auto-explain doesn't seem to be doing this. It outputs fragments that
have <Plan> as the root element rather than <explain>

Here's a tiny fragment of my logfile with auto_explain.log_min_duration
= 0 and auto_explain.log_format = 'xml' settings:

STATEMENT: SELECT 1 AS one;
LOG: duration: 0.008 ms plan:
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.01</Total-Cost>
<Plan-Rows>1</Plan-Rows>
<Plan-Width>0</Plan-Width>
</Plan>

But a <Plan> node can appear as the decendant of a <Plan> node, so this
violates the design principle I enunciated above.

(Anecdote: The very first use I even made of Postgres was at the end of
2002, for an Application I designed called "30 minute App" which let
people design online and generate a questionnaire application for small
devices (Palm, Blackberry, PocketPC) and stored the application metadata
as a piece of XML in a text field in Postgres.)

cheers

andrew

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 13:55:25
Message-ID:	603c8f070908120655t47be0eb1vbb9c1416f99f18df@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 12, 2009 at 9:42 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
> Csaba Nagy wrote:
>>
>> On Tue, 2009-08-11 at 23:58 +0200, Andrew Dunstan wrote:
>>
>>>
>>> Well, I don't think that the fact that we are producing machine-readable
>>> output means we can just ignore the human side of it. It is more than likely
>>> that such output will be read by both machines and humans. Obviously, we
>>> need to make sure that it meets machine processing needs first, but once we
>>> have done that we should not ignore the human requirements.
>>>
>>
>> XML is easy to transform to about anything else. I would vote for the
>> XML output to only focus on machine readability and completeness, and
>> then if needed provide style sheets to make it human readable. Then
>> anybody could have his preferred style to look at it. If there would be
>> a tool to format the XML according to some style sheet (to make it easy
>> for those who don't care about XML and style sheets), I would volunteer
>> to provide the XSL style sheets for different formats on request.
>>
>>
>>
>
> Have you actually looked at a logfile with this in it? A simple stylesheet
> won't do at all. What you get is not an XML document but a text document
> with little bits of XML embedded in it. So you would need a program to parse
> that file and either turn it into a single legal XML document or pass each
> piece of XML individually to your XSLT processor. Bleah.
>
> And all this because you pose a false dichotomy between correctness and
> completeness on one hand and human readability on the other. I don't accept
> that at all. I think we can and should improve human readability without
> sacrificing anything on the correctness and completeness front. In fact,
> that also needs improving, and we can do them both at the same time.
>
> I want to be able to have machine readable explain output, but I also want
> to be able to browse the logs without wasting more brain cells than
> necessary and without having to use external tools other than by standard
> text browser (less). As Pooh Bear said, "Both please!"
>
> One thing I have noticed that we should talk about is that the explain XML
> output doesn't contain the query that is being explained. That's unfortunate
> - it means that any logfile processor will need to extract the statement
> from the surrounding text rather than just pulling out the XML and passing
> it to an XML processor.
>
> Another design issue is this: The root node of an XML document is ideally a
> distinguished element that can't occur within itself. auto-explain doesn't
> seem to be doing this. It outputs fragments that have <Plan> as the root
> element rather than <explain>
>
> Here's a tiny fragment of my logfile with auto_explain.log_min_duration = 0
> and auto_explain.log_format = 'xml' settings:
>
>
> STATEMENT: SELECT 1 AS one;
> LOG: duration: 0.008 ms plan:
> <Plan>
> <Node-Type>Result</Node-Type>
> <Startup-Cost>0.00</Startup-Cost>
> <Total-Cost>0.01</Total-Cost>
> <Plan-Rows>1</Plan-Rows>
> <Plan-Width>0</Plan-Width>
> </Plan>
>
>
> But a <Plan> node can appear as the decendant of a <Plan> node, so this
> violates the design principle I enunciated above.

auto_explain sort of hooks into the middle of the main EXPLAIN
machinery in a strange way. It's not comparable to a real EXPLAIN
ANALYZE because, for example, it won't report_triggers(). But we
could still put a container around it of some sort by modifying
contrib/auto_explain. Perhaps you'd care to propose a patch?

...Robert

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 14:01:10
Message-ID:	10062.1250085670@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Another design issue is this: The root node of an XML document is
> ideally a distinguished element that can't occur within itself.
> auto-explain doesn't seem to be doing this.

Huh? I get (for "explain 2+2")

<explain xmlns="http://www.postgresql.org/2009/explain">
<Query>
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.01</Total-Cost>
<Plan-Rows>1</Plan-Rows>
<Plan-Width>0</Plan-Width>
</Plan>
</Query>
</explain>

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 14:26:41
Message-ID:	603c8f070908120726r3b9fe62bg6b01214f02a263f4@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Aug 12, 2009 at 10:01 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Another design issue is this: The root node of an XML document is
>> ideally a distinguished element that can't occur within itself.
>> auto-explain doesn't seem to be doing this.
>
> Huh? I get (for "explain 2+2")
>
> <explain xmlns="http://www.postgresql.org/2009/explain">
> <Query>
> <Plan>
> <Node-Type>Result</Node-Type>
> <Startup-Cost>0.00</Startup-Cost>
> <Total-Cost>0.01</Total-Cost>
> <Plan-Rows>1</Plan-Rows>
> <Plan-Width>0</Plan-Width>
> </Plan>
> </Query>
> </explain>

He's talking about *auto* explain.

...Robert

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 14:51:29
Message-ID:	1250088689.26985.71.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote:
> Have you actually looked at a logfile with this in it? A simple
> stylesheet won't do at all. What you get is not an XML document but a
> text document with little bits of XML embedded in it. So you would need
> a program to parse that file and either turn it into a single legal XML
> document or pass each piece of XML individually to your XSLT processor.
> Bleah.

I'm pretty sure you will never find a human readable format which is
easily extracted from the logs by a program. But if you format the XML
in a (very human unreadable) one-line-without-breaks format then it will
be a lot easier extracted by a program and formatted at your will.

> And all this because you pose a false dichotomy between correctness and
> completeness on one hand and human readability on the other. I don't
> accept that at all. I think we can and should improve human readability
> without sacrificing anything on the correctness and completeness front.
> In fact, that also needs improving, and we can do them both at the same
> time.

I really really doubt that. I would go here on the UNIX approach of
piping the things through the right tools, each one doing a simple and
good job for it's single and well defined purpose. So let the explain
spit out a line of XML without much thought about formatting but
focusing on completeness, making it easy for tools to get that line, and
then let the tools do the formatting depending on what you want to do
with the information. Each part will be simpler than you would put in a
directly human readable XML (if that's possible at all) approach, which
will anyway not cover all the uses and tastes.

> I want to be able to have machine readable explain output, but I also
> want to be able to browse the logs without wasting more brain cells than
> necessary and without having to use external tools other than by
> standard text browser (less). As Pooh Bear said, "Both please!"

I argue that a sufficiently complicated explain output will never be
easily navigated in a text browser, however much you would like it. If
you do a where clause with 100 nested ANDs (which occasionally happens
here), I don't think you'll be able to cleanly show that in a text
browser - it will simply not fit in no matter how you format it. But
using the right GUI tool (even a generic XML one) it would be easy to
just temporarily collapse the whole top AND and have a clean view of the
rest.

Cheers,
Csaba.

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:09:04
Message-ID:	1250089744.26985.79.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 16:51 +0200, Csaba Nagy wrote:
> I argue that a sufficiently complicated explain output will never be
> easily navigated in a text browser, however much you would like it. If
> you do a where clause with 100 nested ANDs (which occasionally happens
> here), I don't think you'll be able to cleanly show that in a text
> browser - it will simply not fit in no matter how you format it. But
> using the right GUI tool (even a generic XML one) it would be easy to
> just temporarily collapse the whole top AND and have a clean view of the
> rest.

Just a small note: I don't know how the machine-readable auto-explain
output actually looks like, so I just assumed individual conditions will
have their own XML node. But even if they don't have and the AND thing
would be flat in the explain output, the same argument applies to even a
few 10s of joins or sub-selects (and we do have that too). Sometimes
collapsing parts of the plan would help big time in having a clean
picture, which (starting with a certain complexity of the query) you
will have a hard time to do in plain text regardless of the formatting.
I do have plenty of explain outputs (from 8.2 postgres) which don't fit
on my screen in text form even without further details I gather you can
get in the machine readable form.

Cheers,
Csaba.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:11:36
Message-ID:	4A82DBA8.7020603@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Csaba Nagy wrote:
> On Wed, 2009-08-12 at 15:42 +0200, Andrew Dunstan wrote:
>
>> Have you actually looked at a logfile with this in it? A simple
>> stylesheet won't do at all. What you get is not an XML document but a
>> text document with little bits of XML embedded in it. So you would need
>> a program to parse that file and either turn it into a single legal XML
>> document or pass each piece of XML individually to your XSLT processor.
>> Bleah.
>>
>
> I'm pretty sure you will never find a human readable format which is
> easily extracted from the logs by a program. But if you format the XML
> in a (very human unreadable) one-line-without-breaks format then it will
> be a lot easier extracted by a program and formatted at your will.
>

That will just make things worse. And it will break if the XML includes
any expression that contains a line break.

>
>> And all this because you pose a false dichotomy between correctness and
>> completeness on one hand and human readability on the other. I don't
>> accept that at all. I think we can and should improve human readability
>> without sacrificing anything on the correctness and completeness front.
>> In fact, that also needs improving, and we can do them both at the same
>> time.
>>
>
> I really really doubt that. I would go here on the UNIX approach of
> piping the things through the right tools, each one doing a simple and
> good job for it's single and well defined purpose. So let the explain
> spit out a line of XML without much thought about formatting but
> focusing on completeness, making it easy for tools to get that line, and
> then let the tools do the formatting depending on what you want to do
> with the information. Each part will be simpler than you would put in a
> directly human readable XML (if that's possible at all) approach, which
> will anyway not cover all the uses and tastes.
>

I think we're going to have to agree to disagree on this. I think you're
going in precisely the wrong direction.

I repeat, I want to be able to have a log file that is both machine
processable and not utterly unreadable by a human. And I do not accept
at all that this is impossible. Nor do I accept I should need some extra
processing tool to read the machine processable output without suffering
brain damage. If we were to adopt your approach I bet you would find
that nobody in their right mind would use the machine readable formats.

I sure wouldn't.

cheers

andrew

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:14:32
Message-ID:	20090812151432.GE5721@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan escribió:

> STATEMENT: SELECT 1 AS one;
> LOG: duration: 0.008 ms plan:
> <Plan>
> <Node-Type>Result</Node-Type>
> <Startup-Cost>0.00</Startup-Cost>
> <Total-Cost>0.01</Total-Cost>
> <Plan-Rows>1</Plan-Rows>
> <Plan-Width>0</Plan-Width>
> </Plan>

I think what this says is that auto-explain should not be sending its
output to the regular logfile, but somewhere else. The format you show
above is not the most usable thing in the world -- not for machine
parsing, and neither for human consumption.

Maybe it should fill its own file with something like this:

<autoexplain>
<query>select 1 as one;</query>
<duration>0.008</duration>
<Plan>
<Node-Type>Result</Node-Type>
<Startup-Cost>0.00</Startup-Cost>
<Total-Cost>0.01</Total-Cost>
<Plan-Rows>1</Plan-Rows>
<Plan-Width>0</Plan-Width>
</Plan>
</autoexplain>

or some such.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:25:57
Message-ID:	4A82DF05.1040703@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Andrew Dunstan escribió:
>
>
>> STATEMENT: SELECT 1 AS one;
>> LOG: duration: 0.008 ms plan:
>> <Plan>
>> <Node-Type>Result</Node-Type>
>> <Startup-Cost>0.00</Startup-Cost>
>> <Total-Cost>0.01</Total-Cost>
>> <Plan-Rows>1</Plan-Rows>
>> <Plan-Width>0</Plan-Width>
>> </Plan>
>>
>
> I think what this says is that auto-explain should not be sending its
> output to the regular logfile, but somewhere else. The format you show
> above is not the most usable thing in the world -- not for machine
> parsing, and neither for human consumption.
>
> Maybe it should fill its own file with something like this:
>
> <autoexplain>
> <query>select 1 as one;</query>
> <duration>0.008</duration>
> <Plan>
> <Node-Type>Result</Node-Type>
> <Startup-Cost>0.00</Startup-Cost>
> <Total-Cost>0.01</Total-Cost>
> <Plan-Rows>1</Plan-Rows>
> <Plan-Width>0</Plan-Width>
> </Plan>
> </autoexplain>
>
> or some such.
>
>

With a format like this, pulling them out of the log file would be
trivial, so I don't see why it couldn't go in the log file.

It needs a bit of tweaking, but the idea is right.

And to answer Robert's question - yes, I will be submitting a patch or
two in due course.

cheers

andrew

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:31:04
Message-ID:	1250091064.26985.96.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
> That will just make things worse. And it will break if the XML includes
> any expression that contains a line break.

Then escape the expressions using CDATA or such... I'm sure it would be
possible to make sure it's one line and rely on that. That's part of
being machine readable, being able to rely on getting it at all without
too much parsing magic...

> I repeat, I want to be able to have a log file that is both machine
> processable and not utterly unreadable by a human. And I do not accept
> at all that this is impossible. Nor do I accept I should need some extra
> processing tool to read the machine processable output without suffering
> brain damage. If we were to adopt your approach I bet you would find
> that nobody in their right mind would use the machine readable formats.

Then why you bother calling it "machine readable" at all ? Would you
really read your auto-explain output on the DB server ? I doubt that's
the common usage scenario, I would expect that most people would let a
tool extract/summarize it and definitely process it somewhere else than
on the DB machine, with the proper tool set. For ad-hoc explain analysis
which I think it's the typical use case for on-the-DB-server inspection
of text-format explain output you would surely use something else than
what is called "machine readable" format...

Cheers,
Csaba.

From:	Mike <ipso(at)snappymail(dot)ca>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:32:45
Message-ID:	20090812083245.76c8a3e1@ipso.snappymail.ca
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 12 Aug 2009 09:42:00 -0400
Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> One thing I have noticed that we should talk about is that the
> explain XML output doesn't contain the query that is being explained.
> That's unfortunate - it means that any logfile processor will need to
> extract the statement from the surrounding text rather than just
> pulling out the XML and passing it to an XML processor.

For the purposes I propose using the XML output for, not having the
original query in the XML itself is almost a show stopper. I assume the
original design of this feature had tools in mind that would actually
connect to PostgreSQL directly with commands being issued through the
tool itself.

I would like to create a website where users can paste or upload just
the XML output to and have enough information to do a full analysis of
the query that way.

Ideally this would include the raw query itself, column statistics
targets and ideally even some of the more important query
performance related GUC settings. I realize that could be a lot of
information that may not be needed otherwise, so if its only included
in the VERBOSE explain output, that would be more than fine, but just
having this information available increases the usefulness of such an
"outside" tool many times.

Again though, at the very least I think the raw query needs to be
included.

--
Mike

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:40:38
Message-ID:	1250091638.26985.103.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 17:31 +0200, Csaba Nagy wrote:
> On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
> > That will just make things worse. And it will break if the XML includes
> > any expression that contains a line break.
>
> Then escape the expressions using CDATA or such... I'm sure it would be
> possible to make sure it's one line and rely on that. That's part of
> being machine readable, being able to rely on getting it at all without
> too much parsing magic...

Looks like today I'm talking consistently stupid... CDATA of course
won't help in avoiding line breaks, but it would be still possible to
define an XML entity for that. This starts to be too complicated anyway,
so probably not worth it. In any case I still think making it easy for a
tool to extract/parse the information should be higher priority than
human readability - because the information itself is not really human
readable without external help except for simple queries, and I doubt
the main use case is simple queries.

Cheers,
Csaba.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 15:41:19
Message-ID:	4A82E29F.9050909@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Csaba Nagy wrote:
> Then why you bother calling it "machine readable" at all ? Would you
> really read your auto-explain output on the DB server ? I doubt that's
> the common usage scenario, I would expect that most people would let a
> tool extract/summarize it and definitely process it somewhere else than
> on the DB machine, with the proper tool set.
>

Sure I would. I look at log files almost every day to find out things.
Why should I have to wade through a pile of utterly unreadable crap to
find it?

Auto-explain lets you have *one* output format. To follow your approach,
I will have to change that, and have two log files, one machine
processable and one human readable. Triple bleah.

I have not suggested anything that would break the machine readability.
You seem to think that making the machine readable output remotely human
friendly is somehow going to detract from its machine processability.
But that's just silly, frankly. I do not want and should not have to
choose between a format that is machine readable and one that is to some
extent human readable.

cheers

andrew

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 16:02:20
Message-ID:	1250092940.26985.118.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 17:41 +0200, Andrew Dunstan wrote:
>
> Csaba Nagy wrote:
> > Then why you bother calling it "machine readable" at all ? Would you
> > really read your auto-explain output on the DB server ? I doubt that's
> > the common usage scenario, I would expect that most people would let a
> > tool extract/summarize it and definitely process it somewhere else than
> > on the DB machine, with the proper tool set.

> Sure I would. I look at log files almost every day to find out things.
> Why should I have to wade through a pile of utterly unreadable crap to
> find it?

I look at log files every day too (not almost, but literally every day),
but I really can't imagine myself looking at an explain output directly
in the log file. Even with the output of an 8.2 server which has less
detail than I think the new formats have, I always copy the text from
the psql prompt to some friendlier tool where I can play around with it,
delete parts of it if needed for the sake of clear overview and where I
can easily switch between line-wrap or not and such. I simply don't
believe that a remotely human presentation of the thing worths the
slightest compromise in machine readability. That said, I would like to
finish this discussion here because it gets pointless, I agree to let us
disagree :-)

> Auto-explain lets you have *one* output format. To follow your approach,
> I will have to change that, and have two log files, one machine
> processable and one human readable. Triple bleah.

By ad-hoc I didn't mean reading the auto-explain log, that's surely no
ad-hoc operation... I would make that a mandatory daily routine which is
better handled by a tool which serves you directly the plans of the
worst performing queries sorted by runtime for example and highlighting
the obvious planner mis-estimates. By ad-hoc I mean a query you examine
on the psql command line, and there you can expect human readable of
course without considerations to machine readability.

> I have not suggested anything that would break the machine readability.
> You seem to think that making the machine readable output remotely human
> friendly is somehow going to detract from its machine processability.
> But that's just silly, frankly. I do not want and should not have to
> choose between a format that is machine readable and one that is to some
> extent human readable.

OK, if you can do that it's fine... but I'm afraid that will lead to
compromises on the machine readability side and will only delay the
whole thing.

Cheers,
Csaba.

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 16:07:25
Message-ID:	4A82E8BD.4050706@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Csaba Nagy wrote:
> On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
>
>> That will just make things worse. And it will break if the XML includes
>> any expression that contains a line break.
>>
>
> Then escape the expressions using CDATA or such... I'm sure it would be
> possible to make sure it's one line and rely on that. That's part of
> being machine readable, being able to rely on getting it at all without
> too much parsing magic...
>
>
>

Well, the right solution would actually be NOT to use CDATA but to
replace a literal linefeed with the XML numeric escape 
 , but I
really don't think it's necessary.

The extraction tools will be simple whether or not we put everything on
one line.

Assuming we adopt Alvaro's suggestion of an <auto-explain> root, here's
how it would work without putting everything on one line:

{ echo "<explain-root>"; sed -n
'!<auto-explain>!,!</auto-explain>!p' logfile ; echo
"</explain-root>"; } > explain-plans.xml

Putting everything on one line, this becomes:

{ echo "<explain-root>"; sed -n '!<auto-explain>!p' logfile ; echo
"</explain-root>"; } > explain-plans.xml

Not very hard either way, is it?

cheers

andrew

From:	Csaba Nagy <nagy(at)ecircle-ag(dot)com>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Mike <ipso(at)snappymail(dot)ca>, Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: machine-readable explain output v4
Date:	2009-08-12 16:22:07
Message-ID:	1250094127.26985.126.camel@pcd12478
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-08-12 at 18:07 +0200, Andrew Dunstan wrote:
>
> Csaba Nagy wrote:
> > On Wed, 2009-08-12 at 17:11 +0200, Andrew Dunstan wrote:
> Well, the right solution would actually be NOT to use CDATA but to
> replace a literal linefeed with the XML numeric escape 
 , but I
> really don't think it's necessary.

Yes you're right, with the extraction too...

> The extraction tools will be simple whether or not we put everything on
> one line.

Well I normally avoid anything containing caffeine, and today I had my
first coffee of the year and 2 cokes too. I would say ignore the whole
discussion of on the grounds of abnormal hyperactivity on my part...

Cheers,
Csaba.

From:	Greg Smith <gsmith(at)gregsmith(dot)com>
To:	Mike <ipso(at)snappymail(dot)ca>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: machine-readable explain output v4
Date:	2009-08-13 07:00:36
Message-ID:	alpine.GSO.2.01.0908130248240.7164@westnet.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, 11 Aug 2009, Mike wrote:

> Have any tool authors stepped up and committed resources to utilizing
> this feature in the near term?

Even before the easier to read format was available, there were already
multiple EXPLAIN analysis tools floating around, some of them web-based
like you're considering; a list is at
http://wiki.postgresql.org/wiki/Using_EXPLAIN

You might expect some of those tool authors would do the appropriate
overhaul to import the new format data, and perhaps make things more
powerful or simple in the process. You might want to collaborate within
someone writing one of those existing applications rather than start over
on your own.

> The reason I would like to provide this tool in a web-based form is that
> no additional software installation would be necessary for the user,
> reducing any hurdles to using it to zero.

I personally hate only having a web-based tool for this style of
application, because I'm always dealing with data I can't paste into
somebody else's site for EXPLAIN output--that's a common hurdle that's
impossible to clear given all the regulatory and business secret
restrictions people work under nowadays. Even when the source code is
available for the web app, that puts you back to needing to install the
tool locally, and I've found web apps tend to be more complicated to get
running than a typical standalone app.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD