Re: [PATCH] Add pretty-printed XML output option

Lists: pgsql-hackers
From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [PATCH] Add pretty-printed XML output option
Date: 2023-02-02 20:35:39
Message-ID: 2f5df461-dad8-6d7d-4568-08e10608a69b@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

This small patch introduces a XML pretty print function. It basically
takes advantage of the indentation feature of xmlDocDumpFormatMemory
from libxml2 to format XML strings.

postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
id="z">42</var></bar></foo>');
        xmlpretty
--------------------------
 <foo id="x">            +
   <bar id="y">          +
     <var id="z">42</var>+
   </bar>                +
 </foo>                  +

(1 row)

The patch also contains regression tests and documentation.

Feedback is very welcome!

Jim

Attachment Content-Type Size
v1-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 15.4 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-03 07:05:46
Message-ID: 8453003d-e1b1-7306-5fc5-6d1503adac1b@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The system somehow returns different error messages in Linux and
MacOS/Windows, which was causing the cfbot to fail.

SELECT xmlpretty('<foo>')::xml;
                          ^
-DETAIL:  line 1: chunk is not well balanced
+DETAIL:  line 1: Premature end of data in tag foo line 1

Test removed in v2.

On 02.02.23 21:35, Jim Jones wrote:
> Hi,
>
> This small patch introduces a XML pretty print function. It basically
> takes advantage of the indentation feature of xmlDocDumpFormatMemory
> from libxml2 to format XML strings.
>
> postgres=# SELECT xmlpretty('<foo id="x"><bar id="y"><var
> id="z">42</var></bar></foo>');
>         xmlpretty
> --------------------------
>  <foo id="x">            +
>    <bar id="y">          +
>      <var id="z">42</var>+
>    </bar>                +
>  </foo>                  +
>
> (1 row)
>
>
> The patch also contains regression tests and documentation.
>
> Feedback is very welcome!
>
> Jim

Attachment Content-Type Size
v2-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 17.8 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-06 16:19:02
Message-ID: 4359b355-5054-9758-ebab-a2c17a9b115e@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
the regression tests with the error:

ERROR:  unsupported XML feature
DETAIL:  This functionality requires the server to be built with libxml
support.

Is there anything I can do to enable libxml to run my regression tests?

v3 adds a missing xmlFree call.

Best, Jim

Attachment Content-Type Size
v3-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-06 16:23:04
Message-ID: 2677159.1675700584@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> The cfbot on "Windows - Server 2019, VS 2019 - Meson & ninja" is failing
> the regression tests with the error:

> ERROR:  unsupported XML feature
> DETAIL:  This functionality requires the server to be built with libxml
> support.

> Is there anything I can do to enable libxml to run my regression tests?

Your patch needs to also update expected/xml_1.out to match the output
the test produces when run without --with-libxml.

regards, tom lane


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-06 17:58:53
Message-ID: 226a2228-934b-22b8-45cd-dd144a3c443d@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06.02.23 17:23, Tom Lane wrote:
>> Your patch needs to also update expected/xml_1.out to match the output
>> the test produces when run without --with-libxml.

Thanks a lot! It seems to do the trick.

xml_1.out updated in v4.

Best, Jim

Attachment Content-Type Size
v4-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 23.7 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-08 20:30:52
Message-ID: d385a64e-872f-ed09-9dc1-73545635ebca@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

while working on another item of the TODO list I realized that I should
be using a PG_TRY() block in he xmlDocDumpFormatMemory call.

Fixed in v5.

Best regards, Jim

Attachment Content-Type Size
v5-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.6 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-08 23:09:53
Message-ID: CAHut+Pu7Mu1OtB2N_JYxX3Zb0XX5+1UsfpcZ9vPdhtJZkCp9Xg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 9, 2023 at 7:31 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> while working on another item of the TODO list I realized that I should
> be using a PG_TRY() block in he xmlDocDumpFormatMemory call.
>
> Fixed in v5.
>

I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
other xmlFreeDoc(doc) is not. As the doc is assigned outside the
PG_TRY shouldn't those both be the same?

------
Kind Regards,
Peter Smith.
Fujitsu Australia.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-08 23:42:20
Message-ID: a42757df-4b65-aaaa-1eb4-daf9bcf17d6e@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.02.23 00:09, Peter Smith wrote:
> I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
> other xmlFreeDoc(doc) is not. As the doc is assigned outside the
> PG_TRY shouldn't those both be the same?

Hi Peter,

My logic there was the following: if program reached that part of the
code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
which consequently means that the variables doc and xmlbuf are != NULL,
therefore not needing to be checked. Am I missing something?

Thanks a lot for the review!

Best, Jim


From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-09 01:01:10
Message-ID: CAHut+Pu1-wHbVubFjtotn4TBGMinOxv85R8cuqasnPKcWmWLVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 9, 2023 at 10:42 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> On 09.02.23 00:09, Peter Smith wrote:
> > I noticed the xmlFreeDoc(doc) within the PG_CATCH is guarded but the
> > other xmlFreeDoc(doc) is not. As the doc is assigned outside the
> > PG_TRY shouldn't those both be the same?
>
> Hi Peter,
>
> My logic there was the following: if program reached that part of the
> code it means that the xml_parse() and xmlDocDumpFormatMemory() worked,
> which consequently means that the variables doc and xmlbuf are != NULL,
> therefore not needing to be checked. Am I missing something?
>

Thanks. I think I understand it better now -- I expect
xmlDocDumpFormatMemory will cope OK when passed a NULL doc (see this
source [1]), but it will return nbytes of 0, but your code will still
throw ERROR, meaning the guard for doc NULL is necessary for the
PG_CATCH.

In that case, everything LGTM.

~

OTOH, if you are having to check for NULL doc anyway, maybe it's just
as easy only doing that up-front. Then you could quick-exit the
function without calling xmlDocDumpFormatMemory etc. in the first
place. For example:

doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
if (!doc)
return 0;

------
Kind Regards,
Peter Smith.
Fujitsu Australia.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-09 07:16:50
Message-ID: 3a3d7963-7f30-9e0a-5e7c-11382fed425a@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.02.23 02:01, Peter Smith wrote:
> OTOH, if you are having to check for NULL doc anyway, maybe it's just
> as easy only doing that up-front. Then you could quick-exit the
> function without calling xmlDocDumpFormatMemory etc. in the first
> place. For example:
>
> doc = xml_parse(arg, XMLOPTION_DOCUMENT, false, GetDatabaseEncoding(), NULL);
> if (!doc)
> return 0;

I see your point. If I got it right, you're suggesting the following
change in the PG_TRY();

   PG_TRY();
    {

        int nbytes;

        if(!doc)
            xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                    "could not parse the given XML document");

        xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

        if(!nbytes || xmlerrcxt->err_occurred)
            xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
                    "could not indent the given XML document");

        initStringInfo(&buf);
        appendStringInfoString(&buf, (const char *)xmlbuf);

    }

.. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Is it what you suggest?

Thanks a lot for the thorough review!

Best, Jim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-09 07:23:04
Message-ID: 3845900.1675927384@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> I see your point. If I got it right, you're suggesting the following
> change in the PG_TRY();

>    PG_TRY();
>     {

>         int nbytes;

>         if(!doc)
>             xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                     "could not parse the given XML document");

>         xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

>         if(!nbytes || xmlerrcxt->err_occurred)
>             xml_ereport(xmlerrcxt, ERROR, ERRCODE_INTERNAL_ERROR,
>                     "could not indent the given XML document");

>         initStringInfo(&buf);
>         appendStringInfoString(&buf, (const char *)xmlbuf);

>     }

> .. which will catch the doc == NULL before calling xmlDocDumpFormatMemory.

Um ... why are you using PG_TRY here at all? It seems like
you have full control of the actually likely error cases.
The only plausible error out of the StringInfo calls is OOM,
and you probably don't want to trap that at all.

regards, tom lane


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-09 08:17:08
Message-ID: 5e56143f-63bf-beb2-6912-35aff6594e4c@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.02.23 08:23, Tom Lane wrote:
> Um ... why are you using PG_TRY here at all? It seems like
> you have full control of the actually likely error cases.
> The only plausible error out of the StringInfo calls is OOM,
> and you probably don't want to trap that at all.

My intention was to catch any unexpected error from
xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
I can control the likely error cases by checking doc and nbytes.

You suggest something along these lines?

    xmlDocPtr  doc;
    xmlChar    *xmlbuf = NULL;
    text       *arg = PG_GETARG_TEXT_PP(0);
    StringInfoData buf;
    int nbytes;

    doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
GetDatabaseEncoding(), NULL);

    if(!doc)
        elog(ERROR, "could not parse the given XML document");

    xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);

    xmlFreeDoc(doc);

    if(!nbytes)
        elog(ERROR, "could not indent the given XML document");

    initStringInfo(&buf);
    appendStringInfoString(&buf, (const char *)xmlbuf);

    xmlFree(xmlbuf);

    PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));

Thanks!

Best, Jim


From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-09 10:31:08
Message-ID: 62643598-b80d-90f4-8901-b85dccadf4c4@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.02.23 21:35, Jim Jones wrote:
> This small patch introduces a XML pretty print function. It basically
> takes advantage of the indentation feature of xmlDocDumpFormatMemory
> from libxml2 to format XML strings.

I suggest we call it "xmlformat", which is an established term for this.


From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-10 01:10:46
Message-ID: CAHut+Ps7_JcCOtHuG-btzv1FzHppyy1ys8QPFG7u4zoP0R5xUA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> On 09.02.23 08:23, Tom Lane wrote:
> > Um ... why are you using PG_TRY here at all? It seems like
> > you have full control of the actually likely error cases.
> > The only plausible error out of the StringInfo calls is OOM,
> > and you probably don't want to trap that at all.
>
> My intention was to catch any unexpected error from
> xmlDocDumpFormatMemory and handle it properly. But I guess you're right,
> I can control the likely error cases by checking doc and nbytes.
>
> You suggest something along these lines?
>
> xmlDocPtr doc;
> xmlChar *xmlbuf = NULL;
> text *arg = PG_GETARG_TEXT_PP(0);
> StringInfoData buf;
> int nbytes;
>
> doc = xml_parse(arg, XMLOPTION_DOCUMENT, false,
> GetDatabaseEncoding(), NULL);
>
> if(!doc)
> elog(ERROR, "could not parse the given XML document");
>
> xmlDocDumpFormatMemory(doc, &xmlbuf, &nbytes, 1);
>
> xmlFreeDoc(doc);
>
> if(!nbytes)
> elog(ERROR, "could not indent the given XML document");
>
> initStringInfo(&buf);
> appendStringInfoString(&buf, (const char *)xmlbuf);
>
> xmlFree(xmlbuf);
>
> PG_RETURN_XML_P(stringinfo_to_xmltype(&buf));
>
>
> Thanks!
>

Something like that LGTM, but here are some other minor comments.

======

1.
FYI, there are some whitespace warnings applying the v5 patch

[postgres(at)CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch
../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:26:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:29:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:33:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:37:
trailing whitespace.

../patches_misc/v5-0001-Add-pretty-printed-XML-output-option.patch:41:
trailing whitespace.

warning: squelched 48 whitespace errors
warning: 53 lines add whitespace errors.

======
src/test/regress/sql/xml.sql

2.
The v5 patch was already testing NULL, but it might be good to add
more tests to verify the function is behaving how you want for edge
cases. For example,

+-- XML pretty print: NULL, empty string, spaces only...
SELECT xmlpretty(NULL);
SELECT xmlpretty('');
SELECT xmlpretty(' ');

~~

3.
The function is returning XML anyway, so is the '::xml' casting in
these tests necessary?

e.g.
SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);

======
src/include/catalog/pg_proc.dat

4.

+ { oid => '4642', descr => 'Indented text from xml',
+ proname => 'xmlpretty', prorettype => 'xml',
+ proargtypes => 'xml', prosrc => 'xmlpretty' },

Spurious leading space for this new entry.

======
doc/src/sgml/func.sgml

5.
+ <foo id="x">
+ <bar id="y">
+ <var id="z">42</var>
+ </bar>
+ </foo>
+
+(1 row)
+
+]]></screen>

A spurious blank line in the example after the "(1 row)"

~~~

6.
Does this function docs belong in section 9.15.1 "Producing XML
Content"? Or (since it is not really producing content) should it be
moved to the 9.15.3 "Processing XML" section?

------
Kind Regards,
Peter Smith.
Fujitsu Australia


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-10 08:15:50
Message-ID: 4ec6ad44-a6fc-c508-2cba-96316b17548b@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10.02.23 02:10, Peter Smith wrote:
> On Thu, Feb 9, 2023 at 7:17 PM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
> 1.
> FYI, there are some whitespace warnings applying the v5 patch
>
Trailing whitespaces removed. The patch applies now without warnings.
> ======
> src/test/regress/sql/xml.sql
>
> 2.
> The v5 patch was already testing NULL, but it might be good to add
> more tests to verify the function is behaving how you want for edge
> cases. For example,
>
> +-- XML pretty print: NULL, empty string, spaces only...
> SELECT xmlpretty(NULL);
> SELECT xmlpretty('');
> SELECT xmlpretty(' ');

Test cases added.

> 3.
> The function is returning XML anyway, so is the '::xml' casting in
> these tests necessary?
>
> e.g.
> SELECT xmlpretty(NULL)::xml; --> SELECT xmlpretty(NULL);
It is indeed not necessary. Most likely I used it for testing and forgot
to remove it afterwards. Now removed.
> ======
> src/include/catalog/pg_proc.dat
>
> 4.
>
> + { oid => '4642', descr => 'Indented text from xml',
> + proname => 'xmlpretty', prorettype => 'xml',
> + proargtypes => 'xml', prosrc => 'xmlpretty' },
>
> Spurious leading space for this new entry.
Removed.
>
> ======
> doc/src/sgml/func.sgml
>
> 5.
> + <foo id="x">
> + <bar id="y">
> + <var id="z">42</var>
> + </bar>
> + </foo>
> +
> +(1 row)
> +
> +]]></screen>
>
> A spurious blank line in the example after the "(1 row)"
Removed.
> ~~~
>
> 6.
> Does this function docs belong in section 9.15.1 "Producing XML
> Content"? Or (since it is not really producing content) should it be
> moved to the 9.15.3 "Processing XML" section?
Moved to the section 9.15.3

Following the suggestion of Peter Eisentraut I renamed the function to
xmlformat().

v6 attached.

Thanks a lot for the review.

Best, Jim

Attachment Content-Type Size
v6-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.6 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-13 01:15:13
Message-ID: CAHut+PtmMejNUiFQF0qMDMS55nsSN=5Ffw=ywpsEQK2ZxYR7Yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Something is misbehaving.

Using the latest HEAD, and before applying the v6 patch, 'make check'
is working OK.

But after applying the v6 patch, some XML regression tests are failing
because the DETAIL part of the message indicating the wrong syntax
position is not getting displayed. Not only for your new tests -- but
for other XML tests too.

My ./configure looks like this:
./configure --prefix=/usr/local/pg_oss --with-libxml --enable-debug
--enable-cassert --enable-tap-tests CFLAGS="-ggdb -O0 -g3
-fno-omit-frame-pointer"

resulting in:
checking whether to build with XML support... yes
checking for libxml-2.0 >= 2.6.23... yes

~

e.g.(these are a sample of errors)

xml ... FAILED 2561 ms

@@ -344,8 +326,6 @@
<twoerrors>&idontexist;</unbalanced>
^
line 1: Opening and ending tag mismatch: twoerrors line 1 and unbalanced
-<twoerrors>&idontexist;</unbalanced>
- ^
SELECT xmlparse(document '<nosuchprefix:tag/>');
xmlparse
---------------------
@@ -1696,14 +1676,8 @@
SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
-
-^
line 1: Document is empty
-
-^
-- XML format: invalid string (whitespaces)
SELECT xmlformat(' ');
ERROR: invalid XML document
DETAIL: line 1: Start tag expected, '<' not found
-
- ^

~~

Separately (but maybe it's related?), the CF-bot test also reported a
failure [1] with strange error detail differences.

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out
/tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12
09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty

^
-- XML format: invalid string (whitespaces)

------
[1] https://api.cirrus-ci.com/v1/artifact/task/4802219812323328/testrun/build/testrun/regress/regress/regression.diffs

Kind Regards,
Peter Smith.
Fujitsu Australia


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-13 12:15:57
Message-ID: 62dffe3b-7eb0-2f36-b874-839f6864126d@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13.02.23 02:15, Peter Smith wrote:
> Something is misbehaving.
>
> Using the latest HEAD, and before applying the v6 patch, 'make check'
> is working OK.
>
> But after applying the v6 patch, some XML regression tests are failing
> because the DETAIL part of the message indicating the wrong syntax
> position is not getting displayed. Not only for your new tests -- but
> for other XML tests too.

Yes, I noticed it yesterday ... and I'm not sure how to solve it. It
seems that in the system is returning a different error message in the
FreeBSD patch tester, which is causing a regression test in this
particular OS to fail.

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
+++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
@@ -1695,10 +1695,7 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-DETAIL: line 1: switching encoding : no input
-
-^
-line 1: Document is empty
+DETAIL: line 1: Document is empty

^
-- XML format: invalid string (whitespaces)

Does anyone know if there is anything I can do to make the error
messages be consistent among different OS?


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-14 21:55:20
Message-ID: 2962ca3c-abf6-751a-85dc-a9356a384d82@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13.02.23 13:15, Jim Jones wrote:
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
> @@ -1695,10 +1695,7 @@
> -- XML format: empty string
> SELECT xmlformat('');
> ERROR: invalid XML document
> -DETAIL: line 1: switching encoding : no input
> -
> -^
> -line 1: Document is empty
> +DETAIL: line 1: Document is empty
>
> ^
> -- XML format: invalid string (whitespaces)

I couldn't figure out why the error messages are different -- I'm
wondering if the issue is the test environment itself. I just removed
the troubling test case for now

SELECT xmlformat('');

v7 attached.

Thanks for reviewing this patch!

Best, Jim

Attachment Content-Type Size
v7-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.2 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-14 22:45:42
Message-ID: CAHut+PvkFXG3CnT_BBBCpTQT2W3oAoo_1Rj=2-3jQ=yspvyu6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 15, 2023 at 8:55 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> On 13.02.23 13:15, Jim Jones wrote:
>
> diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/xml.out /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out
> --- /tmp/cirrus-ci-build/src/test/regress/expected/xml.out 2023-02-12 09:02:57.077569000 +0000
> +++ /tmp/cirrus-ci-build/build/testrun/regress/regress/results/xml.out 2023-02-12 09:05:45.148100000 +0000
> @@ -1695,10 +1695,7 @@
> -- XML format: empty string
> SELECT xmlformat('');
> ERROR: invalid XML document
> -DETAIL: line 1: switching encoding : no input
> -
> -^
> -line 1: Document is empty
> +DETAIL: line 1: Document is empty
>
> ^
> -- XML format: invalid string (whitespaces)
>
> I couldn't figure out why the error messages are different -- I'm wondering if the issue is the test environment itself. I just removed the troubling test case for now
>
> SELECT xmlformat('');
>
> v7 attached.
>
> Thanks for reviewing this patch!
>

Yesterday I looked at those cfbot configs and noticed all those
machines have different versions of libxml.

2.10.3
2.6.23
2.9.10
2.9.13

But I don't if version numbers have any impact on the different error
details or not.

~

The thing that puzzled me most is that in MY environment (CentOS7;
libxml 20901; PG --with-libxml build) I get this behaviour.

- Without your v6 patch 'make check' is all OK.

- With your v6 patch other XML tests (not only yours) of 'make check'
failed with different error messages.

- Similarly, if I keep the v6 patch but just change (in xmlformat) the
#ifdef USE_LIBXML to be #if 0, then only the new xmlformat tests fail,
but the other XML tests are working OK again.

Those results implied to me that this function code (in my environment
anyway) is somehow introducing a side effect causing the *other* XML
tests to fail.

But so far I was unable to identify the reason. Sorry, I don't know
this XML API well enough to help more.

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 00:05:40
Message-ID: 7f6e4bc2-c4b0-a97e-f5db-51ea300d40e6@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.02.23 23:45, Peter Smith wrote:
> Those results implied to me that this function code (in my environment
> anyway) is somehow introducing a side effect causing the *other* XML
> tests to fail.

I believe I've found the issue. It is probably related to the XML OPTION
settings, as it seems to deliver different error messages when set to
DOCUMENT or CONTENT:

postgres=# SET XML OPTION CONTENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^
postgres=# SET XML OPTION DOCUMENT;
SET
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
LINE 1: SELECT xmlformat('');
                         ^
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
XML OPTION DOCUMENT to the regression tests.

Best, Jim

Attachment Content-Type Size
v8-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.8 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 01:09:13
Message-ID: CAHut+PvWwUG0Z3pzmCVga-0=ZNAk_n2cwo5TbzBaN4CWh1F86g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 15, 2023 at 11:05 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> On 14.02.23 23:45, Peter Smith wrote:
> > Those results implied to me that this function code (in my environment
> > anyway) is somehow introducing a side effect causing the *other* XML
> > tests to fail.
>
> I believe I've found the issue. It is probably related to the XML OPTION
> settings, as it seems to deliver different error messages when set to
> DOCUMENT or CONTENT:
>
> postgres=# SET XML OPTION CONTENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR: invalid XML document
> DETAIL: line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
> postgres=# SET XML OPTION DOCUMENT;
> SET
> postgres=# SELECT xmlformat('');
> ERROR: invalid XML document
> LINE 1: SELECT xmlformat('');
> ^
> DETAIL: line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v8 attached reintroduces the SELECT xmlformat('') test case and adds SET
> XML OPTION DOCUMENT to the regression tests.
>

With v8, in my environment, in psql I see something slightly different:

test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
DETAIL: line 1: switching encoding : no input
line 1: Document is empty
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# SELECT xmlformat('');
ERROR: invalid XML document
LINE 1: SELECT xmlformat('');
^
DETAIL: line 1: switching encoding : no input
line 1: Document is empty

~~

test_pub=# SET XML OPTION CONTENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^
test_pub=# SET XML OPTION DOCUMENT;
SET
test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
ERROR: relation "xmltest" does not exist
LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
^

~~

Because the expected extra detail stuff is missing the regression
tests are still failing for me.

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 07:10:44
Message-ID: 2e9edb52-30ed-1986-2b95-21d321406043@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.23 02:09, Peter Smith wrote:
> With v8, in my environment, in psql I see something slightly different:
>
>
> test_pub=# SET XML OPTION CONTENT;
> SET
> test_pub=# SELECT xmlformat('');
> ERROR: invalid XML document
> DETAIL: line 1: switching encoding : no input
> line 1: Document is empty
> test_pub=# SET XML OPTION DOCUMENT;
> SET
> test_pub=# SELECT xmlformat('');
> ERROR: invalid XML document
> LINE 1: SELECT xmlformat('');
> ^
> DETAIL: line 1: switching encoding : no input
> line 1: Document is empty
>
> ~~
>
> test_pub=# SET XML OPTION CONTENT;
> SET
> test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> ERROR: relation "xmltest" does not exist
> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> ^
> test_pub=# SET XML OPTION DOCUMENT;
> SET
> test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> ERROR: relation "xmltest" does not exist
> LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> ^
>
> ~~

Yes... a cfbot also complained about the same thing.

Setting the VERBOSITY to terse might solve this issue:

postgres=# \set VERBOSITY terse
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document

postgres=# \set VERBOSITY default
postgres=# SELECT xmlformat('');
ERROR:  invalid XML document
DETAIL:  line 1: switching encoding : no input

^
line 1: Document is empty

^

v9 wraps the corner test cases with VERBOSITY terse to reduce the error
message output.

Thanks!

Best, Jim

Attachment Content-Type Size
v9-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 19.4 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 09:06:49
Message-ID: CAHut+PtoH8zkBHxv44XyO+o4kW_nZdGhNdVaJ_cpEjrckKDqtw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Feb 15, 2023 at 6:10 PM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> On 15.02.23 02:09, Peter Smith wrote:
> > With v8, in my environment, in psql I see something slightly different:
> >
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR: invalid XML document
> > DETAIL: line 1: switching encoding : no input
> > line 1: Document is empty
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# SELECT xmlformat('');
> > ERROR: invalid XML document
> > LINE 1: SELECT xmlformat('');
> > ^
> > DETAIL: line 1: switching encoding : no input
> > line 1: Document is empty
> >
> > ~~
> >
> > test_pub=# SET XML OPTION CONTENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> > ERROR: relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> > ^
> > test_pub=# SET XML OPTION DOCUMENT;
> > SET
> > test_pub=# INSERT INTO xmltest VALUES (3, '<wrong');
> > ERROR: relation "xmltest" does not exist
> > LINE 1: INSERT INTO xmltest VALUES (3, '<wrong');
> > ^
> >
> > ~~
>
> Yes... a cfbot also complained about the same thing.
>
> Setting the VERBOSITY to terse might solve this issue:
>
> postgres=# \set VERBOSITY terse
> postgres=# SELECT xmlformat('');
> ERROR: invalid XML document
>
> postgres=# \set VERBOSITY default
> postgres=# SELECT xmlformat('');
> ERROR: invalid XML document
> DETAIL: line 1: switching encoding : no input
>
> ^
> line 1: Document is empty
>
> ^
>
> v9 wraps the corner test cases with VERBOSITY terse to reduce the error
> message output.
>

Bingo!! Your v9 patch now passes all 'make check' tests for me.

But I'll leave it to a committer to decide if this VERBOSITY toggle is
the best fix.

(I don't understand, maybe someone can explain, how the patch managed
to mess verbosity of the existing tests.)

------
Kind Regards,
Peter Smith.
Fujitsu Austalia.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 09:37:39
Message-ID: 75b74663-7fe8-c306-8914-1ada9a2bab41@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.23 10:06, Peter Smith wrote:
> Bingo!! Your v9 patch now passes all 'make check' tests for me.
Nice! It also passed all tests in the patch tester.
> But I'll leave it to a committer to decide if this VERBOSITY toggle is
> the best fix.

I see now other test cases in the xml.sql file that also use this
option, so it might be a known "issue".

Shall we now set the patch to "Ready for Commiter"?

Thanks a lot for the review!

Best, Jim


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 10:11:59
Message-ID: 20230215101159.ef5usgukz2qfgp5n@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2023-Feb-13, Peter Smith wrote:

> Something is misbehaving.
>
> Using the latest HEAD, and before applying the v6 patch, 'make check'
> is working OK.
>
> But after applying the v6 patch, some XML regression tests are failing
> because the DETAIL part of the message indicating the wrong syntax
> position is not getting displayed. Not only for your new tests -- but
> for other XML tests too.

Note that there's another file, xml_2.out, which does not contain the
additional part of the DETAIL message. I suspect in Peter's case it's
xml_2.out that was originally being used as a comparison basis (not
xml.out), but that one is not getting patched, and ultimately the diff
is reported for him against xml.out because none of them matches.

An easy way forward might be to manually apply the patch from xml.out to
xml_2.out, and edit it by hand to remove the additional lines.

See commit 085423e3e326 for a bit of background.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 10:33:50
Message-ID: 10da4d21-8cc9-c8f7-2a6e-2fc90aff764b@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.23 11:11, Alvaro Herrera wrote:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message. I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
>
> An easy way forward might be to manually apply the patch from xml.out to
> xml_2.out, and edit it by hand to remove the additional lines.
>
> See commit 085423e3e326 for a bit of background.

Hi Álvaro,

As my test cases were not specifically about how the error message looks
like, I thought that suppressing part of the error messages by setting
VERBOSITY terse would suffice.[1] Is this approach not recommended?

Thanks!

Best, Jim

1 - v9 patch:
https://www.postgresql.org/message-id/CAHut%2BPtoH8zkBHxv44XyO%2Bo4kW_nZdGhNdVaJ_cpEjrckKDqtw%40mail.gmail.com


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 11:11:22
Message-ID: 20230215111122.ogtc7iz4xuljufsv@alvherre.pgsql
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2023-Feb-15, Jim Jones wrote:

> Hi Álvaro,
>
> As my test cases were not specifically about how the error message looks
> like, I thought that suppressing part of the error messages by setting
> VERBOSITY terse would suffice.[1] Is this approach not recommended?

Well, I don't see why we would depart from what we've been doing in the
rest of the XML tests. I did see that patch and I thought it was taking
the wrong approach.

--
Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 12:31:43
Message-ID: 8f593e67-d8cd-7015-262d-d7b45e53230d@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.02.23 12:11, Alvaro Herrera wrote:
> Well, I don't see why we would depart from what we've been doing in the
> rest of the XML tests. I did see that patch and I thought it was taking
> the wrong approach.

Fair point.

v10 patches the xml.out to xml_2.out - manually removing the additional
lines.

Thanks for the review!

Best, Jim

Attachment Content-Type Size
v10-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 28.3 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 13:49:30
Message-ID: c4086db7-b470-5b28-d5ee-e70e08a2be4f@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Accidentally left the VERBOSE settings out -- sorry!

Now it matches the approach used in a xpath test in xml.sql, xml.out,
xml_1.out and xml_2.out

-- Since different libxml versions emit slightly different
-- error messages, we suppress the DETAIL in this test.
\set VERBOSITY terse
SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
ERROR:  could not parse XML document
\set VERBOSITY default

v11 now correctly sets xml_2.out.

Best, Jim

Attachment Content-Type Size
v11-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 28.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 15:07:43
Message-ID: 2083679.1676473663@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> writes:
> Note that there's another file, xml_2.out, which does not contain the
> additional part of the DETAIL message. I suspect in Peter's case it's
> xml_2.out that was originally being used as a comparison basis (not
> xml.out), but that one is not getting patched, and ultimately the diff
> is reported for him against xml.out because none of them matches.
> See commit 085423e3e326 for a bit of background.

Yeah. That's kind of sad, because it means there are still broken
libxml2s out there in 2023. Nonetheless, since there are, it is not
optional to fix all three expected-files.

regards, tom lane


From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-15 23:13:35
Message-ID: CAHut+Pthb=J88NadLf0wOrhGmvO8JrpKoqKeDHy4EGJNqTD_5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 16, 2023 at 12:49 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> Accidentally left the VERBOSE settings out -- sorry!
>
> Now it matches the approach used in a xpath test in xml.sql, xml.out,
> xml_1.out and xml_2.out
>
> -- Since different libxml versions emit slightly different
> -- error messages, we suppress the DETAIL in this test.
> \set VERBOSITY terse
> SELECT xpath('/*', '<invalidns xmlns=''&lt;''/>');
> ERROR: could not parse XML document
> \set VERBOSITY default
>
> v11 now correctly sets xml_2.out.
>
> Best, Jim

Firstly, Sorry it seems like I made a mistake and was premature
calling bingo above for v9.
- today I repeated v9 'make check' and found it failing still.
- the new xmlformat tests are OK, but some pre-existing xmlparse tests
are broken.
- see attached file pretty-v9-results

----

OTOH, the absence of xml_2.out from this patch appears to be the
correct explanation for why my results have been differing.

----

Today I fetched and tried the latest v11.

It is failing too, but only just.
- see attached file pretty-v11-results

It looks only due to a whitespace EOF issue in the xml_2.out

@@ -1679,4 +1679,4 @@
-- XML format: empty string
SELECT xmlformat('');
ERROR: invalid XML document
-\set VERBOSITY default
\ No newline at end of file
+\set VERBOSITY default

------

The attached patch update (v12-0002) fixes the xml_2.out for me.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment Content-Type Size
v12-0001-Add-pretty-printed-XML-output-option.patch application/octet-stream 28.0 KB
v12-0002-PS-fix-EOF-for-xml_2.out.patch application/octet-stream 721 bytes
pretty-v9-results application/octet-stream 3.2 KB
pretty-v11-results application/octet-stream 539 bytes

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-16 04:38:01
Message-ID: CANNMO+Kwb4_87G8qDeN+Vk1B1vX3HvgoGW+13fJ-b6rj7qbAww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut <
peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:

> I suggest we call it "xmlformat", which is an established term for this.
>

Some very-very old, rusted memory told me that there was something in
standard – and indeed, it seems it described an optional Feature X069,
“XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing should
go there, to XMLSERIALIZE, to follow the standard?

Oracle also has an option for it in XMLSERIALIZE, although in a slightly
different form, with ability to specify the number of spaces for
indentation
https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-16 07:08:16
Message-ID: 9c58fb29-05c9-3c34-8521-d807e53a53e1@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.02.23 05:38, Nikolay Samokhvalov wrote:
> On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> I suggest we call it "xmlformat", which is an established term for
> this.
>
>
> Some very-very old, rusted memory told me that there was something in
> standard – and indeed, it seems it described an optional Feature X069,
> “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing
> should go there, to XMLSERIALIZE, to follow the standard?
>
> Oracle also has an option for it in XMLSERIALIZE, although in a
> slightly different form, with ability to specify the number of spaces
> for indentation
> https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

Hi Nikolay,

My first thought was to call it xmlpretty, to make it consistent with
the jsonb equivalent "jsonb_pretty". But yes, you make a good
observation .. xmlserialize seems to be a much better candidate.

I would be willing to refactor my patch if we agree on xmlserialize.

Thanks for the suggestion!

Jim


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-16 07:51:16
Message-ID: 3ee6a582-c459-68cc-5216-854ab9301e49@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.02.23 00:13, Peter Smith wrote:
> Today I fetched and tried the latest v11.
>
> It is failing too, but only just.
> - see attached file pretty-v11-results
>
> It looks only due to a whitespace EOF issue in the xml_2.out
>
> @@ -1679,4 +1679,4 @@
> -- XML format: empty string
> SELECT xmlformat('');
> ERROR: invalid XML document
> -\set VERBOSITY default
> \ No newline at end of file
> +\set VERBOSITY default
>
> ------
>
> The attached patch update (v12-0002) fixes the xml_2.out for me.

It works for me too.

Thanks for the quick fix!

Jim


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-16 22:12:09
Message-ID: 1f20326d-8282-73d4-c4b2-7f3812e41c4c@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.02.23 00:13, Peter Smith wrote:
> Today I fetched and tried the latest v11.
> It is failing too, but only just.
> - see attached file pretty-v11-results
>
> It looks only due to a whitespace EOF issue in the xml_2.out
>
> @@ -1679,4 +1679,4 @@
> -- XML format: empty string
> SELECT xmlformat('');
> ERROR: invalid XML document
> -\set VERBOSITY default
> \ No newline at end of file
> +\set VERBOSITY default
>
> ------
>
> The attached patch update (v12-0002) fixes the xml_2.out for me.

I'm squashing v12-0001 and v12-0002 (v13 attached). There is still an
open discussion on renaming the function to xmlserialize,[1] but it
shouldn't be too difficult to change it later in case we reach a
consensus :)

Thanks!

Jim

1-
https://www.postgresql.org/message-id/CANNMO%2BKwb4_87G8qDeN%2BVk1B1vX3HvgoGW%2B13fJ-b6rj7qbAww%40mail.gmail.com

Attachment Content-Type Size
v13-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 28.0 KB

From: Andrey Borodin <amborodin86(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-17 00:08:21
Message-ID: CAAhFRxjM4XZeQhKTfSKeQSyMAK6xssOu6UCT2rmVkBfi4NtmJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 16, 2023 at 2:12 PM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:
>
> I'm squashing v12-0001 and v12-0002 (v13 attached).
I've looked into the patch. The code looks to conform to usual expectations.
One nit: this comment should have just one asterisk.
+ /**
And I have a dumb question: is this function protected from using
external XML namespaces? What if the user passes some xmlns that will
force it to read namespace data from the server filesystem? Or is it
not possible? I see there are a lot of calls to xml_parse() anyway,
but still...

Best regards, Andrey Borodin.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-17 09:14:46
Message-ID: beb5a47d-4684-4b2f-8647-bacb6bd60acf@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16.02.23 05:38, Nikolay Samokhvalov wrote:
> On Thu, Feb 9, 2023 at 2:31 AM Peter Eisentraut
> <peter(dot)eisentraut(at)enterprisedb(dot)com> wrote:
>
> I suggest we call it "xmlformat", which is an established term for
> this.
>
>
> Some very-very old, rusted memory told me that there was something in
> standard – and indeed, it seems it described an optional Feature X069,
> “XMLSerialize: INDENT” for XMLSERIALIZE. So probably pretty-printing
> should go there, to XMLSERIALIZE, to follow the standard?
>
> Oracle also has an option for it in XMLSERIALIZE, although in a
> slightly different form, with ability to specify the number of spaces
> for indentation
> https://docs.oracle.com/database/121/SQLRF/functions268.htm#SQLRF06231.

After your comment I'm studying the possibility to extend the existing
xmlserialize function to add the indentation feature. If so, how do you
think it should look like? An extra parameter? e.g.

SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text,
true);

.. or more or like Oracle does it

SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB
INDENT)
FROM dual;

Thanks!

Best, Jim


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Andrey Borodin <amborodin86(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-17 19:01:35
Message-ID: 1ff64a2d-a451-8e74-a055-aee137ecf8f7@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17.02.23 01:08, Andrey Borodin wrote:
> On Thu, Feb 16, 2023 at 2:12 PM Jim Jones<jim(dot)jones(at)uni-muenster(dot)de> wrote:
> I've looked into the patch. The code looks to conform to usual
> expectations.
> One nit: this comment should have just one asterisk.
> + /**

Thanks for reviewing! Asterisk removed in v14.

> And I have a dumb question: is this function protected from using
> external XML namespaces? What if the user passes some xmlns that will
> force it to read namespace data from the server filesystem? Or is it
> not possible? I see there are a lot of calls to xml_parse() anyway,
> but still...

According to the documentation,[1] such validations are not supported.

/"The |xml| type does not validate input values against a document type
declaration (DTD), even when the input value specifies a DTD. There is
also currently no built-in support for validating against other XML
schema languages such as XML Schema."/

But I'll have a look at the code to be sure :)

Best, Jim

1- https://www.postgresql.org/docs/15/datatype-xml.html

Attachment Content-Type Size
v14-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 28.0 KB

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-17 22:24:13
Message-ID: CANNMO+L5X2ynyXL2RK7gTVA-pj=Z8o8FCCr6RwLfRK_LinjSbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 17, 2023 at 1:14 AM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:

> After your comment I'm studying the possibility to extend the existing
> xmlserialize function to add the indentation feature. If so, how do you
> think it should look like? An extra parameter? e.g.
>
> SELECT xmlserialize(DOCUMENT '<foo><bar>42</bar></foo>'::XML AS text,
> true);
>
> .. or more or like Oracle does it
>
> SELECT XMLSERIALIZE(DOCUMENT xmltype('<foo><bar>42</bar></foo>') AS BLOB
> INDENT)
> FROM dual;
>

My idea was to follow the SQL standard (part 14, SQL/XML); unfortunately,
there is no free version, but there are drafts at
http://www.wiscorp.com/SQLStandards.html.

<XML character string serialization> ::=
XMLSERIALIZE <left paren> [ <document or content> ]

<XML value expression> AS <data type>
[ <XML serialize bom> ]
[ <XML serialize version> ]
[ <XML declaration option> ]

[ <XML serialize indent> ]
<right paren>

<XML serialize indent> ::=
[ NO ] INDENT

Oracle's extension SIZE=n also seems interesting (including a special case
SIZE=0, which means using new-line characters without spaces for each line).


From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-18 18:09:34
Message-ID: d79950fd-b3f5-783c-919a-a95de23c0c2c@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17.02.23 23:24, Nikolay Samokhvalov wrote:
>
> My idea was to follow the SQL standard (part 14, SQL/XML);
> unfortunately, there is no free version, but there are drafts at
> http://www.wiscorp.com/SQLStandards.html
> <http://www.wiscorp.com/SQLStandards.html>.
>
> <XML character string serialization> ::= XMLSERIALIZE <left paren> [
> <document or content> ]
>
> <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML
> serialize version> ] [ <XML declaration option> ]
>
> [ <XML serialize indent> ] <right paren>
>
> <XML serialize indent> ::= [ NO ] INDENT

Good find. It would be better to use this standard syntax.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-20 23:06:05
Message-ID: 4c2efca5-8257-2424-6687-07c65e319e95@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18.02.23 19:09, Peter Eisentraut wrote:
> On 17.02.23 23:24, Nikolay Samokhvalov wrote:
>>
>> My idea was to follow the SQL standard (part 14, SQL/XML);
>> unfortunately, there is no free version, but there are drafts at
>> http://www.wiscorp.com/SQLStandards.html
>> <http://www.wiscorp.com/SQLStandards.html>.
>>
>> <XML character string serialization> ::= XMLSERIALIZE <left paren> [
>> <document or content> ]
>>
>> <XML value expression> AS <data type> [ <XML serialize bom> ] [ <XML
>> serialize version> ] [ <XML declaration option> ]
>>
>> [ <XML serialize indent> ] <right paren>
>>
>> <XML serialize indent> ::= [ NO ] INDENT
>
> Good find.  It would be better to use this standard syntax.

As suggested by Peter and Nikolay, v15 now removes the xmlformat
function from the catalog and adds the [NO] INDENT option to
xmlserialize, as described in X069.

postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text INDENT);
              xmlserialize
----------------------------------------
 <?xml version="1.0" encoding="UTF-8"?>+
 <foo>                                 +
   <bar>                               +
     <val x="y">42</val>               +
   </bar>                              +
 </foo>                                +

(1 row)

postgres=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text NO INDENT);
               xmlserialize
-------------------------------------------
 <foo><bar><val x="y">42</val></bar></foo>
(1 row)

Although the indent feature is designed to work with xml strings of type
DOCUMENT, this implementation also allows the usage of CONTENT type
strings as long as it contains a well-formed xml. It will throw an error
otherwise.

Thanks!

Best, Jim

Attachment Content-Type Size
v15-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 22.6 KB

From: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 07:05:38
Message-ID: CANNMO+Kix1DRRa7m4o07qgRHh=0vOiO_x+_PsHOkQvT4GgLrLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 20, 2023 at 3:06 PM Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> wrote:

> As suggested by Peter and Nikolay, v15 now removes the xmlformat
> function from the catalog and adds the [NO] INDENT option to
> xmlserialize, as described in X069.\
>

Great. I'm checking this patch and it seems, indentation stops working if
we have a text node inside:

gitpod=# select xmlserialize(document '<xml><more>13</more></xml>' as text
indent);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml> +
<more>13</more> +
</xml> +

(1 row)

gitpod=# select xmlserialize(document '<xml>text<more>13</more></xml>' as
text indent);
xmlserialize
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>+
<xml>text<more>13</more></xml> +

(1 row)

Worth to mention, Oracle behaves similarly -- indentation doesn't work:
https://dbfiddle.uk/hRz5sXdM.

But is this as expected? Shouldn't it be like this:
<xml>
text
<more>13</more>
</xml>
?


From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 07:20:04
Message-ID: CAHut+PtMHeBBZ2zv0Bk7-4KMkSrrxZ9PZxUCzhpwTGVk+8G3cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here are some review comments for patch v15-0001

FYI, the patch applies clean and tests OK for me.

======
doc/src/sgml/datatype.sgml

1.
XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )

~

Another/shorter way to write that syntax is like below. For me, it is
easier to read. YMMV.

XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
AS <replaceable>type</replaceable> [ [NO] INDENT ] )

======
src/backend/executor/execExprInterp.c

2. ExecEvalXmlExpr

@@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
{
Datum *argvalue = op->d.xmlexpr.argvalue;
bool *argnull = op->d.xmlexpr.argnull;
-
+ bool indent = op->d.xmlexpr.xexpr->indent;
+ text *data;
/* argument type is known to be xml */
Assert(list_length(xexpr->args) == 1);
Missing whitespace after the variable declarations

~~~

3.
+
+ data = xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+ if(indent)
+ *op->resvalue = PointerGetDatum(xmlformat(data));
+ else
+ *op->resvalue = PointerGetDatum(data);
+
}

Unnecessary blank line at the end.
======
src/backend/utils/adt/xml.

4. xmlformat

+#else
+ NO_XML_SUPPORT();
+return 0;
+#endif

Wrong indentation (return 0) in the indentation function? ;-)

------
Kind Regards,
Peter Smith.
Fujitsu Australia


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 09:15:50
Message-ID: 39c309eb-389b-4b20-47cd-a9f41e25dbc4@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.02.23 08:05, Nikolay Samokhvalov wrote:
>
> But is this as expected? Shouldn't it be like this:
> <xml>
>   text
>   <more>13</more>
> </xml>
> ?

Oracle and other parsers I know also do not work well with mixed
contents.[1,2] I believe libxml2's parser does not know where to put the
newline, as mixed values can contain more than one text node:

<xml>text<more>13</more> text2 text3</xml> [3]

And applying this logic the output could look like this ..

<xml>text
  <more>13</more>text2 text3
</xml>

or even this

<xml>
  text
  <more>13</more>
  text2 text3
</xml>

.. which doesn't seem right either. Perhaps a note about mixed contents
in the docs would make things clearer?

Thanks for the review!

Jim

1- https://xmlpretty.com/

2- https://www.samltool.com/prettyprint.php

3- https://dbfiddle.uk/_CcC8h3I


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 09:35:59
Message-ID: 2eac4d78-2b20-e8ec-54cc-22a88653a1cc@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.02.23 08:20, Peter Smith wrote:
> Here are some review comments for patch v15-0001
>
> FYI, the patch applies clean and tests OK for me.
>
> ======
> doc/src/sgml/datatype.sgml
>
> 1.
> XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
> AS <replaceable>type</replaceable> [ { NO INDENT | INDENT } ] )
>
> ~
>
> Another/shorter way to write that syntax is like below. For me, it is
> easier to read. YMMV.
>
> XMLSERIALIZE ( { DOCUMENT | CONTENT } <replaceable>value</replaceable>
> AS <replaceable>type</replaceable> [ [NO] INDENT ] )
Indeed simpler to read.
> ======
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
> {
> Datum *argvalue = op->d.xmlexpr.argvalue;
> bool *argnull = op->d.xmlexpr.argnull;
> -
> + bool indent = op->d.xmlexpr.xexpr->indent;
> + text *data;
> /* argument type is known to be xml */
> Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.
> ~~~
>
> 3.
> +
> + data = xmltotext_with_xmloption(DatumGetXmlP(value),
> + xexpr->xmloption);
> + if(indent)
> + *op->resvalue = PointerGetDatum(xmlformat(data));
> + else
> + *op->resvalue = PointerGetDatum(data);
> +
> }
>
> Unnecessary blank line at the end.
blank line removed.
> ======
> src/backend/utils/adt/xml.
>
> 4. xmlformat
>
> +#else
> + NO_XML_SUPPORT();
> +return 0;
> +#endif
>
> Wrong indentation (return 0) in the indentation function? ;-)

indentation corrected.

v16 attached.

Thanks a lot!

Jim

Attachment Content-Type Size
v16-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 22.6 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 22:45:02
Message-ID: CAHut+PsgXxbzzsG+VqHUa4Kz3G4B-anNmF3-GENwuUdfMJtowA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here are some review comments for patch v16-0001.

======
> src/backend/executor/execExprInterp.c
>
> 2. ExecEvalXmlExpr
>
> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
> {
> Datum *argvalue = op->d.xmlexpr.argvalue;
> bool *argnull = op->d.xmlexpr.argnull;
> -
> + bool indent = op->d.xmlexpr.xexpr->indent;
> + text *data;
> /* argument type is known to be xml */
> Assert(list_length(xexpr->args) == 1);
> Missing whitespace after the variable declarations
Whitespace added.

~

Oh, I meant something different to that fix. I meant there is a
missing blank line after the last ('data') variable declaration.

======
Test code.

I wondered if there ought to be a test that demonstrates explicitly
saying NO INDENT will give the identical result to just omitting it.

For example:

test=# -- no indent is default
test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)

test=# SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
?column?
----------
t
(1 row)

------
Kind Regards,
Peter Smith.
Fujitsu Australia


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-22 23:41:53
Message-ID: fb2a543b-e615-d9ba-8402-909631f9b212@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 22.02.23 23:45, Peter Smith wrote:
> src/backend/executor/execExprInterp.c
>> 2. ExecEvalXmlExpr
>>
>> @@ -3829,7 +3829,8 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
>> {
>> Datum *argvalue = op->d.xmlexpr.argvalue;
>> bool *argnull = op->d.xmlexpr.argnull;
>> -
>> + bool indent = op->d.xmlexpr.xexpr->indent;
>> + text *data;
>> /* argument type is known to be xml */
>> Assert(list_length(xexpr->args) == 1);
>> Missing whitespace after the variable declarations
> Whitespace added.
>
> ~
>
> Oh, I meant something different to that fix. I meant there is a
> missing blank line after the last ('data') variable declaration.
I believe I see it now (it took me a while) :)
> ======
> Test code.
>
> I wondered if there ought to be a test that demonstrates explicitly
> saying NO INDENT will give the identical result to just omitting it.
>
> For example:
>
> test=# -- no indent is default
> test=# SELECT xmlserialize(DOCUMENT '<foo><bar><val
> x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
> '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
> ?column?
> ----------
> t
> (1 row)
>
> test=# SELECT xmlserialize(CONTENT '<foo><bar><val
> x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
> '<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
> ?column?
> ----------
> t
> (1 row)

Actually NO INDENT just ignores this feature and doesn't call the
function at all, so in this particular case the result sets will always
be identical. But yes, I totally agree that a test case for that is also
important.

v17 attached.

Thanks!

Best, Jim

Attachment Content-Type Size
v17-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 24.7 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-23 01:52:03
Message-ID: CAHut+Pvryt_+wA7TwKVaaF0e6Pm5LaPZQLKatp=78oFd8vMh3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here are my review comments for patch v17-0001.

======
src/test/regress/sql/xml.sql

The blank line(s) which previously separated the xmlserialize tests
from the xml IS [NOT] DOCUMENT tests are now missing...

e.g.

-- indent different encoding (returns UTF-8)
SELECT xmlserialize(DOCUMENT '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>&#52;&#50;</val></bar></foo>' AS
text INDENT);
SELECT xmlserialize(CONTENT '<?xml version="1.0"
encoding="ISO-8859-1"?><foo><bar><val>&#52;&#50;</val></bar></foo>' AS
text INDENT);
-- 'no indent' = not using 'no indent'
SELECT xmlserialize(DOCUMENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(DOCUMENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xmlserialize(CONTENT '<foo><bar><val
x="y">42</val></bar></foo>' AS text) = xmlserialize(CONTENT
'<foo><bar><val x="y">42</val></bar></foo>' AS text NO INDENT);
SELECT xml '<foo>bar</foo>' IS DOCUMENT;
SELECT xml '<foo>bar</foo><bar>foo</bar>' IS DOCUMENT;
SELECT xml '<abc/>' IS NOT DOCUMENT;
SELECT xml 'abc' IS NOT DOCUMENT;
SELECT '<>' IS NOT DOCUMENT;

~~

Apart from that, patch v17 LGTM.

------
Kind Regards,
Peter Smith.
Fujitsu Australia


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-23 06:36:16
Message-ID: 6cb296fe-4b49-3691-633e-2e2366997b04@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23.02.23 02:52, Peter Smith wrote:
> Here are my review comments for patch v17-0001.
>
> ======
> src/test/regress/sql/xml.sql
>
> The blank line(s) which previously separated the xmlserialize tests
> from the xml IS [NOT] DOCUMENT tests are now missing...

v18 adds a new line in the xml.sql file to separate the xmlserialize
test cases from the rest.

Thanks!

Best, Jim

Attachment Content-Type Size
v18-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 24.7 KB

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-23 07:51:47
Message-ID: 3baf8f25-3bd7-7db5-0b31-e4c92b6269ba@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23.02.23 07:36, Jim Jones wrote:
> On 23.02.23 02:52, Peter Smith wrote:
>> Here are my review comments for patch v17-0001.
>>
>> ======
>> src/test/regress/sql/xml.sql
>>
>> The blank line(s) which previously separated the xmlserialize tests
>> from the xml IS [NOT] DOCUMENT tests are now missing...
>
> v18 adds a new line in the xml.sql file to separate the xmlserialize
> test cases from the rest.

In kwlist.h you have

PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)

but you can actually make it BARE_LABEL, which is preferable.

More importantly, you need to add the new keyword to the
bare_label_keyword production in gram.y. I thought we had some tooling
in the build system to catch this kind of omission, but it's apparently
not working right now.

Elsewhere, let's rename the xmlformat() C function to xmlserialize() (or
maybe something like xmlserialize_indent()), so the association is clearer.


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-23 08:20:00
Message-ID: 1cbe1087-6acb-2b0a-dc9b-0a2d5b99e466@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 23.02.23 08:51, Peter Eisentraut wrote:
> In kwlist.h you have
>
>     PG_KEYWORD("indent", INDENT, UNRESERVED_KEYWORD, AS_LABEL)
>
> but you can actually make it BARE_LABEL, which is preferable.
>
> More importantly, you need to add the new keyword to the
> bare_label_keyword production in gram.y.  I thought we had some
> tooling in the build system to catch this kind of omission, but it's
> apparently not working right now.
Entry in kwlist.h changed to BARE_LABEL.
>
> Elsewhere, let's rename the xmlformat() C function to xmlserialize()
> (or maybe something like xmlserialize_indent()), so the association is
> clearer.
>
xmlserialize_indent sounds much better and makes the association indeed
clearer. Changed in v19.

v19 attached.

Thanks for the review!

Best, Jim

Attachment Content-Type Size
v19-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 24.9 KB

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-02-23 22:30:59
Message-ID: CAHut+PvBUTf9p+Y6+mpWpz59=3eL_EKQ+7Ysxpa9hYGdncDMdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The patch v19 LGTM.

- v19 applies cleanly for me
- Full clean build OK
- HTML docs build and render OK
- The 'make check' tests all pass for me
- Also cfbot reports latest patch has no errors -- http://cfbot.cputube.org/

So, I marked it a "Ready for Committer" in the CF --
https://commitfest.postgresql.org/42/4162/

------
Kind Regards,
Peter Smith.
Fujitsu Australia


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-09 17:38:07
Message-ID: 429187.1678383487@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While reviewing this patch, I started to wonder why we don't eliminate
the maintenance hassle of xml_1.out by putting in a short-circuit
at the top of the test, similar to those in some other scripts:

/* skip test if XML support not compiled in */
SELECT '<value>one</value>'::xml;
\if :ERROR
\quit
\endif

(and I guess xmlmap.sql could get the same treatment).

The only argument I can think of against it is that the current
approach ensures we produce a clean error (and not, say, a crash)
for all xml.c entry points not just xml_in. I'm not sure how much
that's worth though. The compiler/linker would tell us if we miss
compiling out every reference to libxml2.

Thoughts?

regards, tom lane


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-09 18:41:29
Message-ID: 4b1e081b-831a-e539-9864-8d8a16a66669@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.03.23 18:38, Tom Lane wrote:
> While reviewing this patch, I started to wonder why we don't eliminate
> the maintenance hassle of xml_1.out by putting in a short-circuit
> at the top of the test, similar to those in some other scripts:
>
> /* skip test if XML support not compiled in */
> SELECT '<value>one</value>'::xml;
> \if :ERROR
> \quit
> \endif
>
> (and I guess xmlmap.sql could get the same treatment).
>
> The only argument I can think of against it is that the current
> approach ensures we produce a clean error (and not, say, a crash)
> for all xml.c entry points not just xml_in. I'm not sure how much
> that's worth though. The compiler/linker would tell us if we miss
> compiling out every reference to libxml2.
>
> Thoughts?
>
> regards, tom lane

Hi Tom,

I agree it would make things easier and it could indeed save some time
(and some CI runs ;)).

However, checking in the absence of libxml2 if an error message is
raised, and checking if this error message is the one we expect, is IMHO
also a very nice test. But I guess I could also live with skipping the
whole thing.

Best, Jim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-09 20:21:36
Message-ID: 532259.1678393296@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Smith <smithpb2250(at)gmail(dot)com> writes:
> The patch v19 LGTM.

I've looked through this now, and have some minor complaints and a major
one. The major one is that it doesn't work for XML that doesn't satisfy
IS DOCUMENT. For example,

regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
?column?
----------
f
(1 row)

regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
xmlserialize
-------------------------------------------
<bar><val x="y">42</val></bar><foo></foo>
(1 row)

regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
ERROR: invalid XML document
DETAIL: line 1: Extra content at the end of the document
<bar><val x="y">42</val></bar><foo></foo>
^

This is not what the documentation promises, and I don't think it's
good enough --- the SQL spec has no restriction saying you can't
use INDENT with CONTENT. I tried adjusting things so that we call
xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
but all that got me was empty output (except for a document header).
It seems like xmlDocDumpFormatMemory is not the thing to use, at least
not in the CONTENT case. But libxml2 has a few other "dump"
functions, so maybe we can use a different one? I see we are using
xmlNodeDump elsewhere, and that has a format option, so maybe there's
a way forward there.

A lesser issue is that INDENT tacks on a document header (XML declaration)
whether there was one or not. I'm not sure whether that's an appropriate
thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
case. We have code that can strip off the header again, but we
need to figure out exactly when to apply it.

I also suspect that it's outright broken to attach a header claiming
the data is now in UTF8 encoding. If the database encoding isn't
UTF8, then either that's a lie or we now have an encoding violation.

Another thing that's mildly irking me is that the current
factorization of this code will result in xml_parse'ing the data
twice, if you have both DOCUMENT and INDENT specified. We could
consider avoiding that if we merged the indentation functionality
into xmltotext_with_xmloption, but it's probably premature to do so
when we haven't figured out how to get the output right --- we might
end up needing two xml_parse calls anyway with different parameters,
perhaps.

I also had a bunch of cosmetic complaints (mostly around this having
a bad case of add-at-the-end-itis), which I've cleaned up in the
attached v20. This doesn't address any of the above, however.

regards, tom lane

Attachment Content-Type Size
v20-0001-Add-pretty-printed-XML-output-option.patch text/x-diff 24.1 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-10 13:30:07
Message-ID: 906cfd1d-63a0-3cf5-7291-2dadb2e149ff@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks a lot for the review!

On 09.03.23 21:21, Tom Lane wrote:
> I've looked through this now, and have some minor complaints and a major
> one. The major one is that it doesn't work for XML that doesn't satisfy
> IS DOCUMENT. For example,
>
> regression=# select '<bar><val x="y">42</val></bar><foo></foo>'::xml is document;
> ?column?
> ----------
> f
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text);
> xmlserialize
> -------------------------------------------
> <bar><val x="y">42</val></bar><foo></foo>
> (1 row)
>
> regression=# select xmlserialize (content '<bar><val x="y">42</val></bar><foo></foo>' as text indent);
> ERROR: invalid XML document
> DETAIL: line 1: Extra content at the end of the document
> <bar><val x="y">42</val></bar><foo></foo>
> ^

I assumed it should fail because the XML string doesn't have a
singly-rooted XML. Oracle has this feature implemented and it does not
seem to allow non singly-rooted strings either[1]. Also, some the tools
I use also fail in this case[2,3]

How do you suggest the output should look like? Does the SQL spec also
define it? I can't find it online :(

> This is not what the documentation promises, and I don't think it's
> good enough --- the SQL spec has no restriction saying you can't
> use INDENT with CONTENT. I tried adjusting things so that we call
> xml_parse() with the appropriate DOCUMENT or CONTENT xmloption flag,
> but all that got me was empty output (except for a document header).
> It seems like xmlDocDumpFormatMemory is not the thing to use, at least
> not in the CONTENT case. But libxml2 has a few other "dump"
> functions, so maybe we can use a different one? I see we are using
> xmlNodeDump elsewhere, and that has a format option, so maybe there's
> a way forward there.
>
> A lesser issue is that INDENT tacks on a document header (XML declaration)
> whether there was one or not. I'm not sure whether that's an appropriate
> thing to do in the DOCUMENT case, but it sure seems weird in the CONTENT
> case. We have code that can strip off the header again, but we
> need to figure out exactly when to apply it.
I replaced xmlDocDumpFormatMemory with xmlSaveToBuffer and used to
option XML_SAVE_NO_DECL for input docs with XML declaration. It no
longer returns a XML declaration if the input doc does not contain it.
> I also suspect that it's outright broken to attach a header claiming
> the data is now in UTF8 encoding. If the database encoding isn't
> UTF8, then either that's a lie or we now have an encoding violation.
I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
uses the encoding of the given doc and UTF8 if not provided.
> Another thing that's mildly irking me is that the current
> factorization of this code will result in xml_parse'ing the data
> twice, if you have both DOCUMENT and INDENT specified. We could
> consider avoiding that if we merged the indentation functionality
> into xmltotext_with_xmloption, but it's probably premature to do so
> when we haven't figured out how to get the output right --- we might
> end up needing two xml_parse calls anyway with different parameters,
> perhaps.
>
> I also had a bunch of cosmetic complaints (mostly around this having
> a bad case of add-at-the-end-itis), which I've cleaned up in the
> attached v20. This doesn't address any of the above, however.
I swear to god I have no idea what "add-at-the-end-itis" means :)
> regards, tom lane

Thanks a lot!

Best, Jim

1 - https://dbfiddle.uk/WUOWtjBU

2 - https://www.samltool.com/prettyprint.php

3 - https://xmlpretty.com/xmlpretty

Attachment Content-Type Size
v21-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 29.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-10 14:32:56
Message-ID: 803304.1678458776@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> On 09.03.23 21:21, Tom Lane wrote:
>> I've looked through this now, and have some minor complaints and a major
>> one. The major one is that it doesn't work for XML that doesn't satisfy
>> IS DOCUMENT. For example,

> How do you suggest the output should look like?

I'd say a series of node trees, each starting on a separate line.

>> I also suspect that it's outright broken to attach a header claiming
>> the data is now in UTF8 encoding. If the database encoding isn't
>> UTF8, then either that's a lie or we now have an encoding violation.

> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
> uses the encoding of the given doc and UTF8 if not provided.

Mmmm .... doing this differently from what we do elsewhere does not
sound like the right path forward. The input *is* (or had better be)
in the database encoding.

regards, tom lane


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-13 12:08:11
Message-ID: efe0b19b-d41c-f8ab-f3b8-afb0108f3706@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10.03.23 15:32, Tom Lane wrote:
> Jim Jones<jim(dot)jones(at)uni-muenster(dot)de> writes:
>> On 09.03.23 21:21, Tom Lane wrote:
>>> I've looked through this now, and have some minor complaints and a major
>>> one. The major one is that it doesn't work for XML that doesn't satisfy
>>> IS DOCUMENT. For example,
>> How do you suggest the output should look like?
> I'd say a series of node trees, each starting on a separate line.

v22 attached enables the usage of INDENT with non singly-rooted xml.

postgres=# SELECT xmlserialize (CONTENT '<bar><val
x="y">42</val></bar><foo>73</foo>' AS text INDENT);
     xmlserialize
-----------------------
 <bar>                +
   <val x="y">42</val>+
 </bar>               +
 <foo>73</foo>
(1 row)

I tried several libxml2 dump functions and none of them could cope very
well with an xml string without a root node. So added them into a
temporary root node, so that I could iterate over its children and add
them one by one (formatted) into the output buffer.

I slightly modified the existing xml_parse() function to return the list
of nodes parsed by xmlParseBalancedChunkMemory:

xml_parse(text *data, XmlOptionType xmloption_arg, bool preserve_whitespace,
          int encoding, Node *escontext, *xmlNodePtr *parsed_nodes*)

res_code = xmlParseBalancedChunkMemory(doc, NULL, NULL, 0,
utf8string + count, *parsed_nodes*);

>> I was mistakenly calling xml_parse with GetDatabaseEncoding(). It now
>> uses the encoding of the given doc and UTF8 if not provided.
> Mmmm .... doing this differently from what we do elsewhere does not
> sound like the right path forward. The input *is* (or had better be)
> in the database encoding.
I changed that behavior. It now uses GetDatabaseEncoding();

Thanks!

Best, Jim

Attachment Content-Type Size
v22-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 35.3 KB

From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 12:58:28
Message-ID: 396547e5-1025-d839-e23a-d669b4881fd4@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09.03.23 21:21, Tom Lane wrote:
> Peter Smith <smithpb2250(at)gmail(dot)com> writes:
>> The patch v19 LGTM.
> Another thing that's mildly irking me is that the current
> factorization of this code will result in xml_parse'ing the data
> twice, if you have both DOCUMENT and INDENT specified. We could
> consider avoiding that if we merged the indentation functionality
> into xmltotext_with_xmloption, but it's probably premature to do so
> when we haven't figured out how to get the output right --- we might
> end up needing two xml_parse calls anyway with different parameters,
> perhaps.

Just a thought: since xmlserialize_indent also calls xml_parse() to
build the xmlDocPtr, couldn't we simply bypass
xmltotext_with_xmloption() in case of INDENT is specified?

Something like this:

diff --git a/src/backend/executor/execExprInterp.c
b/src/backend/executor/execExprInterp.c
index 19351fe..ea808dd 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -3829,6 +3829,7 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
        {
                Datum      *argvalue = op->d.xmlexpr.argvalue;
                bool       *argnull = op->d.xmlexpr.argnull;
+                               text       *result;

                /* argument type is known to be xml */
                Assert(list_length(xexpr->args) == 1);
@@ -3837,8 +3838,14 @@ ExecEvalXmlExpr(ExprState *state, ExprEvalStep *op)
                        return;
                value = argvalue[0];

-                               *op->resvalue =
PointerGetDatum(xmltotext_with_xmloption(DatumGetXmlP(value),
- xexpr->xmloption));
+                               if (xexpr->indent)
+                                       result =
xmlserialize_indent(DatumGetXmlP(value),
+ xexpr->xmloption);
+                               else
+                                       result =
xmltotext_with_xmloption(DatumGetXmlP(value),
+ xexpr->xmloption);
+
+                               *op->resvalue = PointerGetDatum(result);
                *op->resnull = false;
        }

        break;


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 17:40:25
Message-ID: 2752578.1678815625@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]

I poked at this for awhile and ran into a problem that I'm not sure
how to solve: it misbehaves for input with embedded DOCTYPE.

regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------
<!DOCTYPE a>+
<a></a> +

(1 row)

regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
xmlserialize
--------------

(1 row)

The bad result for CONTENT is because xml_parse() decides to
parse_as_document, but xmlserialize_indent has no idea that happened
and tries to use the content_nodes list anyway. I don't especially
care for the laissez faire "maybe we'll set *content_nodes and maybe
we won't" API you adopted for xml_parse, which seems to be contributing
to the mess. We could pass back more info so that xmlserialize_indent
knows what really happened. However, that won't fix the bogus output
for the DOCUMENT case. Are we perhaps passing incorrect flags to
xmlSaveToBuffer?

regards, tom lane


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 22:57:22
Message-ID: abd25443-ef6d-7b8a-c593-a2a991d3e5ce@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14.03.23 18:40, Tom Lane wrote:
> Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
>> [ v22-0001-Add-pretty-printed-XML-output-option.patch ]
> I poked at this for awhile and ran into a problem that I'm not sure
> how to solve: it misbehaves for input with embedded DOCTYPE.
>
> regression=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
> <!DOCTYPE a>+
> <a></a> +
>
> (1 row)

The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
to be serialized with start-end tag pairs. Removing it did the trick ...

postgres=# SELECT xmlserialize(DOCUMENT '<!DOCTYPE a><a/>' AS text INDENT);
 xmlserialize
--------------
 <!DOCTYPE a>+
 <a/>        +

(1 row)

... but as a side effect empty start-end tags will be now serialized as
empty elements

postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
INDENT);
 xmlserialize
--------------
 <foo>       +
   <bar/>    +
 </foo>
(1 row)

It seems to be the standard behavior of other xml indent tools
(including Oracle)

> regression=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' as text indent);
> xmlserialize
> --------------
>
> (1 row)
>
> The bad result for CONTENT is because xml_parse() decides to
> parse_as_document, but xmlserialize_indent has no idea that happened
> and tries to use the content_nodes list anyway. I don't especially
> care for the laissez faire "maybe we'll set *content_nodes and maybe
> we won't" API you adopted for xml_parse, which seems to be contributing
> to the mess. We could pass back more info so that xmlserialize_indent
> knows what really happened.

I added a new (nullable) parameter to the xml_parse function that will
return the actual XmlOptionType used to parse the xml data. Now
xmlserialize_indent knows how the data was really parsed:

postgres=# SELECT xmlserialize(CONTENT '<!DOCTYPE a><a/>' AS text INDENT);
 xmlserialize
--------------
 <!DOCTYPE a>+
 <a/>        +

(1 row)

I added test cases for these queries.

v23 attached.

Thanks!

Best, Jim

Attachment Content-Type Size
v23-0001-Add-pretty-printed-XML-output-option.patch text/x-patch 39.5 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-14 23:25:21
Message-ID: 3057762.1678836321@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim Jones <jim(dot)jones(at)uni-muenster(dot)de> writes:
> On 14.03.23 18:40, Tom Lane wrote:
>> I poked at this for awhile and ran into a problem that I'm not sure
>> how to solve: it misbehaves for input with embedded DOCTYPE.

> The issue was the flag XML_SAVE_NO_EMPTY. It was forcing empty elements
> to be serialized with start-end tag pairs. Removing it did the trick ...
> ... but as a side effect empty start-end tags will be now serialized as
> empty elements

> postgres=# SELECT xmlserialize(CONTENT '<foo><bar></bar></foo>' AS text
> INDENT);
>  xmlserialize
> --------------
>  <foo>       +
>    <bar/>    +
>  </foo>
> (1 row)

Huh, interesting. That is a legitimate pretty-fication of the input,
I suppose, but some people might think it goes beyond the charter of
"indentation". I'm okay with it personally; anyone want to object?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-15 21:13:36
Message-ID: 3622266.1678914816@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Huh, interesting. That is a legitimate pretty-fication of the input,
> I suppose, but some people might think it goes beyond the charter of
> "indentation". I'm okay with it personally; anyone want to object?

Hearing no objections to that, I moved ahead with this.

It occurred to me to test v23 for memory leaks, and it had bad ones:

* the "newline" node used in the CONTENT case never got freed.
Making another one for each line wasn't helping, either.

* libxml, at least in the 2.9.7 version I have here, turns out to
leak memory if you pass a non-null encoding to xmlSaveToBuffer.
But AFAICS we don't really need to do that, because the last thing
we want is for libxml to try to do any encoding conversion.

After cleaning that up, I saw that we were indeed doing essentially
duplicative xml_parse calls for the DOCUMENT check and the indentation
work, so I refactored to allow just one call to serve.

Pushed with those changes and some other cosmetic cleanup.
Thanks for working so hard on this!

(Now to keep an eye on the buildfarm, to see if other versions of
libxml work like mine ...)

BTW, the libxml leak problem seems to extend to other cases too.
I tested with code like

do $$
declare x xml; t text;
begin
x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
for i in 1..10000000 loop
t := xmlserialize(document x as text);
end loop;
raise notice 't = %', t;
end;
$$;

That case is fine, but if you change the encoding spec to "latin1",
it leaks like mad. That problem is not the fault of this patch,
I don't think. I wonder if we need to do something to prevent
libxml from seeing encoding declarations other than utf8?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)
Date: 2023-03-15 21:38:37
Message-ID: 3627896.1678916317@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like

> do $$
> declare x xml; t text;
> begin
> x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
> for i in 1..10000000 loop
> t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;

> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad. That problem is not the fault of this patch,
> I don't think. I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

After a bit of further testing: the leak is present in libxml2 2.9.7
which is what I have on this RHEL8 box, but it seems not to occur
in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
isn't carrying any relevant local patch).

So maybe it's worth working around that, or maybe it isn't.

regards, tom lane


From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>, Peter Smith <smithpb2250(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: Memory leak in libxml2 (was Re: [PATCH] Add pretty-printed XML output option)
Date: 2023-03-15 22:17:12
Message-ID: 07F05CE1-1D11-4471-8CB2-5CE3CF5A4915@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> On 15 Mar 2023, at 22:38, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> After a bit of further testing: the leak is present in libxml2 2.9.7
> which is what I have on this RHEL8 box, but it seems not to occur
> in libxml2 2.10.3 (tested on Fedora 37, and I verified that Fedora
> isn't carrying any relevant local patch).
>
> So maybe it's worth working around that, or maybe it isn't.

2.9.7 is from November 2017 and 2.10.3 is from October 2022, so depending on
when in that timespan the issue was fixed it might be in a release which will
be with us for quite some time. The lack of reports (that I was able to find)
indicate that it might be rare in production though?

--
Daniel Gustafsson


From: Jim Jones <jim(dot)jones(at)uni-muenster(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Nikolay Samokhvalov <samokhvalov(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andrey Borodin <amborodin86(at)gmail(dot)com>
Subject: Re: [PATCH] Add pretty-printed XML output option
Date: 2023-03-16 19:07:06
Message-ID: 1c7cf9f0-14c3-0341-6894-8ca577bf325e@uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15.03.23 22:13, Tom Lane wrote:
> I wrote:
> It occurred to me to test v23 for memory leaks, and it had bad ones:
> * the "newline" node used in the CONTENT case never got freed.
> Making another one for each line wasn't helping, either.
Oh, I did really miss that one. Thanks!
> Pushed with those changes and some other cosmetic cleanup.
> Thanks for working so hard on this!
Great! Thank you, Peter and Andrey for the very nice reviews.
> BTW, the libxml leak problem seems to extend to other cases too.
> I tested with code like
>
> do $$
> declare x xml; t text;
> begin
> x := '<?xml version="1.0" encoding="utf8"?><foo><bar><val>73</val></bar></foo>';
> for i in 1..10000000 loop
> t := xmlserialize(document x as text);
> end loop;
> raise notice 't = %', t;
> end;
> $$;
>
> That case is fine, but if you change the encoding spec to "latin1",
> it leaks like mad. That problem is not the fault of this patch,
> I don't think. I wonder if we need to do something to prevent
> libxml from seeing encoding declarations other than utf8?

In my environment (libxml2 v2.9.10 and Ubuntu 22.04) I couldn't
reproduce this memory leak. It's been most likely fixed in further
libxml2 versions. Unfortunately their gitlab page has no release notes
from versions prior to 2.9.13 :(

Best, Jim