[REVIEW] Re: Fix xpath() to return namespace definitions

Lists: pgsql-hackers
From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Fix xpath() to return namespace definitions
Date: 2014-05-30 09:04:33
Message-ID: CACQjQLo18s5Lpx9ngh5Qd1mhB4OukC12MzcjFwy0LQr2kw2DoQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all,

While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.

Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.

This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.

Note: This is my very first postgresql patch.

--
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix.patch text/x-patch 4.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix xpath() to return namespace definitions
Date: 2014-06-04 19:37:42
Message-ID: CA+TgmobJ1J+O-_VizGB2cht_pvVCgeqSXm189XDCyF-EAmHxPQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 30, 2014 at 5:04 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.
>
> Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
> definitions that declared in the node's parents. As per
> https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
> the behavior is intentional.
>
> This patch uses function xmlCopyNode that copies a node, including its
> namespace definitions as required (without unused namespace in the node or
> its children). When the copy dumped, the resulting XML is complete with its
> namespaces. Calling xmlCopyNode will need additional memory to execute, but
> reimplementing its routine to handle namespace definition will introduce
> much complexity to the code.
>
> Note: This is my very first postgresql patch.

Please add your patch here so we don't forget about it:

https://commitfest.postgresql.org/action/commitfest_view/open

Please see also:

https://wiki.postgresql.org/wiki/Submitting_a_Patch

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix xpath() to return namespace definitions
Date: 2014-06-04 22:35:14
Message-ID: CACQjQLr=vxvrU2JqYBg7v7r=hjMOKn9_n-z-ZWRKVOzHa47R-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-06-05 2:37 GMT+07:00 Robert Haas <robertmhaas(at)gmail(dot)com>:

> Please add your patch here so we don't forget about it:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>

Added: https://commitfest.postgresql.org/action/patch_view?id=1461

> Please see also:
>
> https://wiki.postgresql.org/wiki/Submitting_a_Patch
>

Thanks, i've read it.
As for the suggestion there: "Start out by reviewing a patch or responding
to email on the lists. Even if it is unrelated to what you're doing", i'll
look around other posts in this list.

For back versions, i think because this patch changes xpath() behavior, we
will only apply this to future versions. The old behavior is wrong
(according to XPath standard) for not including namespaces, but maybe there
are some application that depends on the old behavior.

Thanks!
--
Ali Akbar


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-06-16 15:52:05
Message-ID: 20140616155205.GB14090@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-05-30 16:04:33 +0700, the(dot)apaan(at)gmail(dot)com wrote:
>
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.

Thanks for the patch, and welcome to Postgres development.

I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).

I'm marking this ready for committer.

-- Abhijit

Attachment Content-Type Size
xpath-ns-fix-2.patch text/x-diff 4.0 KB

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-06-22 14:32:00
Message-ID: CACQjQLoi48pc_jq8sCMt6yR-4w8cuy62sG12MsiUDBu_swzQDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>:

> Thanks for the patch, and welcome to Postgres development.
>

> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).
>
> I'm marking this ready for committer.
>

Thanks for the review. Hope i will be able to contribute a little here and
there in the future.

--
Ali Akbar


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-07-07 23:25:06
Message-ID: 13665.1404775506@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).

> I'm marking this ready for committer.

I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is

extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)

Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)

I also notice that it says

Returns: a new #xmlNodePtr, or NULL in case of error.

and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.

regards, tom lane


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-07-08 10:03:19
Message-ID: CACQjQLoO++VJe27YNZzAnzmpzeJW3ga9Un20KtpQ4WOpj-Ho3A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I don't know enough about XML/XPATH to know if this is a good idea or not,
>

Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.

> but I did go read the documentation of xmlCopyNode(), and I notice it says
> the second argument is
>
> extended: if 1 do a recursive copy (properties, namespaces and children
> when applicable) if 2 copy properties and namespaces (when applicable)
>
> Do we really want it to "copy children"? Perhaps the value should be 2?
> (I have no idea, I'm just raising the question.)
>

If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.

Just to be safe, in the patch 1 passed as the second argument.

Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?

I also notice that it says
>
> Returns: a new #xmlNodePtr, or NULL in case of error.
>
> and the patch is failing to cover the error case. Most likely, that
> would represent out-of-memory, so it definitely seems to be something
> we should account for.
>

Ah, overlooked that.

Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.

Thanks for the review.

--
Ali Akbar


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-07-11 08:36:55
Message-ID: CACQjQLr0AsSTz9m84nHXwTX2=WrYE6++C2gy-CRoz3G8D19wTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).

From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.

But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.

Knowing this but in libxml2 code, is this patch is still acceptable?

Thanks,
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix-3.patch text/x-diff 7.1 KB

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-08-29 13:25:18
Message-ID: CACQjQLqykR2M832uLFO1FnhrXpSzEvJ7f1JLrnXJJ_-Uw25dFQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greetings,

Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).

Thanks,
Ali Akbar

2014-07-11 15:36 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:

> Greetings,
>
> Attached modified patch to handle xmlCopyNode returning NULL. The patch is
> larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
> is needed for calling xml_ereport).
>
> From libxml2 source, it turns out that the other cases that xmlCopyNode
> will return NULL will not happen. So in this patch i assume that the only
> case is memory exhaustion.
>
> But i found some bug in libxml2's code, because we call xmlCopyNode with 1
> as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
> recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
> check the return of xmlStaticCopyNode whether it's NULL. So if someday the
> recursion returns NULL (maybe because of memory exhaustion), it will
> SEGFAULT.
>
> Knowing this but in libxml2 code, is this patch is still acceptable?
>
> Thanks,
> Ali Akbar
>

--
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix-4.patch text/x-patch 9.5 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-04 15:13:37
Message-ID: 5458ED21.7050406@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/11/14 4:36 AM, Ali Akbar wrote:
> But i found some bug in libxml2's code, because we call xmlCopyNode with
> 1 as the second parameter, internally xmlCopyNode will call
> xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
> xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
> whether it's NULL. So if someday the recursion returns NULL (maybe
> because of memory exhaustion), it will SEGFAULT.
>
> Knowing this but in libxml2 code, is this patch is still acceptable?

This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-04 15:14:40
Message-ID: 5458ED60.1080600@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7/8/14 6:03 AM, Ali Akbar wrote:
> If we put 1 as the parameter, the resulting Node will link to the
> existing children. In this case, the namespace problem for the children
> might be still exists. If any of the children uses different
> namespace(s) than the parent, the namespace definition will not be
> copied correctly.

This analysis might very well be right, but I think we should prove it
with a test case.