Lists: | pgsql-hackers |
---|
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Fix xpath() to return namespace definitions |
Date: | 2014-05-30 09:04:33 |
Message-ID: | CACQjQLo18s5Lpx9ngh5Qd1mhB4OukC12MzcjFwy0LQr2kw2DoQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi all,
While developing some XML processing queries, i stumbled on an old bug
mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
repeated xpath() that apparently mess up namespaces.
Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
definitions that declared in the node's parents. As per
https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
the behavior is intentional.
This patch uses function xmlCopyNode that copies a node, including its
namespace definitions as required (without unused namespace in the node or
its children). When the copy dumped, the resulting XML is complete with its
namespaces. Calling xmlCopyNode will need additional memory to execute, but
reimplementing its routine to handle namespace definition will introduce
much complexity to the code.
Note: This is my very first postgresql patch.
--
Ali Akbar
Attachment | Content-Type | Size |
---|---|---|
xpath-ns-fix.patch | text/x-patch | 4.0 KB |
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix xpath() to return namespace definitions |
Date: | 2014-06-04 19:37:42 |
Message-ID: | CA+TgmobJ1J+O-_VizGB2cht_pvVCgeqSXm189XDCyF-EAmHxPQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Fri, May 30, 2014 at 5:04 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.
>
> Source of the bug is that libxml2's xmlNodeDump won't output XML namespace
> definitions that declared in the node's parents. As per
> https://bug639036.bugzilla-attachments.gnome.org/attachment.cgi?id=177858,
> the behavior is intentional.
>
> This patch uses function xmlCopyNode that copies a node, including its
> namespace definitions as required (without unused namespace in the node or
> its children). When the copy dumped, the resulting XML is complete with its
> namespaces. Calling xmlCopyNode will need additional memory to execute, but
> reimplementing its routine to handle namespace definition will introduce
> much complexity to the code.
>
> Note: This is my very first postgresql patch.
Please add your patch here so we don't forget about it:
https://commitfest.postgresql.org/action/commitfest_view/open
Please see also:
https://wiki.postgresql.org/wiki/Submitting_a_Patch
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Fix xpath() to return namespace definitions |
Date: | 2014-06-04 22:35:14 |
Message-ID: | CACQjQLr=vxvrU2JqYBg7v7r=hjMOKn9_n-z-ZWRKVOzHa47R-A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
2014-06-05 2:37 GMT+07:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
> Please add your patch here so we don't forget about it:
>
> https://commitfest.postgresql.org/action/commitfest_view/open
>
Added: https://commitfest.postgresql.org/action/patch_view?id=1461
> Please see also:
>
> https://wiki.postgresql.org/wiki/Submitting_a_Patch
>
Thanks, i've read it.
As for the suggestion there: "Start out by reviewing a patch or responding
to email on the lists. Even if it is unrelated to what you're doing", i'll
look around other posts in this list.
For back versions, i think because this patch changes xpath() behavior, we
will only apply this to future versions. The old behavior is wrong
(according to XPath standard) for not including namespaces, but maybe there
are some application that depends on the old behavior.
Thanks!
--
Ali Akbar
From: | Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> |
---|---|
To: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-06-16 15:52:05 |
Message-ID: | 20140616155205.GB14090@toroid.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
At 2014-05-30 16:04:33 +0700, the(dot)apaan(at)gmail(dot)com wrote:
>
> While developing some XML processing queries, i stumbled on an old bug
> mentioned in http://wiki.postgresql.org/wiki/Todo#XML: Fix Nested or
> repeated xpath() that apparently mess up namespaces.
Thanks for the patch, and welcome to Postgres development.
I can confirm that it works fine. I have attached here a very slightly
tweaked version of the patch (removed trailing whitespace, and changed
some comment text).
I'm marking this ready for committer.
-- Abhijit
Attachment | Content-Type | Size |
---|---|---|
xpath-ns-fix-2.patch | text/x-diff | 4.0 KB |
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> |
Cc: | "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-06-22 14:32:00 |
Message-ID: | CACQjQLoi48pc_jq8sCMt6yR-4w8cuy62sG12MsiUDBu_swzQDA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
2014-06-16 22:52 GMT+07:00 Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>:
> Thanks for the patch, and welcome to Postgres development.
>
> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).
>
> I'm marking this ready for committer.
>
Thanks for the review. Hope i will be able to contribute a little here and
there in the future.
--
Ali Akbar
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> |
Cc: | Ali Akbar <the(dot)apaan(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-07-07 23:25:06 |
Message-ID: | 13665.1404775506@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com> writes:
> I can confirm that it works fine. I have attached here a very slightly
> tweaked version of the patch (removed trailing whitespace, and changed
> some comment text).
> I'm marking this ready for committer.
I don't know enough about XML/XPATH to know if this is a good idea or not,
but I did go read the documentation of xmlCopyNode(), and I notice it says
the second argument is
extended: if 1 do a recursive copy (properties, namespaces and children
when applicable) if 2 copy properties and namespaces (when applicable)
Do we really want it to "copy children"? Perhaps the value should be 2?
(I have no idea, I'm just raising the question.)
I also notice that it says
Returns: a new #xmlNodePtr, or NULL in case of error.
and the patch is failing to cover the error case. Most likely, that
would represent out-of-memory, so it definitely seems to be something
we should account for.
regards, tom lane
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-07-08 10:03:19 |
Message-ID: | CACQjQLoO++VJe27YNZzAnzmpzeJW3ga9Un20KtpQ4WOpj-Ho3A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> I don't know enough about XML/XPATH to know if this is a good idea or not,
>
Actually currently because of the namespace problem, xpath() returns wrong
result (even worse: invalid xml!). So this patch corrects the behavior of
xpath() to the correct one.
> but I did go read the documentation of xmlCopyNode(), and I notice it says
> the second argument is
>
> extended: if 1 do a recursive copy (properties, namespaces and children
> when applicable) if 2 copy properties and namespaces (when applicable)
>
> Do we really want it to "copy children"? Perhaps the value should be 2?
> (I have no idea, I'm just raising the question.)
>
If we put 1 as the parameter, the resulting Node will link to the existing
children. In this case, the namespace problem for the children might be
still exists. If any of the children uses different namespace(s) than the
parent, the namespace definition will not be copied correctly.
Just to be safe, in the patch 1 passed as the second argument.
Ideally, we can traverse the Node object and generate XML accordingly, but
it is a lot of work, and a lot of duplicating libxml's code. Maybe we
should push this upstream to libxml?
I also notice that it says
>
> Returns: a new #xmlNodePtr, or NULL in case of error.
>
> and the patch is failing to cover the error case. Most likely, that
> would represent out-of-memory, so it definitely seems to be something
> we should account for.
>
Ah, overlooked that.
Skimming through libxml2's source, it looks like there are some other cases
beside out-of-memory. Will update the patch according to these cases.
Thanks for the review.
--
Ali Akbar
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-07-11 08:36:55 |
Message-ID: | CACQjQLr0AsSTz9m84nHXwTX2=WrYE6++C2gy-CRoz3G8D19wTg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Greetings,
Attached modified patch to handle xmlCopyNode returning NULL. The patch is
larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
is needed for calling xml_ereport).
From libxml2 source, it turns out that the other cases that xmlCopyNode
will return NULL will not happen. So in this patch i assume that the only
case is memory exhaustion.
But i found some bug in libxml2's code, because we call xmlCopyNode with 1
as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
check the return of xmlStaticCopyNode whether it's NULL. So if someday the
recursion returns NULL (maybe because of memory exhaustion), it will
SEGFAULT.
Knowing this but in libxml2 code, is this patch is still acceptable?
Thanks,
Ali Akbar
Attachment | Content-Type | Size |
---|---|---|
xpath-ns-fix-3.patch | text/x-diff | 7.1 KB |
From: | Ali Akbar <the(dot)apaan(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-08-29 13:25:18 |
Message-ID: | CACQjQLqykR2M832uLFO1FnhrXpSzEvJ7f1JLrnXJJ_-Uw25dFQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Greetings,
Because of the memory bug in xmlCopyNode, this is a new patch with
different method. In this patch, instead of using xmlCopyNode to bring the
namespace back, we added the required namespaces to the node before
dumping the node to string, and cleaning it up afterwards (because the same
node could be dumped again in next xpath result).
Thanks,
Ali Akbar
2014-07-11 15:36 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
> Greetings,
>
> Attached modified patch to handle xmlCopyNode returning NULL. The patch is
> larger because xmlerrcxt must be passed to xml_xmlnodetoxmltype (xmlerrcxt
> is needed for calling xml_ereport).
>
> From libxml2 source, it turns out that the other cases that xmlCopyNode
> will return NULL will not happen. So in this patch i assume that the only
> case is memory exhaustion.
>
> But i found some bug in libxml2's code, because we call xmlCopyNode with 1
> as the second parameter, internally xmlCopyNode will call xmlStaticCopyNode
> recursively via xmlStaticCopyNodeList. And xmlStaticCopyNodeList doesn't
> check the return of xmlStaticCopyNode whether it's NULL. So if someday the
> recursion returns NULL (maybe because of memory exhaustion), it will
> SEGFAULT.
>
> Knowing this but in libxml2 code, is this patch is still acceptable?
>
> Thanks,
> Ali Akbar
>
--
Ali Akbar
Attachment | Content-Type | Size |
---|---|---|
xpath-ns-fix-4.patch | text/x-patch | 9.5 KB |
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-11-04 15:13:37 |
Message-ID: | 5458ED21.7050406@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 7/11/14 4:36 AM, Ali Akbar wrote:
> But i found some bug in libxml2's code, because we call xmlCopyNode with
> 1 as the second parameter, internally xmlCopyNode will call
> xmlStaticCopyNode recursively via xmlStaticCopyNodeList. And
> xmlStaticCopyNodeList doesn't check the return of xmlStaticCopyNode
> whether it's NULL. So if someday the recursion returns NULL (maybe
> because of memory exhaustion), it will SEGFAULT.
>
> Knowing this but in libxml2 code, is this patch is still acceptable?
This problem was fixed in libxml2 2.9.2 (see
https://bugzilla.gnome.org/show_bug.cgi?id=708681).
From: | Peter Eisentraut <peter_e(at)gmx(dot)net> |
---|---|
To: | Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [REVIEW] Re: Fix xpath() to return namespace definitions |
Date: | 2014-11-04 15:14:40 |
Message-ID: | 5458ED60.1080600@gmx.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On 7/8/14 6:03 AM, Ali Akbar wrote:
> If we put 1 as the parameter, the resulting Node will link to the
> existing children. In this case, the namespace problem for the children
> might be still exists. If any of the children uses different
> namespace(s) than the parent, the namespace definition will not be
> copied correctly.
This analysis might very well be right, but I think we should prove it
with a test case.