Re: [REVIEW] Re: Fix xpath() to return namespace definitions

Lists: pgsql-hackers
From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-10-07 02:24:42
Message-ID: CACQjQLpzAsgetKJs2c5OvSWXMza0W=J2ZMkCf+nvdz4UtFpgNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

While reviewing the patch myself, i spotted some formatting problems in the
code. Fixed in this v5 patch.

Also, this patch uses context patch format (in first versions, because of
the small modification, context patch format obfucates the changes. After
reimplementation this isn't the case anymore)

Thanks,
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix-5.patch text/x-diff 10.7 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-04 15:16:38
Message-ID: 5458EDD6.1030701@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/6/14 10:24 PM, Ali Akbar wrote:
> While reviewing the patch myself, i spotted some formatting problems in
> the code. Fixed in this v5 patch.
>
> Also, this patch uses context patch format (in first versions, because
> of the small modification, context patch format obfucates the changes.
> After reimplementation this isn't the case anymore)

I think the problem this patch is addressing is real, and your approach
is sound, but I'd ask you to go back to the xmlCopyNode() version, and
try to add a test case for why the second argument = 1 is necessary. I
don't see any other problems.


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-05 10:47:55
Message-ID: CACQjQLqGS2ua_2W816Fkqq9a202KPP04G4GTTpyo1WCJDjZSvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-11-04 22:16 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:

> On 10/6/14 10:24 PM, Ali Akbar wrote:
> > While reviewing the patch myself, i spotted some formatting problems in
> > the code. Fixed in this v5 patch.
> >
> > Also, this patch uses context patch format (in first versions, because
> > of the small modification, context patch format obfucates the changes.
> > After reimplementation this isn't the case anymore)
>
> I think the problem this patch is addressing is real, and your approach
> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
> try to add a test case for why the second argument = 1 is necessary. I
> don't see any other problems.
>

OK. Because upstream code is fixed in current version, i'll revert to the
previous version. Test case added to regression test. With =1 argument, the
result is correct:
<local:piece xmlns:local=\"http://127.0.0.1\" xmlns=\"http://127.0.0.2\"
id=\"1\">
<internal>number one</internal>
<internal2/>
</local:piece>

without the argument, the result is not correct, all children will be lost.
Because of that, the other regression test will fail too because the
children is not copied:
*** 584,593 ****

-- Text XPath expressions evaluation
SELECT xpath('/value', data) FROM xmltest;
! xpath
! ----------------------
! {<value>one</value>}
! {<value>two</value>}
(2 rows)

SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
--- 584,593 ----

-- Text XPath expressions evaluation
SELECT xpath('/value', data) FROM xmltest;
! xpath
! ------------
! {<value/>}
! {<value/>}
(2 rows)

SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
***************
... <cut>

updated patch attached.

Regards,
--
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix-6.patch text/x-diff 7.9 KB

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-05 14:50:20
Message-ID: CACQjQLri_Dcs7P8+wm_o1uxsxKvGQxJQMZ8OeVhhkoxthQFjhA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-11-04 22:16 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:

>
> On 10/6/14 10:24 PM, Ali Akbar wrote:
>> > While reviewing the patch myself, i spotted some formatting problems in
>> > the code. Fixed in this v5 patch.
>> >
>> > Also, this patch uses context patch format (in first versions, because
>> > of the small modification, context patch format obfucates the changes.
>> > After reimplementation this isn't the case anymore)
>>
>> I think the problem this patch is addressing is real, and your approach
>> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
>> try to add a test case for why the second argument = 1 is necessary. I
>> don't see any other problems.
>>
>
> OK. Because upstream code is fixed in current version, i'll revert to the
> previous version. Test case added to regression test. With =1 argument, the
> result is correct:
> <local:piece xmlns:local=\"http://127.0.0.1\" xmlns=\"http://127.0.0.2\"
> id=\"1\">
> <internal>number one</internal>
> <internal2/>
> </local:piece>
>
> without the argument, the result is not correct, all children will be
> lost. Because of that, the other regression test will fail too because the
> children is not copied:
> *** 584,593 ****
>
> -- Text XPath expressions evaluation
> SELECT xpath('/value', data) FROM xmltest;
> ! xpath
> ! ----------------------
> ! {<value>one</value>}
> ! {<value>two</value>}
> (2 rows)
>
> SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
> --- 584,593 ----
>
> -- Text XPath expressions evaluation
> SELECT xpath('/value', data) FROM xmltest;
> ! xpath
> ! ------------
> ! {<value/>}
> ! {<value/>}
> (2 rows)
>
> SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
> ***************
> ... <cut>
>
> updated patch attached.
>

I noticed somewhat big performance regression if the xml is large (i use
PRODML Product Volume sample from energistics.org):
* Without patch (tested 3 times):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
-----------------------------------------------------------------------------------------------
<flow>
+
<kind>gas
lift</kind> +
...
Time: 84,012 ms
Time: 85,683 ms
Time: 88,766 ms

* With latest v6 patch (notice the correct result with namespace
definition):

select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
-----------------------------------------------------------------------------------------------
<flow xmlns="http://www.prodml.org/schemas/1series">
+
...
Time: 108,912 ms
Time: 108,267 ms
Time: 114,848 ms

It's 23% performance regression.

* Just curious, i'm also testing v5 patch performance (notice the namespace
in the result):
select unnest(xpath('//a:flow', x, ARRAY[['a','
http://www.prodml.org/schemas/1series']])) from u;

unnest
-----------------------------------------------------------------------------------------------
<flow xmlns="http://www.prodml.org/schemas/1series">
+
<kind>gas
lift</kind> +
Time: 92,552 ms
Time: 97,440 ms
Time: 99,309 ms

The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is
much more cleaner than v5patch, should we consider the performance benefit?

Anyway, thanks for the review! :)

Regards,
--
Ali Akbar


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-11-26 00:04:25
Message-ID: CACQjQLqK4jJGabXRONr+742NacAxynq2q3XKewo7GvqDGDNpHw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-11-05 21:50 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:

> 2014-11-04 22:16 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:
>
>> I think the problem this patch is addressing is real, and your approach
>>> is sound, but I'd ask you to go back to the xmlCopyNode() version, and
>>> try to add a test case for why the second argument = 1 is necessary. I
>>> don't see any other problems.
>>>
>>
>> OK. Because upstream code is fixed in current version, i'll revert to the
>> previous version. Test case added to regression test. With =1 argument, the
>> result is correct:
>> <local:piece xmlns:local=\"http://127.0.0.1\" xmlns=\"http://127.0.0.2\"
>> id=\"1\">
>> <internal>number one</internal>
>> <internal2/>
>> </local:piece>
>>
>> without the argument, the result is not correct, all children will be
>> lost. Because of that, the other regression test will fail too because the
>> children is not copied:
>> *** 584,593 ****
>>
>> -- Text XPath expressions evaluation
>> SELECT xpath('/value', data) FROM xmltest;
>> ! xpath
>> ! ----------------------
>> ! {<value>one</value>}
>> ! {<value>two</value>}
>> (2 rows)
>>
>> SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
>> --- 584,593 ----
>>
>> -- Text XPath expressions evaluation
>> SELECT xpath('/value', data) FROM xmltest;
>> ! xpath
>> ! ------------
>> ! {<value/>}
>> ! {<value/>}
>> (2 rows)
>>
>> SELECT xpath(NULL, NULL) IS NULL FROM xmltest;
>> ***************
>> ... <cut>
>>
>> updated patch attached.
>>
>
> I noticed somewhat big performance regression if the xml is large (i use
> PRODML Product Volume sample from energistics.org):
> * Without patch (tested 3 times):
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> -----------------------------------------------------------------------------------------------
> <flow>
> +
> <kind>gas
> lift</kind> +
> ...
> Time: 84,012 ms
> Time: 85,683 ms
> Time: 88,766 ms
>
>
> * With latest v6 patch (notice the correct result with namespace
> definition):
>
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> -----------------------------------------------------------------------------------------------
> <flow xmlns="http://www.prodml.org/schemas/1series">
> +
> ...
> Time: 108,912 ms
> Time: 108,267 ms
> Time: 114,848 ms
>
>
> It's 23% performance regression.
>
> * Just curious, i'm also testing v5 patch performance (notice the
> namespace in the result):
> select unnest(xpath('//a:flow', x, ARRAY[['a','
> http://www.prodml.org/schemas/1series']])) from u;
>
> unnest
>
> -----------------------------------------------------------------------------------------------
> <flow xmlns="http://www.prodml.org/schemas/1series">
> +
> <kind>gas
> lift</kind> +
> Time: 92,552 ms
> Time: 97,440 ms
> Time: 99,309 ms
>
> The regression is only 13%. I know the xmlCopyNode() version (v6 patch) is
> much more cleaner than v5patch, should we consider the performance benefit?
>
> Anyway, thanks for the review! :)
>

commit bac2739 in master by Tom Lane changes *astate definition
in xml_xpathobjtoxmlarray, this attached v7 patch rebases v6 patch with
latest master.

For performance comparison, i also rebased the v5 patch attached with name
v5-141126.patch

--
Ali Akbar

Attachment Content-Type Size
xpath-ns-fix-5-141126.patch text/x-patch 10.6 KB
xpath-ns-fix-7.patch text/x-patch 7.9 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-11 14:51:45
Message-ID: 5489AF81.8070709@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/5/14 9:50 AM, Ali Akbar wrote:
> I noticed somewhat big performance regression if the xml is large (i use
> PRODML Product Volume sample from energistics.org <http://energistics.org>):
> * Without patch (tested 3 times):
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 84,012 ms
> Time: 85,683 ms
> Time: 88,766 ms

> * With latest v6 patch (notice the correct result with namespace
> definition):
>
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 108,912 ms
> Time: 108,267 ms
> Time: 114,848 ms
>
>
> It's 23% performance regression.
>
> * Just curious, i'm also testing v5 patch performance (notice the
> namespace in the result):
> select unnest(xpath('//a:flow', x,
> ARRAY[['a','http://www.prodml.org/schemas/1series']])) from u;

> Time: 92,552 ms
> Time: 97,440 ms
> Time: 99,309 ms
>
> The regression is only 13%. I know the xmlCopyNode() version (v6 patch)
> is much more cleaner than v5patch, should we consider the performance
> benefit?

I ran a test using postgres-US.fo built in the PostgreSQL source tree,
which is 38 MB, and ran

select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
'http://www.w3.org/1999/XSL/Format']])) from data;

(Table contains one row only.)

The timings were basically indistinguishable between the three code
versions.

I'll try to reproduce your test. How big is your file? Do you have a
link to the actual file? Could you share your load script?


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-14 15:18:44
Message-ID: CACQjQLri-53oE_2FMWLmsAyi1Zuzb4uGbzHQPyVNjEXQ2n8q2A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> I ran a test using postgres-US.fo built in the PostgreSQL source tree,
> which is 38 MB, and ran
>
> select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
> 'http://www.w3.org/1999/XSL/Format']])) from data;
>
> (Table contains one row only.)
>
> The timings were basically indistinguishable between the three code
> versions.
>
> I'll try to reproduce your test. How big is your file? Do you have a
> link to the actual file? Could you share your load script?
>

I use this xml sample:
http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml

Basically i loaded the xml to table u 100 times. Load script attached.

Regards,
--
Ali Akbar

Attachment Content-Type Size
load_test.sql application/sql 49.3 KB

From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 00:05:26
Message-ID: CACQjQLox2XyUS6+raDjjXtb+Cunb2Fu7YON3mpGX=gnoNUx4nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-12-14 22:18 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
>
>
> I ran a test using postgres-US.fo built in the PostgreSQL source tree,
>> which is 38 MB, and ran
>>
>> select unnest(xpath('//fo:bookmark-title', b, array[array['fo',
>> 'http://www.w3.org/1999/XSL/Format']])) from data;
>>
>> (Table contains one row only.)
>>
>> The timings were basically indistinguishable between the three code
>> versions.
>>
>> I'll try to reproduce your test. How big is your file? Do you have a
>> link to the actual file? Could you share your load script?
>>
>
> I use this xml sample:
> http://w3.energistics.org/schema/prodml_v1.2.0_data/XML_Examples/productVolume_no_xsl.xml
>
> Basically i loaded the xml to table u 100 times. Load script attached.
>

Peter, while reviewing the better performing patch myself, now i think the
patch needs more work to be committed. The structuring of the method will
be confusing in the long term. I think i'll restructure the patch in the
next commitfest.

So i propose to break the patch:
1. We apply the current patch which uses xmlNodeCopy, so that the
long-standing bug will be fixed in postgres.
2. I'll work with the performance enhancement in the next commitfest.

Maybe for (2), the current better-performing patch can be viewed as PoC of
the expected performance.

Regards,
--
Ali Akbar


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 03:19:15
Message-ID: CAB7nPqSLcjzBhN0=BnEMHi0j5kLh=bPFRVLd8qZHV2kjzOL41Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> Peter, while reviewing the better performing patch myself, now i think the
> patch needs more work to be committed. The structuring of the method will be
> confusing in the long term. I think I'll restructure the patch in the next
> commitfest.
> So i propose to break the patch:
> 1. We apply the current patch which uses xmlNodeCopy, so that the
> long-standing bug will be fixed in postgres.
> 2. I'll work with the performance enhancement in the next commitfest.
>
> Maybe for (2), the current better-performing patch can be viewed as PoC of
> the expected performance.

Ali, are you currently working on that? Would you mind re-creating new
entries in the commit fest app for the new set of patches that you are
planning to do?
For now I am switching this patch as returned with feedback.
Thanks,
--
Michael


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 04:02:17
Message-ID: CACQjQLopnbKNQT4eHxYu+zcmgzD-2+DZ7s9isMo3-X_ifEG+UQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-12-15 10:19 GMT+07:00 Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
>
> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> > Peter, while reviewing the better performing patch myself, now i think
> the
> > patch needs more work to be committed. The structuring of the method
> will be
> > confusing in the long term. I think I'll restructure the patch in the
> next
> > commitfest.
> > So i propose to break the patch:
> > 1. We apply the current patch which uses xmlNodeCopy, so that the
> > long-standing bug will be fixed in postgres.
> > 2. I'll work with the performance enhancement in the next commitfest.
> >
> > Maybe for (2), the current better-performing patch can be viewed as PoC
> of
> > the expected performance.
>
> Ali, are you currently working on that? Would you mind re-creating new
> entries in the commit fest app for the new set of patches that you are
> planning to do?
> For now I am switching this patch as returned with feedback.
> Thanks,
>

What i mean, the last patch (v7 patch) as it is is enough to fix the bug
(nested xpath namespace problem). I think the performance regression is
still acceptable (in my case it's ~20%), because the bug is severe.
Currently, xpath can return invalid xml because the namespace isn't
included in the output!

What i'll be working is the v4 patch, because it turns out the v4 patch has
better performance (~10%, but Peter's test shows it isn't the case). But,
the problem is the v4 patch is organized wrongly, and hacks around the
libxml's xml node structure (duplicating the namespace on the existing
structure). I'll work on that, but it will affects the performance benefit.

So what i propose is, we close the longstanding bug in this comitfest with
the v7 patch. I'll work on improving the performance, without compromising
good code structure. If the result is good, i'll submit the patch.

Thanks

Regards,
--
Ali Akbar


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 04:06:19
Message-ID: CACQjQLoTXG8C0-VFThsTP9pN76twzaGwC-QiBkn-WRnLoxEAzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-12-15 11:02 GMT+07:00 Ali Akbar <the(dot)apaan(at)gmail(dot)com>:
>
> 2014-12-15 10:19 GMT+07:00 Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
>>
>> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
>> > Peter, while reviewing the better performing patch myself, now i think
>> the
>> > patch needs more work to be committed. The structuring of the method
>> will be
>> > confusing in the long term. I think I'll restructure the patch in the
>> next
>> > commitfest.
>> > So i propose to break the patch:
>> > 1. We apply the current patch which uses xmlNodeCopy, so that the
>> > long-standing bug will be fixed in postgres.
>> > 2. I'll work with the performance enhancement in the next commitfest.
>> >
>> > Maybe for (2), the current better-performing patch can be viewed as PoC
>> of
>> > the expected performance.
>>
>> Ali, are you currently working on that? Would you mind re-creating new
>> entries in the commit fest app for the new set of patches that you are
>> planning to do?
>> For now I am switching this patch as returned with feedback.
>> Thanks,
>>
>
> What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> (nested xpath namespace problem). I think the performance regression is
> still acceptable (in my case it's ~20%), because the bug is severe.
> Currently, xpath can return invalid xml because the namespace isn't
> included in the output!
>

Sorry, typo here. What i mean isn't "invalid xml", but "incomplete xml".
Hence the next call to xpath with the previous result as its input will
become a problem because the namespace will not match.

>
> What i'll be working is the v4 patch, because it turns out the v4 patch
> has better performance (~10%, but Peter's test shows it isn't the case).
> But, the problem is the v4 patch is organized wrongly, and hacks around the
> libxml's xml node structure (duplicating the namespace on the existing
> structure). I'll work on that, but it will affects the performance benefit.
>
> So what i propose is, we close the longstanding bug in this comitfest with
> the v7 patch. I'll work on improving the performance, without compromising
> good code structure. If the result is good, i'll submit the patch.
>
> Thanks
>

Regards,
--
Ali Akbar


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 04:06:56
Message-ID: CAB7nPqS2tTPhytzK7DJ=QnfZuDU0eC50k594Jm5tY2gNPdxFqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> 2014-12-15 10:19 GMT+07:00 Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
>>
>> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
>> > Peter, while reviewing the better performing patch myself, now i think
>> > the
>> > patch needs more work to be committed. The structuring of the method
>> > will be
>> > confusing in the long term. I think I'll restructure the patch in the
>> > next
>> > commitfest.
>> > So i propose to break the patch:
>> > 1. We apply the current patch which uses xmlNodeCopy, so that the
>> > long-standing bug will be fixed in postgres.
>> > 2. I'll work with the performance enhancement in the next commitfest.
>> >
>> > Maybe for (2), the current better-performing patch can be viewed as PoC
>> > of
>> > the expected performance.
>>
>> Ali, are you currently working on that? Would you mind re-creating new
>> entries in the commit fest app for the new set of patches that you are
>> planning to do?
>> For now I am switching this patch as returned with feedback.
>> Thanks,
>
>
> What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> (nested xpath namespace problem). I think the performance regression is
> still acceptable (in my case it's ~20%), because the bug is severe.
> Currently, xpath can return invalid xml because the namespace isn't included
> in the output!
>
> What i'll be working is the v4 patch, because it turns out the v4 patch has
> better performance (~10%, but Peter's test shows it isn't the case). But,
> the problem is the v4 patch is organized wrongly, and hacks around the
> libxml's xml node structure (duplicating the namespace on the existing
> structure). I'll work on that, but it will affects the performance benefit.
>
> So what i propose is, we close the longstanding bug in this comitfest with
> the v7 patch. I'll work on improving the performance, without compromising
> good code structure. If the result is good, i'll submit the patch.
OK. Could you then move this patch to the new CF with Needs Review
with Peter as reviewer then? He seems to be looking at it anyway
seeing the update from 12/11.
--
Michael


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2014-12-15 04:11:10
Message-ID: CACQjQLoE=4nBX9+iVY2S0T50k3F7j7nM7xMoAvWDY462sLfS2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-12-15 11:06 GMT+07:00 Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
>
> On Mon, Dec 15, 2014 at 1:02 PM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> > 2014-12-15 10:19 GMT+07:00 Michael Paquier <michael(dot)paquier(at)gmail(dot)com>:
> >>
> >> On Mon, Dec 15, 2014 at 9:05 AM, Ali Akbar <the(dot)apaan(at)gmail(dot)com> wrote:
> >> > Peter, while reviewing the better performing patch myself, now i think
> >> > the
> >> > patch needs more work to be committed. The structuring of the method
> >> > will be
> >> > confusing in the long term. I think I'll restructure the patch in the
> >> > next
> >> > commitfest.
> >> > So i propose to break the patch:
> >> > 1. We apply the current patch which uses xmlNodeCopy, so that the
> >> > long-standing bug will be fixed in postgres.
> >> > 2. I'll work with the performance enhancement in the next commitfest.
> >> >
> >> > Maybe for (2), the current better-performing patch can be viewed as
> PoC
> >> > of
> >> > the expected performance.
> >>
> >> Ali, are you currently working on that? Would you mind re-creating new
> >> entries in the commit fest app for the new set of patches that you are
> >> planning to do?
> >> For now I am switching this patch as returned with feedback.
> >> Thanks,
> >
> >
> > What i mean, the last patch (v7 patch) as it is is enough to fix the bug
> > (nested xpath namespace problem). I think the performance regression is
> > still acceptable (in my case it's ~20%), because the bug is severe.
> > Currently, xpath can return invalid xml because the namespace isn't
> included
> > in the output!
> >
> > What i'll be working is the v4 patch, because it turns out the v4 patch
> has
> > better performance (~10%, but Peter's test shows it isn't the case). But,
> > the problem is the v4 patch is organized wrongly, and hacks around the
> > libxml's xml node structure (duplicating the namespace on the existing
> > structure). I'll work on that, but it will affects the performance
> benefit.
> >
> > So what i propose is, we close the longstanding bug in this comitfest
> with
> > the v7 patch. I'll work on improving the performance, without
> compromising
> > good code structure. If the result is good, i'll submit the patch.
> OK. Could you then move this patch to the new CF with Needs Review
>
with Peter as reviewer then? He seems to be looking at it anyway
> seeing the update from 12/11.
>

OK. Moved to the new CF.

Regards,
--
Ali Akbar


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-07 04:14:26
Message-ID: 54ACB2A2.2030006@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

committed version 7


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-07 06:28:15
Message-ID: 9523.1420612095@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> committed version 7

Isn't that a back-patchable bug fix?

regards, tom lane


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-08 01:51:31
Message-ID: CACQjQLotP6K+HcnZk9PbPzQc6HUo8mDZDHoY8zEGkoVdDHbuBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > committed version 7
>
Thanks!

2015-01-07 13:28 GMT+07:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Isn't that a back-patchable bug fix?
>

Upthread, i noted:

> For back versions, i think because this patch changes xpath() behavior, we
> will only apply this to future versions. The old behavior is wrong
> (according to XPath standard) for not including namespaces, but maybe there
> are some application that depends on the old behavior.
>

Reviewing the behavior on 9.3, now i think the old behavior isn't usable
(the resulting xml is not even processable in postgres):

# select unnest(xpath('//a:b', '<b:a
xmlns:b="http://test.com/a"><b:b>1</b:b><b:b>2</b:b></b:a>'::xml,
array[array['a','http://test.com/a']]));
unnest
--------------
<b:b>1</b:b>
<b:b>2</b:b>
(2 rows)

# select xpath('//b:b', unnest(xpath('//a:b', '<b:a xmlns:b="
http://test.com/a"><b:b>1</b:b><b:b>2</b:b></b:a>'::xml, array[array['a','
http://test.com/a']])));
ERROR: could not parse XML document
DETAIL: line 1: Namespace prefix b on b is not defined
<b:b>1</b:b>

Maybe some application uses the result directly, but correct xml-using
applications should handle namespace correctly, so if '<b:b>1</b:b>'
becomes '<b:b xmlns:b="http://test.com/a' >1</b:b>', there should be no
issue in those applications.

So now +1 for back-patching this.

Regards,
--
Ali Akbar


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-18 03:44:03
Message-ID: 54BB2C03.6090101@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/7/15 8:51 PM, Ali Akbar wrote:
> So now +1 for back-patching this.

Done. Was a bit of work to get it into all the old versions.


From: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-18 11:05:05
Message-ID: CACQjQLpPQT4XRfmMf3GmU5SOZ0vEWE4-EK0CP0XaATPCr1VDsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2015-01-18 10:44 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:

> On 1/7/15 8:51 PM, Ali Akbar wrote:
> > So now +1 for back-patching this.
>
> Done. Was a bit of work to get it into all the old versions.
>

Wow. Thanks.

Btw, for bug-fix patches like this, should the patch creator (me) also
create patches for back branches?

Regards,
--
Ali Akbar


From: David Fetter <david(at)fetter(dot)org>
To: Ali Akbar <the(dot)apaan(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-18 17:38:54
Message-ID: 20150118173854.GC12186@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:
>
> > On 1/7/15 8:51 PM, Ali Akbar wrote:
> > > So now +1 for back-patching this.
> >
> > Done. Was a bit of work to get it into all the old versions.
> >
>
> Wow. Thanks.
>
> Btw, for bug-fix patches like this, should the patch creator (me) also
> create patches for back branches?

As I understand it, back-patches are the committer's responsibility.

The submitter might make suggestions as to how this might be
approached if it doesn't appear trivial.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-19 00:00:34
Message-ID: CAB7nPqR681+rMy_kLqqQgiBufb0g=n9E6p-vA_q6yK_TniF1Xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 19, 2015 at 2:38 AM, David Fetter <david(at)fetter(dot)org> wrote:
> On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
>> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:
>> Btw, for bug-fix patches like this, should the patch creator (me) also
>> create patches for back branches?
>
> As I understand it, back-patches are the committer's responsibility.
> The submitter might make suggestions as to how this might be
> approached if it doesn't appear trivial.
TBH, I would imagine that patches that can be applied to back-branches
are a better start point than plain scratch particularly if there are
diffs in stable branches compared to HEAD. Everybody's time is
important.
--
Michael


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: David Fetter <david(at)fetter(dot)org>, Ali Akbar <the(dot)apaan(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [REVIEW] Re: Fix xpath() to return namespace definitions
Date: 2015-01-19 00:15:11
Message-ID: 20900.1421626511@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> On Mon, Jan 19, 2015 at 2:38 AM, David Fetter <david(at)fetter(dot)org> wrote:
>> On Sun, Jan 18, 2015 at 06:05:05PM +0700, Ali Akbar wrote:
>>> 2015-01-18 10:44 GMT+07:00 Peter Eisentraut <peter_e(at)gmx(dot)net>:
>>>> Btw, for bug-fix patches like this, should the patch creator (me) also
>>>> create patches for back branches?

>> As I understand it, back-patches are the committer's responsibility.
>> The submitter might make suggestions as to how this might be
>> approached if it doesn't appear trivial.

> TBH, I would imagine that patches that can be applied to back-branches
> are a better start point than plain scratch particularly if there are
> diffs in stable branches compared to HEAD. Everybody's time is
> important.

Yeah --- and I'd argue that it's largely a waste of time to work on
creating back-branch patches until the HEAD patch is in final form.
Since we've generally reserved the right for the committer to whack
patches around before committing, I think this means the committer
also has to do the work of back-patch adjustment.

Now a committer who doesn't feel like doing that could certainly say
"here's the version of the HEAD patch that I'm willing to commit, but
it doesn't apply cleanly in back branches; could you work up back-branch
equivalents?". But that hasn't been the usual approach.

regards, tom lane