Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected

Lists: pgsql-hackers
From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-06-29 17:34:23
Message-ID: 201106291934.23089.rsmogura@softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Review of patch
https://commitfest.postgresql.org/action/patch_view?id=580

=== Patch description ===
SUMMARY: When text() based XPATH expression is invoked output is not XML
escaped

DESCRIPTION: Submitter invokes following statement:
SELECT (XPATH('/*/text()', '<root>&lt;</root>'))[1].
He expect (escaped) result "&lt;", but gets "<"

AFFECTS: Possible this may affects situations when user wants to insert output
from above expression to XML column.

PATCH CONTENT:
A. 1. Patch fixes above problem (I don't treat this like problem, but like
enhancement).
A. 2. In addition patch contains test cases for above.
A. 3. Patch changes behaviour of method xml_xmlnodetoxmltype invoked by
xpath_internal, by adding escape_xml() call.
A. 4. I don't see any stability issues with this.
A. 5. Performance may be reduced and memory consumption may
increase due to internals of method escape_xml

=== Review ===
B. 1. Patch applies cleanly.

B. 2. Source compiles, and patch works as Submitter wants.

B. 3. Personally I missed some notes in documentation that such
expression will be escaped (those should be clearly exposed, as the "live"
functionality is changed).

B. 4. Submitter, possible, revealed some missed, minor functionality of
PostgreSQL XML. As he expects XML escaped output.

B. 5. Currently XPATH produces escaped output for Element nodes, and
not escaped output for all other types of nodes (including text,
comment, etc.)

B. 6. Current behaviour _is intended_ (there is "if" to check node type)
and _"natural"_. In this particular case user ask for text content of some
node, and this content is actually "<".

B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
Produced "<"

B. 8. Even if current behaviour may be treated as wrong it was exposed
and other may depends on not escaped content.

B. 9. I think, correct approach should go to create new function
(basing on one existing) that will be able to escape above. In this
situation
call should look like (for example only!):
SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>')))[1]
or
SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>'))[1])
One method to operate on array one to operate on single XML datum.
Or to think about something like xmltext(). (Compare current status of xml.c)

B. 10. If such function (B.9.) is needed and if it will be included is out of
scope of this review.

Basing mainly on A.1, B.6., B.8., bearing in mind B.10., in my opinion this is
subject to reject as "need more work", "or as invalid".

The detailed explanation why such behaviour should not be implemented I will
send in review of https://commitfest.postgresql.org/action/patch_view?id=565.

Regards,

Radosław Smogura

P. S.
I would like to say sorry, for such late answaer, but I sent this from other
mail address, which was not attached to mailing list. Blame KDE KMail not me
:)


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-06-29 20:48:40
Message-ID: 1471EE3B-B08E-418B-8D9F-54116812F6D5@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jun29, 2011, at 19:34 , Radosław Smogura wrote:
> B. 6. Current behaviour _is intended_ (there is "if" to check node type) and _"natural"_. In this particular case user ask for text content of some node, and this content is actually "<".

I don't buy that. The check for the node type is there because
two different libxml functions are used to convert nodes to
strings. The if has absolutely *zero* to do with escaping, expect
for that missing escape_xml() call in the "else" case.

Secondly, there is little point in having an type XML if we
don't actually ensure that values of that type can only contain
well-formed XML.

> B. 7. Similar behaviour may be observer e. g. in SQL Server(tm)
> SELECT x.value('(//text())[1]', 'varchar(256)') FROM #xml_t;
> Produced "<"

Whats the *type* of that value? I'm not familiar with the XPATH
support in SQL Server, but since you pass 'varchar(256)' as
the second parameter, I assume that this is how you tell it
the return type you want. Since you chose a textual type, not
quoting the value is perfectly fine. I suggest that you try
this again, but this time evaluate the XPATH expression so
that you get a value of type XML (Assuming such a type exists
in SQL Server)

> B. 8. Even if current behaviour may be treated as wrong it was exposed and other may depends on not escaped content.

Granted, there's the possibility of breaking existing applications
here. But the same argument could have been applied when we made
check against invalid characters (e.g. invalid UTF-8 byte sequences)
tighter for textual types.

When it comes to data integrity (and non-well-formed values in
XML columns are a data integrity issue), I believe that the advantages
of tighter checks out-weight potential compatibility problems.

> B. 9. I think, correct approach should go to create new function (basing on one existing) that will be able to escape above. In this situation
> call should look like (for example only!):
> SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>')))[1]
> or
> SELECT xml_escape((XPATH('/*/text()', '<root>&lt;</root>'))[1])
> One method to operate on array one to operate on single XML datum.
> Or to think about something like xmltext(). (Compare current status of xml.c)

-1. Again, value of type XML should always be well-formed XML.
Requiring every future user of posgres XML support to remember
to surround XPATH() calls with XML_ESCAPE() will undoubtedly lead
to bugs.

I'm all for having escaping and un-escaping functions though,
but these *need* to have the signatures

XML_ESCAPE(text) RETURNS xml
XML_UNESCAPE(XML) RETURNS text

best regards,
Florian Pflug

PS: Next time, please post your Review as a follow-up of the mail
that contains the patch. That makes sure that people who already
weighted in on the issue don't overlook your review. Or, the
patch author, for that matter - I nearly missed your Review between
the larger number of mail in my postgres folder.


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-10 18:40:54
Message-ID: 4E19F236.4080507@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

>> B. 6. Current behaviour _is intended_ (there is "if" to check node type) and _"natural"_. In this particular case user ask for text content of some node, and this content is actually "<".
>
> I don't buy that. The check for the node type is there because
> two different libxml functions are used to convert nodes to
> strings. The if has absolutely *zero* to do with escaping, expect
> for that missing escape_xml() call in the "else" case.
>
> Secondly, there is little point in having an type XML if we
> don't actually ensure that values of that type can only contain
> well-formed XML.

Can anyone else weigh in on this? Peter?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Radosław Smogura <rsmogura(at)softperience(dot)eu>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-10 22:06:22
Message-ID: BC66B446-29EC-44FA-8860-7165F5105BDA@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 10, 2011, at 1:40 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> Hackers,
>
>>> B. 6. Current behaviour _is intended_ (there is "if" to check node type) and _"natural"_. In this particular case user ask for text content of some node, and this content is actually "<".
>>
>> I don't buy that. The check for the node type is there because
>> two different libxml functions are used to convert nodes to
>> strings. The if has absolutely *zero* to do with escaping, expect
>> for that missing escape_xml() call in the "else" case.
>>
>> Secondly, there is little point in having an type XML if we
>> don't actually ensure that values of that type can only contain
>> well-formed XML.
>
> Can anyone else weigh in on this? Peter?

Unless I am missing something, Florian is clearly correct here.

...Robert


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 09:00:29
Message-ID: cf964c14afa1bf7c60466b22fd45a71d@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
> On Jul 10, 2011, at 1:40 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>
>> Hackers,
>>
>>>> B. 6. Current behaviour _is intended_ (there is "if" to check
>>>> node type) and _"natural"_. In this particular case user ask for
>>>> text content of some node, and this content is actually "<".
>>>
>>> I don't buy that. The check for the node type is there because
>>> two different libxml functions are used to convert nodes to
>>> strings. The if has absolutely *zero* to do with escaping, expect
>>> for that missing escape_xml() call in the "else" case.
>>>
>>> Secondly, there is little point in having an type XML if we
>>> don't actually ensure that values of that type can only contain
>>> well-formed XML.
>>
>> Can anyone else weigh in on this? Peter?
>
> Unless I am missing something, Florian is clearly correct here.
>
> ...Robert
For me not, because this should be fixed internally by making xml type
sefe, currently xml type may be used to keep proper XMLs and any kind of
data, as well.

If I ask, by any means select xpath(/text(...))..... I want to get
text.
1) How I should descape node in client application (if it's part of xml
I don't have header), bear in mind XML must give support for streaming
processing too.
2) Why I should differntly treat text() then select from varchar in
both I ask for xml, driver can't make this, because it doesn't know if
it gets scalar, text, comment, element, or maybe document.
3) What about current applications, folks probably uses this and are
happy they get text, and will not see, that next release of PostgreSQL
will break their applications.

There is of course disadvantage of current behaviour as it may lead to
inserting badly xmls (in one case), but I created example when auto
escaping will create double escaped xmls, and may lead to insert
inproper data (this is about 2nd patch where Florian add escaping, too).

SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM
(SELECT
(XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES
(XMLELEMENT(name
"root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as
foo;

xmlelement
-------------------------
<root sth="&amp;lt;n"/>

It can't be resolved without storing type in xml or adding xmltext or
adding pseudo xmlany element, which will be returned by xpath.

Regards,
Radek


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 09:45:59
Message-ID: 0A7466C6-B139-4B9C-96E7-1315DA03AD7F@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul12, 2011, at 11:00 , Radosław Smogura wrote:
> On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
>> Unless I am missing something, Florian is clearly correct here.
> For me not, because this should be fixed internally by making xml type sefe

Huh??. Making the xml type safe is *exactly* what I'm trying to do here...

> currently xml type may be used to keep proper XMLs and any kind of data, as well.

As I pointed out before, that simply isn't true. Try storing
non-well-formed data into an XML column (there *are* ways to do
that, i.e. there are bugs, one if which I'm trying to fix here!)
and then dump and (try to) reload your database. Ka-wooooom!

> If I ask, by any means select xpath(/text(...))..... I want to get text.

And I want '3' || '4' to return the integer 34. Though luck! The fact that
XPATH() is declared to return XML, *not* TEXT means you don't get what you
want. Period. Feel free to provide a patch that adds a function XPATH_TEXT
if you feel this is an issue.

XML *is* *not* simply an alias for TEXT! It's a distinct type, which its
down distinct rules about what constitutes a valid value and what doesn't.

> 1) How I should descape node in client application (if it's part of xml I don't have header), bear in mind XML must give support for streaming processing too.

Huh?

> 2) Why I should differntly treat text() then select from varchar in both I ask for xml, driver can't make this, because it doesn't know if it gets scalar, text, comment, element, or maybe document.

> 3) What about current applications, folks probably uses this and are happy they get text, and will not see, that next release of PostgreSQL will break their applications.

That, and *only* that, I recognize as a valid concern. However, and *again*
as I have pointer out before a *multiple* of times, backwards compatibility
is no excuse not to fix bugs. Plus, there might just as well be applications
which feed the contents of XML columns directly into a XML parser (as they
have every right to!) and don't expect that parser to throw an error. Which,
as it stands, we cannot guarantee. Having to deal with an error there is akin
to having to deal with integer columns containing 'foobar'!

> There is of course disadvantage of current behaviour as it may lead to inserting badly xmls (in one case), but I created example when auto escaping will create double escaped xmls, and may lead to insert inproper data (this is about 2nd patch where Florian add escaping, too).
>
> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT
> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name
> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;
>
> xmlelement
> -------------------------
> <root sth="&amp;lt;n"/>

Radosław, you've raised that point before, and I refuted it. The crucial
difference is that double-escaped values are well-formed, where as un-escaped
ones are not.

Again, as I said before, the double-escaping done by XMLATTRIBUTES there is
not pretty. But its *not* XPATH()'s fault!. To see that, simply replace your
XPATH() expression with '&lt;n'::xml to see that.

And in fact

> It can't be resolved without storing type in xml or adding xmltext or adding pseudo xmlany element, which will be returned by xpath.

Huh?

Frankly, Radosław, I get the feeling that you're not trying to understand my
answers to your objections, but instead keep repeating the same assertions
over and over again. Even though at least some of them, like XML being able to
store arbitrary values, are simply wrong! And I'm getting pretty tired of this...
So far, you also don't seem to have taken a single look at the actual
implementation of the patch, even though code review is an supposed to be an
integral part of the patch review process. I therefore don't believe that we're
getting anywhere here.

So please either start reviewing the actual implementation, and leave
the considerations about whether we want this or not to the eventual committer.
Or, if you don't want to do that for one reason or another, pleaser consider
letting somebody else take over this review, i.e. consider removing your name
from the "Reviewer" field.

best regards,
Florian Pflug


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 10:57:59
Message-ID: 86d1764e9c7e6020bf2872d3e98789c3@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 12 Jul 2011 11:45:59 +0200, Florian Pflug wrote:
> On Jul12, 2011, at 11:00 , Radosław Smogura wrote:
>> On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
>>> Unless I am missing something, Florian is clearly correct here.
>> For me not, because this should be fixed internally by making xml
>> type sefe
>
> Huh??. Making the xml type safe is *exactly* what I'm trying to do
> here...
>
>> currently xml type may be used to keep proper XMLs and any kind of
>> data, as well.
>
> As I pointed out before, that simply isn't true. Try storing
> non-well-formed data into an XML column (there *are* ways to do
> that, i.e. there are bugs, one if which I'm trying to fix here!)
> and then dump and (try to) reload your database. Ka-wooooom!
>
>> If I ask, by any means select xpath(/text(...))..... I want to get
>> text.
>
> And I want '3' || '4' to return the integer 34. Though luck! The fact
> that
> XPATH() is declared to return XML, *not* TEXT means you don't get
> what you
> want. Period. Feel free to provide a patch that adds a function
> XPATH_TEXT
> if you feel this is an issue.
>
> XML *is* *not* simply an alias for TEXT! It's a distinct type, which
> its
> down distinct rules about what constitutes a valid value and what
> doesn't.
>
>> 1) How I should descape node in client application (if it's part of
>> xml I don't have header), bear in mind XML must give support for
>> streaming processing too.
>
> Huh?
>
>> 2) Why I should differntly treat text() then select from varchar in
>> both I ask for xml, driver can't make this, because it doesn't know if
>> it gets scalar, text, comment, element, or maybe document.
>
>> 3) What about current applications, folks probably uses this and are
>> happy they get text, and will not see, that next release of PostgreSQL
>> will break their applications.
>
> That, and *only* that, I recognize as a valid concern. However, and
> *again*
> as I have pointer out before a *multiple* of times, backwards
> compatibility
> is no excuse not to fix bugs. Plus, there might just as well be
> applications
> which feed the contents of XML columns directly into a XML parser (as
> they
> have every right to!) and don't expect that parser to throw an error.
> Which,
> as it stands, we cannot guarantee. Having to deal with an error there
> is akin
> to having to deal with integer columns containing 'foobar'!

Bugs must be resolved in smart way, especially if they changes
behaviour, with consideration of impact change will produce, removing
support for xml resolves this bug as well. I've said problem should be
resolved in different way.

>> There is of course disadvantage of current behaviour as it may lead
>> to inserting badly xmls (in one case), but I created example when auto
>> escaping will create double escaped xmls, and may lead to insert
>> inproper data (this is about 2nd patch where Florian add escaping,
>> too).
>>
>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth))
>> FROM (SELECT
>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES
>> (XMLELEMENT(name
>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as
>> foo;
>>
>> xmlelement
>> -------------------------
>> <root sth="&amp;lt;n"/>
>
> Radosław, you've raised that point before, and I refuted it. The
> crucial
> difference is that double-escaped values are well-formed, where as
> un-escaped
> ones are not.
>
> Again, as I said before, the double-escaping done by XMLATTRIBUTES
> there is
> not pretty. But its *not* XPATH()'s fault!. To see that, simply
> replace your
> XPATH() expression with '&lt;n'::xml to see that.
>
> And in fact
>
>> It can't be resolved without storing type in xml or adding xmltext
>> or adding pseudo xmlany element, which will be returned by xpath.
>
> Huh?
>
> Frankly, Radosław, I get the feeling that you're not trying to
> understand my
> answers to your objections, but instead keep repeating the same
> assertions
> over and over again. Even though at least some of them, like XML
> being able to
> store arbitrary values, are simply wrong! And I'm getting pretty
> tired of this...
> So far, you also don't seem to have taken a single look at the actual
> implementation of the patch, even though code review is an supposed
> to be an
> integral part of the patch review process. I therefore don't believe
> that we're
> getting anywhere here.
So far, you don't know if I taken a single look, your suspicious are
wrong, and You try to blame me. All of your sentences about "do not
understanding" I may sent to you, and blame you with your words.

> So please either start reviewing the actual implementation, and leave
> the considerations about whether we want this or not to the eventual
> committer.
> Or, if you don't want to do that for one reason or another, pleaser
> consider
> letting somebody else take over this review, i.e. consider removing
> your name
> from the "Reviewer" field.

If I do review I may put my comments, but "I get the feeling that
you're not trying to understand my
answers to your objections, but instead keep repeating the same
assertions over and over again." - and in patch there is review of code.
So please either "stop" responding to my objections "and leave the
considerations about whether we want this or not to the eventual
committer."

And consider this new assertions...
I store XML document in XML (escaped text, why not?), then recreate it
by xpath and insert, with Your patch, I will not insert document but
escaped text, which is wrong XML, and ALL your objections about current
buggy not-escaping will FULLY apply to this situation.
It's ping-pong, and this patch moves problem from one corner to other.

For me this discussion is over. I putted my objections and suggestions.
Full review is available in archives, and why to not escape is putted in
review of your 2nd patch, about scalar values.

Regards,
Radek


From: Florian Pflug <fgp(at)phlo(dot)org>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 12:25:12
Message-ID: AE24E2AB-1D94-41DF-AACF-597154A55B18@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul12, 2011, at 12:57 , Radosław Smogura wrote:
> On Tue, 12 Jul 2011 11:45:59 +0200, Florian Pflug wrote:
>> On Jul12, 2011, at 11:00 , Radosław Smogura wrote:
>>> On Sun, 10 Jul 2011 17:06:22 -0500, Robert Haas wrote:
>>>> Unless I am missing something, Florian is clearly correct here.
>>> For me not, because this should be fixed internally by making xml type sefe
>>
>> Huh??. Making the xml type safe is *exactly* what I'm trying to do here...
>>
>>> currently xml type may be used to keep proper XMLs and any kind of data, as well.
>>
>> As I pointed out before, that simply isn't true. Try storing
>> non-well-formed data into an XML column (there *are* ways to do
>> that, i.e. there are bugs, one if which I'm trying to fix here!)
>> and then dump and (try to) reload your database. Ka-wooooom!

You again very conveniently ignored me here, and thus the *fact*
that XML *doesn't* allow arbitrary textual values to be stored.
If it did, there would not be a "Ka-wooooom!" here.

I beg you to actually try this out. Put the result of an XPATH()
expression that returns a literal '<' into a column of type XML,
and dump and reload.

>>> If I ask, by any means select xpath(/text(...))..... I want to get text.
>>
>> And I want '3' || '4' to return the integer 34. Though luck! The fact that
>> XPATH() is declared to return XML, *not* TEXT means you don't get what you
>> want. Period. Feel free to provide a patch that adds a function XPATH_TEXT
>> if you feel this is an issue.
>>
>> XML *is* *not* simply an alias for TEXT! It's a distinct type, which its
>> down distinct rules about what constitutes a valid value and what doesn't.

Again, you ignored my answer.

>>> 3) What about current applications, folks probably uses this and are happy
>>> they get text, and will not see, that next release of PostgreSQL will break
>>> their applications.
>>
>> That, and *only* that, I recognize as a valid concern. However, and *again*
>> as I have pointer out before a *multiple* of times, backwards compatibility
>> is no excuse not to fix bugs. Plus, there might just as well be applications
>> which feed the contents of XML columns directly into a XML parser (as they
>> have every right to!) and don't expect that parser to throw an error. Which,
>> as it stands, we cannot guarantee. Having to deal with an error there is akin
>> to having to deal with integer columns containing 'foobar'!
>
> Bugs must be resolved in smart way, especially if they changes behaviour, with
> consideration of impact change will produce, removing support for xml resolves
> this bug as well. I've said problem should be resolved in different way.

Fine. So what does that "different way" look like? Keeping things as they are
is certainly not an option, since it failed as soon as you dump and reload
(Or simply cast the value to TEXT and back to XML).

>>> There is of course disadvantage of current behaviour as it may lead to
>>> inserting badly xmls (in one case), but I created example when auto escaping
>>> will create double escaped xmls, and may lead to insert inproper data (this
>>> is about 2nd patch where Florian add escaping, too).
>>>
>>> SELECT XMLELEMENT(name root, XMLATTRIBUTES(foo.namespace AS sth)) FROM (SELECT
>>> (XPATH('namespace-uri(/*)', x))[1] AS namespace FROM (VALUES (XMLELEMENT(name
>>> "root", XMLATTRIBUTES('<n' AS xmlns, '<v' AS value),'<t'))) v(x)) as foo;
>>>
>>> xmlelement
>>> -------------------------
>>> <root sth="&amp;lt;n"/>
>>
>> Radosław, you've raised that point before, and I refuted it. The crucial
>> difference is that double-escaped values are well-formed, where as un-escaped
>> ones are not.
>>
>> Again, as I said before, the double-escaping done by XMLATTRIBUTES there is
>> not pretty. But its *not* XPATH()'s fault!. To see that, simply replace your
>> XPATH() expression with '&lt;n'::xml to see that.

And here too I see no response from you...

>> Frankly, Radosław, I get the feeling that you're not trying to understand my
>> answers to your objections, but instead keep repeating the same assertions
>> over and over again. Even though at least some of them, like XML
>> being able to
>> store arbitrary values, are simply wrong! And I'm getting pretty
>> tired of this...
>> So far, you also don't seem to have taken a single look at the actual
>> implementation of the patch, even though code review is an supposed to be an
>> integral part of the patch review process. I therefore don't believe
>> that we're
>> getting anywhere here.
> So far, you don't know if I taken a single look, your suspicious are wrong, and
> You try to blame me.

Well, you haven't commented on the code, so assumed that you haven't
look at it. May I instead assume that you did look at it, and found the
patch to be in good shape, implementation-wise?

> All of your sentences about "do not understanding" I may
> sent to you, and blame you with your words.

I think I have so far provided quite detailed responses to all of your
concerns. If no, please point me to one of your concerns where I haven't
either acknowledged that there is a problem, or have explained quite detailed
why there is none.

>> So please either start reviewing the actual implementation, and leave
>> the considerations about whether we want this or not to the eventual
>> committer.
>> Or, if you don't want to do that for one reason or another, pleaser consider
>> letting somebody else take over this review, i.e. consider removing your name
>> from the "Reviewer" field.
>
> If I do review I may put my comments, but "I get the feeling that you're not
> trying to understand my answers to your objections, but instead keep repeating
> the same assertions over and over again." - and in patch there is review of code.

I didn't see any code review. Maybe I missed it, though. Could you point me
to it again, please?

> So please either "stop" responding to my objections "and leave the considerations
> about whether we want this or not to the eventual committer."

It's usually the reviewer, not the patch author, who decides that it's time
for a committer to look at the patch. At that point, the reviewer then marks
the patch as "Ready for Committer". I can do that for you, of course, but
I didn't want to do it without your consent.

> And consider this new assertions...
> I store XML document in XML (escaped text, why not?), then recreate it by xpath
> and insert, with Your patch, I will not insert document but escaped text, which
> is wrong XML, and ALL your objections about current buggy not-escaping will FULLY
> apply to this situation.

As far as I know, this only happens if you put the result of XPATH() into
an XML *attribute*. If you instead put it into a text node (i.e, pass the
result of XPATH() to XMLELEMENT(), not XMLATTRIBUTES()), everything is fine.
In fact, *without* the patch, if you select a text node with XPATH() and
pass the result to XMLELEMENT(), the result is not well-formed.

Here's an example.

select xmlelement(
name "newtag",
(xpath('/oldtag/text()', '<oldtag>&lt;</oldtag>'))[1]
);

produces

xmlelement
--------------------
<newtag><</newtag>

That can't be right, can it?

Again, this currently just happens to work for the combination of
XMLATTRIBUTES and XPATH() because XPATH()'s failure to escape the
result is compensated by XMLATTRIBUTES()'s failure to not escape values
which are already of type XML.

> It's ping-pong, and this patch moves problem from one corner to other.
>
> For me this discussion is over. I putted my objections and suggestions. Full
> review is available in archives, and why to not escape is putted in review
> of your 2nd patch, about scalar values.

Please mark the two patches "Ready for Committer", then!

best regards,
Florian Pflug


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 18:11:40
Message-ID: 4E1C8E5C.5040201@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Radoslaw,

> For me this discussion is over. I putted my objections and suggestions. Full
>> review is available in archives, and why to not escape is putted in review
>> of your 2nd patch, about scalar values.

Did you install and test the functionality of the patch? I can't tell
from your review whether you got that far.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Radosław Smogura <rsmogura(at)softperience(dot)eu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-12 18:16:17
Message-ID: 4E1C8F71.8000608@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Florian, Radoslaw,

Please both of you calm down. Florian is trying to improve our XML
type. Radoslaw is trying to help out by reviewing it. It's not a
benefit to anyone for you two to get into an argument about who said
what ... especially if the argument is based on (as far as I can see)
not understanding what the other person was saying.

Answering "What did you mean by X? Did you mean Y, or something else?"
is much more friendly than saying "You couldn't possibly mean X" or "You
don't understand X". Consider that *both* of you are exchanging emails
in a language which is native to neither of you.

This goes for all discussions on -hackers, but your recent conversation
is a good example of unnecessary argument.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-13 08:35:02
Message-ID: a872b45c461d2a8817a1e3ffe97a2931@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 12 Jul 2011 11:11:40 -0700, Josh Berkus wrote:
> Radoslaw,
>
>> For me this discussion is over. I putted my objections and
>> suggestions. Full
>>> review is available in archives, and why to not escape is putted in
>>> review
>>> of your 2nd patch, about scalar values.
>
> Did you install and test the functionality of the patch? I can't
> tell
> from your review whether you got that far.

Yas, patch was tested, and applied. I think in both reviews I marked
that patch does this what Florain said. Examples of double escaping were
taken from one of patch (I think from this about scalar values).

If I should I will bump this patches up.

Regards,
Radosław Smogura


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-13 09:58:58
Message-ID: CAP-rdTYr6fH6L=aHsw7jRV_xzn9gLNEKN9525TdXo2MxcUc08g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/6/29, Florian Pflug <fgp(at)phlo(dot)org>:

> Secondly, there is little point in having an type XML if we
> don't actually ensure that values of that type can only contain
> well-formed XML.

+1. The fact that XPATH() must return a type that cannot depend on the
given expression (even if it is a constant string) may be unfortunate,
but returning XML-that-is-not-quite-XML sounds way worse to me.

Nicolas

--
A. Because it breaks the logical sequence of discussion.
Q. Why is top posting bad?


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
Cc: Florian Pflug <fgp(at)phlo(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-14 12:15:15
Message-ID: 1310645715.16381.2.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On ons, 2011-07-13 at 11:58 +0200, Nicolas Barbier wrote:
> 2011/6/29, Florian Pflug <fgp(at)phlo(dot)org>:
>
> > Secondly, there is little point in having an type XML if we
> > don't actually ensure that values of that type can only contain
> > well-formed XML.
>
> +1. The fact that XPATH() must return a type that cannot depend on the
> given expression (even if it is a constant string) may be unfortunate,
> but returning XML-that-is-not-quite-XML sounds way worse to me.

The example given was

XPATH('/*/text()', '<root>&lt;</root>')

This XPath expression returns a node set, and XML is a serialization
format of a node, so returning xml[] in this particular case seems
entirely reasonable to me.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>, Radosław Smogura <rsmogura(at)softperience(dot)eu>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-14 12:15:33
Message-ID: 1310645733.16381.3.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On sön, 2011-07-10 at 11:40 -0700, Josh Berkus wrote:
> Hackers,
>
> >> B. 6. Current behaviour _is intended_ (there is "if" to check node type) and _"natural"_. In this particular case user ask for text content of some node, and this content is actually "<".
> >
> > I don't buy that. The check for the node type is there because
> > two different libxml functions are used to convert nodes to
> > strings. The if has absolutely *zero* to do with escaping, expect
> > for that missing escape_xml() call in the "else" case.
> >
> > Secondly, there is little point in having an type XML if we
> > don't actually ensure that values of that type can only contain
> > well-formed XML.
>
> Can anyone else weigh in on this? Peter?

Looks like a good change to me.


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Florian Pflug <fgp(at)phlo(dot)org>
Subject: Re: Patch Review: Bugfix for XPATH() if text or attribute nodes are selected
Date: 2011-07-14 12:52:33
Message-ID: 5a303f2d9a9de0ce684fac3439b7f3e4@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 14 Jul 2011 15:15:33 +0300, Peter Eisentraut wrote:
> On sön, 2011-07-10 at 11:40 -0700, Josh Berkus wrote:
>> Hackers,
>>
>> >> B. 6. Current behaviour _is intended_ (there is "if" to check
>> node type) and _"natural"_. In this particular case user ask for text
>> content of some node, and this content is actually "<".
>> >
>> > I don't buy that. The check for the node type is there because
>> > two different libxml functions are used to convert nodes to
>> > strings. The if has absolutely *zero* to do with escaping, expect
>> > for that missing escape_xml() call in the "else" case.
>> >
>> > Secondly, there is little point in having an type XML if we
>> > don't actually ensure that values of that type can only contain
>> > well-formed XML.
>>
>> Can anyone else weigh in on this? Peter?
>
> Looks like a good change to me.
I'll bump it in few hours, as I can't recall password from keyring. Now
I have hands clean and it's not my business to care about this.

Best regards.
Radek.