Re: review: xml_is_well_formed

Lists: pgsql-hackers
From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: mike(at)mlfowler(dot)com
Subject: review: xml_is_well_formed
Date: 2010-07-29 13:21:55
Message-ID: AANLkTikTru1-KBjW3xQ9OSG2h76ZOVMB-qfms__atFTC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I looked on patch
https://commitfest.postgresql.org/action/patch_view?id=334 .This patch
moves function xml_is_well_formed from contrib xm2 to core.

* Is the patch in context diff format?
yes

* Does it apply cleanly to the current CVS HEAD?
yes

* Does it include reasonable tests, necessary doc patches, etc?
yes

* Does the patch actually implement that?
yes

* Do we want that?
yes

* Do we already have it?
yes - simplified version in core

* Does it follow SQL spec, or the community-agreed behavior?
no - I didn't find any resources about conformance with SQL spec, but
it has same behave like original contrib function

* Does it include pg_dump support (if applicable)?
not related

* Are there dangers?
no

*Are there any assertion failures or crashes?

not found

I have a few issues:
* broken regress test (fedora 13 - xmllint: using libxml version 20707)

postgres=# SELECT xml_is_well_formed('<pg:foo
xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>');
xml_is_well_formed
--------------------
f
(1 row)

this xml is broken - but in regress tests is ok

[pavel(at)pavel-stehule ~]$ xmllint xxx
xxx:1: parser error : error parsing attribute name
<pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>

* xml_is_well_formed returns true for simple text

postgres=# SELECT xml_is_well_formed('ssss');
xml_is_well_formed
--------------------
t
(1 row)

it is probably wrong result - is it ok??

* I don't understand to this fragment

PG_TRY();
+ {
+ size_t count;
+ xmlChar *version = NULL;
+ int standalone = -1;
+.
+ res_code = parse_xml_decl(string, &count, &version,
NULL, &standalone);
+ if (res_code != 0)
+ xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT,
+ "invalid XML
content: invalid XML declaration",
+ res_code);
+.
+ doc = xmlNewDoc(version);
+ doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
+ doc->standalone = 1;

why? This function can raise exception when declaration is wrong. It
is wrong - I think, this function should returns false instead
exception.

Regards

Pavel Stehule

postgres=# select version();
version

--------------------------------------------------------------------------------------------------------------
--------
PostgreSQL 9.1devel on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.4.4 20100630 (Red Hat 4.4.4-10),
64-bit
(1 row)


From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-07-30 11:50:25
Message-ID: 4C52BC81.9050209@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Pavel,

Thanks for taking the time to review my patch. Attached is a new version
addressing your concerns.

On 29/07/10 14:21, Pavel Stehule wrote:
> I have a few issues:
> * broken regress test (fedora 13 - xmllint: using libxml version 20707)
>
> postgres=# SELECT xml_is_well_formed('<pg:foo
> xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>');
> xml_is_well_formed
> --------------------
> f
> (1 row)
>
> this xml is broken - but in regress tests is ok
>
> [pavel(at)pavel-stehule ~]$ xmllint xxx
> xxx:1: parser error : error parsing attribute name
> <pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>
>

This is a problem that was observered recently by Thom Brown - the
commit fest webapp adds the semicolon after the quote. If you look at
the attachment I sent in a email client you'll find there is no
semicolon. The patch attached here will also not have the semicolon.

> * xml_is_well_formed returns true for simple text
>
> postgres=# SELECT xml_is_well_formed('ssss');
> xml_is_well_formed
> --------------------
> t
> (1 row)
>
> it is probably wrong result - is it ok??
>

Yes this is OK, pure text is valid XML content.

> * I don't understand to this fragment
>
> PG_TRY();
> + {
> + size_t count;
> + xmlChar *version = NULL;
> + int standalone = -1;
> +.
> + res_code = parse_xml_decl(string,&count,&version,
> NULL,&standalone);
> + if (res_code != 0)
> + xml_ereport_by_code(ERROR, ERRCODE_INVALID_XML_CONTENT,
> + "invalid XML
> content: invalid XML declaration",
> + res_code);
> +.
> + doc = xmlNewDoc(version);
> + doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
> + doc->standalone = 1;
>
> why? This function can raise exception when declaration is wrong. It
> is wrong - I think, this function should returns false instead
> exception.
>
>

You're quite right, I should be returning false here and not causing an
exception. I have corrected this in the attached patch.

Regards,

--
Mike Fowler
Registered Linux user: 379787

Attachment Content-Type Size
xml_is_well_formed-4.patch text/x-diff 10.3 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-07-30 12:51:53
Message-ID: AANLkTinPtwCynBo+fQdbrTW7+EWrWWegG9M-M1zzE10T@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/7/30 Mike Fowler <mike(at)mlfowler(dot)com>:
> Hi Pavel,
>
> Thanks for taking the time to review my patch. Attached is a new version
> addressing your concerns.
>
> On 29/07/10 14:21, Pavel Stehule wrote:
>>
>> I have a few issues:
>> * broken regress test (fedora 13 - xmllint: using libxml version 20707)

ok - main regress test is ok now, next I checked a contrib test for xml2

inside contrib/xml2 make installcheck, and there is a problem

SET client_min_messages = warning;
\set ECHO none
+ psql:pgxml.sql:10: ERROR: could not find function
"xml_is_well_formed" in file "/usr/local/pgsql.xwf/lib/pgxml.so"
+ psql:pgxml.sql:15: ERROR: could not find function
"xml_is_well_formed" in file "/usr/local/pgsql.xwf/lib/pgxml.so"
RESET client_min_messages;
select query_to_xml('select 1 as x',true,false,'');

you have to remove declaration from pgxml.sql.in and
uninstall_pgxml.sql, and other related files in contrib/xml2/ regress
test

>>
>> postgres=# SELECT xml_is_well_formed('<pg:foo
>> xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>');
>>  xml_is_well_formed
>> --------------------
>>  f
>> (1 row)
>>
>> this xml is broken - but in regress tests is ok
>>
>> [pavel(at)pavel-stehule ~]$ xmllint xxx
>> xxx:1: parser error : error parsing attribute name
>> <pg:foo xmlns:pg="http://postgresql.org/stuff";>bar</pg:foo>
>>
>
> This is a problem that was observered recently by Thom Brown - the commit
> fest webapp adds the semicolon after the quote. If you look at the
> attachment I sent in a email client you'll find there is no semicolon. The
> patch attached here will also not have the semicolon.
>

ok - attached patch is correct, ... please, can you remove a broken patch?

>> * xml_is_well_formed returns true for simple text
>>
>> postgres=# SELECT xml_is_well_formed('ssss');
>>  xml_is_well_formed
>> --------------------
>>  t
>> (1 row)
>>
>> it is probably wrong result - is it ok??
>>
>
> Yes this is OK, pure text is valid XML content.

It interesting for me - is somewhere some documentation about it?

My colleagues speak some else :)

http://zvon.org/comp/r/tut-XML.html#Pages~MinimalQ20XnumberQ20XofQ20Xelements
http://www.w3.org/TR/REC-xml/#sec-prolog-dtd

I am not a specialist on XML - so just don't know

>
>> * I don't understand to this fragment
>>
>>        PG_TRY();
>> +       {
>> +               size_t          count;
>> +               xmlChar    *version = NULL;
>> +               int                     standalone = -1;
>> +.
>> +               res_code = parse_xml_decl(string,&count,&version,
>> NULL,&standalone);
>> +               if (res_code != 0)
>> +                       xml_ereport_by_code(ERROR,
>> ERRCODE_INVALID_XML_CONTENT,
>> +                                                 "invalid XML
>> content: invalid XML declaration",
>> +                                                       res_code);
>> +.
>> +               doc = xmlNewDoc(version);
>> +               doc->encoding = xmlStrdup((const xmlChar *) "UTF-8");
>> +               doc->standalone = 1;
>>
>> why? This function can raise exception when declaration is wrong. It
>> is wrong - I think, this function should returns false instead
>> exception.
>>
>>
>
> You're quite right,  I should be returning false here and not causing an
> exception. I have corrected this in the attached patch.
>

ok

> Regards,
>
> --
> Mike Fowler
> Registered Linux user: 379787
>
>


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-07-31 12:10:33
Message-ID: 1280578233.4895.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
> > * xml_is_well_formed returns true for simple text
> >
> > postgres=# SELECT xml_is_well_formed('ssss');
> > xml_is_well_formed
> > --------------------
> > t
> > (1 row)
> >
> > it is probably wrong result - is it ok??
> >
>
> Yes this is OK, pure text is valid XML content.

Are you speaking of XML content fragments that SQL/XML defines?

Well-formedness should probably only allow XML documents.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-07-31 17:40:36
Message-ID: AANLkTinHhCZR2RqiDrMd=eqdfw2eHVZ=+Q2vdYiLVP=5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
>> > * xml_is_well_formed returns true for simple text
>> >
>> > postgres=# SELECT xml_is_well_formed('ssss');
>> >   xml_is_well_formed
>> > --------------------
>> >   t
>> > (1 row)
>> >
>> > it is probably wrong result - is it ok??
>> >
>>
>> Yes this is OK, pure text is valid XML content.
>
> Are you speaking of XML content fragments that SQL/XML defines?
>
> Well-formedness should probably only allow XML documents.

I think the point of this function is to determine whether a cast to
xml will throw an error. The behavior should probably match exactly
whatever test would be applied there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Mike Fowler <mike(at)mlfowler(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-02 06:34:57
Message-ID: AANLkTi=B6_c6ySqzc-+s_CWBOkXrfo--QsWOJvKfTG8E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/31 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
>>> > * xml_is_well_formed returns true for simple text
>>> >
>>> > postgres=# SELECT xml_is_well_formed('ssss');
>>> >   xml_is_well_formed
>>> > --------------------
>>> >   t
>>> > (1 row)
>>> >
>>> > it is probably wrong result - is it ok??
>>> >
>>>
>>> Yes this is OK, pure text is valid XML content.
>>
>> Are you speaking of XML content fragments that SQL/XML defines?
>>
>> Well-formedness should probably only allow XML documents.
>
> I think the point of this function is to determine whether a cast to
> xml will throw an error.  The behavior should probably match exactly
> whatever test would be applied there.

I agree with this idea - so I am able to do:

postgres=# select 'xxx'::xml;
xml
-----
xxx
(1 row)

I have not any suggestions now - so I'll change flag to "ready to commit"

Regards

Pavel Stehule

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Mike Fowler <mike(at)mlfowler(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-02 06:46:13
Message-ID: AANLkTikkFTd7Z8Kt5vdALOK_=t2w1ACaj_6ufsiueVKW@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/2 Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>:
> 2010/7/31 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Sat, Jul 31, 2010 at 8:10 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>>> On fre, 2010-07-30 at 12:50 +0100, Mike Fowler wrote:
>>>> > * xml_is_well_formed returns true for simple text
>>>> >
>>>> > postgres=# SELECT xml_is_well_formed('ssss');
>>>> >   xml_is_well_formed
>>>> > --------------------
>>>> >   t
>>>> > (1 row)
>>>> >
>>>> > it is probably wrong result - is it ok??
>>>> >
>>>>
>>>> Yes this is OK, pure text is valid XML content.
>>>
>>> Are you speaking of XML content fragments that SQL/XML defines?
>>>
>>> Well-formedness should probably only allow XML documents.
>>
>> I think the point of this function is to determine whether a cast to
>> xml will throw an error.  The behavior should probably match exactly
>> whatever test would be applied there.
>
> I agree with this idea - so I am able to do:
>
> postgres=# select 'xxx'::xml;
>  xml
> -----
>  xxx
> (1 row)
>
> I have not any suggestions now - so I'll change flag to "ready to commit"

sorry - contrib module should be a fixed

patch attached

>
> Regards
>
> Pavel Stehule
>
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise Postgres Company
>>
>

Attachment Content-Type Size
xml2contrib.patch text/x-patch 1.2 KB

From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-02 21:16:20
Message-ID: 4C5735A4.4080409@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02/08/10 07:46, Pavel Stehule wrote:

>>
>> I have not any suggestions now - so I'll change flag to "ready to commit"
>
> sorry - contrib module should be a fixed
>
> patch attached
>

Thanks Pavel, you saved me some time!

Regards,

--
Mike Fowler
Registered Linux user: 379787


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-03 15:15:26
Message-ID: 1280848526.13031.6.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
> > Well-formedness should probably only allow XML documents.
>
> I think the point of this function is to determine whether a cast to
> xml will throw an error. The behavior should probably match exactly
> whatever test would be applied there.

Maybe there should be

xml_is_well_formed()
xml_is_well_formed_document()
xml_is_well_formed_content()

I agree that consistency with SQL/XML is desirable, but for someone
coming from the outside, the unqualified claim that 'foo' is well-formed
XML might sound suspicious.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mike Fowler <mike(at)mlfowler(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-03 17:57:14
Message-ID: AANLkTikKQdAR11UuzJtAbOMQVLBibkXjx8fHOpZ2DbkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2010/8/3 Peter Eisentraut <peter_e(at)gmx(dot)net>:
> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
>> > Well-formedness should probably only allow XML documents.
>>
>> I think the point of this function is to determine whether a cast to
>> xml will throw an error.  The behavior should probably match exactly
>> whatever test would be applied there.
>
> Maybe there should be
>
> xml_is_well_formed()
> xml_is_well_formed_document()
> xml_is_well_formed_content()
>
> I agree that consistency with SQL/XML is desirable, but for someone
> coming from the outside, the unqualified claim that 'foo' is well-formed
> XML might sound suspicious.
>

yes, it is little bit curious - but it can be just documented. Now, I
don't think, so we need more functions.

Regards

Pavel


From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 08:28:17
Message-ID: 4C5BC7A1.1080602@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03/08/10 16:15, Peter Eisentraut wrote:
> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
>>> Well-formedness should probably only allow XML documents.
>>
>> I think the point of this function is to determine whether a cast to
>> xml will throw an error. The behavior should probably match exactly
>> whatever test would be applied there.
>
> Maybe there should be
>
> xml_is_well_formed()
> xml_is_well_formed_document()
> xml_is_well_formed_content()
>
> I agree that consistency with SQL/XML is desirable, but for someone
> coming from the outside, the unqualified claim that 'foo' is well-formed
> XML might sound suspicious.

What about making the function sensitive to the XML OPTION, such that:

test=# SET xmloption TO DOCUMENT;
SET
text=# SELECT xml_is_well_formed('foo');

xml_is_well_formed
--------------------
f
(1 row)

test=# SET xmloption TO CONTENT;
SET
text=# SELECT xml_is_well_formed('foo');

xml_is_well_formed
--------------------
t
(1 row)

with the inverse for DOCUMENTS? To me this makes the most sense as it makes the function behave much more like the other xml functions.

--
Mike Fowler
Registered Linux user: 379787


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 11:31:48
Message-ID: AANLkTikkWJCWSW0Z3EUi51Q+bG1ymgnrLaZQe_c9Hyix@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 6, 2010 at 4:28 AM, Mike Fowler <mike(at)mlfowler(dot)com> wrote:
> On 03/08/10 16:15, Peter Eisentraut wrote:
>>
>> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
>>>>
>>>> Well-formedness should probably only allow XML documents.
>>>
>>> I think the point of this function is to determine whether a cast to
>>> xml will throw an error.  The behavior should probably match exactly
>>> whatever test would be applied there.
>>
>> Maybe there should be
>>
>> xml_is_well_formed()
>> xml_is_well_formed_document()
>> xml_is_well_formed_content()
>>
>> I agree that consistency with SQL/XML is desirable, but for someone
>> coming from the outside, the unqualified claim that 'foo' is well-formed
>> XML might sound suspicious.
>
> What about making the function sensitive to the XML OPTION, such that:
>
> test=# SET xmloption TO DOCUMENT;
> SET
> text=# SELECT xml_is_well_formed('foo');
>
>  xml_is_well_formed
>  --------------------
>  f
>  (1 row)

That will make using this function a huge hassle, won't it? Functions
that do different things depending on GUC settings are usually
troublesome. Having three functions would be more sensible if we need
all three behaviors, but I don't see why we do.

Or perhaps it could return a string instead of a boolean: content,
document, or NULL if it's neither.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 13:43:52
Message-ID: 4C5C1198.6080205@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/08/10 12:31, Robert Haas wrote:
>>>
>>> Maybe there should be
>>>
>>> xml_is_well_formed()
>>> xml_is_well_formed_document()
>>> xml_is_well_formed_content()
>>>
>>> I agree that consistency with SQL/XML is desirable, but for someone
>>> coming from the outside, the unqualified claim that 'foo' is well-formed
>>> XML might sound suspicious.
>>>
>> What about making the function sensitive to the XML OPTION, such that:
>>
>> test=# SET xmloption TO DOCUMENT;
>> SET
>> text=# SELECT xml_is_well_formed('foo');
>>
>> xml_is_well_formed
>> --------------------
>> f
>> (1 row)
>>
> That will make using this function a huge hassle, won't it? Functions
> that do different things depending on GUC settings are usually
> troublesome. Having three functions would be more sensible if we need
> all three behaviors, but I don't see why we do.
>
> Or perhaps it could return a string instead of a boolean: content,
> document, or NULL if it's neither.
>

I like the sound of that. In fact this helps workaround the IS DOCUMENT
and IS CONTENT limitations such that you can you can select only
content, only documents or both is you use IS NOT NULL.

Unless anyone sees a reason that this function needs to remain a boolean
function, I'll rework the patch over the weekend.

Regards,

--
Mike Fowler
Registered Linux user: 379787


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 20:52:36
Message-ID: 1281127956.2563.5.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote:
> > What about making the function sensitive to the XML OPTION, such
> that:
> >
> > test=# SET xmloption TO DOCUMENT;
> > SET
> > text=# SELECT xml_is_well_formed('foo');
> >
> > xml_is_well_formed
> > --------------------
> > f
> > (1 row)
>
> That will make using this function a huge hassle, won't it? Functions
> that do different things depending on GUC settings are usually
> troublesome. Having three functions would be more sensible if we need
> all three behaviors, but I don't see why we do.

Upthread you opined that this function should essentially indicate
whether a cast to type xml would succeed, and observing the xmloption is
an essential part of that. I had assumed the original patch actually
did that.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 20:55:06
Message-ID: 1281128106.2563.7.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote:
> > Or perhaps it could return a string instead of a boolean: content,
> > document, or NULL if it's neither.
> >
>
> I like the sound of that. In fact this helps workaround the IS
> DOCUMENT
> and IS CONTENT limitations such that you can you can select only
> content, only documents or both is you use IS NOT NULL.
>
> Unless anyone sees a reason that this function needs to remain a
> boolean function, I'll rework the patch over the weekend.

What is the actual use case for this function? Is the above behavior
actually useful?

One reason to stick with boolean is backward compatibility.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-06 21:46:52
Message-ID: AANLkTinqrqP8u3rvbwUmofRtoOdFDDWebPkHJQu7Jhgo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 6, 2010 at 4:52 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On fre, 2010-08-06 at 07:31 -0400, Robert Haas wrote:
>> > What about making the function sensitive to the XML OPTION, such
>> that:
>> >
>> > test=# SET xmloption TO DOCUMENT;
>> > SET
>> > text=# SELECT xml_is_well_formed('foo');
>> >
>> >  xml_is_well_formed
>> >  --------------------
>> >  f
>> >  (1 row)
>>
>> That will make using this function a huge hassle, won't it?  Functions
>> that do different things depending on GUC settings are usually
>> troublesome.  Having three functions would be more sensible if we need
>> all three behaviors, but I don't see why we do.
>
> Upthread you opined that this function should essentially indicate
> whether a cast to type xml would succeed, and observing the xmloption is
> an essential part of that.  I had assumed the original patch actually
> did that.

Oh.

I didn't realize the casting behavior was already dependent on the GUC. Yuck.

Maybe we should following the GUC after all, then.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-07 15:47:17
Message-ID: 4C5D8005.4030304@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 06/08/10 21:55, Peter Eisentraut wrote:
> On fre, 2010-08-06 at 14:43 +0100, Mike Fowler wrote:
>>> Or perhaps it could return a string instead of a boolean: content,
>>> document, or NULL if it's neither.
>>>
>>
>> I like the sound of that. In fact this helps workaround the IS
>> DOCUMENT
>> and IS CONTENT limitations such that you can you can select only
>> content, only documents or both is you use IS NOT NULL.
>>
>> Unless anyone sees a reason that this function needs to remain a
>> boolean function, I'll rework the patch over the weekend.
>
> What is the actual use case for this function? Is the above behavior
> actually useful?

The idea is to be able to filter a table that contains XML in TEXT that
might not be well formed. Knowing that you're only dealing with well
formed XML prevents you blowing up when you attempt the cast.

>
> One reason to stick with boolean is backward compatibility.
>

To be honest I'm happiest with returning a boolean, even if there is
some confusion over content only being valid. Though changing the return
value to DOCUMENT/CONTENT/NULL makes things a touch more explicit, the
same results can be achieved by simply running:

SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND data::xml
IS DOCUMENT;

Regards,
--
Mike Fowler
Registered Linux user: 379787


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-08 17:45:20
Message-ID: 6299.1281289520@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> On lr, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
>> I think the point of this function is to determine whether a cast to
>> xml will throw an error. The behavior should probably match exactly
>> whatever test would be applied there.

> Maybe there should be

> xml_is_well_formed()
> xml_is_well_formed_document()
> xml_is_well_formed_content()

> I agree that consistency with SQL/XML is desirable, but for someone
> coming from the outside, the unqualified claim that 'foo' is well-formed
> XML might sound suspicious.

I think I agree with the later discussion that xml_is_well_formed()
should tell you whether a cast to xml would succeed (and hence it has to
pay attention to XMLOPTION). However, it seems to also make sense to
provide the other two functions suggested here, both to satify people
who know XML and to offer tests that will tell you whether
XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed.

Merging the three cases into one function doesn't seem like a win
for either compatibility or usability.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-08 21:40:48
Message-ID: AANLkTi=my3mfdghX2pK2JM_1fj+1SNsJJpgz3N-bJ6mY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 8, 2010 at 1:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
>> On lör, 2010-07-31 at 13:40 -0400, Robert Haas wrote:
>>> I think the point of this function is to determine whether a cast to
>>> xml will throw an error.  The behavior should probably match exactly
>>> whatever test would be applied there.
>
>> Maybe there should be
>
>> xml_is_well_formed()
>> xml_is_well_formed_document()
>> xml_is_well_formed_content()
>
>> I agree that consistency with SQL/XML is desirable, but for someone
>> coming from the outside, the unqualified claim that 'foo' is well-formed
>> XML might sound suspicious.
>
> I think I agree with the later discussion that xml_is_well_formed()
> should tell you whether a cast to xml would succeed (and hence it has to
> pay attention to XMLOPTION).  However, it seems to also make sense to
> provide the other two functions suggested here, both to satify people
> who know XML and to offer tests that will tell you whether
> XMLPARSE ( { DOCUMENT | CONTENT } value ) will succeed.
>
> Merging the three cases into one function doesn't seem like a win
> for either compatibility or usability.

+1. I didn't realize how funky the XMLOPTION stuff was at the start
of this discussion; I think your analysis here is spot-on.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-09 14:20:39
Message-ID: 1281363639.22702.1.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
> To be honest I'm happiest with returning a boolean, even if there is
> some confusion over content only being valid. Though changing the
> return
> value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
> the
> same results can be achieved by simply running:
>
> SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
> data::xml IS DOCUMENT;

Note that this wouldn't necessarily work because it is not guaranteed
that the well-formedness test is executed before the cast to xml. SQL
doesn't short-circuit left to right. (A CASE expression could work.)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-09 14:41:36
Message-ID: AANLkTik0mkL9_J0YrzJtiHEE4Dk6hjOWgV7qGU=dXvf9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
>> To be honest I'm happiest with returning a boolean, even if there is
>> some confusion over content only being valid. Though changing the
>> return
>> value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
>> the
>> same results can be achieved by simply running:
>>
>> SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
>> data::xml IS DOCUMENT;
>
> Note that this wouldn't necessarily work because it is not guaranteed
> that the well-formedness test is executed before the cast to xml.  SQL
> doesn't short-circuit left to right.  (A CASE expression could work.)

There's also the fact that it would probably end up parsing the data
twice. Given xmloption, I'm inclined to think Tom has it right:
provided xml_is_well_formed() that follows xmloption, plus a specific
version for each of content and document.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-11 16:44:50
Message-ID: AANLkTim7NfiKpCZT6av-q+8dd6YMcAYKNMBPkCV145nT@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Aug 9, 2010 at 10:20 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On lör, 2010-08-07 at 16:47 +0100, Mike Fowler wrote:
>>> To be honest I'm happiest with returning a boolean, even if there is
>>> some confusion over content only being valid. Though changing the
>>> return
>>> value to DOCUMENT/CONTENT/NULL makes things a touch more explicit,
>>> the
>>> same results can be achieved by simply running:
>>>
>>> SELECT data::xml FROM mixed WHERE xml_is_well_formed(data) AND
>>> data::xml IS DOCUMENT;
>>
>> Note that this wouldn't necessarily work because it is not guaranteed
>> that the well-formedness test is executed before the cast to xml.  SQL
>> doesn't short-circuit left to right.  (A CASE expression could work.)
>
> There's also the fact that it would probably end up parsing the data
> twice.  Given xmloption, I'm inclined to think Tom has it right:
> provided xml_is_well_formed() that follows xmloption, plus a specific
> version for each of content and document.

Another reasonable option here would be to forget about having
xml_is_well_formed() per se and ONLY offer
xml_is_well_formed_content() and xml_is_well_formed_document().

As a project management note, this CommitFest is over in 4 days, so
unless we have a new version of this patch real soon now we need to
defer it to the September 15th CommitFest (of course not precluding
the possibility that someone will pick it up and commit it sooner, but
we're not going to postpone 9.1alpha1 for this patch).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Mike Fowler <mike(at)mlfowler(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-11 20:27:05
Message-ID: 11256.1281558425@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> There's also the fact that it would probably end up parsing the data
>> twice. Given xmloption, I'm inclined to think Tom has it right:
>> provided xml_is_well_formed() that follows xmloption, plus a specific
>> version for each of content and document.

> Another reasonable option here would be to forget about having
> xml_is_well_formed() per se and ONLY offer
> xml_is_well_formed_content() and xml_is_well_formed_document().

We already have xml_is_well_formed(); just dropping it doesn't seem like
a helpful choice.

> As a project management note, this CommitFest is over in 4 days, so
> unless we have a new version of this patch real soon now we need to
> defer it to the September 15th CommitFest

Yes. Mike, are you expecting to submit a new version before the end of
the week?

regards, tom lane


From: Mike Fowler <mike(at)mlfowler(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-11 21:33:30
Message-ID: 4C63172A.5010508@mlfowler.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/08/10 21:27, Tom Lane wrote:
> Robert Haas<robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>>> There's also the fact that it would probably end up parsing the data
>>> twice. Given xmloption, I'm inclined to think Tom has it right:
>>> provided xml_is_well_formed() that follows xmloption, plus a specific
>>> version for each of content and document.
>
>> Another reasonable option here would be to forget about having
>> xml_is_well_formed() per se and ONLY offer
>> xml_is_well_formed_content() and xml_is_well_formed_document().
>
> We already have xml_is_well_formed(); just dropping it doesn't seem like
> a helpful choice.
>
>> As a project management note, this CommitFest is over in 4 days, so
>> unless we have a new version of this patch real soon now we need to
>> defer it to the September 15th CommitFest
>
> Yes. Mike, are you expecting to submit a new version before the end of
> the week?
>

Yes and here it is, apologies for the delay. I have re-implemented
xml_is_well_formed such that it is sensitive to the XMLOPTION. The
additional _document and _content methods are now present. Tests and
documentation adjusted to suit.

Regards,

--
Mike Fowler
Registered Linux user: 379787

Attachment Content-Type Size
xml_is_well_formed-5.patch text/x-diff 12.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-12 18:02:44
Message-ID: AANLkTinLeVxvYbpkuE39706rt0HZyi0taKOs15rZY4yq@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

I checked last version:

* there are not a problem with regress and contrib regress tests
* the design is simple and clean now - well documented

notes:
* don't get a patch via copy/paste from mailing list archive - there
are a broken xml2 tests via this access!
* I didn't find a sentence so default for xml_is_well_formed a content
checking - some like "without change of xmloption, the behave is same
as xml_is_well_formed_content"

Regards

Pavel Stehule

2010/8/11 Mike Fowler <mike(at)mlfowler(dot)com>:
> On 11/08/10 21:27, Tom Lane wrote:
>>
>> Robert Haas<robertmhaas(at)gmail(dot)com>  writes:
>>>
>>> On Mon, Aug 9, 2010 at 10:41 AM, Robert Haas<robertmhaas(at)gmail(dot)com>
>>>  wrote:
>>>>
>>>> There's also the fact that it would probably end up parsing the data
>>>> twice.  Given xmloption, I'm inclined to think Tom has it right:
>>>> provided xml_is_well_formed() that follows xmloption, plus a specific
>>>> version for each of content and document.
>>
>>> Another reasonable option here would be to forget about having
>>> xml_is_well_formed() per se and ONLY offer
>>> xml_is_well_formed_content() and xml_is_well_formed_document().
>>
>> We already have xml_is_well_formed(); just dropping it doesn't seem like
>> a helpful choice.
>>
>>> As a project management note, this CommitFest is over in 4 days, so
>>> unless we have a new version of this patch real soon now we need to
>>> defer it to the September 15th CommitFest
>>
>> Yes.  Mike, are you expecting to submit a new version before the end of
>> the week?
>>
>
> Yes and here it is, apologies for the delay. I have re-implemented
> xml_is_well_formed such that it is sensitive to the XMLOPTION. The
> additional _document and _content methods are now present. Tests and
> documentation adjusted to suit.
>
> Regards,
>
> --
> Mike Fowler
> Registered Linux user: 379787
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mike Fowler <mike(at)mlfowler(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: review: xml_is_well_formed
Date: 2010-08-13 18:43:39
Message-ID: 4963.1281725019@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mike Fowler <mike(at)mlfowler(dot)com> writes:
> On 11/08/10 21:27, Tom Lane wrote:
>> Yes. Mike, are you expecting to submit a new version before the end of
>> the week?

> Yes and here it is, apologies for the delay. I have re-implemented
> xml_is_well_formed such that it is sensitive to the XMLOPTION. The
> additional _document and _content methods are now present. Tests and
> documentation adjusted to suit.

Applied with minor cleanups, mostly in the documentation. The only
thing that seems worth remarking on is that since xml_is_well_formed
now depends on a GUC variable, it *cannot* be marked IMMUTABLE. The
right marking in such cases is STABLE.

regards, tom lane