PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files

Lists: pgsql-generalpgsql-hackers
From: "Lawrence Oluyede" <l(dot)oluyede(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-02-07 09:48:04
Message-ID: 9eebf5740802070148s5bd366a2k278496ca52429bac@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

As specified in the W3C Recommendation for XML the DOCTYPE element is
perfectly valid in a document.
I have a bunch of XML files generated by the boost library which
contains a doctype like this:

<!DOCTYPE boost_serialization>

which lies within the bound of the recommendation
(http://www.w3.org/TR/xml/#sec-prolog-dtd):

"Note that it is possible to construct a well-formed document
containing a doctypedecl that neither points to an external subset nor
contains an internal subset."

PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype
in its new native data type returning this error message:

"""
ERROR: invalid XML content
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE foo>
^

********** Error **********

ERROR: invalid XML content
SQL state: 2200N
Detail: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE foo>
"""

This kind of behavior surprises me because pgsql has been compiled
with the following flags on the development machine:
./configure --with-python --with-openssl --with-pam --with-libxml
--with-libxslt --enable-thread-safety --enable-debug

During the configuration stage it creates a Makefile binding the
system version of the libxml2 library which is 2.6.30, the same
library I use through Python (which parses correctly the XML file with
the doctype).

Any hints?


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Lawrence Oluyede <l(dot)oluyede(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-03-05 18:29:22
Message-ID: 200803051829.m25ITM728602@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Is this a bug that needs to be fixed?

---------------------------------------------------------------------------

Lawrence Oluyede wrote:
> As specified in the W3C Recommendation for XML the DOCTYPE element is
> perfectly valid in a document.
> I have a bunch of XML files generated by the boost library which
> contains a doctype like this:
>
> <!DOCTYPE boost_serialization>
>
> which lies within the bound of the recommendation
> (http://www.w3.org/TR/xml/#sec-prolog-dtd):
>
> "Note that it is possible to construct a well-formed document
> containing a doctypedecl that neither points to an external subset nor
> contains an internal subset."
>
> PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype
> in its new native data type returning this error message:
>
> """
> ERROR: invalid XML content
> DETAIL: Entity: line 2: parser error : StartTag: invalid element name
> <!DOCTYPE foo>
> ^
>
> ********** Error **********
>
> ERROR: invalid XML content
> SQL state: 2200N
> Detail: Entity: line 2: parser error : StartTag: invalid element name
> <!DOCTYPE foo>
> """
>
> This kind of behavior surprises me because pgsql has been compiled
> with the following flags on the development machine:
> ./configure --with-python --with-openssl --with-pam --with-libxml
> --with-libxslt --enable-thread-safety --enable-debug
>
> During the configuration stage it creates a Makefile binding the
> system version of the libxml2 library which is 2.6.30, the same
> library I use through Python (which parses correctly the XML file with
> the doctype).
>
> Any hints?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Lawrence Oluyede" <l(dot)oluyede(at)gmail(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-03-05 18:37:37
Message-ID: 9eebf5740803051037i6aedf387jf897a9cbdbdaf9a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Mar 5, 2008 at 7:29 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> Is this a bug that needs to be fixed?
>

Since it's not unusual to find DOCTYPEs inside XML document I guess
so. Maybe low priority but I hope it will be fixed soon. Now we make
the database add and remove the doctype line inside out, hope to
remove that part sometime in the future :-)

Thanks


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Lawrence Oluyede <l(dot)oluyede(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-04-15 14:24:35
Message-ID: 200804151424.m3FEOZ106087@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Added to TODO:

* Allow XML to accept more liberal DOCTYPE specifications

http://archives.postgresql.org/pgsql-general/2008-02/msg00347.php

---------------------------------------------------------------------------

Lawrence Oluyede wrote:
> As specified in the W3C Recommendation for XML the DOCTYPE element is
> perfectly valid in a document.
> I have a bunch of XML files generated by the boost library which
> contains a doctype like this:
>
> <!DOCTYPE boost_serialization>
>
> which lies within the bound of the recommendation
> (http://www.w3.org/TR/xml/#sec-prolog-dtd):
>
> "Note that it is possible to construct a well-formed document
> containing a doctypedecl that neither points to an external subset nor
> contains an internal subset."
>
> PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype
> in its new native data type returning this error message:
>
> """
> ERROR: invalid XML content
> DETAIL: Entity: line 2: parser error : StartTag: invalid element name
> <!DOCTYPE foo>
> ^
>
> ********** Error **********
>
> ERROR: invalid XML content
> SQL state: 2200N
> Detail: Entity: line 2: parser error : StartTag: invalid element name
> <!DOCTYPE foo>
> """
>
> This kind of behavior surprises me because pgsql has been compiled
> with the following flags on the development machine:
> ./configure --with-python --with-openssl --with-pam --with-libxml
> --with-libxslt --enable-thread-safety --enable-debug
>
> During the configuration stage it creates a Makefile binding the
> system version of the libxml2 library which is 2.6.30, the same
> library I use through Python (which parses correctly the XML file with
> the doctype).
>
> Any hints?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-06-03 18:35:10
Message-ID: 48458EDE.4070603@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Added to TODO:
>
> * Allow XML to accept more liberal DOCTYPE specifications

Is any form of DOCTYPE accepted?

We're getting errors on the second line like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM http://host.domain/dtd/dotdisposition0_02.dtd">

The actual host.domain value is resolved by DNS,
and wget of the url works on the machine.
Attempts to cast the document to type xml give:

ERROR: invalid XML content
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot
^

It would be nice to use the xml type, but we always have DOCTYPE....

-Kevin


From: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-06-03 18:40:31
Message-ID: 4845901F.9050500@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Added to TODO:
>
> * Allow XML to accept more liberal DOCTYPE specifications

Is any form of DOCTYPE accepted?

We're getting errors on the second line like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM http://host.domain/dtd/dotdisposition0_02.dtd">

The actual host.domain value is resolved by DNS,
and wget of the url works on the machine.
Attempts to cast the document to type xml give:

ERROR: invalid XML content
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot
^

It would be nice to use the xml type, but we always have DOCTYPE....

-Kevin


From: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-06-03 18:48:05
Message-ID: 484591E5.7050309@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Added to TODO:
>
> * Allow XML to accept more liberal DOCTYPE specifications

Is any form of DOCTYPE accepted?

We're getting errors on a second line in an XML document that
starts like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dotdisposition0_02.dtd">

The actual host.domain value is resolved by DNS,
and wget of the url works on the server running PostgreSQL.
Attempts to cast the document to type xml give:

ERROR: invalid XML content
DETAIL: Entity: line 2: parser error : StartTag: invalid element name
<!DOCTYPE DOT_OFFICER_CITATION SYSTEM "http://host.domain/dtd/dot
^

It would be nice to use the xml type, but we always have DOCTYPE.
I understand that PostgreSQL won't validate against the specified
DOCTYPE, but it shouldn't error out on it, either.

-Kevin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-general(at)postgresql(dot)org
Cc: "Lawrence Oluyede" <l(dot)oluyede(at)gmail(dot)com>, "Belbin, Peter" <PETER(dot)BELBIN(at)mcleodusa(dot)com>
Subject: Re: PostgreSQL 8.3 XML parser seems not to recognize the DOCTYPE element in XML files
Date: 2008-08-12 09:40:31
Message-ID: 200808121240.32797.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Am Thursday, 7. February 2008 schrieb Lawrence Oluyede:
> PostgreSQL 8.3 instead doesn't allow the insertion of XML with doctype
> in its new native data type returning this error message:
>
> """
> ERROR: invalid XML content
> DETAIL: Entity: line 2: parser error : StartTag: invalid element name
> <!DOCTYPE foo>
> ^

It turns out that this behavior is entirely correct. It depends on the XML
option. If you set the XML option to DOCUMENT, you can parse documents
including DOCTYPE declarations. If you set the XML option to CONTENT, then
what you can parse is defined by the production

XMLDecl? content

which does not allow for a DOCTYPE.

The default XML option is CONTENT, which explains the behavior.

Now, the supercorrect way to parse XML values would be using the XMLPARSE()
function, which requires you to specify the XML option inline. That way,
everything works.