Re: XML Issue with DTDs

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XML Issue with DTDs
Date: 2013-12-26 20:28:45
Message-ID: AE499D25-0910-4CFD-AF98-D6103918495E@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Dec23, 2013, at 03:45 , Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Dec 20, 2013 at 8:16 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>> On Dec20, 2013, at 18:52 , Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> On Thu, Dec 19, 2013 at 6:40 PM, Florian Pflug <fgp(at)phlo(dot)org> wrote:
>>>> Solving this seems a bit messy, unfortunately. First, I think we need to have some XMLOPTION value which is a superset of all the others - otherwise, dump & restore won't work reliably. That means either allowing DTDs if XMLOPTION is CONTENT, or inventing a third XMLOPTION, say ANY.
>>>
>>> Or we can just decide that it was a bug that this was ever allowed,
>>> and if you upgrade to $FIXEDVERSION you'll need to sanitize your data.
>>> This is roughly what we did with encoding checks.
>>
>> What exactly do you suggest we outlaw?
>
> <!DOCTYPE> anywhere but at the beginning.

I think we're talking past one another here. Fixing XMLCONCAT/XMLAGG
to not produce XML values which are neither valid DOCUMENTS nor valid
CONTENT fixes *one* part of the problem.

The other part of the problem is that since not every DOCUMENT
is valid CONTENT (because CONTENT forbids DTDs) and not every CONTENT
is a valid DOCUMENT (because DOCUMENT forbids multiple root nodes), it's
impossible to set XMLOPTION to a value which accepts *all* valid XML
values. That breaks pg_dump/pg_restore. To fix this, we must provide
a way to insert XML data which accepts both DOCUMENTS and CONTENT, and
not only one or the other. Due to the way COPY works, we cannot call
a special conversion function, so we must modify the input functions.

My initial thought was to simply allow XML values which are CONTENT,
not DOCUMENTS, to contain a DTD (at the beginning), thus making CONTENT
a superset of DOCUMENT. But I've since then realized that the 2003
standard explicitly constrains CONTENT to *not* contain a DTD. The
only other option that I can see is to invert a third, non-standard
XMLOPTION value, ANY. ANY would accept anything accepted by either
DOCUMENT or CONTENT, but no more than that.

best regards,
Florian Pflug

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2013-12-26 20:30:00 Re: XML Issue with DTDs
Previous Message Peter Eisentraut 2013-12-26 19:54:04 Re: "stuck spinlock"