Re: Another issue with invalid XML values

From: Florian Pflug <fgp(at)phlo(dot)org>
To: Florian Pflug <fgp(at)phlo(dot)org>
Cc: Noah Misch <noah(at)leadboat(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Another issue with invalid XML values
Date: 2011-06-09 16:11:46
Message-ID: 0A7484B0-B50E-4A2B-BE23-9817692DB9BB@phlo.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jun2, 2011, at 01:34 , Florian Pflug wrote:
> On Jun2, 2011, at 00:02 , Noah Misch wrote:
>> On Wed, Jun 01, 2011 at 06:16:21PM +0200, Florian Pflug wrote:
>>> Anyway, I'll try to come up with a patch that replaces xmlSetGenericErrorFunc() with xmlSetStructuredErrorFunc().
>>
>> Sounds sensible. Will this impose any new libxml2 version dependency?
>
> xmlSetStructuredErrorFunc() seems to be available starting with libxml 2.6.0,
> release on Oct 20, 2003. Since we already require the version to be >= 2.6.23,
> we should be OK.
>
> I won't have access to my PC the next few days, but I'll try to come up with
> a patch some time next week.

Phew... I did manage to produce a patch, but it was way more work than I
had intended to put into this.

As it turns out, you loose the nicely formatted context information that
libxml2 provides via the generic error func once you switch to structured
error reporting. Registering handlers for both doesn't help either, since
the generic error handler isn't called once you register a structured one.

Fortunately, libxml does export xmlParserPrintFileContext() which
generates these context messages. It, however, doesn't return a string,
but instead passes them to the generic error handler (this time, independent
from whether a structural error handler is registered or not).

As it stood, the code assumed that all third-party library re-install
their libxml error handlers before each library call, and thus didn't
bother to restore the old error handler itself. Since I revamped the
error handling anyway, I removed that requirement. There is now a
function pg_xml_done() which restores the original error handler that
we overwrote in pg_xml_init().

I also realized that some libxml error (like undefined namespace prefixes)
must be ignored during xmlparse() and friends. Otherwise, it becomes
impossible to build XML documents from individual fragments. pg_xml_init()
therefore now takes an argument which specifies how strict the error
checking is supposed to be. For the moment, only XPATH() uses the strict
mode in which we report all errors. XMLPARSE() and friends only report
parse errors, not namespace errors.

Finally, I had to adjust contrib/xml2 because it uses some parts of
the core XML support like pg_xml_init().

Heres the indended behaviour with the patch applied:
----------------------------------------------------

We always use structured error handling. For now, the error messages
pretty much resemble the old ones, but it's now easy to add additional
information.

XMLPARSE() and casting to XML check for parse errors only, like they do
without the patch. They're also capable of reporting warnings, but I
didn't find a case where the libxml parser generates a warning.

XPATH() reports all errors and warnings. Trying to use XPATH() on
a document with e.g. inconsistent namespace usage or invalid
namespace URIs therefore now raises an error. This is *necessary*
because libxml's XPath evaluator gets confused if it encounters
e.g. invalid namespace URI and outputs invalid XML in response.

contrib/xml2's behaviour hasn't changed.

Patch is attached, and comments are welcome.

best regards,
Florian Pflug

Attachment Content-Type Size
pg_xml_errorhandling.v1.patch application/octet-stream 47.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2011-06-09 16:13:04 Re: .gitignore for some of cygwin files
Previous Message Kevin Grittner 2011-06-09 16:07:35 Re: release slippage