Native XML

From: Anton <antonin(dot)houska(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Native XML
Date: 2011-02-26 23:40:28
Message-ID: 4D698F6C.3020509@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,
I've been playing with 'native XML' for a while and now wondering if
further development of such a feature makes sense for Postgres.
(By not having brought this up earlier I'm taking the chance that the
effort will be wasted, but that's not something you should worry about.)

The code is available here:
https://github.com/ahouska/postgres/commit/bde3d3ab05915e91a0d831a8877c2fed792693c7

Whoever is interested in my suggestions, I recommend to start at the
test (it needs to be executed standalone, pg_regress is not aware of it
yet):

src/test/regress/sql/xmlnode.sql
src/test/expected/xmlnode.out

In few words, the 'xmlnode' is a structured type that stores XML
document in a form of tree, as opposed to plain text.
Parsing is only performed on insert or update (for update it would also
make sense to implement functions that add/remove nodes at the low
level, w/o dumping & parsing).

Unlike 'libxml2', the parser uses palloc()/pfree(). The output format is
independent from any 3rd party code.
The binary (parsed) XML node is single chunk of memory, independent from
address where it was allocated.
The parser does yet fully conform to XML standard and some functionality
is still missing (DTD, PI, etc., see comments in the code if you're
interested in details).

'xquery()' function evaluates (so far just a simple) XMLPath expressions
and for each document it returns a set of matching nodes/subtrees.
'xmlpath' is parsed XMLPath (i.e. the expression + some metadata). It
helps to avoid repeated parsing of the XMLPath expressions by the
xquery() function.

I don't try to pretend that I invented this concept: DB2, Oracle and
probably some other commercial databases do have it for years.
Even though the mission of Postgres is not as simple as copying features
from other DBMs, I think the structured XML makes sense as such.
It allows for better integration of relational and XML data - especially
joining relational columns with XML node sets.

In the future, interesting features could be based on it. For example,
XML node/subtree can be located quickly within a xmlnode value and as
such it could be indexed (even though the existing indexes / access
methods might not be appropriate for that).

When reviewing my code, please focus on the ideas, rather than the code
quality :-) I'm aware that some refactoring will have to be done in case
this subproject will go on.

Thanks in advance for any feedback,
Tony.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2011-02-27 00:21:05 psql: \dg off by one error
Previous Message David Fetter 2011-02-26 22:49:44 Re: wCTE: about the name of the feature