Re: Proposal: XML helper functions

Lists: pgsql-hackers
From: Scott Bailey <artacus(at)comcast(dot)net>
To: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Proposal: XML helper functions
Date: 2010-01-05 18:14:12
Message-ID: 4B438174.8050004@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One of the problem with shredding XML is that it is very kludgy to get a
scalar value back from xpath. The xpath function always returns an array
of XML. So for example, to extract a numeric value you need to:
1) use xpath to get the node
2) get the first element of the XML array
3) cast that to varchar
4) cast that to numeric

So I wrote the following function:

CREATE OR REPLACE FUNCTION xmlvalue(
VARCHAR,
XML
) RETURNS TEXT AS
$$
SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
THEN (xpath($1, $2))[1]
WHEN $1 ~* '/text()$'
THEN (xpath($1, $2))[1]
WHEN $1 LIKE '%/'
THEN (xpath($1 || 'text()', $2))[1]
ELSE (xpath($1 || '/text()', $2))[1]
END::text;
$$ LANGUAGE 'sql' IMMUTABLE;

It's pretty simple. It just does a check to see if you are extracting an
attribute or an element and if element, it makes sure to get the text value.

So query that used to look like:

SELECT CAST(
CAST(
(xpath('/foo/bar/text()', myxml))[1]
AS varchar)
AS numeric) AS bar

now becomes:

SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar

Second function just checks that the xpath expression finds at least one
node.

CREATE OR REPLACE FUNCTION xmlexists(
VARCHAR,
XML
) RETURNS BOOLEAN AS
$$
SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0
THEN true ELSE false END;
$$ LANGUAGE 'sql' IMMUTABLE;

On naming, SQL/XML specifies xmlexists and xmlcast. Latest db2 provides
xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses
xml.value(). The xmlvalue does only part of what is required by xmlcast
(it won't cast scalar to xml).

So would these functions need to be rewritten in c in order to be accepted?

Regards,

Scott Bailey

Further reading:

http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/
http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML
http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042
http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 18:26:26
Message-ID: b42b73151001051026o3c566c7cw5ee94a1a00a4cd4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus(at)comcast(dot)net> wrote:
> One of the problem with shredding XML is that it is very kludgy to get a
> scalar value back from xpath. The xpath function always returns an array of
> XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric

I just happen to be dealing with XML right now as well and my initial
thought is that your suggestion doesn't buy you a whole lot: the root
problem IMO is not dealing with what xpath gives you but that there is
no DOMish representation of the xml document for you to query. You
have to continually feed the entire document to xpath which is
absolutely not scalable (if it works the way I think it does --
haven't looked at the code).

xpath is great for simple things but it's too texty and you need a
more robust API to handle documents for serious parsing on the
backend. In the short term i'd advise doing work in another pl like
perl.

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 18:33:33
Message-ID: 162867791001051033i22178c26h1169dcd7ec524717@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/1/5 Scott Bailey <artacus(at)comcast(dot)net>:
> One of the problem with shredding XML is that it is very kludgy to get a
> scalar value back from xpath. The xpath function always returns an array of
> XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric
>
> So I wrote the following function:
>
> CREATE OR REPLACE FUNCTION xmlvalue(
>   VARCHAR,
>   XML
> ) RETURNS TEXT AS
> $$
>   SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
>   THEN (xpath($1, $2))[1]
>   WHEN $1 ~* '/text()$'
>   THEN (xpath($1, $2))[1]
>   WHEN $1 LIKE '%/'
>   THEN (xpath($1 || 'text()', $2))[1]
>   ELSE (xpath($1 || '/text()', $2))[1]
>   END::text;
> $$ LANGUAGE 'sql' IMMUTABLE;
>
> It's pretty simple. It just does a check to see if you are extracting an
> attribute or an element and if element, it makes sure to get the text value.
>
> So query that used to look like:
>
> SELECT CAST(
>  CAST(
>    (xpath('/foo/bar/text()', myxml))[1]
>  AS varchar)
> AS numeric) AS bar
>
> now becomes:
>
> SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar
>
>
> Second function just checks that the xpath expression finds at least one
> node.
>
> CREATE OR REPLACE FUNCTION xmlexists(
>  VARCHAR,
>  XML
> ) RETURNS BOOLEAN AS
> $$
>  SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0
>  THEN true ELSE false END;
> $$ LANGUAGE 'sql' IMMUTABLE;
>
> On naming, SQL/XML specifies xmlexists and xmlcast.

I am for SQL/XML naming convention.

Regards
Pavel Stehule

Latest db2 provides
> xmlcast(), Oracle has equivalent extractvalue() function, MSSQL uses
> xml.value(). The xmlvalue does only part of what is required by xmlcast (it
> won't cast scalar to xml).
>
> So would these functions need to be rewritten in c in order to be accepted?
>
> Regards,
>
> Scott Bailey
>
>
> Further reading:
>
> http://scottrbailey.wordpress.com/2009/06/19/xml-parsing-postgres/
> http://en.wikibooks.org/wiki/SQL_Dialects_Reference/SQL_XML
> http://download.oracle.com/docs/cd/B19306_01/server.102/b14200/functions052.htm#i1131042
> http://publib.boulder.ibm.com/infocenter/db2luw/v9/index.jsp?topic=/com.ibm.db2.udb.admin.doc/doc/r0023486.htm
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Scott Bailey <artacus(at)comcast(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 19:07:28
Message-ID: 4B438DF0.8060108@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure wrote:
> On Tue, Jan 5, 2010 at 1:14 PM, Scott Bailey <artacus(at)comcast(dot)net> wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>
> I just happen to be dealing with XML right now as well and my initial
> thought is that your suggestion doesn't buy you a whole lot: the root
> problem IMO is not dealing with what xpath gives you but that there is
> no DOMish representation of the xml document for you to query. You
> have to continually feed the entire document to xpath which is
> absolutely not scalable (if it works the way I think it does --
> haven't looked at the code).

No typically you'll only be passing the xml for a single "row" so what
we end up doing in Postgres typically looks something like this:

SELECT xmlvalue('/row/@id', bitesizexml)::int AS id,
xmlvalue('/row/@lat', bitesizexml)::numeric AS lat,
xmlvalue('/row/@lon', bitesizexml)::numeric,
xmlvalue('/row/comment', bitesizexml) AS cmt
FROM (
SELECT unnest(xpath('/foo/row', mybigxmldoc)) AS bitesizexml
) sub

So only the one call has to work with the entire document. All the calls
to xmlvalue are passed a much smaller node to work with.

> xpath is great for simple things but it's too texty and you need a
> more robust API to handle documents for serious parsing on the
> backend. In the short term i'd advise doing work in another pl like
> perl.

This is basically the method used for Oracle too until they provided
XMLTable functionality. They had a function xmlsequence that basically
did the unnest(xpath()) part. Hopefully we'll get xmltable support soon.

Scott


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 19:19:38
Message-ID: 1262719178.6448.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote:
> One of the problem with shredding XML is that it is very kludgy to get a
> scalar value back from xpath. The xpath function always returns an array
> of XML. So for example, to extract a numeric value you need to:
> 1) use xpath to get the node
> 2) get the first element of the XML array
> 3) cast that to varchar
> 4) cast that to numeric

There has been talk about adding something like xpath_string,
xpath_number, xpath_boolean for fetching xpath expressions that don't
return nodesets. I think that would fit your use case.


From: Scott Bailey <artacus(at)comcast(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 19:39:38
Message-ID: 4B43957A.20007@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule wrote:
> 2010/1/5 Scott Bailey <artacus(at)comcast(dot)net>:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array of
>> XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>>
>> So I wrote the following function:
>>
>> CREATE OR REPLACE FUNCTION xmlvalue(
>> VARCHAR,
>> XML
>> ) RETURNS TEXT AS
>> $$
>> SELECT CASE WHEN $1 ~ '@[[:alnum:]_]+$'
>> THEN (xpath($1, $2))[1]
>> WHEN $1 ~* '/text()$'
>> THEN (xpath($1, $2))[1]
>> WHEN $1 LIKE '%/'
>> THEN (xpath($1 || 'text()', $2))[1]
>> ELSE (xpath($1 || '/text()', $2))[1]
>> END::text;
>> $$ LANGUAGE 'sql' IMMUTABLE;
>>
>> It's pretty simple. It just does a check to see if you are extracting an
>> attribute or an element and if element, it makes sure to get the text value.
>>
>> So query that used to look like:
>>
>> SELECT CAST(
>> CAST(
>> (xpath('/foo/bar/text()', myxml))[1]
>> AS varchar)
>> AS numeric) AS bar
>>
>> now becomes:
>>
>> SELECT xmlvalue('/foo/bar/text(), myxml)::numeric AS bar
>>
>>
>> Second function just checks that the xpath expression finds at least one
>> node.
>>
>> CREATE OR REPLACE FUNCTION xmlexists(
>> VARCHAR,
>> XML
>> ) RETURNS BOOLEAN AS
>> $$
>> SELECT CASE WHEN array_upper(xpath($1, $2), 1) > 0
>> THEN true ELSE false END;
>> $$ LANGUAGE 'sql' IMMUTABLE;
>>
>> On naming, SQL/XML specifies xmlexists and xmlcast.
>
> I am for SQL/XML naming convention.

Well I'm shying away from the name xmlcast because it is supposed to
cast xml to scalar, scalar to xml and xml to xml. For instance these
would all work on db2.

SELECT xmlcast(null AS XML),
xmlcast(bar AS XML),
xmlcast(xmlquery('$x/baz/bar' PASSING foo.barxml AS x) AS VARCHAR(30))
FROM foo

But I just found that Oracle added xmlcast in 11g and it only does xml
to scalar (and only number, varchar2 and date/time types). So maybe
you're right.

Scott


From: Scott Bailey <artacus(at)comcast(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 19:50:31
Message-ID: 4B439807.2040707@comcast.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> On tis, 2010-01-05 at 10:14 -0800, Scott Bailey wrote:
>> One of the problem with shredding XML is that it is very kludgy to get a
>> scalar value back from xpath. The xpath function always returns an array
>> of XML. So for example, to extract a numeric value you need to:
>> 1) use xpath to get the node
>> 2) get the first element of the XML array
>> 3) cast that to varchar
>> 4) cast that to numeric
>
> There has been talk about adding something like xpath_string,
> xpath_number, xpath_boolean for fetching xpath expressions that don't
> return nodesets. I think that would fit your use case.

The first two sound very much like what I'm looking for. I'm unsure
about the third. Is it's purpose to extract the scalar value of an
expression and cast to bool as the other two do, or is it to identify if
the xpath expression returned any nodes like xmlexists?

Scott


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: XML helper functions
Date: 2010-01-05 21:13:56
Message-ID: 1262726036.6448.3.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2010-01-05 at 11:50 -0800, Scott Bailey wrote:
> > There has been talk about adding something like xpath_string,
> > xpath_number, xpath_boolean for fetching xpath expressions that
> don't
> > return nodesets. I think that would fit your use case.
>
> The first two sound very much like what I'm looking for. I'm unsure
> about the third. Is it's purpose to extract the scalar value of an
> expression and cast to bool as the other two do, or is it to identify
> if the xpath expression returned any nodes like xmlexists?

XPath as a language has four data types: nodeset, string, number,
boolean. So an XPath expression could return any of those types. Then,
I suppose, the xpath_foo() function would evaluate the expression and
return the result as type foo, possibly raising an error if the types
don't match.

Details to be determined.


From: Larry <l(dot)paige(at)live(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: XML helper functions
Date: 2012-02-19 15:40:35
Message-ID: 1329666035993-5497064.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

But by using the above code: how do we deal with multiple matching values?

For example:

<question name=“my_question">
<tag> java </tag>
<tag> c++ </tag>
</question>

In this case, perhaps I would want something like

---------------+-----
my_question | java
my_question | c++

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Proposal-XML-helper-functions-tp2018975p5497064.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.