Re: xpath improvement suggestion

From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Scott Bailey <artacus(at)comcast(dot)net>
Cc: Arie Bikker <arie(at)abikker(dot)nl>, pgsql-hackers(at)postgresql(dot)org, peter_e(at)gmx(dot)net
Subject: Re: xpath improvement suggestion
Date: 2010-01-17 16:33:40
Message-ID: 4B533BE4.2080203@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

here's a review of the patch:

It applies with offsets, but worked fine for me. It works as advertised,
and I believe it is a solid step forward from the current situation.

As far as the coding goes, the PG_TRY/CATCH in xml_xmlpathobjtoxmltype
seems unnecessary in the XPATH_BOOLEAN branch, as the CATCH branch only
rethrows the elog. Additional whitespace in

cstring_to_text((char *)(xmlXPathCastToBoolean(cur)?"t":"f"));

would help also. Idle though: why go through xmlXPathCastToBoolean at
all? Couldn't you just user xmlXPathCastToString always? The
documentation suggests that you will get "true" or "false", which is as
good as "t" or "f" and simlifies the code of xml_xmlpathobjtoxmltype
quite a bit.

The default: branch of the switch in the xpath() function should not be
a elog(WARNING) but an elog(ERROR). Surely getting a bogus values from
XPath is an error that should prevent us from returning a possibly wrong
answer.

The switch statement should have curly braces on separate lines and the
case statements should be indented and written without curly braces.

The patch really needs a couple of regression tests that demonstrate
what was happening previously and what will happen now, i.e.
string/number/boolean results from XPath expressions are not ignored.
See http://wiki.postgresql.org/wiki/Regression_test_authoring for info
on how to write a unit test for PostgreSQL.

The change in xpah() behaviour should be reflected in the documentation,
so the patch really should include also doc changes. Specifically, I
think it should be explicitly documented that xpath() always returns an
array of XML nodes, even in cases where according to the XPath spec it
should return a boolean or a number. It's clearly a step forward,
though, because previously we were returning an empty resultset instead
of something that can at least be cast to a PG type with good hope of
resulting in what the programmer expected.

Scott Bailey wrote:
> Arie Bikker wrote:
>> Peter Eisentraut wrote:
>>> On ons, 2010-01-06 at 23:46 +0100, Arie Bikker wrote:
>>>
>>>> Hope this is the right attachement type (I'm new at this)
>>>> BTW. here a some nice examples:
>>>>
>>>> - Get the number of attributes of the first childnode:
>>>>
>>>> select ( xpath('count(@*)',(xpath('*[1]','<a b="c"><d e="f"
>>>> g="j"/></a>'))[1]))[1];
>>>>
>>>> - an alternative for xpath_exist('/a/d')
>>>> select (xpath('boolean(/a/d)','<a b="c"><d e="f" g="j"/></a>'))[1];
>>>>
>>>> - fixes bug 4206

I could not reproduce that in HEAD, probably got fixed when that
terrible "add <x> </x>" hack has been taken out.

>>>> - fixes bug 4294

Yes, it does fix that one.

Additional moral from these two is: they should be part of the
regression test suite.

>>> Instead of converting everything to text, there have been previous
>>> suggestions to add functionx like xpath_string, xpath_number,
>>> xpath_boolean that return the appropriate types from xpath. This could
>>> provide for better type safety and probably also more clarity.

Possibly, but this patch is useful even without those.

>>> In any case, please consider adding test cases like the above to the
>>> regression tests in whatever patch comes out at the end.

+1

> Postgres' type system is MUCH more robust than anything in XPath/XML.
> And folks who use XML on a regular basis expect most XPath expressions
> to return a string any way.
> For instance how many confused users do you think you'll get with
> something like:
> SELECT xpath_boolean('boolean(/root/@bar)', '<root bar="false"/>)
> -- evaluates to true

The users' confusion would come from the fact that XPath has different
rules for string->boolean casts than PG.
In XPath (as in Python, Perl I guess and some other languages) a string
when coerced to boolean is false only if it is empty. PosgtreSQL
considers 'no'::boolean, 'fal'::boolean and 'n'::boolean as false and
the empty string as something uncoercable to boolean.

Both PostgreSQL casting rules and XPath casting rules cannot be changed,
so they will always be confusing in mixed contexts.

> I think we'd be much better of having a function like xpath_nonnode() or
> xpath_value() that returns text and let the user handle the casting.

I'm not sure about this, TBH. At first I thought it's a good idea, but
after some thinking I beliefe xpath_value() would be a simple wrapper
around (xpath())[1] that could also do some additional validation, like
checking if the result from XPath is not XPATH_NODESET. The merit of
xpath_boolean and xpath_number would be in using the XPath casting
conventions, not the PG ones, and in translating the XPath return type
to PG's type.

In short, you can always do (xpath())[1] and get a string, which you can
then instruct PG to cast using PG casting rules.

My proposal is to accept Arie's patch (barring objections already
raised) and consider adding xpath_{boolean,number,string} that would
return the respective PG datatype and would internally check if the
result from XPath is XPATH_{BOOLEAN,NUMBER,STRING} respectively and
would do the cast for you or throw an error.

This way if you'd do xpath_number('/root/@num', '<root num="40"/>') you
would get an error, because the result from XPath would be XPATH_STRING,
forcing you do explicitly use the XPath function number() (and which is
a very good thing, because of the different casting semantics I already
mentioned).

I'm only afraid about the subtleties around what XPath considers a
number and what PG considers a number (see
http://www.w3.org/TR/xpath#numbers), string encoding etc. but that's
probably just a matter of thinking it out.

I'll mark this patch as Waiting on Author to give Arie the chance to
consider my remarks and add regression tests, after which this patch
looks committable.

If no one steps up I could then come up with a follow-up patch that adds
xpath_number and friends.

Cheers,
Jan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-01-17 17:29:55 Fixing handling of constraint triggers
Previous Message Tom Lane 2010-01-17 16:32:50 Re: compiler warnings with GCC 4.5