PATCH: Implement value_to_json for single-datum conversion

Lists: pgsql-hackers
From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 03:21:38
Message-ID: 502872C2.2050906@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi all

Whenever I try to work with the new json types I trip over the lack of a
function to escape text to json. The attached patch against master
provides one by exposing the existing datum_to_json function to SQL.
I've used the name value_to_json, but I'm not sure it's necessarily the
right choice of name.

Please consider this for the 9.2 branch as well as to HEAD, as IMO it's
very important for basic usability of the json functionality. It applies
to 9.2 fine and passes "make check". I know it's late in the game, but
it's also a very small change and it's very hard to build up JSON data
structures other than simple rows or arrays without at minimum a way of
escaping `text' to json strings.

This feels basic enough that I'm wondering if there's a reason it wasn't
included from the start, but I don't see any comments in json.c talking
about anything like this, nor did I find any -hackers discussion about
it. I suspect it's just an oversight.

As value_to_json directly wraps datum_to_json it actually accepts record
and array types too. I didn't see any reason to prevent that and force
the user to instead use row_to_json or array_to_json for those cases. If
you don't want to accept this, I can provide a wrapper for escape_json
that only accepts a text argument instead, but I think *some* way to
escape text to JSON is vital to have in 9.2.

A docs patch will follow shortly if you're happy that this patch is
reasonable.

--
Craig Ringer

Attachment Content-Type Size
0001-Implement-value_to_json-exposing-the-existing-datum_.patch text/x-patch 0 bytes

From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 03:44:49
Message-ID: 50287831.6010701@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Whoops. It actually looks like the posted patch muffed up opr_sanity
checks. I'm totally new to pg_proc.h wrangling so I'm not sure why yet,
looking.

Sorry, not sure how I missed that. I'll follow up shortly.

--
Craig Ringer


From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 04:05:08
Message-ID: 50287CF4.4040504@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

OK, opr_sanity was failing because I added the value_to_json(text) alias
to ensure that:

value_to_json('some_literal')

worked, following the same approach as quote_literal(anyelement) and
quote_literal(text). That should be reasonable, right? The comments on
the affected check in opr_sanity say that it's not necessarily wrong so
long as the called function is prepared to handle the different
arguments its self - which it is, since it's already accepting anyelement.

The test comment reads:

Note that the expected output of this part of the test will
need to be modified whenever new pairs of types are made
binary-equivalent,
or when new polymorphic built-in functions are added

so that seems reasonable.

postgres=# \df quote_literal
List of functions
Schema | Name | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
pg_catalog | quote_literal | text | anyelement | normal
pg_catalog | quote_literal | text | text | normal
(2 rows)

postgres=# \df value_to_json
List of functions
Schema | Name | Result data type | Argument data types |
Type
------------+---------------+------------------+---------------------+--------
pg_catalog | value_to_json | json | anyelement | normal
pg_catalog | value_to_json | json | text | normal
(2 rows)

Revised patch that tweaks the expected result of opr_sanity attached.

--
Craig Ringer

Attachment Content-Type Size
0001-Implement-value_to_json-exposing-the-existing-datum_.patch text/x-patch 7.7 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 04:33:00
Message-ID: 21714.1344832380@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> writes:
> OK, opr_sanity was failing because I added the value_to_json(text) alias
> to ensure that:

> value_to_json('some_literal')

> worked, following the same approach as quote_literal(anyelement) and
> quote_literal(text). That should be reasonable, right?

No, it isn't. What you're proposing is to let opr_sanity think that
text and anyelement are interchangeable to C functions, which is so
far from reality as to be ludicrous. That would be seriously damaging
to its ability to detect errors.

But more to the point, your analogy to quote_literal is faulty anyway.
If you looked at that, what you'd find is that only quote_literal(text)
is a C function. The other one is a SQL wrapper around a coercion to
text followed by the C function. I rather imagine that the definition
as you have it would crash on, say, value_to_json(42).

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 04:48:31
Message-ID: 21962.1344833311@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> writes:
> Whenever I try to work with the new json types I trip over the lack of a
> function to escape text to json. The attached patch against master
> provides one by exposing the existing datum_to_json function to SQL.
> ...
> This feels basic enough that I'm wondering if there's a reason it wasn't
> included from the start,

There actually was a previous thread about this:
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
Note in particular Andrew's comment:

Second, RFC 4627 is absolutely clear: a valid JSON value can
only be an object or an array, so this thing about converting
arbitrary datum values to JSON is a fantasy. If anything, we
should adjust the JSON input routines to disallow anything else,
rather than start to output what is not valid JSON.

It's possible he's misread the spec, but I think we ought to tread very
carefully before adding "obvious" conversions we might regret later.

> Please consider this for the 9.2 branch as well as to HEAD, as IMO it's
> very important for basic usability of the json functionality. It applies
> to 9.2 fine and passes "make check". I know it's late in the game,

It's several months too late for feature additions to 9.2, especially
ones that would require an initdb to install. We can look at this for
9.3, but I'm still worried about the spec-compliance question.

regards, tom lane


From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 05:33:01
Message-ID: 5028918D.9020208@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/13/2012 12:48 PM, Tom Lane wrote:
> There actually was a previous thread about this:
> http://archives.postgresql.org/pgsql-hackers/2012-05/msg00001.php
> Note in particular Andrew's comment:
>
> Second, RFC 4627 is absolutely clear: a valid JSON value can
> only be an object or an array, so this thing about converting
> arbitrary datum values to JSON is a fantasy. If anything, we
> should adjust the JSON input routines to disallow anything else,
> rather than start to output what is not valid JSON.

Thanks for taking a look. That makes sense. I guess these are similar
issues to those the XML type faces, where working with fragments is a
problem. The spec requires a single root element, but you don't always
have that when you're *building* XML, hence the addition of `IS DOCUMENT'.

I was hoping to find a low-impact way to allow SQL-level construction of
more complex JSON objects with correct text escaping, but it sounds like
this isn't the right route. I don't currently see any way to achieve the
kind of on-the-fly building you can do with XML's xmlelement(),
xmlconcat(), xmlforest() etc; nor equivalent to hstore's
hstore(text[],text[]), and I was hoping to improve that.

I have a half-finished JSON object constructor
json_object_from_arrays(text[], json[]) in the same style as
hstore(text[],text[]) . It won't work without the notion of json-typed
scalars, though, as the values of keys could then only be arrays or
objects, which isn't very useful. I can't usefully accept `anyarray' as
a values argument since arrays are of homogeneous type. Accepting text[]
would be a bug-magnet even if there was some kind of `text
json_escape(text)' function.

Would it be reasonable to add a separate json_element type, one that's
binary-equivalent to `json' but not constrained by the requirement to be
an array or object/dict? Or a `jsobject' ?

As for the value_to_json crashing, works for me:

postgres=# SELECT value_to_json(42);
value_to_json
---------------
42
(1 row)

... since datum_to_json is happy to accept anything you throw at it
using output function lookups, and value_to_json its self doesn't care
about the argument type at all. That was all in the regression tests.

Purely so I understand what the correct handling of the anyelement+text
overload would've been: In light of your comments on opr_sanity would
the right approach be to add a second C function like text_to_json that
only accepts 'text' to avoid confusing the sanity check? So the SQL
"value_to_json(anyelement)" would point to the C "value_to_json" and the
SQL "value_to_json(text)" would point to the C "text_to_json" ?

Anyway, clearly the value_to_json approach is out.

--
Craig Ringer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 05:55:04
Message-ID: 23141.1344837304@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Craig Ringer <ringerc(at)ringerc(dot)id(dot)au> writes:
> As for the value_to_json crashing, works for me:

> postgres=# SELECT value_to_json(42);
> value_to_json
> ---------------
> 42
> (1 row)

Oh, right, because there actually is support for anyelement in the
underlying C function. There is not in the quote_literal case.

> Purely so I understand what the correct handling of the anyelement+text
> overload would've been: In light of your comments on opr_sanity would
> the right approach be to add a second C function like text_to_json that
> only accepts 'text' to avoid confusing the sanity check?

Actually, given the above, what did you need value_to_json(text) for at
all? Wouldn't value_to_json(anyelement) have covered it?

But yeah, the general approach to suppressing complaints from that
opr_sanity test is to make more C entry points. The point of it,
in some sense, is that if you want to make an assumption that two
types are binary-equivalent then it's better to have that assumption
in C code than embedded in the pg_proc entries. The cases that we
let pass via the "expected" outputs are only ones where binary
equivalence seems pretty well assured, like text vs varchar.

regards, tom lane


From: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: Implement value_to_json for single-datum conversion
Date: 2012-08-13 06:16:50
Message-ID: 50289BD2.1050607@ringerc.id.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/13/2012 01:55 PM, Tom Lane wrote:
> Actually, given the above, what did you need value_to_json(text) for at
> all? Wouldn't value_to_json(anyelement) have covered it?

Usability. Without the version accepting text an explicit cast to text
is required to disambiguate a literal argument like
value_to_json('something') .

> But yeah, the general approach to suppressing complaints from that
> opr_sanity test is to make more C entry points. The point of it,
> in some sense, is that if you want to make an assumption that two
> types are binary-equivalent then it's better to have that assumption
> in C code than embedded in the pg_proc entries. The cases that we
> let pass via the "expected" outputs are only ones where binary
> equivalence seems pretty well assured, like text vs varchar.

Thanks. I appreciate the explanation, and sorry for the newbie error.

On the JSON stuff, I can see it's not as simple as adding a simple
escape function. For my needs during the 9.2 timeframe I'll bundle up an
extension with the functionality I need and deal with the need to port
to whatever 9.3 includes. Hopefully a "json_value" or "javascript_value"
or similar can be introduced for 9.3, given comments like:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00030.php
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00065.php

.. expressing not only the need for json scalars, but the fact that
they're already commonplace in pretty much everything else.

Given this:

http://archives.postgresql.org/pgsql-hackers/2012-05/msg00040.php

it does seem that `json' should be a whole document not a fragment, but
IMO a way to work with individual JSON values is going to be *necessary*
to get the most out of the json support - and to stop people who use
JSON in the real world complaining that Pg's JSON support is broken
because it follows the standard not real-world practice.

Personally the lack of json scalars has prevented me from using JSON
support in two different places already, though it's proving very useful
in many others.

--
Craig Ringer