JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?

Lists: pgsql-hackers
From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 12:02:40
Message-ID: 1335873760.3106.66.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi hackers

After playing around with array_to_json() and row_to_json() functions a
bit it I have a question - why do we even have 2 variants *_to_json()

Collapsing array_to_json() and row_to_json() into just to_json()
----------------------------------------------------------------

As the functionality is not yet release maybe we could still rethink the
interface and have just one which can deal with all types :

to_json(any) returns json

the current two versions have to call the "any" variant internally
anyhow, to convert individual field values, so why not just expose the
full functionality as a single to_json() function

You almost can get the "any" functionality now by wrapping the type in
an array and afterwards strip outermost [] from the result.

Is there any good reason why not expose any_to_json() it directly ?

CREATE OR REPLACE FUNCTION to_json(obj anyelement)
RETURNS json AS $$
BEGIN
RETURN substring(array_to_json(array[obj])::text FROM '^\[(.*)\]$');
END;
$$ LANGUAGE plpgsql;

hannu=# select to_json('1'::text) ;
to_json
---------
"1"
(1 row)

hannu=# select to_json(test) from test limit 2;
to_json
-------------------------------------------------------------------
{"id":9,"data":"testdata","tstamp":"2012-05-01 09:44:50.175189"}
{"id":10,"data":"testdata","tstamp":"2012-05-01 09:45:50.260276"}
(2 rows)

Maybe we can remove the *_to_json(functions completely :)
---------------------------------------------------------

As a separate note, could we go even further and fold all this
functionality into an universal cast, so that attaching ::json to any
object will automagically work ?

Removing current limitation of PL/pgSQL
----------------------------------------

The above plpgsql definition of to_json() does not currently work with
anonymous records defined inline or arrays of such records, but only
because Pl/PgSQL functions currently don't accept these types

hannu=# select to_json(s) from (select 1::int as i, 't'::text as t union
select 2,'x')s;
ERROR: PL/pgSQL functions cannot accept type record
CONTEXT: compilation of PL/pgSQL function "to_json" near line 1

I think PL/pgSQL could now start accepting such records as the wrinkles
which made it hard to (recursively) get the needed info for anonymous
records were ironed out when developing the *_to_json() functions so all
of it just works.

IIRC some of this needed improving data available in core, and was not
just extra surgery done directly inside the *_to_json() functions.

hannu=# select row_to_json(s) from (select 1::int as i, 't'::text as t
union select 2,'x')s;
row_to_json
-----------------
{"i":1,"t":"t"}
{"i":2,"t":"x"}
(2 rows)

and even

hannu=# select row_to_json(s) from (select 1::int as i, (select z
from(select 2::int as j, 'x'::text as x)z) as t union select 2,null)s;
row_to_json
-----------------------------
{"i":1,"t":{"j":2,"x":"x"}}
{"i":2,"t":null}
(2 rows)

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 13:18:28
Message-ID: CAHyXU0w-gZ+bU72Y=bQFwpeogUKQ66vEZd4D2jPjfuswWhDa0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 7:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> Hi hackers
>
> After playing around with array_to_json() and row_to_json() functions a
> bit it I have a question - why do we even have 2 variants *_to_json()
>
> Collapsing array_to_json() and row_to_json() into just to_json()

I asked the same question. It was noted that the xml functions aren't
overloaded like that and that it's cleaner to introduce datum specific
behaviors if you don't overload.

I don't really agree with that or any of the naming styles that are in
the form inputtype_func() but I think most people are on the other
side of the argument.

merlin


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 15:02:27
Message-ID: 1335884547.3106.126.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 08:18 -0500, Merlin Moncure wrote:
> On Tue, May 1, 2012 at 7:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> > Hi hackers
> >
> > After playing around with array_to_json() and row_to_json() functions a
> > bit it I have a question - why do we even have 2 variants *_to_json()
> >
> > Collapsing array_to_json() and row_to_json() into just to_json()
>
> I asked the same question. It was noted that the xml functions aren't
> overloaded like that and that it's cleaner to introduce datum specific
> behaviors if you don't overload.

XML, being an "enterprise" thing is a large and complex beast.

Javascript - and by extension json - comes from the other end, being
lightweight and elegant at core.

Also, the the *_to_xml functions present still don't match what is there
for json, they don't even overlap !

Thus I see no reason why deciding on how to_json() functions (or cast to
json) should work needs to be based on how xml works.

We currently don't have any of the "database_to_json()" or
"querystring_to_json()" and we don't need these either.

I'd be much more happy by just having a working cast to json from all
types, not a myriad of functions for all possible types -
int4_to_json(), text_to_json(), bool_to_json(), record_to_json(),
array_to_json(), pg_user_to_json, etc. etc. etc.

What we currently have exposed to userspace are two arbitrarily chosen
"compex type" functions -

array_to_json() for converting arrays of ANY element type to json ,
inluding arrays consisting of records which may again contain arrays and
records.

and

row_to_json() for converting "rows" again potentially consisting of ANY
TYPE, including arrays of any type and any complex type. It handles even
the row() type :)

hannu=# select row_to_json(row(1,2,3));
row_to_json
------------------------
{"f1":1,"f2":2,"f3":3}
(1 row)

What we currently lack is direct conversion for simple types, though
they are easily achieved by converting to a single-element array and
then stripping outer [] from the result

It would be really nice to also have the casts from json to any type,
including records though.

And perhaps one functions for converting schema elements to some json
representation, so that a json_dump could easily be constructed :)

We really do not need footguns similar to database_to_xml() or
schema_to_xml() which just to consume all memory in the server on any
real database.

> I don't really agree with that or any of the naming styles that are in
> the form inputtype_func() but I think most people are on the other
> side of the argument.

I think that most people have not given this any thought yet, so they
simply lack any reasoned opinion ;)

> merlin

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 15:49:28
Message-ID: CAARyMpA6sPU2pyEjeybU+5KUCi+EnWuSTHqB5uuCZPR63wO7Hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> Hi hackers
>
> After playing around with array_to_json() and row_to_json() functions a
> bit it I have a question - why do we even have 2 variants *_to_json()

Here's the discussion where that decision was made:

http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php

To quote:

>>> why not call all these functions 'to_json' and overload them?
>>
>> I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options that are specific to the datum type. We have various foo_to_xml() functions now.
>
> -1
>
> older proposal is more consistent with xml functions

The most compelling argument I see here is the one about options
specific to the datum type.

Two other reasons I can think of:

* If someone tries to google for how to convert an array to JSON,
having a function named 'array_to_json' will make that easier.

* If the JSON type does not yet support, say, converting from a
number, it will be apparent from the names and types of the functions,
rather than being a hidden surprise. On the other hand, array_to_json
and composite_to_json already convert ANY values to JSON, so this
doesn't matter, anyway.

On Tue, May 1, 2012 at 11:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> What we currently lack is direct conversion for simple types, though
> they are easily achieved by converting to a single-element array and
> then stripping outer [] from the result

I agree that this function ought to be exposed. Note that such a
function (perhaps called datum_to_json) is indeed the same as the
proposed to_json function, which tries to convert a value of any type
to JSON.

> It would be really nice to also have the casts from json to any type,
> including records though.

What the casts currently do (primarily) is convert between the TEXT
and JSON types. So if you have JSON-encoded TEXT, use a cast to
convert it to the JSON type (this will perform validation, ensuring
that no invalid JSON gets in). Any escape/unescape operations need to
be explicit.

-Joey


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)2ndquadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:05:06
Message-ID: CAHyXU0z1e4HfEy2GPByT5Dq-rKpEBTeLsaiQt7=97qfZXsoUJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
>> Hi hackers
>>
>> After playing around with array_to_json() and row_to_json() functions a
>> bit it I have a question - why do we even have 2 variants *_to_json()
>
> Here's the discussion where that decision was made:
>
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php
>
> To quote:
>
>>>> why not call all these functions 'to_json' and overload them?
>>>
>>> I don't honestly feel that advances clarity much. And we might want to overload each at some stage with options that are specific to the datum type. We have various foo_to_xml() functions now.
>>
>> -1
>>
>> older proposal is more consistent with xml functions
>
> The most compelling argument I see here is the one about options
> specific to the datum type.

I don't find that to be particularly compelling at all. to_timestamp
for example supports multiple argument versions depending on the input
type.

>  * If the JSON type does not yet support, say, converting from a
> number, it will be apparent from the names and types of the functions,
> rather than being a hidden surprise.  On the other hand, array_to_json
> and composite_to_json already convert ANY values to JSON, so this
> doesn't matter, anyway.

I don't see how not having to_json(type) is any less surprising than
type_to_json().

To add:
Are we going to have json_length()? Why shouldn't length operate
directly on the json type since it has a length? Or are we going to
force an implicit cast to text?

An elementary point of generic programming through SQL is that you are
supposed to keep *what you are trying to do* decoupled from *what
you're doing it on*. It allows for very natural and terse
programming. The array, xml, and now the json apis essentially
violate this principle. The array api I find particularly galling
since you end up having to retype 'array' N times in a single
expression.

merlin


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:12:11
Message-ID: 1335888731.3106.137.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 11:49 -0400, Joey Adams wrote:
> On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> > Hi hackers
> >
> > After playing around with array_to_json() and row_to_json() functions a
> > bit it I have a question - why do we even have 2 variants *_to_json()
>
> Here's the discussion where that decision was made:
>
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php

Thanks, will read it!

> To quote:
>
> >>> why not call all these functions 'to_json' and overload them?
> >>
> >> I don't honestly feel that advances clarity much. And we might want
> to overload each at some stage with options that are specific to the
> datum type. We have various foo_to_xml() functions now.
> >
> > -1
> >
> > older proposal is more consistent with xml functions
>
> The most compelling argument I see here is the one about options
> specific to the datum type.
>
> Two other reasons I can think of:
>
> * If someone tries to google for how to convert an array to JSON,
> having a function named 'array_to_json' will make that easier.

Well, if you want to know how to convert an integer to string, you don't
use integer_to_text() function. you just use a working cast.

and here it is an outright lie:

hannu=# select 1::json;
ERROR: cannot cast type integer to json
LINE 1: select 1::json;

the error should be "won't cast type integer to json" :)

It very well _can_ convert it, as it does it without a problem when such
integer is inside an array or a record type.

> * If the JSON type does not yet support, say, converting from a
> number, it will be apparent from the names and types of the functions,
> rather than being a hidden surprise. On the other hand, array_to_json
> and composite_to_json already convert ANY values to JSON, so this
> doesn't matter, anyway.

By this logic all non-working casts are "hidden surprises"

> On Tue, May 1, 2012 at 11:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> > What we currently lack is direct conversion for simple types, though
> > they are easily achieved by converting to a single-element array and
> > then stripping outer [] from the result
>
> I agree that this function ought to be exposed. Note that such a
> function (perhaps called datum_to_json) is indeed the same as the
> proposed to_json function, which tries to convert a value of any type
> to JSON.

Hmm, I just have found an answer to my question on how to add to_json()
capability to extension types .

If whe had to_json as a cast, it would probably be straightforward for
extensions like hstore to provide their own to_json casts - especially
now that the json type is in core - and we could get a working hstore
--> json conversion by just running the

CREATE EXTENSION hstore;

command.

> > It would be really nice to also have the casts from json to any type,
> > including records though.
>
> What the casts currently do (primarily) is convert between the TEXT
> and JSON types. So if you have JSON-encoded TEXT, use a cast to
> convert it to the JSON type (this will perform validation, ensuring
> that no invalid JSON gets in). Any escape/unescape operations need to
> be explicit.
>
> -Joey

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:22:25
Message-ID: CAD5tBc+qHrNKth+qT-hQmT5RTTaABfjabd-r4acLy9HdwBm7hg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> On Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
> wrote:
> > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
> wrote:
> >> Hi hackers
> >>
> >> After playing around with array_to_json() and row_to_json() functions a
> >> bit it I have a question - why do we even have 2 variants *_to_json()
> >
> > Here's the discussion where that decision was made:
> >
> > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php
> >
> > To quote:
> >
> >>>> why not call all these functions 'to_json' and overload them?
> >>>
> >>> I don't honestly feel that advances clarity much. And we might want to
> overload each at some stage with options that are specific to the datum
> type. We have various foo_to_xml() functions now.
> >>
> >> -1
> >>
> >> older proposal is more consistent with xml functions
> >
> > The most compelling argument I see here is the one about options
> > specific to the datum type.
>
> I don't find that to be particularly compelling at all. to_timestamp
> for example supports multiple argument versions depending on the input
> type.
>
> > * If the JSON type does not yet support, say, converting from a
> > number, it will be apparent from the names and types of the functions,
> > rather than being a hidden surprise. On the other hand, array_to_json
> > and composite_to_json already convert ANY values to JSON, so this
> > doesn't matter, anyway.
>

I am away from base on a consulting assignment all this week, so my
connectivity and time are severely limited, and I don't have time to
respond in depth.

Let me just point out two things. First, we are approaching a beta release.
The time for changing this is long since gone, IMNSHO.

Second, RFC 4627 is absolutely clear: a valid JSON value can only be an
object or an array, so this thing about converting arbitrary datum values
to JSON is a fantasy. If anything, we should adjust the JSON input routines
to disallow anything else, rather than start to output what is not valid
JSON.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:33:10
Message-ID: CAHyXU0zzxEV93CvMUuSP7Tn+j9GEqdYj=XjxcU2YaZ+WO-mf5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 11:22 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
>>
>> On Tue, May 1, 2012 at 10:49 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
>> wrote:
>> > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
>> > wrote:
>> >> Hi hackers
>> >>
>> >> After playing around with array_to_json() and row_to_json() functions a
>> >> bit it I have a question - why do we even have 2 variants *_to_json()
>> >
>> > Here's the discussion where that decision was made:
>> >
>> > http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php
>> >
>> > To quote:
>> >
>> >>>> why not call all these functions 'to_json' and overload them?
>> >>>
>> >>> I don't honestly feel that advances clarity much. And we might want to
>> >>> overload each at some stage with options that are specific to the datum
>> >>> type. We have various foo_to_xml() functions now.
>> >>
>> >> -1
>> >>
>> >> older proposal is more consistent with xml functions
>> >
>> > The most compelling argument I see here is the one about options
>> > specific to the datum type.
>>
>> I don't find that to be particularly compelling at all.  to_timestamp
>> for example supports multiple argument versions depending on the input
>> type.
>>
>> >  * If the JSON type does not yet support, say, converting from a
>> > number, it will be apparent from the names and types of the functions,
>> > rather than being a hidden surprise.  On the other hand, array_to_json
>> > and composite_to_json already convert ANY values to JSON, so this
>> > doesn't matter, anyway.
>
>
>
> I am away from base on a consulting assignment all this week, so my
> connectivity and time are severely limited, and I don't have time to respond
> in depth.
>
> Let me just point out two things. First, we are approaching a beta release.
> The time for changing this is long since gone, IMNSHO.

sure. pedantic philosophical arguments aside, I'm already using the
api heavily and would prefer not to see it changed :-).

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:54:14
Message-ID: 15985.1335891254@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Let me just point out two things. First, we are approaching a beta release.
> The time for changing this is long since gone, IMNSHO.

This is our last chance to get it right, so that argument doesn't seem
to me to carry a lot of weight ...

> Second, RFC 4627 is absolutely clear: a valid JSON value can only be an
> object or an array, so this thing about converting arbitrary datum values
> to JSON is a fantasy. If anything, we should adjust the JSON input routines
> to disallow anything else, rather than start to output what is not valid
> JSON.

... but this one does.

regards, tom lane


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 16:56:20
Message-ID: CAARyMpA9E6Mt4SYQfTzqYOZ_7=qNu+SR5rYRjuAVaXp-wR6uWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 12:22 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Second, RFC 4627 is absolutely clear: a valid JSON value can only be an
> object or an array, so this thing about converting arbitrary datum values to
> JSON is a fantasy. If anything, we should adjust the JSON input routines to
> disallow anything else, rather than start to output what is not valid JSON.

No, the RFC says (emphasis mine):

A JSON *text* is a serialized object or array.

If we let the JSON type correspond to a *value* instead, this
restriction does not apply, and the JSON type has a useful recursive
definition.

For example, this function would not be possible if we applied the
"object or array" restriction:

unnest(json) returns setof json

Note that a similar distinction appears with the XML type: "document"
versus "content".

-Joey


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 18:11:52
Message-ID: CAD5tBcL2AqHjOu5ODPKjfhG+3_BHCyeTDGBov9xFHa97eEZ7sg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>wrote:

> On Tue, May 1, 2012 at 12:22 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
> wrote:
> > Second, RFC 4627 is absolutely clear: a valid JSON value can only be an
> > object or an array, so this thing about converting arbitrary datum
> values to
> > JSON is a fantasy. If anything, we should adjust the JSON input routines
> to
> > disallow anything else, rather than start to output what is not valid
> JSON.
>
> No, the RFC says (emphasis mine):
>
> A JSON *text* is a serialized object or array.
>
> If we let the JSON type correspond to a *value* instead, this
> restriction does not apply, and the JSON type has a useful recursive
> definition.
>
> For example, this function would not be possible if we applied the
> "object or array" restriction:
>
> unnest(json) returns setof json
>
> Note that a similar distinction appears with the XML type: "document"
> versus "content".
>
>
>
I think you're playing with words. But in any case, the RFC says this
regarding generators:

5. Generators

A JSON generator produces JSON text. The resulting text MUST
strictly conform to the JSON grammar.

Our functions do seem to be JSON generators. So even if we accept
things that aren't JSON texts in our parser (possibly permitted by
section 4 of the RFC) we should not be generating them.

cheers

andrew


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 20:16:01
Message-ID: 1335903361.3106.247.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 12:54 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > Let me just point out two things. First, we are approaching a beta release.
> > The time for changing this is long since gone, IMNSHO.
>
> This is our last chance to get it right, so that argument doesn't seem
> to me to carry a lot of weight ...
>
> > Second, RFC 4627 is absolutely clear: a valid JSON value can only be an
> > object or an array,

No, according to RFC, a valid "JSON value" can be:

an object, an array, a number, a string, or one of false null true

>From RFC:
-------------------------
1. Introduction

JavaScript Object Notation (JSON) is a text format for the
serialization of structured data. It is derived from the object
literals of JavaScript, as defined in the ECMAScript Programming
Language Standard, Third Edition [ECMA].

JSON can represent four primitive types (strings, numbers, booleans,
and null) and two structured types (objects and arrays).
...

2.1. Values

A JSON value MUST be an object, array, number, or string, or one of
the following three literal names:

false null true

-------------------------

By having our JSON type mean a "JSON value" instead of "JSON
text" (which indeed is required to be array or object) we could make it
easy for all extension types to provide casts to "JSON value" and thus
automatically plug them into postgreSQL's built-in JSON support.

I would very much like this the *_to_array() functions first try a cast
to json when converting values, so that for example after the following
cast it would do the right thing for hstore .

CREATE FUNCTION hstore_to_json(IN hvalue hstore, OUT jvalue json) AS $$
return '{%s}' % hvalue.replace('"=>"','":"')
$$ LANGUAGE plpythonu;

CREATE CAST (hstore AS json)
WITH FUNCTION hstore_to_json(hstore)
AS IMPLICIT
;

hannu=# select *, datadict::json from test_hstore;
id | datadict | datadict
----+---------------------------------+---------------------------------
1 | "baz"=>"whatever", "foo"=>"bar" | {"baz":"whatever", "foo":"bar"}
2 | "bar"=>"the same", "foo"=>"bar" | {"bar":"the same", "foo":"bar"}
(2 rows)

Currently it seems to be hardwired to do datum --> text conversions

hannu=# select row_to_json(test_hstore) from test_hstore;
row_to_json
---------------------------------------------------------------
{"id":1,"datadict":"\"baz\"=>\"whatever\", \"foo\"=>\"bar\""}
{"id":2,"datadict":"\"bar\"=>\"the same\", \"foo\"=>\"bar\""}
(2 rows)

I'd like it to try datum --> json first and yield

hannu=# select row_to_json(test_hstore) from test_hstore;
row_to_json
---------------------------------------------------------------
{"id":1,"datadict":{"baz":"whatever", "foo":"bar"}}
{"id":2,"datadict":{"bar":"the same", "foo":"bar"}}
(2 rows)

This exact case could be made to work even with "JSON text" meaning ob
json type, but some other types may not be so lucky.

FOr example imagine a tri-value booean with textual values "yes", "no",
and "don't know" . Logical mapping to json would be true, false, null,
but we can't easily provide a triboolean --> json cast for this if we
require json value to be "JSON text" and don't accept "JSON values"

> > so this thing about converting arbitrary datum values
> > to JSON is a fantasy.

It should be possible to cast them to "JSON value", but not always "JSON
text" which indeed has to be array or object .

> > If anything, we should adjust the JSON input routines
> > to disallow anything else, rather than start to output what is not valid
> > JSON.

Nah, I'd like us to accept what other JSON parsers usually accept,
especially the original one described in
http://www.json.org/fatfree.html which cited one way to parse json to be

responseData = eval('(' + responseText + ')');

:)

But then I also like their statement when comparing JSON to XML :

JSON has no validator. Being well-formed and valid is not the same as
being correct and relevant. Ultimately, every application is responsible
for validating its inputs. This cannot be delegated.

> ... but this one does.

It does, _if_ we accept that json type is for "JSON text" and not "JSON
value". in which case we might need also a json_value type for
extensible casting to and from json.

> regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 20:54:56
Message-ID: 1335905696.3106.272.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 11:11 -0700, Andrew Dunstan wrote:
> On Tue, May 1, 2012 at 9:56 AM, Joey Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
...
> No, the RFC says (emphasis mine):
>
> A JSON *text* is a serialized object or array.
>
> If we let the JSON type correspond to a *value* instead, this
> restriction does not apply, and the JSON type has a useful
> recursive
> definition.
>
> For example, this function would not be possible if we applied
> the
> "object or array" restriction:
>
> unnest(json) returns setof json
>
> Note that a similar distinction appears with the XML type:
> "document"
> versus "content".
>
>
> I think you're playing with words. But in any case, the RFC says this
> regarding generators:
>
> 5. Generators
>
> A JSON generator produces JSON text. The resulting text MUST
> strictly conform to the JSON grammar.

I know it is a weak argument, but at least python, ruby and Javascript
in both Firefox and Chrome do "generate" JSON values, so our users might
kind of expect us to do the same :

Python
---------
hannu(at)hvost:~$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import json
>>> json.dumps(1)
'1'

Ruby:
-------
hannu(at)hvost:~$ irb
irb(main):001:0> require 'json'
=> true
irb(main):002:0> JSON.generate(1)
=> "1"

Mozilla Javascript
------------------
>>> JSON.stringify(1) "1"

Chrome Javascript
------------------
> JSON.stringify(1)
"1"

> Our functions do seem to be JSON generators. So even if we accept
> things that aren't JSON texts in our parser (possibly permitted
> by section 4 of the RFC) we should not be generating them.

we could have a function

json_generator(json) returns text

for generating pure JSON texts ;)

the argument for interpreting out json type as "JSON value" instead of
"JSON text" would be much bigger flexibility regarding extension types
support for json (and not only extension types, but also the types
explicitly unsupported by json, like Date) via defining casts to and
from json.

>From http://en.wikipedia.org/wiki/JSON

-------------------------------------------------------------------------
Unsupported native data types
-----------------------------
JavaScript syntax defines several native data types not included in the
JSON standard:[7] Date, Error, Math, Regular Expression, and Function.
These JavaScript data types must be represented as some other data
format, with the programs on both ends agreeing on how to convert
between types. As of 2011, there are some de facto standards for e.g.
converting between Date and String, but none universally
recognized.[8][9] Other languages may have a different set of native
types that must be serialized carefully to deal with this type of
conversion.

-------------------------------------------------------------------------

If we allowed user-defined casts for things like Date the out users
could decide, which de facto standard to support on each specific case .

But if we did interpret out json type strictly as "JSON text", not "JSON
value", this would need another type for json_value.

And it is entirely possible that somebody does want to do what merlin
described recently, that is get a rowset of "json" values from the
client and wrap them in '[' and ']' on way out, it wuld be shame to
restrict his json array elements to be just objects and arrays and not
the other legal json values.

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 21:29:12
Message-ID: 1335907752.3106.298.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 09:22 -0700, Andrew Dunstan wrote:
>
>
> On Tue, May 1, 2012 at 9:05 AM, Merlin Moncure <mmoncure(at)gmail(dot)com>
> wrote:
> On Tue, May 1, 2012 at 10:49 AM, Joey Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> > On Tue, May 1, 2012 at 8:02 AM, Hannu Krosing
> <hannu(at)2ndquadrant(dot)com> wrote:
> >> Hi hackers
> >>
> >> After playing around with array_to_json() and row_to_json()
> functions a
> >> bit it I have a question - why do we even have 2 variants
> *_to_json()
> >
> > Here's the discussion where that decision was made:
> >
> >
> http://archives.postgresql.org/pgsql-hackers/2012-01/msg01339.php
> >
> > To quote:
> >
> >>>> why not call all these functions 'to_json' and overload
> them?
> >>>
> >>> I don't honestly feel that advances clarity much. And we
> might want to overload each at some stage with options that
> are specific to the datum type. We have various foo_to_xml()
> functions now.
> >>
> >> -1
> >>
> >> older proposal is more consistent with xml functions
> >
> > The most compelling argument I see here is the one about
> options
> > specific to the datum type.
>
>
> I don't find that to be particularly compelling at all.
> to_timestamp
> for example supports multiple argument versions depending on
> the input
> type.
>
> > * If the JSON type does not yet support, say, converting
> from a
> > number, it will be apparent from the names and types of the
> functions,
> > rather than being a hidden surprise. On the other hand,
> array_to_json
> > and composite_to_json already convert ANY values to JSON, so
> this
> > doesn't matter, anyway.
>
>
>
> I am away from base on a consulting assignment all this week, so my
> connectivity and time are severely limited, and I don't have time to
> respond in depth.
>
> Let me just point out two things. First, we are approaching a beta
> release. The time for changing this is long since gone, IMNSHO.

First, let me start with stating that I am really happy (and a little
amazed and envious ;) ) with what the current to_json functions are
capable of.

It is already way better than what current query_to_xml could do (at
least int 9.1, may have improved since).

hannu=# select row_to_json(z) from (select 1::int as a, (select s from
(select 2::int as x, 2::text as b)s))z;
row_to_json
-----------------------------
{"a":1,"s":{"x":2,"b":"2"}}
(1 row)

hannu=# select query_to_xml('select 1::int as a, (select s from (select
2::int as x, 2::text as b)s)',true,true,'');
query_to_xml
-------------------------------------------------------------
<row xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">+
+
<a>1</a> +
<_x003F_column_x003F_>(2,2)</_x003F_column_x003F_> +
</row> +
+
(1 row)

The reason I am whining now is that with minor adjustments in
implementation it could all be made much more powerful (try cast
to ::json for values before cast to ::text) and much more elegant thanks
to PostgreSQL's built in casting.

If we allowed json to hold any "JSON value" and tried ::json when
generating json for compound types than we would be able to claim that
PostgreSQL supports JSON everywhere, defaulting to representing
officially unsupported types as strings, but allowing users to convert
these to their preferred conventions.

I'd also prefer to have default conversions already included for some of
our sexier types, like intervals (just a two element array) and hstore
(an object) etc.

Suddenly we would be the best match database for Web development and all
things Ajax and also have a widely used built in and adjustable
interchange format to outer world.

> Second, RFC 4627 is absolutely clear: a valid JSON value can only be
> an object or an array, so this thing about converting arbitrary datum
> values to JSON is a fantasy.

Probably a typo on your part - valid "JSON _text_" is object or array,
valid "JSON value" can also be number, text, true, false and null

What I am arguing for is interpreting our json type as representing a
"JSON value" not "JSON text", this would enable users to adjust and
extend the generation of json values via defining casts for their
specific types - most notably Date* types but also things like hstore,
which has a natural JSON representation as "object" (a list of key:value
pairs for non-js users, a.k.a. a dictionary, hash, etc.)

> If anything, we should adjust the JSON input routines to disallow
> anything else, rather than start to output what is not valid JSON.

I tested python, ruby and javascript in firefox and chrome, all their
JSON generators generate 1 for standalone integer 1 and "a" for
standalone string a , and none refused to convert either to JSON.

--
-------
Hannu Krosing
PostgreSQL Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Hannu Krosing'" <hannu(at)2ndQuadrant(dot)com>, "'Andrew Dunstan'" <andrew(at)dunslane(dot)net>
Cc: "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>, "'Joey Adams'" <joeyadams3(dot)14159(at)gmail(dot)com>, "'PavelStehule'" <pavel(dot)stehule(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 22:35:02
Message-ID: 00b101cd27ea$a75cf4d0$f616de70$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Hannu Krosing
> Sent: Tuesday, May 01, 2012 5:29 PM
>
> The reason I am whining now is that with minor adjustments in
> implementation it could all be made much more powerful (try cast to ::json
> for values before cast to ::text) and much more elegant thanks to
> PostgreSQL's built in casting.
>
> If we allowed json to hold any "JSON value" and tried ::json when generating
> json for compound types than we would be able to claim that PostgreSQL
> supports JSON everywhere, defaulting to representing officially unsupported
> types as strings, but allowing users to convert these to their preferred
> conventions.

I get that a JSON Text is always also a JSON Value but the reverse is not true. Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded value is a possible JSON Text - the most important property for purposes of data interchange.

>
> I'd also prefer to have default conversions already included for some of our
> sexier types, like intervals (just a two element array) and hstore (an object)
> etc.

Interval is not simply 2 values but also denotes whether the particular value is inclusive or exclusive; you would have to use an object unless you transmit in a text format and let the target perform the necessary interpretation of the string.

>
> Suddenly we would be the best match database for Web development and
> all things Ajax and also have a widely used built in and adjustable interchange
> format to outer world.
>
> > Second, RFC 4627 is absolutely clear: a valid JSON value can only be
> > an object or an array, so this thing about converting arbitrary datum
> > values to JSON is a fantasy.
>
> Probably a typo on your part - valid "JSON _text_" is object or array, valid
> "JSON value" can also be number, text, true, false and null
>
> What I am arguing for is interpreting our json type as representing a "JSON
> value" not "JSON text", this would enable users to adjust and extend the
> generation of json values via defining casts for their specific types - most
> notably Date* types but also things like hstore, which has a natural JSON
> representation as "object" (a list of key:value pairs for non-js users, a.k.a. a
> dictionary, hash, etc.)

Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case.

Currently, the idea is to get your result all lined up and ready to go and then ship it off to the caller as valid JSON so that the caller does not have to do so itself. Answering the question "what would this value look like if it was part of a json output?" is good; however, production use is likely to mostly care about the entire json interchange construct (i.e., JSON Text)

So: json -> json_text;

A JSON Value always has a textual representation but if we were to have an actual type it would make sense to encode it such that (strings, objects and arrays) are delimited while (numbers, false, true, and null) are not.

Type Name: json_value

Output Representations (all output surrounded by double-quotes since all are string-like) -
String: "'VALUE'" (single-quote delimiter)
Object: "{...}"
Array: "[]"
Number: "0.00"
Other: "false", "true", "null"

JSON is fundamentally an interchange format (especially from a database's perspective). JSON Values only really have meaning if they are attached explicitly to a JSON Text structure, if you wanted to store one independently you should convert it into a native representation first. The few exceptions to this would be sufficiently handled via plain text with meta-data indicating that the stored value is structured in directly JSON compatible syntax. In short, the default context for JSON in PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should reflect this default (which it does).

>
> > If anything, we should adjust the JSON input routines to disallow
> > anything else, rather than start to output what is not valid JSON.
>
> I tested python, ruby and javascript in firefox and chrome, all their JSON
> generators generate 1 for standalone integer 1 and "a" for standalone string
> a , and none refused to convert either to JSON.
>

^Assume that we keep the meaning of json to be JSON Text; what would you suggest occurs if someone attempts a datum -> json cast? Given that we are working in a strongly-typed environment the meaning of JSON cannot be changed and so either the cast has to output valid JSON Text or it has to fail. My personal take it is have it fail since any arbitrary decision to cast to JSON Text is going to make someone unhappy and supposedly they can modify their query so that the result generates whatever format they desire.

I haven't followed the JSON development in 9.2 too closely but exposing whatever conversion mechanism is currently used to generate JSON makes sense from a ease-of-development standpoint. But even then, during development passing around true JSON Text is not a big deal and then no "JSON_Value" API has to be exposed; thus it can be freely refined, along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3

So, in short, all of your ideas are still valid but use "json_value" for the data type. But, even them my guess is that you would rarely use json_value as a column type whereas you would frequently use json (JSON Text) for one. json_value would be a support type to facilitate working with json in a procedural-like way.

David J.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-01 23:11:02
Message-ID: 23251.1335913862@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>wrote:
>> No, the RFC says (emphasis mine):
>>
>> A JSON *text* is a serialized object or array.
>>
>> If we let the JSON type correspond to a *value* instead, this
>> restriction does not apply, and the JSON type has a useful recursive
>> definition.

> I think you're playing with words. But in any case, the RFC says this
> regarding generators:
> 5. Generators
> A JSON generator produces JSON text. The resulting text MUST
> strictly conform to the JSON grammar.

I read over the RFC, and I think the only reason why they restricted
JSON texts to represent just a subset of JSON values is this cute
little hack in section 3 (Encoding):

Since the first two characters of a JSON text will always be ASCII
characters [RFC0020], it is possible to determine whether an octet
stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
at the pattern of nulls in the first four octets.
00 00 00 xx UTF-32BE
00 xx 00 xx UTF-16BE
xx 00 00 00 UTF-32LE
xx 00 xx 00 UTF-16LE
xx xx xx xx UTF-8

They need a guaranteed 2 ASCII characters to make that work, and
they won't necessarily get that many with a bare string literal.

Since for our purposes there is not, and never will be, any need to
figure out whether a JSON input string is encoded in UTF16 or UTF32,
I find myself agreeing with the camp that says we might as well consider
that our JSON type corresponds to JSON values not JSON texts. I also
notice that json_in() seems to believe that already.

However, that doesn't mean I'm sold on the idea of getting rid of
array_to_json and row_to_json in favor of a universal "to_json()"
function. In particular, both of those have optional "pretty_bool"
arguments that don't fit nicely at all in a generic conversion
function. The meaning of that flag is very closely tied to the
input being an array or record respectively.

I'm inclined to leave these functions as they are, and consider
adding a universal "to_json(anyelement)" (with no options) later.
Because it would not have options, it would not be meant to cover
cases where there's value in formatting or conversion options;
so it wouldn't render the existing functions entirely obsolete,
nor would it mean there would be no need for other specialized
conversion functions.

regards, tom lane


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: 'Andrew Dunstan' <andrew(at)dunslane(dot)net>, 'Merlin Moncure' <mmoncure(at)gmail(dot)com>, 'Joey Adams' <joeyadams3(dot)14159(at)gmail(dot)com>, 'PavelStehule' <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 00:41:14
Message-ID: 1335919274.3106.343.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 18:35 -0400, David Johnston wrote:
> > -----Original Message-----
> > From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> > owner(at)postgresql(dot)org] On Behalf Of Hannu Krosing
> > Sent: Tuesday, May 01, 2012 5:29 PM
> >
> > The reason I am whining now is that with minor adjustments in
> > implementation it could all be made much more powerful (try cast to ::json
> > for values before cast to ::text) and much more elegant thanks to
> > PostgreSQL's built in casting.
> >
> > If we allowed json to hold any "JSON value" and tried ::json when generating
> > json for compound types than we would be able to claim that PostgreSQL
> > supports JSON everywhere, defaulting to representing officially unsupported
> > types as strings, but allowing users to convert these to their preferred
> > conventions.
>
> I get that a JSON Text is always also a JSON Value but the reverse is not true.
> Thus, if we define JSON to be JSON Value we cannot guarantee that the encoded
> value is a possible JSON Text - the most important property for purposes of
> data interchange.

Nope, the most important property for purposes of data interchange is
that we produce something that the client expects and can understand
without too much extra work on client side.

The way to "guarantee" JSON Text is to encode objects that produce it.

I see nothing wrong with returning either a complex JSON object of
simply null if the object could not be found.

> > I'd also prefer to have default conversions already included for some of our
> > sexier types, like intervals (just a two element array) and hstore (an object)
> > etc.
>
> Interval is not simply 2 values but also denotes whether the particular value
> is inclusive or exclusive; you would have to use an object unless you transmit
> in a text format and let the target perform the necessary interpretation of
> the string.

if you need that info to be passed to _your_ caller you just define a
ned cast for youtr interval-to-json which returns "object" notation.

It was meant as a sample of what could be included by having generic
json values and using json casts.

> > Suddenly we would be the best match database for Web development and
> > all things Ajax and also have a widely used built in and adjustable interchange
> > format to outer world.
> >
> > > Second, RFC 4627 is absolutely clear: a valid JSON value can only be
> > > an object or an array, so this thing about converting arbitrary datum
> > > values to JSON is a fantasy.
> >
> > Probably a typo on your part - valid "JSON _text_" is object or array, valid
> > "JSON value" can also be number, text, true, false and null
> >
> > What I am arguing for is interpreting our json type as representing a "JSON
> > value" not "JSON text", this would enable users to adjust and extend the
> > generation of json values via defining casts for their specific types - most
> > notably Date* types but also things like hstore, which has a natural JSON
> > representation as "object" (a list of key:value pairs for non-js users, a.k.a. a
> > dictionary, hash, etc.)
>
> Aside from the fact it is likely too late to change the interpretation I would argue against doing so in any case.
>
> Currently, the idea is to get your result all lined up and ready to go and
> then ship it off to the caller as valid JSON so that the caller does not
> have to do so itself.

"Valid JSON" is only a small part of the equation, as I quoted before
from "JSON: The Fat-Free Alternative to XML" at
http://www.json.org/fatfree.html

"JSON has no validator. Being well-formed and valid is not the same as
being correct and relevant. Ultimately, every application is responsible
for validating its inputs."

If we produce correct JSON text for things that are converible to JSON
text then we should be free to produce JSON values for simple value,
like everybody else (that is Javascript , python, ruby, ...)

I don't think it is postgreSQL's business to start educating people
about "correct" way to do JSON serialisation when everybody else does it
the generic way.

> Answering the question "what would this value look like if it was part of
> a json output?" is good; however, production use is likely to mostly care
> about the entire json interchange construct (i.e., JSON Text)

the "what would it look like" part is important for values that are not
covered by standard and are thus encoded as text. These need to follow
conventions outside the JSON spec proper, and thus may need to be
adjusted by the developer.

Doing it via ::json casts would be the cleanest and simplest way to deal
with it.

> So: json -> json_text;
>
> A JSON Value always has a textual representation but if we were to have an
> actual type it would make sense to encode it such that (strings, objects
> and arrays) are delimited while (numbers, false, true, and null) are not.

And so it is, what are you trying to say here ?

> Type Name: json_value
>
> Output Representations (all output surrounded by double-quotes since all are string-like) -
> String: "'VALUE'" (single-quote delimiter)

strings are surrounded by "" not ''

> Object: "{...}"
> Array: "[]"
> Number: "0.00"

"0.00" is a sting according to JSON spec

also, objects and arrays don't have surrounding ""

> Other: "false", "true", "null"

that is not what the standard says - "false" is not the same as false
without quotes - the first is string 'false', the second one is boolean
untrue

> JSON is fundamentally an interchange format (especially from a database's perspective).
> JSON Values only really have meaning if they are attached explicitly to a JSON Text
> structure, if you wanted to store one independently you should convert it into a
> native representation first. The few exceptions to this would be sufficiently
> handled via plain text with meta-data indicating that the stored value is structured
> in directly JSON compatible syntax.

This is something that each developer could define for his specific app
by writing an appropriate ::json cast function for the datatype, if such
casts were honoured while doing the eoutput conversion.

There is no one standard for mosts "outside jsons scope" datatypes,

> In short, the default context for JSON in
> PostgreSQL should JSON Text (not JSON Value) and thus the unadorned "json" should
> reflect this default (which it does).

I don't think postgreSQL has hierarchical types, so that json_text would
also be json_value, but not the other way.

I'm perfectly happy with us being flexible enough to allow people to
also generate non-standard JSON - as some of them most likely are now -
if it makes the whole system cleaner and easier to use.

I don't see why PostgreSQL's JSON should be more restrictive than most
others.

> > > If anything, we should adjust the JSON input routines to disallow
> > > anything else, rather than start to output what is not valid JSON.
> >
> > I tested python, ruby and javascript in firefox and chrome, all their JSON
> > generators generate 1 for standalone integer 1 and "a" for standalone string
> > a , and none refused to convert either to JSON.
> >
>
> Assume that we keep the meaning of json to be JSON Text; what would you
> suggest occurs if someone attempts a datum -> json cast?

I would not assume such thing ;)

> Given that we are working in a strongly-typed environment the meaning of
> JSON cannot be changed and so either the cast has to output valid JSON Text
> or it has to fail.

Most people don't work in strongly-typed environment, and thus would
work around such restriction if they need a simple JSON value at the
other end of the interchange.

They would either do like I did and use array_to_json(array[datum]) and
then strip off the [] before shipping the JSON in the most likely case
that receiver side _wants_ to get JSON value and has been getting a JSON
value from other partners, or it may decode it as array and then use the
only element.

> My personal take it is have it fail since any arbitrary decision to cast
> to JSON Text is going to make someone unhappy and supposedly they can
> modify their query so that the result generates whatever format they desire.

Do you actually have such an experience or is it just a wild guess ?

Would the one being "unhappy" be some developer who is actually using
JSON, or somebody whoi has only academic interest in RFCs ?

> I haven't followed the JSON development in 9.2 too closely but exposing
> whatever conversion mechanism is currently used to generate JSON makes
> sense from a ease-of-development standpoint. But even then, during
> development passing around true JSON Text is not a big deal and then
> no "JSON_Value" API has to be exposed; thus it can be freely refined,

The use of casts to json would be a very natural way to make all
postgreSQL type exportable to correct JSON and would give the control
over the exact conversion details to the developer.

Currently we blindly use for json value whatever the text format of the
field happens to be with no choice other than either do the whole
conversion ourselves or then accept that some datums have unsuitable
formats and need to do some ugly and possibly error-prone conversions on
the client.

that way madness lies - I'm pretty sure I don't want to do any of that.

I want my database do the conversion for standard tyoes and give me a
choice to change the conversion for _only_ the types I need,

Using type specific casts to json give me exactly this.

> along with related behavior - e.g., append_to_json(value json_value, location text), in 9.3
>
> So, in short, all of your ideas are still valid but use "json_value"
> for the data type. But, even them my guess is that you would rarely
> use json_value as a column type whereas you would frequently use json
> (JSON Text) for one.

> json_value would be a support type to facilitate working with json in
> a procedural-like way.

If you are really paranoid about somebody returning json value out of
postgresql why not just use a filter function which would fail if the
argument is not an array or text

define fail_if_not_json_text(json) returns json

this gives the developers the choice to still return JSON Values to
clients if they need.

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: David Johnston <polobo(at)yahoo(dot)com>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 01:22:55
Message-ID: 9A450013-8006-4EB9-B34E-D62A301DD51B@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On May 1, 2012, at 20:41, Hannu Krosing <hannu(at)2ndQuadrant(dot)com> wrote:
>
> Most people don't work in strongly-typed environment, and thus would
> work around such restriction if they need a simple JSON value at the
> other end of the interchange.
>
>
>> My personal take it is have it fail since any arbitrary decision to cast
>> to JSON Text is going to make someone unhappy and supposedly they can
>> modify their query so that the result generates whatever format they desire.
>
> Do you actually have such an experience or is it just a wild guess ?
>
>

So even given the semantic differences between an object and a scalar I am better understanding where interpreting JSON as JSON Value makes sense. However, if I convert a record or array to JSON I expect to get a JSON Text even if the there is only a single column or value in the input.

I guess my take is that record -> JSON text while anything else is JSON value. Whether it is worth maiming the special case for record is worthwhile I really do not know but the semantic difference does exist; and record output is a significant aspect of PostgreSQL output.

I get the ease-of-use aspect but also recognize that sometimes being slightly harder to use is worthwhile if you eliminate ambiguities or limit the possibility to make mistakes.

FWIW my background on this topic is more theoretical than experiential though I am an web-application developer by trade and do use some JSON in that capacity.

David J.


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 08:52:11
Message-ID: 1335948731.3106.402.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 21:22 -0400, David Johnston wrote:
> On May 1, 2012, at 20:41, Hannu Krosing <hannu(at)2ndQuadrant(dot)com> wrote:
> >
> > Most people don't work in strongly-typed environment, and thus would
> > work around such restriction if they need a simple JSON value at the
> > other end of the interchange.
> >
> >
> >> My personal take it is have it fail since any arbitrary decision to cast
> >> to JSON Text

For arrays and records the json text and jason value are exactly the
same. it is just that json representations of simple types are
officially not JSON texts.

> is going to make someone unhappy and supposedly they can
> >> modify their query so that the result generates whatever format they desire.
> >
> > Do you actually have such an experience or is it just a wild guess ?
> >
> >
>
> So even given the semantic differences between an object and a scalar
> I am better understanding where interpreting JSON as JSON Value makes
> sense. However, if I convert a record or array to JSON I expect to get
> a JSON Text even if the there is only a single column or value in the input.

Of course you will, and you will get a Json Text even for empty object
or array.

array[1] and 1 and {'one':1} are all different and will stay such.

> I guess my take is that record -> JSON text while anything else is JSON
> value. Whether it is worth maiming the special case for record is
> worthwhile I really do not know but the semantic difference does exist;
> and record output is a significant aspect of PostgreSQL output.

I have never suggested that we special-case an 1-element record or list
and start returning only the contained value for these.

> I get the ease-of-use aspect but also recognize that sometimes being slightly
> harder to use is worthwhile if you eliminate ambiguities or limit the
> possibility to make mistakes.

There are no ambiguities in what is returnded for record or array.

But not being able to return JSON values via cast to json for some types
or not using such casts will make extending the json support for types
by user much much harder. And nonstandard.

Using simple cast to json is very PostgreSQL-ish way to give support of
json to any type

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PavelStehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-02 09:29:29
Message-ID: 1335950969.3106.438.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-05-01 at 19:11 -0400, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On Tue, May 1, 2012 at 9:56 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>wrote:
> >> No, the RFC says (emphasis mine):
> >>
> >> A JSON *text* is a serialized object or array.
> >>
> >> If we let the JSON type correspond to a *value* instead, this
> >> restriction does not apply, and the JSON type has a useful recursive
> >> definition.
>
> > I think you're playing with words. But in any case, the RFC says this
> > regarding generators:
> > 5. Generators
> > A JSON generator produces JSON text. The resulting text MUST
> > strictly conform to the JSON grammar.
>
> I read over the RFC, and I think the only reason why they restricted
> JSON texts to represent just a subset of JSON values is this cute
> little hack in section 3 (Encoding):
>
> Since the first two characters of a JSON text will always be ASCII
> characters [RFC0020], it is possible to determine whether an octet
> stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
> at the pattern of nulls in the first four octets.
> 00 00 00 xx UTF-32BE
> 00 xx 00 xx UTF-16BE
> xx 00 00 00 UTF-32LE
> xx 00 xx 00 UTF-16LE
> xx xx xx xx UTF-8
>
> They need a guaranteed 2 ASCII characters to make that work, and
> they won't necessarily get that many with a bare string literal.
>
> Since for our purposes there is not, and never will be, any need to
> figure out whether a JSON input string is encoded in UTF16 or UTF32,
> I find myself agreeing with the camp that says we might as well consider
> that our JSON type corresponds to JSON values not JSON texts. I also
> notice that json_in() seems to believe that already.
>
> However, that doesn't mean I'm sold on the idea of getting rid of
> array_to_json and row_to_json in favor of a universal "to_json()"
> function. In particular, both of those have optional "pretty_bool"
> arguments that don't fit nicely at all in a generic conversion
> function. The meaning of that flag is very closely tied to the
> input being an array or record respectively.

The flags probably should not be tied to specific type, as JSON is
recursive and as such I think the current one-top-level-element-per row
is quite limited form of pretty-printing.

I have a table with a field the type of which is an array of type of
another table, and what I currently get with pretty=true is

hannu=# select row_to_json(test3, true) from test3;
-[ RECORD
1 ]----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
row_to_json | {"id":1,
|
"data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}],
| "tstamp":"2012-04-16 14:40:15.795947"}

What I would like to get what python's pprint does for the same json:

>>> pprint(row)
{'id': 1,
'data3': [{'data2': {'data': '0.262814193032682',
'id': 1,
'tstamp': '2012-04-05 13:21:03.235204'},
'id': 1,
'tstamp': '2012-04-05 13:25:03.644497'},
{'data2': {'data': '0.157406373415142',
'id': 2,
'tstamp': '2012-04-05 13:21:05.2033'},
'id': 2,
'tstamp': '2012-04-05 13:25:03.644497'}],
'tstamp': '2012-04-16 14:40:15.795947'}

If we have a pretty flag why not make it work all the way down the
structure ?

> I'm inclined to leave these functions as they are, and consider
> adding a universal "to_json(anyelement)" (with no options) later.

To achieve recursive prettyprinting the better way is to have an
universal to_json(anyelement) with a prettyprinting option

to_json(datum anyelement, indent int)

with the behavior that if indent is NULL or negative integer no
pretty-printing is done, if it is 0 printing starts at left margin and
if it is a positive integer then this number of spaces is added to the
left for each row (except the first one) of the json representation.

And it would be overridable for specific types, so that hstore could
provide its own

to_json(datum hstore, indent int)

which would do the correct pretty-printing for hstor-as-json_object
representation.

> Because it would not have options, it would not be meant to cover
> cases where there's value in formatting or conversion options;
> so it wouldn't render the existing functions entirely obsolete,
> nor would it mean there would be no need for other specialized
> conversion functions.

I don't object to row_to_json() and array_to_json() functions being
there as a convenience and as the two "official" functions guaranteed to
return "JSON text".

> regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 10:39:38
Message-ID: 1336127978.19151.72.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
>
>
> On Wed, May 2, 2012 at 2:29 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com>
> wrote:
>
>
>
> I don't object to row_to_json() and array_to_json() functions
> being
> there as a convenience and as the two "official" functions
> guaranteed to
> return "JSON text".
>
>
> So given that do we do anything about this now, or wait till 9.3?

Sorry for missing this mail, followed this only on list
I hope it is ok to CC this back to list

I'd like the json support in 9.2 updated as follows

Generic to_json(...) returning a "JSON value"
=============================================

we should have a generic to_json(...) both for eas and use and for easy
extensibility, as explained below.

to_json(...) should work for all types, returning a "json value" similar
to what current json_in does, but for all types, not just cstring.

We could keep row_to_json() and array_to_json() as official json-text
returning functions

Configurable and extensible to_json()
======================================

When working on structured types, always the first try for getting an
element-as-json should be running to_json(element) and only if this
fails fall back to current "use text representation" code.

this enables two important things

1) configurable json-coding of values not explicitly supported by
standard

You can read about an attempt to standardise json-date formart here

http://weblogs.asp.net/bleroy/archive/2008/01/18/dates-and-json.aspx .

By allowing developers just to define their own to_json(date) function
we give them the power do decide which one to use. And if we honour
search_path when looking up the to_json() functions, then they can even
choose to have different conventions for different applications.

2) flexibility in adding support for extension types, like representing
hstore as object/dict by just providing the to_json(hstore, ...)
functions in hstore extension

Pretty-printing
===============

If we were to support prettyprinting of anything more complex than
single level structs (record or array), then we need to pass "ident"
into the to_json() function

my recommendation would be to have the signature

to_json(datum any, ident int)

with ident = NULL meaning no prettyprint , ident =0 meaninf top level,
or "starting at left margin" and anything else meaning the amount of
spaces needed to be added to the beginning of all rows exept the first
one, for example the query

hannu=# select array_to_json(array(select test from test limit 2),true);
array_to_json
---------------------------------------------------------------------
[{"id":9,"data":"testdata","tstamp":"2012-05-01 09:44:50.175189"}, +
{"id":10,"data":"testdata","tstamp":"2012-05-01 09:45:50.260276"}]
(1 row)

could return this:

[{'data': 'testdata',
'id': 9,
'tstamp': '2012-05-01 09:44:50.175189'},
{'data': 'testdata',
'id': 10,
'tstamp': '2012-05-01 09:45:50.260276'}]

if it would call to_json(row, 1) for getting each row prettyprinted with
ident 1

Getting a record _from_ json()
==============================

JSON support would be much more useful if we supported the function of
converting the other way as well, that is from json to record

The best model easy to support seems to be what Merlin suggested, that
is the populate_record(record,hstore) function and corresponding #=
operator (see: http://www.postgresql.org/docs/9.1/static/hstore.html)

The complete plan for 9.2 once more
===================================

What is needed to nicely round up a simple and powerful json type is

1. the "json" TYPE

an agreement tha json type represents a "JSON value" (this is what
json_in currently does)

2. json output

2.1 doing the right thing with internal values

to_json() function for converting to this "JSON value" for any type.
default implementation for 'non-json' types returns their
postgresql textual representation in double quotes
(date -> "2012-05-01 09:45:50.260276"

structured types use to_json() for getting values internally,
so that by defining to_json(hstore) you can automatically get
hstore represented in javascript object or dictionary representation

hannu=# select row_to_json(r)
from (select 1::int id, '"foo"=>1, "bar"=>"baz"'::hstore)r;

should not return this:

row_to_json
------------------------------------------------------
{"id":1,"hstore":"\"bar\"=>\"baz\", \"foo\"=>\"1\""}
(1 row)

but this
row_to_json
------------------------------------------------------
{"id":1,"hstore":{"bar": "baz", "foo":"1"}}
(1 row)

2.1 getting the pretty-printing right for structured types

to_json(any, indent) functions for getting recursive prettyprinting.
we might also need another argument telling the "page width" we want
to pretty print to.

3. json input for records and arrays

3.1 json row to record

a function to convert from json to record, so that we can use json
also as input format. modeled after populate_record(record,hstore)
from http://www.postgresql.org/docs/9.1/static/hstore.html

one json_to_row should be enough, as we can use

3.2 json array to json rows

if we also want to use input arrays , say send all invoice rows as
a json array, we could add a

unnest_json(json_array_of_rows json) returns table(json_row json)

function and then do the inputting as

insert into invoice_row
select json_to_row(null::invoice_row, json_row)
from (select json_row as unnest_json(json_array_of_rows)) s;

> cheers
>
> andrew

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 13:52:45
Message-ID: 3827.1336139565@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
>> So given that do we do anything about this now, or wait till 9.3?

> I'd like the json support in 9.2 updated as follows

I think it's too late to be entertaining proposals for such changes in
9.2. If we had concluded that the existing functions were actively
wrong or a bad idea, then of course we'd need to do something; but they
are not, so we can just as well consider additions in the 9.3 cycle
rather than now. I am not convinced that this proposal is fully baked
yet, anyway; not to mention that right now we need to have our heads
down on resolving the remaining open issues, not designing,
implementing, and reviewing a pile of brand new code for json.

> By allowing developers just to define their own to_json(date) function
> we give them the power do decide which one to use. And if we honour
> search_path when looking up the to_json() functions, then they can even
> choose to have different conventions for different applications.

This is not going to work anywhere near as nicely as you think. If
somebody tries to define multiple to_json() functions that override a
generic to_json(anyelement) one, he will start getting "function is not
unique" parse failures. The parser will only successfully decide which
function to call when the input data type exactly matches one of the
specialized functions, which means you might as well not have the
generic one at all.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 16:30:39
Message-ID: 4FA4042F.3040007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05/04/2012 09:52 AM, Tom Lane wrote:
> Hannu Krosing<hannu(at)2ndQuadrant(dot)com> writes:
>> On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
>>> So given that do we do anything about this now, or wait till 9.3?
>> I'd like the json support in 9.2 updated as follows
> I think it's too late to be entertaining proposals for such changes in
> 9.2. If we had concluded that the existing functions were actively
> wrong or a bad idea, then of course we'd need to do something; but they
> are not, so we can just as well consider additions in the 9.3 cycle
> rather than now. I am not convinced that this proposal is fully baked
> yet, anyway; not to mention that right now we need to have our heads
> down on resolving the remaining open issues, not designing,
> implementing, and reviewing a pile of brand new code for json.

Yeah, that was my feeling. We usually take a release or two to get
things right, fill in what's missing, etc. and I don't think this will
be ant different.

>
>> By allowing developers just to define their own to_json(date) function
>> we give them the power do decide which one to use. And if we honour
>> search_path when looking up the to_json() functions, then they can even
>> choose to have different conventions for different applications.
> This is not going to work anywhere near as nicely as you think. If
> somebody tries to define multiple to_json() functions that override a
> generic to_json(anyelement) one, he will start getting "function is not
> unique" parse failures. The parser will only successfully decide which
> function to call when the input data type exactly matches one of the
> specialized functions, which means you might as well not have the
> generic one at all.
>
>

Yeah, what I've been thinking about in conjunction with similar problems
is some sort of type registry, so that we could code for non-builtin
types in certain cases. Maybe we should add that the the developers'
meeting agenda.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 16:56:22
Message-ID: 7677.1336150582@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Yeah, what I've been thinking about in conjunction with similar problems
> is some sort of type registry, so that we could code for non-builtin
> types in certain cases. Maybe we should add that the the developers'
> meeting agenda.

Maybe. I don't want to see a json-specific hack for this, but some sort
of generic way to add type knowledge could be useful, if we could figure
out what we want.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 17:43:47
Message-ID: CA+TgmoaqXDmXN-D0eX6+rF2r-189oa9oToC7a2gKjMeFRK0YVQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 4, 2012 at 12:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Yeah, what I've been thinking about in conjunction with similar problems
>> is some sort of type registry, so that we could code for non-builtin
>> types in certain cases. Maybe we should add that the the developers'
>> meeting agenda.
>
> Maybe.  I don't want to see a json-specific hack for this, but some sort
> of generic way to add type knowledge could be useful, if we could figure
> out what we want.

For this particular case, I think you just need some place to store a
pg_type -> pg_proc mapping. I'm not exactly sure how to make that not
a JSON-specific hack, since I certainly don't think we'd want to add a
new catalog just for that.

In general, I think it would be very useful to have some way of
identifying particular types - and versions of types - independently
of a particular installation - e.g. by assigning each type a UUID that
never changes and a version number that we bump when we change
something about that type. That seems like it would be very useful
for schema comparison tools, or for logical replication, where you
want to know whether two types are "the same type" even though they
are in different clusters. pg_upgrade has had past needs in this area
as well. However, I'm not sure that'd help solve this particular
problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 19:44:54
Message-ID: 1336160694.19151.85.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-05-04 at 13:43 -0400, Robert Haas wrote:
> On Fri, May 4, 2012 at 12:56 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> >> Yeah, what I've been thinking about in conjunction with similar problems
> >> is some sort of type registry, so that we could code for non-builtin
> >> types in certain cases. Maybe we should add that the the developers'
> >> meeting agenda.
> >
> > Maybe. I don't want to see a json-specific hack for this, but some sort
> > of generic way to add type knowledge could be useful, if we could figure
> > out what we want.
>
> For this particular case, I think you just need some place to store a
> pg_type -> pg_proc mapping. I'm not exactly sure how to make that not
> a JSON-specific hack, since I certainly don't think we'd want to add a
> new catalog just for that.

This was my initial proposal to have casts to ::json for all types.

I backed out from this in favot of generic to_json(datum, indent) in
order to support prettyprinting.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 19:49:56
Message-ID: 1336160996.19151.90.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
> >> So given that do we do anything about this now, or wait till 9.3?
>
> > I'd like the json support in 9.2 updated as follows
>
> I think it's too late to be entertaining proposals for such changes in
> 9.2. If we had concluded that the existing functions were actively
> wrong or a bad idea,

I think that hard-coding "postgresql text" representation as our json
representation without a possibility for the user tio easily fix it
without rewriting foll xx_to_json() functions is borderline "actively
wrong".

Can we at least have the xxx_to_json() functions try cast to json first
and fall back to text if the cast fails.

This would address my worst problem, all the rest can be easily defined
in user functions.

> then of course we'd need to do something; but they
> are not, so we can just as well consider additions in the 9.3 cycle
> rather than now. I am not convinced that this proposal is fully baked
> yet, anyway; not to mention that right now we need to have our heads
> down on resolving the remaining open issues, not designing,
> implementing, and reviewing a pile of brand new code for json.
>
> > By allowing developers just to define their own to_json(date) function
> > we give them the power do decide which one to use. And if we honour
> > search_path when looking up the to_json() functions, then they can even
> > choose to have different conventions for different applications.
>
> This is not going to work anywhere near as nicely as you think. If
> somebody tries to define multiple to_json() functions that override a
> generic to_json(anyelement) one, he will start getting "function is not
> unique" parse failures. The parser will only successfully decide which
> function to call when the input data type exactly matches one of the
> specialized functions, which means you might as well not have the
> generic one at all.
>
> regards, tom lane
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 19:59:03
Message-ID: CA+TgmobOP9tPKE2Kth0KTSH9969roSKs0GVpiaMeJF6yXb4w6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
>> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
>> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
>> >> So given that do we do anything about this now, or wait till 9.3?
>>
>> > I'd like the json support in 9.2 updated as follows
>>
>> I think it's too late to be entertaining proposals for such changes in
>> 9.2.  If we had concluded that the existing functions were actively
>> wrong or a bad idea,
>
> I think that hard-coding "postgresql text" representation as our json
> representation without a possibility for the user tio easily fix it
> without rewriting foll xx_to_json() functions is borderline "actively
> wrong".
>
> Can we at least have the xxx_to_json() functions try cast to json first
> and fall back to text if the cast fails.

I think the idea that you can involve the casting machinery in this is
misguided. sometextval::json has got to mean that sometextval is
expected to be in the form of a syntactically correct JSON value - and
NOT that we wrap it in a JSON string. We can have constructors for
JSON, but they've got to be separate from the casting machinery.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 20:12:26
Message-ID: 12299.1336162346@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
>> Can we at least have the xxx_to_json() functions try cast to json first
>> and fall back to text if the cast fails.

> I think the idea that you can involve the casting machinery in this is
> misguided.

It is possible that that can be made to work, but it's a research
project, not something to be crammed into 9.2 at the last possible
minute. In any case, I really dislike the idea that array_to_json
and row_to_json would contain two entirely different behaviors.
Leave the extensibility ideas for a future to_json() function.

regards, tom lane


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 20:40:18
Message-ID: 1336164018.19151.93.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
> On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
> >> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
> >> >> So given that do we do anything about this now, or wait till 9.3?
> >>
> >> > I'd like the json support in 9.2 updated as follows
> >>
> >> I think it's too late to be entertaining proposals for such changes in
> >> 9.2. If we had concluded that the existing functions were actively
> >> wrong or a bad idea,
> >
> > I think that hard-coding "postgresql text" representation as our json
> > representation without a possibility for the user tio easily fix it
> > without rewriting foll xx_to_json() functions is borderline "actively
> > wrong".
> >
> > Can we at least have the xxx_to_json() functions try cast to json first
> > and fall back to text if the cast fails.
>
> I think the idea that you can involve the casting machinery in this is
> misguided. sometextval::json has got to mean that sometextval is
> expected to be in the form of a syntactically correct JSON value - and
> NOT that we wrap it in a JSON string. We can have constructors for
> JSON, but they've got to be separate from the casting machinery.

on the contrary - the string representation of textual value a is "a"

casting should _not_ neam syntax check, casting is by definition a
conversion.

if we cast text to int, we return value of type int , if we cast int to
numeric(5,2) we return value of type numeric(5,2)

why should casring to json work differntly ?

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


From: "David Johnston" <polobo(at)yahoo(dot)com>
To: "'Hannu Krosing'" <hannu(at)krosing(dot)net>, "'Robert Haas'" <robertmhaas(at)gmail(dot)com>
Cc: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "'Andrew Dunstan'" <andrew(at)dunslane(dot)net>, "'PostgreSQL-development'" <pgsql-hackers(at)postgresql(dot)org>, "'Merlin Moncure'" <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-04 21:24:11
Message-ID: 00cb01cd2a3c$3ff8d990$bfea8cb0$@yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org [mailto:pgsql-hackers-
> owner(at)postgresql(dot)org] On Behalf Of Hannu Krosing
> Sent: Friday, May 04, 2012 4:40 PM
> To: Robert Haas
> Cc: Tom Lane; Andrew Dunstan; PostgreSQL-development; Merlin Moncure
> Subject: Re: [HACKERS] JSON in 9.2 - Could we have just one to_json()
> function instead of two separate versions ?
>
> On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
> > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net>
> wrote:
> > > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
> > >> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> > >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
> > >> >> So given that do we do anything about this now, or wait till 9.3?
> > >>
> > >> > I'd like the json support in 9.2 updated as follows
> > >>
> > >> I think it's too late to be entertaining proposals for such changes
> > >> in 9.2. If we had concluded that the existing functions were
> > >> actively wrong or a bad idea,
> > >
> > > I think that hard-coding "postgresql text" representation as our
> > > json representation without a possibility for the user tio easily
> > > fix it without rewriting foll xx_to_json() functions is borderline
> > > "actively wrong".
> > >
> > > Can we at least have the xxx_to_json() functions try cast to json
> > > first and fall back to text if the cast fails.
> >
> > I think the idea that you can involve the casting machinery in this is
> > misguided. sometextval::json has got to mean that sometextval is
> > expected to be in the form of a syntactically correct JSON value - and
> > NOT that we wrap it in a JSON string. We can have constructors for
> > JSON, but they've got to be separate from the casting machinery.
>
> on the contrary - the string representation of textual value a is "a"
>
> casting should _not_ neam syntax check, casting is by definition a conversion.
>
> if we cast text to int, we return value of type int , if we cast int to
> numeric(5,2) we return value of type numeric(5,2)
>
> why should casring to json work differntly ?
>

What is the distinction between what you are thinking regarding JSON and this example?

SELECT '1a'::integer;
SQL Error: ERROR: invalid input syntax for integer: "1a"
LINE 1: SELECT '1a'::integer

As a user if I cast something to something else I want the result to be of the correct type and deterministic; otherwise throw me some kind of "invalid input format" exception (or syntax exception). Casting vs. Constructors is really a meaningless distinction to a lay person. When I cast I do so by constructing a new value using my existing value for input. When I use an explicit CAST I am unable to supply additional parameters to configure the casting whereas a constructor function gives me that possibility. But a constructor function without any additional parameters is not semantically different than a cast.

I guess the concern to address is something like:

SELECT '{key: value}'::json OR SELECT '[1.25]'::json; Do you interpret this as already being valid JSON and thus output object/array constructs (i.e., JSON Text) or do you treat them as string literals and output scalars (i.e., JSON Value). Even if you feel these are artificial constructs the concepts holds that there may be ambiguous data that can be interpreted in multiple ways (this applies even to function forms, though in the function form you could specify which one you want to output using a separate DEFAULTed parameter).

I can see the primary use-case for JSON Value casting as being queries of the following forms (since the record and array forms are going through the record/array_to_json function):

SELECT COUNT(*)::json FROM table [WHERE ...];
SELECT single_col::json FROM table WHERE id = ?;

Where the single provided value can be sent directly back to the web-caller JavaScript and used as-is because it is valid JSON. Though, honestly, both SELECT to_json(single_col) and SELECT to_json(COUNT(*)) are equally usable so any distinction between them is a pure technical issue to me.

Am I correct in assuming the following expected behavior (the forgive the blatantly wrong syntax but you should get the point)?

RAISE NOTICE '%', SELECT 'A'::text => A
RAISE NOTICE '%', SELECT 'A'::json => "A"

David J


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 06:06:38
Message-ID: 1336197998.19151.230.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
> On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> > On Fri, 2012-05-04 at 09:52 -0400, Tom Lane wrote:
> >> Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> >> > On Wed, 2012-05-02 at 12:06 -0700, Andrew Dunstan wrote:
> >> >> So given that do we do anything about this now, or wait till 9.3?
> >>
> >> > I'd like the json support in 9.2 updated as follows
> >>
> >> I think it's too late to be entertaining proposals for such changes in
> >> 9.2. If we had concluded that the existing functions were actively
> >> wrong or a bad idea,
> >
> > I think that hard-coding "postgresql text" representation as our json
> > representation without a possibility for the user tio easily fix it
> > without rewriting foll xx_to_json() functions is borderline "actively
> > wrong".
> >
> > Can we at least have the xxx_to_json() functions try cast to json first
> > and fall back to text if the cast fails.
>
> I think the idea that you can involve the casting machinery in this is
> misguided. sometextval::json has got to mean that sometextval is
> expected to be in the form of a syntactically correct JSON value - and
> NOT that we wrap it in a JSON string.

PostgreSQL CAST is different from casting a C pointer, they actually
perform a conversion when defined WITH FUNCTION or WITH INOUT.
And they pass value unchanged when defined WITHOUT FUNCTION.

Casts _do_not_ perform syntax checks, they assume the source to be of
the type castted from and they return result of the target type.

I think you are confusing input/output formatting with CAST here.

This confusion about cast-as-syntax-check probably qualifies as
something being "actively wrong" with current implementation, though it
does not seem to be something actively defined (there are no casts
defined for json type) but rather an artifact of how postgresql input
works.

CAST is something that should convert one type to another, in this case
a textual type to its "json value" representation and back.

'sometext'::text::json --> '"sometext"'

and

'"sometext"'::json::text --> 'sometext'

the suggested "syntax check only" should only be done by the type i/o
functions json_in(cstring) and json_recv(internal) and not casts.

Casts should do "casting", that in PostgreSQL means type conversion

> We can have constructors for
> JSON, but they've got to be separate from the casting machinery.

Currently we do have constructors - json_in(csting) and
json_recv(internal)

These are the ones that should and do check for correct syntax.

Some more confusiong examples for pondering on cast vs i/o functions
--------------------------------------------------------------------

Some of the difficulties of understanding and explaining what a json
type should be are exemplified in the following

hannu=# create table jtest(plaintext text, jsontext json);
CREATE TABLE

hannu=# insert into jtest values('"A"','"A"');
hannu=# insert into jtest values('true','true');
hannu=# insert into jtest values('null','null');
hannu=# insert into jtest values(null,null);

hannu=# select row_to_json(jtest) from jtest;
row_to_json
--------------------------------------
{"plaintext":"\"A\"","jsontext":"A"}
{"plaintext":"true","jsontext":true}
{"plaintext":"null","jsontext":null}
{"plaintext":null,"jsontext":null}
(4 rows)

hannu=# insert into jtest values('a','a');
ERROR: invalid input syntax for type json
LINE 1: insert into jtest values('a','a');
^
DETAIL: line 1: Token "a" is invalid.

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 06:07:40
Message-ID: 1336198060.19151.232.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 2012-05-04 at 16:12 -0400, Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > On Fri, May 4, 2012 at 3:49 PM, Hannu Krosing <hannu(at)krosing(dot)net> wrote:
> >> Can we at least have the xxx_to_json() functions try cast to json first
> >> and fall back to text if the cast fails.
>
> > I think the idea that you can involve the casting machinery in this is
> > misguided.

> It is possible that that can be made to work, but it's a research
> project, not something to be crammed into 9.2 at the last possible
> minute. In any case, I really dislike the idea that array_to_json
> and row_to_json would contain two entirely different behaviors.
> Leave the extensibility ideas for a future to_json() function.

I did not mean that array_to_json and row_to_json would be different
than the generic to_json, just that they would be thin wrappers around
the to_json function which check that the arguments are of the correct
types for casting to "JSON text".

They need to recurse to each other and generic to_json anyway.

> regards, tom lane

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Hannu Krosing <hannu(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 09:06:15
Message-ID: 1336208775.13755.5.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2012-05-04 at 13:43 -0400, Robert Haas wrote:
> For this particular case, I think you just need some place to store a
> pg_type -> pg_proc mapping. I'm not exactly sure how to make that not
> a JSON-specific hack, since I certainly don't think we'd want to add a
> new catalog just for that.

I think you're thinking of pg_cast.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 09:13:14
Message-ID: 1336209194.13755.10.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2012-05-04 at 12:30 -0400, Andrew Dunstan wrote:
> Yeah, what I've been thinking about in conjunction with similar
> problems is some sort of type registry, so that we could code for
> non-builtin types in certain cases.

It certainly seems to come up a lot, but I'm not sure whether the two
main use cases -- mapping types into languages, and mapping types into
JSON (or XML, as an alternative) -- would have the same solution. A
third use case that could be included as well is changing the main text
or binary format of a type (e.g., what datestyle does). So instead of
having just a hard-coded set of typinput/typoutput, typrecv/typsend,
you'd have a catalog of

(context, inputfunc, outputfunc)

and the context would a language, something json or xml, or a
user-settable value. This could become quite complicated, but it sure
could solve a lot of issues in one go.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 09:16:18
Message-ID: 1336209378.13755.11.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
> > Can we at least have the xxx_to_json() functions try cast to json
> first
> > and fall back to text if the cast fails.
>
> I think the idea that you can involve the casting machinery in this is
> misguided. sometextval::json has got to mean that sometextval is
> expected to be in the form of a syntactically correct JSON value - and
> NOT that we wrap it in a JSON string.

I think it's only wrong if you try casting first and fall back to text.
Otherwise it could work, if the set of all json casts is defined
consistently.


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 13:27:14
Message-ID: 1336224434.19151.238.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2012-05-05 at 12:16 +0300, Peter Eisentraut wrote:
> On fre, 2012-05-04 at 15:59 -0400, Robert Haas wrote:
> > > Can we at least have the xxx_to_json() functions try cast to json
> > first
> > > and fall back to text if the cast fails.
> >
> > I think the idea that you can involve the casting machinery in this is
> > misguided. sometextval::json has got to mean that sometextval is
> > expected to be in the form of a syntactically correct JSON value - and
> > NOT that we wrap it in a JSON string.
>
> I think it's only wrong if you try casting first and fall back to text.
> Otherwise it could work, if the set of all json casts is defined
> consistently.

Currently the default "cast" for non-number, non-bool, not-already-json,
non-null values is to wrap text representation in double quotes.

So casting first then fall back to _quoted_ text is wrong only for those
types which have a very ugly text representation :)

--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: JSON in 9.2 - Could we have just one to_json() function instead of two separate versions ?
Date: 2012-05-05 15:49:37
Message-ID: 13188.1336232977@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hannu Krosing <hannu(at)2ndQuadrant(dot)com> writes:
> CAST is something that should convert one type to another, in this case
> a textual type to its "json value" representation and back.
> 'sometext'::text::json --> '"sometext"'
> and
> '"sometext"'::json::text --> 'sometext'

Well, that's a pretty interesting example, because if you expect that to
work like that, then what should happen with this?

'{"f1":4,"f2":2}'::text::json
'{"f1":4,"f2":2}'::json::text

Does the first of these produce a JSON object, or a quoted string?
At the moment you get the former, but it's pretty hard to argue that
it shouldn't produce a quoted string if transparent conversion is the
expectation. In the second case, do you end up with a JSON text (which
is what happens at the moment) or does it just slap some quotes around
the value as a string? I'm not convinced that you've made a principled
argument as to what should happen when.

In general, I think casts should only be used for conversions where
there is just one unsurprising choice of behavior, since the cast syntax
by definition doesn't provide any room for options. It's not clear to
me that JSON conversions are so obvious as to meet that standard. If
you start throwing random user-defined conversions into the mix, it's
even less obvious that there's only one unsurprising choice.

In the particular case of casts to and from text, we've essentially
set a project policy that those should behave equivalently to the type's
I/O conversion functions whenever possible. So I think the existing
behavior of those operations is correct and what you propose above is
wrong. There is certainly scope for a conversion function that takes
any random text string and produces a JSON quoted string from it, but
the cast operator is not the place for that.

regards, tom lane