Re: JSON manipulation functions

Lists: pgsql-hackers
From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: magnus(at)hagander(dot)net
Subject: JSON manipulation functions
Date: 2010-05-14 01:47:37
Message-ID: AANLkTimyZqOqhOpFWv8WiGWVM8eHIaMN9AUTeugvrMtL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

First off, thank you for allowing me to participate in Google Summer
of Code 2010. I'm sorry I haven't been active for the past few weeks.
Today, I added the wiki page for my project, but the project schedule
is highly tentative:
http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010 .

I'd like to discuss how the functions for type checking and conversion
(the majority of my project) should work. Below is my first draft for
the JSON manipulation function specs, along with annotations.

== Type checking ==

The following function returns the type of any JSON value.

json_type as enum ('null', 'string', 'number', 'boolean', 'object', 'array')
json_type(json) returns json_type

Would it be a bad idea to give an enum and a function the same name
(which appears to be allowed by PostgreSQL) ? If so, json_type(json)
could be json_typeof(json) or something instead.

I thought about having predicates like IS STRING and IS NUMBER,
similar to the IS DOCUMENT predicate used for XML. However, a major
problem with that approach is that it could lead to confusion
involving IS NULL. By my understanding, the JSON datatype will just
be a specialization of TEXT (it just validates the input). Like TEXT,
a JSON string can be 'null'. 'null'::JSON is not NULL. Bear in mind
that json_to_*('null') is NULL, though.

I also thought about having a series of json_is_* functions. I don't
think it's a bad idea, but I think json_type is a better solution.

== text/number/boolean conversion ==

These functions each convert a non-compound JSON value to its
respective return type. Run-time type checking is performed; a
conversion will throw an error if the input JSON is not the correct
type. If the JSON value is 'null', then the return value will be
NULL.

json_to_text(json) returns text
json_to_number(json) returns numeric
json_to_bool(json) returns boolean

These functions convert values to JSON. Passing NULL to any of the
functions below will return 'null':

text_to_json(text) returns json
number_to_json(numeric) returns json
bool_to_json(boolean) returns json

There could be generic value_to_json(any), but not a
json_to_value(json) function. See
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00321.php for
more details.

Conversion to/from number or boolean can also be achieved with
casting. Note well that '"string"'::JSON::TEXT is '"string"', not the
string's actual value. json_to_text is needed for this conversion.
For this reason, casting JSON might seem like something to recommend
against. However, IMHO, casting numbers and booleans to/from JSON is
fine and dandy; the paragraphs below give some weight to this.

I originally considered making json_to_number and number_to_json work
with TEXT instead of NUMERIC. However, as Tom Lane pointed out in the
above link, "Forcing people to insert explicit coercions from text
isn't going to be particularly convenient to use.". Nevertheless,
NUMERIC introduces a problem. For instance, if you say:

SELECT '-1e-38'::NUMERIC;

This conversion knocks out the scientific notation and produces a
41-character string. I seriously doubt that all outside applications
will handle 41-character numbers correctly.

Perhaps there should be individual functions for specific data types,
or maybe just a handful for particular cases. There might be
json_to_int, json_to_float, and json_to_numeric. In any case,
converting to/from number types can be achieved quite easily with
casting.

== array/object conversion ==

The json_object function converts a tuple to a JSON object. If there
are duplicate column names, there will be duplicate keys in the
resulting JSON object.

json_object([content [AS name] [, ...]]) returns json

Likewise, the json_array function converts a tuple to a JSON array.
Column names are ignored.

json_array([content [AS name] [, ...]]) returns json

The json_agg function reduces a set of JSON values to a single array
containing those values.

aggregate json_agg(json) returns json

json_object and json_agg can be used together to convert an entire
result set to one JSON array:

SELECT json_agg(json_object(*)) FROM tablename;

json_keys gets the keys of a JSON object as a set.

json_keys(json) returns setof text

json_values gets the values of a JSON object or the iems of a JSON
array as a set.

json_values(json) returns setof json

Note that all JSON slicing and splicing operations retain the original
formatting of JSON content.

== Miscellaneous ==

The features below would be nice, but will probably not be regarded as
required for this Google Summer of Code project to be considered
complete.

json_cleanup accepts a superset of JSON and, if it can, cleans it up
and returns a valid JSON string. This superset of JSON supports the
following extra features:

* Comments:
- Single-line comments with // and #
- C-style comments: /* comment */
* Unquoted object keys: {key: "value"}
* Single quote strings: 'single quotes; "double quotes" do not need
to be escaped here'
* Single quote escape allowed: "It\'s allowed, but it's not necessary"
* Lax number format (+ sign allowed; digits may be omitted on one
side of the decimal point).

json_cleanup(text) returns json

Example:

SELECT json_cleanup('{/*comment*/number: +.3}');
-- Result is '{"number": 0.3}'::JSON

-> retrieves an item of a JSON object by key. If the object has
duplicate keys, the first key listed will be retrieved. Example:

SELECT ('{"foo": 50, "foo": 100, "bar": "string"}'::JSON)->"foo"
-- Result is '50'::JSON

[] retrieves a value of a JSON array/object by (one-based) index. In
other words, value[n] is equivalent to selecting the nth row of
json_values(value) (provided value is of type JSON). Examples:

SELECT ('[1,2,3,4]'::JSON)[3]
-- Result is '3'::JSON

SELECT ('{"a": 1, "b": 2, "c": 3, "d": 4}'::JSON)[3]
-- Result is '3'::JSON


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-14 15:33:12
Message-ID: 201005141533.o4EFXC504326@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joseph Adams wrote:
> == array/object conversion ==
>
> The json_object function converts a tuple to a JSON object. If there
> are duplicate column names, there will be duplicate keys in the
> resulting JSON object.
>
> json_object([content [AS name] [, ...]]) returns json
>
> Likewise, the json_array function converts a tuple to a JSON array.
> Column names are ignored.
>
> json_array([content [AS name] [, ...]]) returns json

Do you see any problems with the fact that JSON arrays can use mixed
data types, e.g.:

[ 1, 2, 'hi', false]

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-14 16:17:52
Message-ID: AANLkTincYQpPjpd42swWLWu70q7rlUOpI9pgrsqgeMZ9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/5/14 Bruce Momjian <bruce(at)momjian(dot)us>:
> Joseph Adams wrote:
>> == array/object conversion ==
>>
>> The json_object function converts a tuple to a JSON object.  If there
>> are duplicate column names, there will be duplicate keys in the
>> resulting JSON object.
>>
>> json_object([content [AS name] [, ...]]) returns json
>>
>> Likewise, the json_array function converts a tuple to a JSON array.
>> Column names are ignored.
>>
>> json_array([content [AS name] [, ...]]) returns json
>
> Do you see any problems with the fact that JSON arrays can use mixed
> data types, e.g.:
>
>        [ 1, 2, 'hi', false]

it could not be a problem

regards
Pavel
>
> --
>  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-14 17:15:55
Message-ID: AANLkTikWU97pSb1akfzm6RhqAn401NvYzwYS8EhD94Ut@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> The following function returns the type of any JSON value.
>
> json_type as enum ('null', 'string', 'number', 'boolean', 'object', 'array')
> json_type(json) returns json_type

Seems reasonable.

> Would it be a bad idea to give an enum and a function the same name
> (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
> could be json_typeof(json) or something instead.

No, I think that's a fine idea.

> I thought about having predicates like IS STRING and IS NUMBER,
> similar to the IS DOCUMENT predicate used for XML.  However, a major
> problem with that approach is that it could lead to confusion
> involving IS NULL.  By my understanding, the JSON datatype will just
> be a specialization of TEXT (it just validates the input).  Like TEXT,
> a JSON string can be 'null'.  'null'::JSON is not NULL.  Bear in mind
> that json_to_*('null') is NULL, though.

Even aside from the possible semantic confusion, I don't think that we
should make any changes to our core grammar (gram.y) to support JSON.
It's not really necessary and it's better not to add extra stuff to
the grammar unless we really need it.

> I also thought about having a series of json_is_* functions.  I don't
> think it's a bad idea, but I think json_type is a better solution.

I agree.

> == text/number/boolean conversion ==
>
> These functions each convert a non-compound JSON value to its
> respective return type.  Run-time type checking is performed; a
> conversion will throw an error if the input JSON is not the correct
> type.  If the JSON value is 'null', then the return value will be
> NULL.
>
> json_to_text(json) returns text
> json_to_number(json) returns numeric
> json_to_bool(json) returns boolean

Can '3' be converted to a number, or only if it's written without the quotes?

> These functions convert values to JSON.  Passing NULL to any of the
> functions below will return 'null':
>
> text_to_json(text) returns json
> number_to_json(numeric) returns json
> bool_to_json(boolean) returns json
>
> There could be generic value_to_json(any), but not a
> json_to_value(json) function.  See
> http://archives.postgresql.org/pgsql-hackers/2010-04/msg00321.php for
> more details.

Seems OK.

> Conversion to/from number or boolean can also be achieved with
> casting.  Note well that '"string"'::JSON::TEXT is '"string"', not the
> string's actual value.  json_to_text is needed for this conversion.
> For this reason, casting JSON might seem like something to recommend
> against.  However, IMHO, casting numbers and booleans to/from JSON is
> fine and dandy; the paragraphs below give some weight to this.
>
> I originally considered making json_to_number and number_to_json work
> with TEXT instead of NUMERIC.  However, as Tom Lane pointed out in the
> above link, "Forcing people to insert explicit coercions from text
> isn't going to be particularly convenient to use.".  Nevertheless,
> NUMERIC introduces a problem.  For instance, if you say:
>
> SELECT '-1e-38'::NUMERIC;
>
> This conversion knocks out the scientific notation and produces a
> 41-character string.  I seriously doubt that all outside applications
> will handle 41-character numbers correctly.

Maybe not, but I don't think it's your problem to fix it if they
don't. If people want to have fine-grained control over the JSON that
gets generated, they can always generate the value as text and cast it
to JSON.

> Perhaps there should be individual functions for specific data types,
> or maybe just a handful for particular cases.  There might be
> json_to_int, json_to_float, and json_to_numeric.  In any case,
> converting to/from number types can be achieved quite easily with
> casting.

Personally I'd go with just json_to_numeric for starters. We can
always add the others if and when it's clear that they are useful.

> == array/object conversion ==
>
> The json_object function converts a tuple to a JSON object.  If there
> are duplicate column names, there will be duplicate keys in the
> resulting JSON object.
>
> json_object([content [AS name] [, ...]]) returns json

Seems good.

> Likewise, the json_array function converts a tuple to a JSON array.
> Column names are ignored.
>
> json_array([content [AS name] [, ...]]) returns json

I think this is pointless and should be omitted.

> The json_agg function reduces a set of JSON values to a single array
> containing those values.
>
> aggregate json_agg(json) returns json

Very useful, I like that.

> json_object and json_agg can be used together to convert an entire
> result set to one JSON array:
>
> SELECT json_agg(json_object(*)) FROM tablename;

Spiffy.

> json_keys gets the keys of a JSON object as a set.
>
> json_keys(json) returns setof text

I would tend to make this return text[] rather than SETOF text.

> json_values gets the values of a JSON object or the iems of a JSON
> array as a set.
>
> json_values(json) returns setof json

Similarly I would make this return json[].

> Note that all JSON slicing and splicing operations retain the original
> formatting of JSON content.

Good.

> == Miscellaneous ==
>
> The features below would be nice, but will probably not be regarded as
> required for this Google Summer of Code project to be considered
> complete.
>
> json_cleanup accepts a superset of JSON and, if it can, cleans it up
> and returns a valid JSON string.  This superset of JSON supports the
> following extra features:
>
>  * Comments:
>   - Single-line comments with // and #
>   - C-style comments: /* comment */
>  * Unquoted object keys: {key: "value"}
>  * Single quote strings: 'single quotes; "double quotes" do not need
> to be escaped here'
>  * Single quote escape allowed: "It\'s allowed, but it's not necessary"
>  * Lax number format (+ sign allowed; digits may be omitted on one
> side of the decimal point).
>
> json_cleanup(text) returns json
>
> Example:
>
> SELECT json_cleanup('{/*comment*/number: +.3}');
> -- Result is '{"number": 0.3}'::JSON

This might be nice to have, or not. I don't think it's a big deal either way.

> -> retrieves an item of a JSON object by key.  If the object has
> duplicate keys, the first key listed will be retrieved.  Example:
>
> SELECT ('{"foo": 50, "foo": 100, "bar": "string"}'::JSON)->"foo"
> -- Result is '50'::JSON
>
> [] retrieves a value of a JSON array/object by (one-based) index.  In
> other words, value[n] is equivalent to selecting the nth row of
> json_values(value) (provided value is of type JSON).  Examples:
>
> SELECT ('[1,2,3,4]'::JSON)[3]
> -- Result is '3'::JSON
>
> SELECT ('{"a": 1, "b": 2, "c": 3, "d": 4}'::JSON)[3]
> -- Result is '3'::JSON

I think some kind of array deference and object deference mechanism is
absolutely, positively 100% required. I don't know whether the
particular syntax you've proposed here is best or whether we should
pick another syntax or just use function notation, but I think we
definitely need *something*.

I also think we need a function called something like json_length()
which returns the length of a list or the number of keys in an object.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-14 17:47:08
Message-ID: AANLkTikdYc_L1QhFh33JTsNqjAtNnsPBUY0pdRjK2JaG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 14, 2010 at 1:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:

[snip]

>> == array/object conversion ==
>>
>> The json_object function converts a tuple to a JSON object.  If there
>> are duplicate column names, there will be duplicate keys in the
>> resulting JSON object.
>>
>> json_object([content [AS name] [, ...]]) returns json
>
> Seems good.
>
>> Likewise, the json_array function converts a tuple to a JSON array.
>> Column names are ignored.
>>
>> json_array([content [AS name] [, ...]]) returns json
>
> I think this is pointless and should be omitted.
>

(I'm going to avoid the use of the term "object" here to reduce confusion.)

I disagree with the assertion that it's pointless, and I have a
specific use-case in mind for this function. I have a system that
uses JSON arrays on the wire to encapsulate data, and both ends
understand the positional semantics of the elements. Using JSON
arrays instead of JSON objects reduces the transfer size by 40-80%,
depending on how "full" the rows (or class instances) are and the data
types of the elements, simply by removing the redundant object keys.
This function would be extremely useful to me when creating or
persisting raw class instances of these sorts.

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-14 19:09:55
Message-ID: 20885.1273864195@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> [] retrieves a value of a JSON array/object by (one-based) index. In
>> other words, value[n] is equivalent to selecting the nth row of
>> json_values(value) (provided value is of type JSON). Examples:
>>
>> SELECT ('[1,2,3,4]'::JSON)[3]
>> -- Result is '3'::JSON
>>
>> SELECT ('{"a": 1, "b": 2, "c": 3, "d": 4}'::JSON)[3]
>> -- Result is '3'::JSON

> I think some kind of array deference and object deference mechanism is
> absolutely, positively 100% required. I don't know whether the
> particular syntax you've proposed here is best or whether we should
> pick another syntax or just use function notation, but I think we
> definitely need *something*.

Trying to use array notation on something that isn't a SQL array type
is guaranteed to be a mess. I strongly recommend that you not attempt
that. Just define a function for it.

regards, tom lane


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-15 02:35:21
Message-ID: AANLkTim_fGWOdsj89rvxH3eA7extiK_neNQvJfDDMUbR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 14, 2010 at 11:33 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Joseph Adams wrote:
>> == array/object conversion ==
>>
>> The json_object function converts a tuple to a JSON object.  If there
>> are duplicate column names, there will be duplicate keys in the
>> resulting JSON object.
>>
>> json_object([content [AS name] [, ...]]) returns json
>>
>> Likewise, the json_array function converts a tuple to a JSON array.
>> Column names are ignored.
>>
>> json_array([content [AS name] [, ...]]) returns json
>
> Do you see any problems with the fact that JSON arrays can use mixed
> data types, e.g.:
>
>        [ 1, 2, 'hi', false]

I suppose the json_object and json_array functions would determine
which JSON types to employ by looking at the types of arguments given
(TEXT values would become strings, INT/FLOAT/NUMERIC/etc. values would
become numbers, TRUE/FALSE would become true/false, NULLS would just
be null, and JSON values would just be inserted as themselves). Note
that json_array('"Hello"'::TEXT) would yield '["\"Hello\""]'::JSON,
while json_array('"Hello"'::JSON) would yield '["Hello"]' .

Going the other way around, values pulled out of JSON objects and
arrays would just be of type JSON. This (revised) function signature
says it all:

json_values(JSON) returns JSON[]

In short, I don't believe mixed data types in arrays will be a
problem. json_to_* and *_to_json functions would be used for
individual conversions.

On Fri, May 14, 2010 at 1:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> json_keys gets the keys of a JSON object as a set.
>>
>> json_keys(json) returns setof text
>
> I would tend to make this return text[] rather than SETOF text.
>
>> json_values gets the values of a JSON object or the iems of a JSON
>> array as a set.
>>
>> json_values(json) returns setof json
>
> Similarly I would make this return json[].

Agreed. For those who want sets, the unnest() function can be used.

>> -> retrieves an item of a JSON object by key.
>> [snip]
>> [] retrieves a value of a JSON array/object by (one-based) index.
>> [snip]
>
> I think some kind of array deference and object deference mechanism is
> absolutely, positively 100% required.  I don't know whether the
> particular syntax you've proposed here is best or whether we should
> pick another syntax or just use function notation, but I think we
> definitely need *something*.

If the dereferencing operations aren't available, one could work
around it by using json_keys/json_values. Of course, it would be a
really clunky solution, and implementing -> will probably be easy
compared to implementing those functions.

> I also think we need a function called something like json_length()
> which returns the length of a list or the number of keys in an object.

Definitely.

By the way, I'm considering making it so JSON arrays will be treated
like objects when it comes to -> and the json_keys function. Thus,
json_keys('[1,4,9,16,25]') would yield '{1,2,3,4,5}', and
('[1,4,9,16,25]'::JSON) -> 3 would yield the third item. This would
obviate the need for an array-only subscript function/operator.

In general, I prefer zero-based counting, but because PostgreSQL's
array indexing is one-based, one-based array keys would be better for
the sake of consistency. Note that if there was a function like this
in the future:

-- Access a JSON object like you would in JavaScript
json_path('{"squares": [1,4,9,16,25]}', '.squares[2]')

There could be confusion, as JavaScript uses zero-based indexing.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-15 03:08:36
Message-ID: AANLkTilddD_hjKxBh1MPFF3LIijnjzo6MA7Ebinw_vll@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 14, 2010 at 10:35 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> By the way, I'm considering making it so JSON arrays will be treated
> like objects when it comes to -> and the json_keys function.  Thus,
> json_keys('[1,4,9,16,25]') would yield '{1,2,3,4,5}', and
> ('[1,4,9,16,25]'::JSON) -> 3 would yield the third item.  This would
> obviate the need for an array-only subscript function/operator.
>
> In general, I prefer zero-based counting, but because PostgreSQL's
> array indexing is one-based, one-based array keys would be better for
> the sake of consistency.  Note that if there was a function like this
> in the future:
>
> -- Access a JSON object like you would in JavaScript
> json_path('{"squares": [1,4,9,16,25]}', '.squares[2]')
>
> There could be confusion, as JavaScript uses zero-based indexing.

I think you should take Tom's suggestion and use functional notation
rather than operator notation. And then I think you should use
0-based counting to match JS. But I'm game to be outvoted if others
disagree. Basically, I think you're right: it will be confusing to
have two different notations, and we're certainly going to want a JS
equivalent of XPath at some point.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-15 04:08:04
Message-ID: AANLkTinr8qbyGg6fFQT41qUjDbVzOS7Y34whXLsPas-T@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/5/14 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
>> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> [] retrieves a value of a JSON array/object by (one-based) index.  In
>>> other words, value[n] is equivalent to selecting the nth row of
>>> json_values(value) (provided value is of type JSON).  Examples:
>>>
>>> SELECT ('[1,2,3,4]'::JSON)[3]
>>> -- Result is '3'::JSON
>>>
>>> SELECT ('{"a": 1, "b": 2, "c": 3, "d": 4}'::JSON)[3]
>>> -- Result is '3'::JSON
>
>> I think some kind of array deference and object deference mechanism is
>> absolutely, positively 100% required.  I don't know whether the
>> particular syntax you've proposed here is best or whether we should
>> pick another syntax or just use function notation, but I think we
>> definitely need *something*.
>
> Trying to use array notation on something that isn't a SQL array type
> is guaranteed to be a mess.  I strongly recommend that you not attempt
> that.  Just define a function for it.

I agree. Maybe you can implement cast to hstore datatype.

Regards

Pavel Stehule

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-25 09:37:32
Message-ID: AANLkTinXCZuhLOUuiAfbcZHw_rwAor83ydUEsnV3hT6h@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I started a wiki article for brainstorming the JSON API:
http://wiki.postgresql.org/wiki/JSON_API_Brainstorm . I also made
substantial changes to the draft of the API based on discussion here
and on the #postgresql IRC channel.

Is it alright to use the wiki for brainstorming, or should it stay on
the mailing list or go somewhere else?

I'll try not to spend too much time quibbling over the specifics as I
tend to do. While the brainstorming is going on, I plan to start
implementing the datatype by itself so I can establish an initial
working codebase.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-25 10:57:12
Message-ID: AANLkTilzkJSgTj6LJ8hRvVnVhkHIAqcpgw2mntriWGGv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 25, 2010 at 5:37 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I started a wiki article for brainstorming the JSON API:
> http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
> substantial changes to the draft of the API based on discussion here
> and on the #postgresql IRC channel.
>
> Is it alright to use the wiki for brainstorming, or should it stay on
> the mailing list or go somewhere else?

Well, I think it's fine to use the wiki for brainstorming, but before
you change the design you probably need to talk about it here. You
can't rely on everyone on -hackers to follow changes on a wiki page
somewhere. It looks like the API has been overhauled pretty heavily
since the last version we talked about here, and I'm not sure I
understand it.

> I'll try not to spend too much time quibbling over the specifics as I
> tend to do.  While the brainstorming is going on, I plan to start
> implementing the datatype by itself so I can establish an initial
> working codebase.

Sounds good.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-25 12:38:11
Message-ID: AANLkTildQKF3XZx0g_O5_r7jppluF3V7tbYg0rXyUxIF@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 25, 2010 at 12:57, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, May 25, 2010 at 5:37 AM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> I started a wiki article for brainstorming the JSON API:
>> http://wiki.postgresql.org/wiki/JSON_API_Brainstorm .  I also made
>> substantial changes to the draft of the API based on discussion here
>> and on the #postgresql IRC channel.
>>
>> Is it alright to use the wiki for brainstorming, or should it stay on
>> the mailing list or go somewhere else?
>
> Well, I think it's fine to use the wiki for brainstorming, but before
> you change the design you probably need to talk about it here.  You
> can't rely on everyone on -hackers to follow changes on a wiki page
> somewhere.  It looks like the API has been overhauled pretty heavily
> since the last version we talked about here, and I'm not sure I
> understand it.

The general idea that most people have been using, and that I think is
correct, is to have the discussion here on the list, and then keep a
summary of the current state of it on the wiki page so it's easier for
someone entering the discussion to catch up on where it is.

>> I'll try not to spend too much time quibbling over the specifics as I
>> tend to do.  While the brainstorming is going on, I plan to start
>> implementing the datatype by itself so I can establish an initial
>> working codebase.
>
> Sounds good.

Agreed.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-25 14:52:30
Message-ID: AANLkTimV2VmW_z313k02drTYz1798tahbLNpoe59OO7z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Well, I think it's fine to use the wiki for brainstorming, but before
> you change the design you probably need to talk about it here.  You
> can't rely on everyone on -hackers to follow changes on a wiki page
> somewhere.  It looks like the API has been overhauled pretty heavily
> since the last version we talked about here, and I'm not sure I
> understand it.

I'll try to explain it in one big nutshell:

Instead of, for instance, json_to_number('5') and number_to_json(5), I
propose changing it to from_json(5)::INT and to_json('5'). Note how
from_json simply returns TEXT containing the underlying value for the
user to cast. I plan to make calling to_json/from_json with arrays or
objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw
an error for now, as implementing all the specifics of this could be
quite distracting.

If I'm not mistaken, json_object([content [AS name] [, ...]] | *)
RETURNS json can't be implemented without augmenting the grammar (as
was done with xmlforest), so I considered making it take a RECORD
parameter like the hstore(RECORD) function does, as was suggested on
IRC. However, this may be inadequate for selecting some columns but
not others. Using examples from hstore:

SELECT hstore(foo) FROM foo; => '"e"=>"2.71828", "pi"=>"3.14159"'
-- this works, but what if we only want one field?

SELECT hstore(pi) FROM foo;
-- function type error

SELECT hstore(row(pi)) FROM foo; => '"f1"=>"3.14159"'
-- field name is lost

SELECT hstore(bar) FROM (select pi FROM foo) AS bar; => '"f1"=>"3.14159"'
-- ugly, and field name is *still* lost

To get (and set, which I overlooked before), use json_get and
json_set. These take "JSONPath" expressions, but I don't plan to
implement all sorts of fancy features during the summer. However, I
do plan to support some kind of parameter substitution so you can do
this:

json_get('[0,1,4,9,16,25]', '[%]' %% 2) => '4'::TEXT

For this use case, though, it would be simpler to say:

'[0,1,4,9,16,25]'::JSON -> 2


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, magnus(at)hagander(dot)net
Subject: Re: JSON manipulation functions
Date: 2010-05-25 16:49:38
Message-ID: AANLkTin2OhgkZXztJGrxJYw5oFjoFd4A_UcgBDto-UNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 25, 2010 at 10:52 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Well, I think it's fine to use the wiki for brainstorming, but before
>> you change the design you probably need to talk about it here.  You
>> can't rely on everyone on -hackers to follow changes on a wiki page
>> somewhere.  It looks like the API has been overhauled pretty heavily
>> since the last version we talked about here, and I'm not sure I
>> understand it.
>
> I'll try to explain it in one big nutshell:
>
> Instead of, for instance, json_to_number('5') and number_to_json(5), I
> propose changing it to from_json(5)::INT and to_json('5').  Note how
> from_json simply returns TEXT containing the underlying value for the
> user to cast.  I plan to make calling to_json/from_json with arrays or
> objects (e.g. to_json(ARRAY[1,2,3]) and from_json('[1,2,3]') ) throw
> an error for now, as implementing all the specifics of this could be
> quite distracting.

I don't see how that's an improvement over the previous design. It
seems like it adds a lot of extra casting and removes useful list
operations without any corresponding advantage.

> If I'm not mistaken, json_object([content [AS name] [, ...]] | *)
> RETURNS json can't be implemented without augmenting the grammar (as
> was done with xmlforest), so I considered making it take a RECORD
> parameter like the hstore(RECORD) function does, as was suggested on
> IRC.  However, this may be inadequate for selecting some columns but
> not others.  Using examples from hstore:
>
> SELECT hstore(foo) FROM foo;  => '"e"=>"2.71828", "pi"=>"3.14159"'
> -- this works, but what if we only want one field?
>
> SELECT hstore(pi) FROM foo;
> -- function type error
>
> SELECT hstore(row(pi)) FROM foo;  => '"f1"=>"3.14159"'
> -- field name is lost
>
> SELECT hstore(bar) FROM (select pi FROM foo) AS bar;  => '"f1"=>"3.14159"'
> -- ugly, and field name is *still* lost

Yeah. I'm not sure what to do about this problem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 18:39:37
Message-ID: AANLkTik4vcQEkhJvLHzITKmMe--K8TWGU9ixW0agIGtJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've started implementing the JSON datatype; the repo is at
http://git.postgresql.org/gitweb?p=json-datatype.git .

On Fri, May 14, 2010 at 1:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Would it be a bad idea to give an enum and a function the same name
>> (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
>> could be json_typeof(json) or something instead.
>
> No, I think that's a fine idea.

I tried making a function named json_type that has the same name as
the type json_type. However, this doesn't work as expected:

SELECT json_type('[1,2,3]');

Instead of calling json_type with '[1,2,3]' casted to JSON, it's
trying to cast '[1,2,3]' to json_type. Is there a way to override
this behavior, or would I be better off renaming the function?

Note that if the function were renamed, the literal would implicitly be json:

SELECT json_typeof('[1,2,3]'); -- works

I tried this:

CREATE OR REPLACE FUNCTION json_type(json)
RETURNS json_type
AS 'MODULE_PATHNAME','json_get_type'
LANGUAGE C STRICT IMMUTABLE;

CREATE CAST (json AS json_type) WITH FUNCTION json_type(json);

However, json_type('[1,2,3]') still doesn't work (it doesn't infer
that '[1,2,3]' should be casted to json first). I also tried each of
AS ASSIGNMENT and AS IMPLICIT as well.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 18:44:38
Message-ID: AANLkTim5eTBvsJPMevAg_mTQ0eHI16kaHYJ2U9I87C13@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 27, 2010 at 2:39 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I've started implementing the JSON datatype; the repo is at
> http://git.postgresql.org/gitweb?p=json-datatype.git .
>
> On Fri, May 14, 2010 at 1:15 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Thu, May 13, 2010 at 9:47 PM, Joseph Adams
>> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> Would it be a bad idea to give an enum and a function the same name
>>> (which appears to be allowed by PostgreSQL) ?  If so, json_type(json)
>>> could be json_typeof(json) or something instead.
>>
>> No, I think that's a fine idea.
>
> I tried making a function named json_type that has the same name as
> the type json_type.  However, this doesn't work as expected:
>
> SELECT json_type('[1,2,3]');
>
> Instead of calling json_type with '[1,2,3]' casted to JSON, it's
> trying to cast '[1,2,3]' to json_type.  Is there a way to override
> this behavior, or would I be better off renaming the function?

Well, you should rename either the function or the type, I guess. Not
sure which. Calling it json_typeof would be reasonable...

> Note that if the function were renamed, the literal would implicitly be json:
>
> SELECT json_typeof('[1,2,3]'); -- works
>
> I tried this:
>
> CREATE OR REPLACE FUNCTION json_type(json)
> RETURNS json_type
> AS 'MODULE_PATHNAME','json_get_type'
> LANGUAGE C STRICT IMMUTABLE;
>
> CREATE CAST (json AS json_type) WITH FUNCTION json_type(json);
>
> However, json_type('[1,2,3]') still doesn't work (it doesn't infer
> that '[1,2,3]' should be casted to json first).  I also tried each of
> AS ASSIGNMENT and AS IMPLICIT as well.

Yeah, I don't think you want to go that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 19:35:07
Message-ID: 6547.1274988907@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> writes:
> I tried making a function named json_type that has the same name as
> the type json_type. However, this doesn't work as expected:

> SELECT json_type('[1,2,3]');

> Instead of calling json_type with '[1,2,3]' casted to JSON, it's
> trying to cast '[1,2,3]' to json_type. Is there a way to override
> this behavior, or would I be better off renaming the function?

Well, that might not be the behavior you expected, but that doesn't
make it wrong. The above is, by convention, equivalent to
'[1,2,3]'::json_type, so it's acting as per convention.

If the function is a cast function (which it is), it *should* be named
after the destination type. Doing anything else will violate numerous
longstanding expectations. You might want to read the comments about
function-calls-as-casts in func_get_detail().

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 19:37:42
Message-ID: AANLkTil_8o50OQUpMvM3GddN_V6zbdhNQmBMncViuOJK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 27, 2010 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> If the function is a cast function (which it is),

I don't think it is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 19:50:33
Message-ID: 6852.1274989833@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 27, 2010 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If the function is a cast function (which it is),

> I don't think it is.

It certainly is --- he was actually declaring a cast with it in his
example, no?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 19:55:24
Message-ID: AANLkTilklxKdfghcriDZdpCepScDiqAPlITe_4f0EgMH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 27, 2010 at 3:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, May 27, 2010 at 3:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If the function is a cast function (which it is),
>
>> I don't think it is.
>
> It certainly is --- he was actually declaring a cast with it in his
> example, no?

That was an attempt at a workaround to get it to do what he wanted.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON manipulation functions
Date: 2010-05-27 19:56:59
Message-ID: 6989.1274990219@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, May 27, 2010 at 3:50 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> It certainly is --- he was actually declaring a cast with it in his
>> example, no?

> That was an attempt at a workaround to get it to do what he wanted.

Oh. If you don't want to think of it as being a cast-like operation,
then naming it after the result type is probably the wrong thing anyway.

regards, tom lane