Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

Lists: pgsql-hackers
From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-23 06:18:46
Message-ID: AANLkTinqw2jXyS6v0vc_1PCd94L2NF2HvvRQJFU2j90V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is a work-in-progress patch of my GSoC project: Add JSON datatype
to PostgreSQL. It provides the following:

* JSON datatype: A TEXT-like datatype for holding JSON-formatted
text. Although the JSON RFC decrees that a JSON text be an "object or
array" (meaning '"hello"' is considered invalid JSON text), this
datatype lets you store any JSON "value" (meaning '"hello"'::JSON is
allowed).
* Validation: Content is validated when a JSON datum is constructed,
but JSON validation can also be done programmatically with the
json_validate() function.
* Conversion to/from JSON for basic types. Conversion functions are
needed because casting will not unwrap JSON-encoded values. For
instance, json('"string"')::text is '"string"', while
from_json('"string"') is 'string'. Also, to_json can convert
PostgreSQL arrays to JSON arrays, providing a nice option for dealing
with arrays client-side. from_json currently can't handle JSON
arrays/objects yet (how they should act is rather unclear to me,
except when array dimensions and element type are consistent).
* Retrieving/setting values in a JSON node (via selectors very
similar to, but not 100% like, JSONPath as described at
http://goessner.net/articles/JsonPath/ ).
* Miscellaneous functions json_condense and json_type.

This is a patch against CVS HEAD. This module compiles, installs, and
passes all 8 tests successfully on my Ubuntu 9.10 system. It is
covered pretty decently with regression tests. It also has SGML
documentation (the generated HTML is attached for convenience).

Although I am aware of many problems in this patch, I'd like to put it
out sooner rather than later so it can get plenty of peer review.
Problems I'm aware of include:
* Probably won't work properly when the encoding (client or server?)
is not UTF-8. When encoding (e.g. with json_condense), it should (but
doesn't) use \uXXXX escapes for characters the target encoding doesn't
support.
* json.c is rather autarkic. It has its own string buffer system
(rather than using StringInfo) and UTF-8 validator (rather than using
pg_verify_mbstr_len(?) ).
* Some functions/structures are named suggestively, as if they belong
to (and would be nice to have in) PostgreSQL's utility libraries.
They are:
- TypeInfo, initTypeInfo, and getTypeInfo: A less cumbersome
wrapper around get_type_io_data.
- FN_EXTRA and FN_EXTRA_SZ: Macros to make working with
fcinfo->flinfo->fn_extra easier.
- enumLabelToOid: Look up the Oid of an enum label; needed to
return an enum that isn't built-in.
- utf8_substring: Extract a range of UTF-8 characters out of a UTF-8 string.
* Capitalization and function arrangement are rather inconsistent.
Braces are K&R-style.
* json_cleanup and company aren't even used.
* The sql/json.sql test case should be broken into more files.

P.S. The patch is gzipped because it expands to 2.6 megabytes.

Joey Adams

Attachment Content-Type Size
json.html text/html 8.4 KB
json-datatype-wip-01.diff.gz application/x-gzip 61.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-23 18:34:28
Message-ID: AANLkTimKqnq+x5jEsTN8yrUWA4zA+QXK6ms_K-djb6t1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 23, 2010 at 2:18 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> This is a work-in-progress patch of my GSoC project: Add JSON datatype
> to PostgreSQL.  It provides the following:
>
>  * JSON datatype: A TEXT-like datatype for holding JSON-formatted
> text.  Although the JSON RFC decrees that a JSON text be an "object or
> array" (meaning '"hello"' is considered invalid JSON text), this
> datatype lets you store any JSON "value" (meaning '"hello"'::JSON is
> allowed).
>  * Validation: Content is validated when a JSON datum is constructed,
> but JSON validation can also be done programmatically with the
> json_validate() function.
>  * Conversion to/from JSON for basic types.  Conversion functions are
> needed because casting will not unwrap JSON-encoded values.  For
> instance, json('"string"')::text is '"string"', while
> from_json('"string"') is 'string'.  Also, to_json can convert
> PostgreSQL arrays to JSON arrays, providing a nice option for dealing
> with arrays client-side.  from_json currently can't handle JSON
> arrays/objects yet (how they should act is rather unclear to me,
> except when array dimensions and element type are consistent).
>  * Retrieving/setting values in a JSON node (via selectors very
> similar to, but not 100% like, JSONPath as described at
> http://goessner.net/articles/JsonPath/ ).
>  * Miscellaneous functions json_condense and json_type.
>
> This is a patch against CVS HEAD.  This module compiles, installs, and
> passes all 8 tests successfully on my Ubuntu 9.10 system.  It is
> covered pretty decently with regression tests.  It also has SGML
> documentation (the generated HTML is attached for convenience).
>
> Although I am aware of many problems in this patch, I'd like to put it
> out sooner rather than later so it can get plenty of peer review.
> Problems I'm aware of include:
>  * Probably won't work properly when the encoding (client or server?)
> is not UTF-8.  When encoding (e.g. with json_condense), it should (but
> doesn't) use \uXXXX escapes for characters the target encoding doesn't
> support.
>  * json.c is rather autarkic.  It has its own string buffer system
> (rather than using StringInfo) and UTF-8 validator (rather than using
> pg_verify_mbstr_len(?) ).
>  * Some functions/structures are named suggestively, as if they belong
> to (and would be nice to have in) PostgreSQL's utility libraries.
> They are:
>   - TypeInfo, initTypeInfo, and getTypeInfo: A less cumbersome
> wrapper around get_type_io_data.
>   - FN_EXTRA and FN_EXTRA_SZ: Macros to make working with
> fcinfo->flinfo->fn_extra easier.
>   - enumLabelToOid: Look up the Oid of an enum label; needed to
> return an enum that isn't built-in.
>   - utf8_substring: Extract a range of UTF-8 characters out of a UTF-8 string.
>  * Capitalization and function arrangement are rather inconsistent.
> Braces are K&R-style.
>  * json_cleanup and company aren't even used.
>  * The sql/json.sql test case should be broken into more files.
>
> P.S. The patch is gzipped because it expands to 2.6 megabytes.

Some technical points about the submission:

- If you want your coded to be included in PostgreSQL, you need to put
the same license and attribution on it that we use elsewhere.
Generally that looks sorta like this:

/*-------------------------------------------------------------------------
*
* tablecmds.c
* Commands for creating and altering table structures and settings
*
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
* $PostgreSQL$
*
*-------------------------------------------------------------------------
*/

You should have this header on each file, both .c and .h.

- The reason why this patch is 2.6MB is because it has 2.4MB of tests.
I think you should probably pick out the most useful 10% or so, and
drop the rest.

- I was under the impression that we wanted EXPLAIN (FORMAT JSON) to
return type json, but that's obviously not going to be possible if all
of this is contrib. We could (a) move it all into core, (b) move the
type itself and its input and output functions into core and leave the
rest in contrib [or something along those lines], or (c) give up using
it as the return type for EXPLAIN (FORMAT JSON).

- You need to comply with the project coding standards. Thus:

static void
foo()
{
exit(1);
}

Not:

static void foo()
{
exit(1);
}

You should have at least one blank line between every pair of
functions. You should use uncuddled curly braces. Your commenting
style doesn't match the project standard. We prefer explicit NULL
tests over if (!foo). Basically, you should run pgindent on your
code, and also read this:

http://www.postgresql.org/docs/8.4/static/source.html

I don't think this is going to fly either:

/* Declare and initialize a String with the given name. */
#define String(name) String name = NewString()
#define NewString() {{NULL, 0, 0}}

That's just too much magic. We want people to be able to read this
code and easily tell what's going on... spelling a few things out
long hand is OK, good, even.

- You have boatloads of functions in here with no comments whose
functions is entirely non-self-evident. You may or may not need to
rename some of them, but you definitely need to write some comments.

- elog() must be used except for "can't happen" situations. Compare
the way numeric_in() throws an error message versus json_in().

- You have a number of very short convenience functions which don't
seem warranted. For example, build_array_string() is a 2-line
function that is called once. And all the string_append,
string_append_range, string_append_char stuff is duplicating
functionality we already have in appendStringInfoStr,
appendBinaryStringInfo, appendStringInfoChar.

In short, my first impression of this patch is that there's a lot of
good stuff in here, but you need to stop adding features NOW and put A
LOT of work into getting this into a form that the community can
accept. Otherwise, this is likely going to die on the vine, which
would be a shame because a lot of it seems pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-23 18:47:19
Message-ID: AANLkTi=V6SY_acc22cc5wOUga5wvvDHS7c6d6t63WqbH@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 23, 2010 at 2:34 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> - elog() must be used except for "can't happen" situations.  Compare
> the way numeric_in() throws an error message versus json_in().

Er... that should have said "elog() must NOT be used except for can't
happen situations".

Also, I was just looking at json_delete(). While the existing coding
there is kind of fun, I think this can be written more
straightforwardly by doing something like this (not tested):

while (1)
{
while (is_internal(node) && node->v.children.head)
node = node->v.children.head;
if (node->next)
next = node->next;
else if (node->parent)
next = node->parent;
else
break;
free_node(node);
node = next;
}

That gets rid of all of the gotos and one of the local variables and,
at least IMO, is easier to understand... though it would be even
better still if you also added a comment saying something like "We do
a depth-first, left-to-right traversal of the tree, freeing nodes as
we go. We need not bother clearing any of the pointers, because the
traversal order is such that we're never in danger of revisiting a
node we've already freed."

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-24 22:57:18
Message-ID: AANLkTi=L1DDPQuzp9_zTe=76gmcSpVUnV6+-v_ObYWnj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Update: I'm in the middle of cleaning up the JSON code (
http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary if you
want to see the very latest ), so I haven't addressed all of the major
problems with it yet.

On Fri, Jul 23, 2010 at 2:34 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> - I was under the impression that we wanted EXPLAIN (FORMAT JSON) to
> return type json, but that's obviously not going to be possible if all
> of this is contrib.  We could (a) move it all into core, (b) move the
> type itself and its input and output functions into core and leave the
> rest in contrib [or something along those lines], or (c) give up using
> it as the return type for EXPLAIN (FORMAT JSON).

I've been developing it as a contrib module because:
* I'd imagine it's easier than developing it as a built-in datatype
right away (e.g. editing a .sql.in file versus editing pg_type.h ).
* As a module, it has PGXS support, so people can try it out right
away rather than having to recompile PostgreSQL.

I, for one, think it would be great if the JSON datatype were all in
core :-) However, if and how much JSON code should go into core is up
for discussion. Thoughts, anyone?

A particularly useful aspect of the JSON support is the ability to
convert PostgreSQL arrays to JSON arrays (using to_json ), as there
currently isn't any streamlined way to parse arrays in the PostgreSQL
format client-side (that I know of).

Joey Adams


From: Andres Freund <andres(at)anarazel(dot)de>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-25 00:22:36
Message-ID: 20100725002235.GA1485@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 24, 2010 at 06:57:18PM -0400, Joseph Adams wrote:
> A particularly useful aspect of the JSON support is the ability to
> convert PostgreSQL arrays to JSON arrays (using to_json ), as there
> currently isn't any streamlined way to parse arrays in the PostgreSQL
> format client-side (that I know of).
I really would like to address the latter issue some day. I don't know
how many broken implementations I have seen in my, not that long, time
using pg, but it sure is 10+. I also knowingly have written dumbed
down versions.

Andres


From: "Massa, Harald Armin" <chef(at)ghum(dot)de>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-25 06:06:56
Message-ID: AANLkTim4_M4Xmk3oQ4eR-8f87=0p4pk4CvjZZtmuVh5y@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I, for one, think it would be great if the JSON datatype were all in
> core :-)  However, if and how much JSON code should go into core >is up for discussion.  Thoughts, anyone?
>
in my opinion: As soon as possible. Spinning PostgreSQL as the
Ajax-enabled-database has many great uses.

Harald

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Spielberger Straße 49
70435 Stuttgart
0173/9409607
no fx, no carrier pigeon
-
Using PostgreSQL is mostly about sleeping well at night.


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-10 08:03:43
Message-ID: AANLkTikfwBxBpBGnc0heTFjCmj-LCFY7VLAHD+BRzKvo@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated JSON datatype patch. It cleans up the major problems that
have been discussed, and it's very close to being commit-worthy (I
think). The major issues as I see them are:

* Contains several utility functions that may be useful in general.
They are all in util.c / util.h
* It's still a contrib module
* No json_agg or json_object functions for constructing arrays / objects

The utility functions and their potential to collide with themselves
in the future is the main problem with this patch. Of course, this
problem could be sidestepped simply by namespacifying them (prepending
json_ to all the function names). I would like some thoughts and
opinions about the design and usefulness of the utility code.

An overview, along with my thoughts, of the utility functions:

FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT macros
* Useful-ometer: ()--------------------o
* Rationale: Using fcinfo->flinfo->fn_extra takes a lot of
boilerplate. These macros help cut down the boilerplate, and the
comment explains what fn_extra is all about.

TypeInfo structure and getTypeInfo function
* Useful-ometer: ()---------------------------o
* Rationale: The get_type_io_data "six-fer" function is very
cumbersome to use, since one has to declare all the output variables.
The getTypeInfo puts the results in a structure. It also performs the
fmgr_info_cxt step, which is a step done after every usage of
get_type_io_data in the PostgreSQL code.

getEnumLabelOids
* Useful-ometer: ()-----------------------------------o
* Rationale: There is currently no streamlined way to return a custom
enum value from a PostgreSQL function written in C. This function
performs a batch lookup of enum OIDs, which can then be cached with
fn_extra. This should be reasonably efficient, and it's quite elegant
to use (see json_op.c for an example).

UTF-8 functions:
utf8_substring
utf8_decode_char
(there's a patch in the works for a utf8_to_unicode function
which does the same thing as this function)
utf8_validate (variant of pg_verify_mbstr(PG_UTF8, str, length, true)
that allows '\0' characters)
server_to_utf8
utf8_to_server
text_to_utf8_cstring
utf8_cstring_to_text
utf8_cstring_to_text_with_len
* Useful-ometer: ()-------o
* Rationale: The JSON code primarily operates in UTF-8 rather than
the server encoding because it needs to deal with Unicode escapes, and
there isn't an efficient way to encode/decode Unicode codepoints
to/from the server encoding. These functions make it easy to perform
encoding conversions needed for the JSON datatype. However, they're
not very useful when operating solely in the server encoding, hence
the low usefulometric reading.

As for the JSON datatype support itself, nobody has come out against
making JSON a core datatype rather than a contrib module, so I will
proceed with making it one. I guess this would involve adding entries
to pg_type.h and pg_proc.h . Where would I put the rest of the code?
I guess json_io.c and json_op.c (the PG_FUNCTION_ARGS functions) would
become json.c in src/backend/utils/adt . Where would json.c and
jsonpath.c (JSON encoding/decoding functions and JSONPath
implementation) go?

Are there any other issues with the JSON code I didn't spot?

Thanks,

Joey Adams

Attachment Content-Type Size
json-datatype-wip-02.diff application/octet-stream 173.4 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-10 15:32:23
Message-ID: 1281454126-sup-3094@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Joseph Adams's message of mar ago 10 04:03:43 -0400 2010:

> An overview, along with my thoughts, of the utility functions:
>
> FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT macros
> * Useful-ometer: ()--------------------o
>
> TypeInfo structure and getTypeInfo function
> * Useful-ometer: ()---------------------------o
>
> getEnumLabelOids
> * Useful-ometer: ()-----------------------------------o

I think this kind of thing could be stripped from the patch and
submitted separately; they would presumably see a quick review and
commit if they are small and useful (particularly if you can show a
decrease of code verbosity by switching other uses in the existing
code).

The advantage is you don't have to keep arguing for their usefulness in
the JSON patch; and if they turn out to be rejected, they won't cause
the JSON patch to be rejected as a whole.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-10 17:35:30
Message-ID: 3015.1281461730@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Excerpts from Joseph Adams's message of mar ago 10 04:03:43 -0400 2010:
>> An overview, along with my thoughts, of the utility functions:
>>
>> FN_EXTRA, FN_EXTRA_ALLOC, FN_MCXT macros
>> * Useful-ometer: ()--------------------o
>>
>> TypeInfo structure and getTypeInfo function
>> * Useful-ometer: ()---------------------------o
>>
>> getEnumLabelOids
>> * Useful-ometer: ()-----------------------------------o

> I think this kind of thing could be stripped from the patch and
> submitted separately;

+1. It's easier all around if a patch does just one thing. Code
refactoring and feature addition, in particular, are easier to review
separately.

regards, tom lane


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 08:24:45
Message-ID: 1281515085.2142.1486.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2010-07-24 at 18:57 -0400, Joseph Adams wrote:

> I've been developing it as a contrib module because:
> * I'd imagine it's easier than developing it as a built-in datatype
> right away (e.g. editing a .sql.in file versus editing pg_type.h ).
> * As a module, it has PGXS support, so people can try it out right
> away rather than having to recompile PostgreSQL.
>
> I, for one, think it would be great if the JSON datatype were all in
> core :-) However, if and how much JSON code should go into core is up
> for discussion. Thoughts, anyone?

As a GSoC piece of work, doing it as a contrib module gives an
immediately useful deliverable. Good plan.

Once that is available, we can then get some feedback on it and include
it as an in-core datatype later in the 9.1 cycle.

So lets do both: contrib and in-core.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Training and Services


From: David Fetter <david(at)fetter(dot)org>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 22:27:18
Message-ID: 20100811222718.GC1094@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jul 24, 2010 at 06:57:18PM -0400, Joseph Adams wrote:
> Update: I'm in the middle of cleaning up the JSON code (
> http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary if you
> want to see the very latest ), so I haven't addressed all of the major
> problems with it yet.
>
> On Fri, Jul 23, 2010 at 2:34 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > - I was under the impression that we wanted EXPLAIN (FORMAT JSON) to
> > return type json, but that's obviously not going to be possible if all
> > of this is contrib.  We could (a) move it all into core, (b) move the
> > type itself and its input and output functions into core and leave the
> > rest in contrib [or something along those lines], or (c) give up using
> > it as the return type for EXPLAIN (FORMAT JSON).
>
> I've been developing it as a contrib module because:
> * I'd imagine it's easier than developing it as a built-in datatype
> right away (e.g. editing a .sql.in file versus editing pg_type.h ).
> * As a module, it has PGXS support, so people can try it out right
> away rather than having to recompile PostgreSQL.
>
> I, for one, think it would be great if the JSON datatype were all in
> core :-) However, if and how much JSON code should go into core is up
> for discussion. Thoughts, anyone?

+1 for putting it in core in 9.1 :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 22:40:36
Message-ID: 1281566436.14004.2.camel@jd-desktop.unknown.charter.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2010-08-11 at 15:27 -0700, David Fetter wrote:
>
> > I've been developing it as a contrib module because:
> > * I'd imagine it's easier than developing it as a built-in datatype
> > right away (e.g. editing a .sql.in file versus editing pg_type.h ).
> > * As a module, it has PGXS support, so people can try it out right
> > away rather than having to recompile PostgreSQL.
> >
> > I, for one, think it would be great if the JSON datatype were all in
> > core :-) However, if and how much JSON code should go into core is up
> > for discussion. Thoughts, anyone?
>
> +1 for putting it in core in 9.1 :)

I would be curious to the benefit of putting it in core. I have no
problem with the type but in core?

Joshua D. Drake

--
PostgreSQL.org Major Contributor
Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579
Consulting, Training, Support, Custom Development, Engineering
http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt


From: David Fetter <david(at)fetter(dot)org>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 23:33:37
Message-ID: 20100811233337.GA18374@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 11, 2010 at 03:40:36PM -0700, Joshua D. Drake wrote:
> On Wed, 2010-08-11 at 15:27 -0700, David Fetter wrote:
> >
> > > I've been developing it as a contrib module because:
> > > * I'd imagine it's easier than developing it as a built-in
> > > datatype right away (e.g. editing a .sql.in file versus editing
> > > pg_type.h ).
> > > * As a module, it has PGXS support, so people can try it out
> > > right away rather than having to recompile PostgreSQL.
> > >
> > > I, for one, think it would be great if the JSON datatype were
> > > all in core :-) However, if and how much JSON code should go
> > > into core is up for discussion. Thoughts, anyone?
> >
> > +1 for putting it in core in 9.1 :)
>
> I would be curious to the benefit of putting it in core. I have no
> problem with the type but in core?

If it's not in core, the vast majority of users will not have it
installed, and nothing, in core or otherwise, will be able to count on
it.

As this is really pretty green-field stuff, it's super unlikely to
break extant code. :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: David Fetter <david(at)fetter(dot)org>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 23:39:37
Message-ID: 4C6334B9.3080001@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/11/2010 07:33 PM, David Fetter wrote:
>> I would be curious to the benefit of putting it in core. I have no
>> problem with the type but in core?
> If it's not in core, the vast majority of users will not have it
> installed, and nothing, in core or otherwise, will be able to count on
> it.
>
>

You could say that about almost any feature. PostgreSQL is designed to
be modular, and we can hardly credibly use that as an argument against
ourselves.

A convincing argument would be that there is another feature we want in
core that needs or at least could benefit from it.

cheers

andrew


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: David Fetter <david(at)fetter(dot)org>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 23:47:37
Message-ID: 201008112347.o7BNlbG02342@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> On 08/11/2010 07:33 PM, David Fetter wrote:
> >> I would be curious to the benefit of putting it in core. I have no
> >> problem with the type but in core?
> > If it's not in core, the vast majority of users will not have it
> > installed, and nothing, in core or otherwise, will be able to count on
> > it.
> >
> >
>
>
> You could say that about almost any feature. PostgreSQL is designed to
> be modular, and we can hardly credibly use that as an argument against
> ourselves.
>
> A convincing argument would be that there is another feature we want in
> core that needs or at least could benefit from it.

I would say that JSON is no longer a niche data format, which would
suggest its inclusion in core.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-11 23:49:58
Message-ID: 20100811234958.GB18374@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 11, 2010 at 07:39:37PM -0400, Andrew Dunstan wrote:
> On 08/11/2010 07:33 PM, David Fetter wrote:
> >>I would be curious to the benefit of putting it in core. I have no
> >>problem with the type but in core?
> >If it's not in core, the vast majority of users will not have it
> >installed, and nothing, in core or otherwise, will be able to count on
> >it.
>
> You could say that about almost any feature. PostgreSQL is designed
> to be modular, and we can hardly credibly use that as an argument
> against ourselves.
>
> A convincing argument would be that there is another feature we want
> in core that needs or at least could benefit from it.

EXPLAIN (FORMAT JSON) would benefit right away, as its overall code
would be much more likely to be (and stay) correct.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-13 10:33:25
Message-ID: AANLkTimknK8xEKdn3Zjj7TH8RQLFwt=5mDk8F2O1zePJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated patch: the JSON code has all been moved into core, so this
patch is now for a built-in data type. However, I factored the
general-purpose utility functions out into a different patch to be
reviewed separately, so this JSON data type patch won't work without
the utility patch at
http://archives.postgresql.org/pgsql-hackers/2010-08/msg00949.php .

I still need to improve the documentation, as it still looks like the
documentation for a module instead of a built-in data type. Also, the
code is not taking advantage of the json_type_t enum being built-in
yet; it still uses getEnumLabelOids (one of the utility functions) to
retrieve the OIDs dynamically.

Joey Adams

Attachment Content-Type Size
json-datatype-wip-03.diff application/octet-stream 166.3 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-25 05:34:56
Message-ID: AANLkTinaNPBGtcOdTnAxNwbLdXPkNh6KS4Dj-CaMLr5O@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi, I start to review JSON patch.

On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> Updated patch:  the JSON code has all been moved into core, so this
> patch is now for a built-in data type.

I think the patch can be split into two pieces:
1. Basic I/O support for JSON type (in/out/validate)
2. JSONPath support and functions for partial node management

It is better to submit only 1 at first. Of course we should consider
about JSONPath before deciding the internal representation of JSON,
but separated patches can be easily reviewed.

I have several questions about the spec and implementation.
Sorry if you have already discussed about some of them, but I cannot
understand why the current code is the best design from the patch...

* Should we accept a scalar value as a valid JSON?
According to RFC, the root element of JSON text must be an object
or array. But to_json() and from_json() accept scalar values.

* JSON to a scalar value by from_json()
How about to have json_to_array(JSON) instead of from_json()?
JSON value is always an array or object, it's nonsense to convert
it to a scalar value directly; to an array seems to match better.
In addition, an array can be indexed with GIN; index-able JSON
type is very attractive.

* struct JSON seems to be too complex for me.
Can we use List (pg_list.h) instead of linked-list? 'key' and 'key_length'
fields should be held in the parent's List. i.e, JSON_ARRAY has List of
JSON, and JSON_OBJECT has List of {string, JSON} pairs.

We could also discard 'parent' field. It might be needed by JSONPath,
but we can have parent information in variables on C-stack because we
search JSON trees from root to children, no?

I think we don't need 'orig' field because the original input text is
not so important in normal use cases. Instead, we could have formatter
function something like json_pretty(json) RETURNS text.

* On-disk format of JSON values
(There might be some discussions before... What is the conclusion?)
The current code stores the original input text, but we can use
some kinds of pre-parsed format to store JSON, like hstore.
It can be different from BSON.

* Completeness of JSONPath APIs
json_get() can be replaced with json_path(), no?
Also, we can replace existing nodes with json_set(), but we cannot
append new nodes. What do you think modification of JSON value?
If the design is too difficult, it'd be better only to have search
APIs at this moment. Modification APIs will be added in the future.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-25 14:18:43
Message-ID: AANLkTimJBTiwKFaf1OvyunMAFSwzGCyFXOfgu9oU3=YE@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 25, 2010 at 1:34 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> * Should we accept a scalar value as a valid JSON?
> According to RFC, the root element of JSON text must be an object
> or array. But to_json() and from_json() accept scalar values.

This seems a bit like the XML document/content distinction, which I've
never really understood. If [[1], false] is a legal JSON value, then
it seems odd that [1] should be legal but false not.

> * JSON to a scalar value by from_json()
> How about to have json_to_array(JSON) instead of from_json()?
> JSON value is always an array or object, it's nonsense to convert
> it to a scalar value directly; to an array seems to match better.
> In addition, an array can be indexed with GIN; index-able JSON
> type is very attractive.

Yeah, I don't like the excessive use of polymorphism either.

> * On-disk format of JSON values
> (There might be some discussions before... What is the conclusion?)
> The current code stores the original input text, but we can use
> some kinds of pre-parsed format to store JSON, like hstore.
> It can be different from BSON.

I see no value to choosing a different on-disk format. It might speed
up indexing, but I/O will be slower.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-26 06:29:08
Message-ID: AANLkTi=kfFtMJmktkTBSdpiqSGg265QD3QaKm+AHq63g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/8/25 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Aug 25, 2010 at 1:34 AM, Itagaki Takahiro
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> * Should we accept a scalar value as a valid JSON?
>> According to RFC, the root element of JSON text must be an object
>> or array. But to_json() and from_json() accept scalar values.
>
> This seems a bit like the XML document/content distinction, which I've
> never really understood.  If [[1], false] is a legal JSON value, then
> it seems odd that [1] should be legal but false not.

I want false to be parsed without error, just for convinience. JSON
specification seems a bit too strict. For example, it doesn't have
date value as its parts, which results in people implement their own
parsing rule for Date(long). And AFAIK the strictness of JSON parsing
is partly because the main platform was browser engines that can
eval() string that causes security issue. Without execution engine, we
can allow more formats than RFC.

>> * On-disk format of JSON values
>> (There might be some discussions before... What is the conclusion?)
>> The current code stores the original input text, but we can use
>> some kinds of pre-parsed format to store JSON, like hstore.
>> It can be different from BSON.
>
> I see no value to choosing a different on-disk format.  It might speed
> up indexing, but I/O will be slower.

It depends on use cases, but in my mind plain text will do for us. If
we have JavaScript engine in PostgreSQL like pl/v8 and it handles
on-disk format as-is, then we should choose the kind of format, but in
either text or binary format way it is hopeless to have such
compelling environment in the short future.

Regards,

--
Hitoshi Harada


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-26 09:53:36
Message-ID: 87iq2x66ov.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
> It depends on use cases, but in my mind plain text will do for us. If
> we have JavaScript engine in PostgreSQL like pl/v8 and it handles
> on-disk format as-is, then we should choose the kind of format, but in
> either text or binary format way it is hopeless to have such
> compelling environment in the short future.

Well, for javascript support, there's another nice thing happening:

- plscheme is built on GNU Guile
- next version of GNU Guile supports javascript too

http://plscheme.projects.postgresql.org/
http://wingolog.org/archives/2009/02/22/ecmascript-for-guile

So my current guess at which javascript engine we'd get first would be
plscheme. Now I don't know what implication that would have on the
binary storage format of javascript or json documents.

Regards,
--
dim


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-14 01:30:25
Message-ID: AANLkTikCE5kxTFxdHSHtVrX3SvMLhGfJjWMXhZgK9de8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Anyone working on JSON datatype?
If no, I'm going to submit simplified version of JSON datatype patch.

On Wed, Aug 25, 2010 at 2:34 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Updated patch:  the JSON code has all been moved into core, so this
>> patch is now for a built-in data type.
>
> I think the patch can be split into two pieces:
>  1. Basic I/O support for JSON type (in/out/validate)
>  2. JSONPath support and functions for partial node management
>
> It is better to submit only 1 at first. Of course we should consider
> about JSONPath before deciding the internal representation of JSON,
> but separated patches can be easily reviewed.

--
Itagaki Takahiro


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-14 02:06:37
Message-ID: 4C8ED8AD.8080107@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/13/2010 09:30 PM, Itagaki Takahiro wrote:
> Hi,
>
> Anyone working on JSON datatype?
> If no, I'm going to submit simplified version of JSON datatype patch.
>

What's the state of the GSOC project?

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-15 20:59:45
Message-ID: AANLkTik4vJtB7s2UUYQvGcEiG9dnZonDHspB1zianTDv@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 14, 2010 at 04:06, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 09/13/2010 09:30 PM, Itagaki Takahiro wrote:
>>
>> Hi,
>>
>> Anyone working on JSON datatype?
>> If no, I'm going to submit simplified version of JSON datatype patch.
>>
>
> What's the state of the GSOC project?

Well, GSoC itself is over. But Joey said he'd continue to work on the
patch after that, to get it ready for commit. I've been unable to
confirm that with him for this commitfest though, so I don't know the
exact status.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-17 12:32:10
Message-ID: AANLkTiky4FivajKACtXtPbHk9vbToa+SpbxwpvYVeoch@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> Updated patch:  the JSON code has all been moved into core, so this
> patch is now for a built-in data type.

I have a question about the design of the JSON type. Why do we need to
store the value in UTF8 encoding? It's true the RFC of JSON says the
the encoding SHALL be encoded in Unicode, but I don't understand why
we should reject other encodings.

As I said before, I'd like to propose only 3 features in the commitfest:
* TYPE json data type
* text to json: FUNCTION json_parse(text)
* json to text: FUNCTION json_stringify(json, whitelist, space)

JSONPath will be re-implemented on the basic functionalities in the
subsequent commitfest. Do you have a plan to split your patch?
Or, can I continue to develop my patch? If so, JSONPath needs
to be adjusted to the new infrastructure.

I think json_parse() and json_stringify() is well-known APIs for JSON:
https://developer.mozilla.org/En/Using_JSON_in_Firefox
So, it'd be worth buying the names and signatures for our APIs.
(I'll rename json_pretty in my previous patch to json_stringify.)

--
Itagaki Takahiro


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-17 21:45:35
Message-ID: AANLkTi=2fhSXy5kKS8PEKkAC8G_dfuHGct7e=zaK6pFN@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 17, 2010 at 8:32 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Updated patch: the JSON code has all been moved into core, so this
>> patch is now for a built-in data type.
>
> I have a question about the design of the JSON type. Why do we need to
> store the value in UTF8 encoding? It's true the RFC of JSON says the
> the encoding SHALL be encoded in Unicode, but I don't understand why
> we should reject other encodings.

Actually, the code in my original patch should work with any server
encoding in PostgreSQL. However, internally, it operates in UTF-8 and
converts to/from the server encoding when necessary. I did it this
way because the JSON code needs to handle Unicode escapes like
"\u266B", but there is no simple and efficient way (that I know of) to
convert single characters to/from the server encoding.

I noticed that in your new patch, you sidestepped the encoding issue
by simply storing strings in their encoded form (right?). This is
nice and simple, but in the future, JSON tree conversions and updates
will still need to deal with the encoding issue somehow.

> As I said before, I'd like to propose only 3 features in the commitfest:
> * TYPE json data type
> * text to json: FUNCTION json_parse(text)
> * json to text: FUNCTION json_stringify(json, whitelist, space)

Although casting from JSON to TEXT does "stringify" it in my original
patch, I think json_stringify would be much more useful. In addition
to the formatting options, if the internal format of the JSON type
changes and no longer preserves original formatting, then the behavior
of the following would change:

$$ "unnecessary\u0020escape" $$ :: JSON :: TEXT

json_stringify would be more predictable because it would re-encode
the whitespace (but not the \u0020, unless we went out of our way to
make it do that).

Also, json_parse is "unnecessary" if you allow casting from TEXT to
JSON (which my patch does), but I think having json_parse would be
more intuitive for the same reason you do.

Long story short: I like it :-) If you're keeping track, features
from my patch not in the new code yet are:
* Programmatically validating JSON ( json_validate() )
* Getting the type of a JSON value ( json_type() )
* Converting scalar values to/from JSON
* Converting arrays to JSON
* JSONPath

> JSONPath will be re-implemented on the basic functionalities in the
> subsequent commitfest. Do you have a plan to split your patch?
> Or, can I continue to develop my patch? If so, JSONPath needs
> to be adjusted to the new infrastructure.

I think your patch is on a better footing than mine, so maybe I should
start contributing to your code rather than the other way around.
Before the next commitfest, I could merge the testcases from my patch
in and identify parsing discrepancies (if any). Afterward, I could
help merge the other features into the new JSON infrastructure.

I can't compile your initial patch against the latest checkout because
json_parser.h and json_scanner.h are missing. Is there a more recent
patch, or could you update the patch so it compiles? I'd like to
start tinkering with the new code. Thanks!

Joey Adams


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-18 02:28:06
Message-ID: AANLkTi=GTXxhi26diLH+qQ-5O3CcW9tLtMU_FtMA4iBa@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Why do we need to store the value in UTF8 encoding?
>
> because the JSON code needs to handle Unicode escapes like
> "\u266B", but there is no simple and efficient way (that I know of) to
> convert single characters to/from the server encoding.

Ah, we don't need UTF8 encoding only to store JSON data, but we should
care about Unicode escape when we support comparison and extracting
values from JSON, right? I see the worth encoding to UTF8.

One of my proposal is we don't have to keep the original input text.
We store JSON data in effective internal formats. If users want to get
human-readable output, they can use stringify() with indentation option.

> I think your patch is on a better footing than mine, so maybe I should
> start contributing to your code rather than the other way around.
> Before the next commitfest, I could merge the testcases from my patch
> in and identify parsing discrepancies (if any).  Afterward, I could
> help merge the other features into the new JSON infrastructure.

Thanks! I'll contribute my codes developed for another project
(PL/JavaScript), and let's merge our codes to the core.

> I can't compile your initial patch against the latest checkout because
> json_parser.h and json_scanner.h are missing.

Hmm, those files should be generated from .y and .l files. I'll check it.

--
Itagaki Takahiro


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-18 02:33:55
Message-ID: 201009180233.o8I2Xtx16727@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Itagaki Takahiro wrote:
> On Sat, Sep 18, 2010 at 6:45 AM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> >> Why do we need to store the value in UTF8 encoding?
> >
> > because the JSON code needs to handle Unicode escapes like
> > "\u266B", but there is no simple and efficient way (that I know of) to
> > convert single characters to/from the server encoding.
>
> Ah, we don't need UTF8 encoding only to store JSON data, but we should
> care about Unicode escape when we support comparison and extracting
> values from JSON, right? I see the worth encoding to UTF8.
>
> One of my proposal is we don't have to keep the original input text.
> We store JSON data in effective internal formats. If users want to get
> human-readable output, they can use stringify() with indentation option.
>
> > I think your patch is on a better footing than mine, so maybe I should
> > start contributing to your code rather than the other way around.
> > Before the next commitfest, I could merge the testcases from my patch
> > in and identify parsing discrepancies (if any). ?Afterward, I could
> > help merge the other features into the new JSON infrastructure.
>
> Thanks! I'll contribute my codes developed for another project
> (PL/JavaScript), and let's merge our codes to the core.
>
> > I can't compile your initial patch against the latest checkout because
> > json_parser.h and json_scanner.h are missing.
>
> Hmm, those files should be generated from .y and .l files. I'll check it.

I am please the two efforts can be joined. I like the idea of
PL/JavaScript too.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-18 02:46:38
Message-ID: AANLkTinaKZ5HSbctAqzmk8XcKBGztzsieQ-fi4vMVwc8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 17, 2010 at 10:28 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> One of my proposal is we don't have to keep the original input text.
> We store JSON data in effective internal formats. If users want to get
> human-readable output, they can use stringify() with indentation option.

There's a trade-off here: this will make some things faster, but other
things slower. Probably some discussion of the pros and cons is in
order.

(Also, it's important not to break EXPLAIN (FORMAT JSON), which thinks
that the internal format of JSON is text.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-18 03:12:41
Message-ID: AANLkTindmwEcM=e-beqOnNi5wJ_KmP9nQ4vynx=ZsWqj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 18, 2010 at 11:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>> One of my proposal is we don't have to keep the original input text.
>> We store JSON data in effective internal formats. If users want to get
>> human-readable output, they can use stringify() with indentation option.
>
> There's a trade-off here: this will make some things faster, but other
> things slower.  Probably some discussion of the pros and cons is in
> order.

I didn't intended to introduce non-text internal formats. The original
patch spent some codes to keep all of whitespaces as-is in the input.
But I'd say we can simplify it.

Except whitespaces, normalization of strings and numbers might be
problem when we support JSON comparison operators -- comparison of
Unicode escaped characters in strings or 0 vs. 0.0 in numbers.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-18 20:03:08
Message-ID: AANLkTindnPVaJ-ro4WXDGFAsS3jCwrnv2D=x=Cy3iYj=@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 17, 2010 at 11:12 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Sat, Sep 18, 2010 at 11:46 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> <itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>>> One of my proposal is we don't have to keep the original input text.
>>> We store JSON data in effective internal formats. If users want to get
>>> human-readable output, they can use stringify() with indentation option.
>>
>> There's a trade-off here: this will make some things faster, but other
>> things slower.  Probably some discussion of the pros and cons is in
>> order.
>
> I didn't intended to introduce non-text internal formats. The original
> patch  spent some codes to keep all of whitespaces as-is in the input.
> But I'd say we can simplify it.
>
> Except whitespaces, normalization of strings and numbers might be
> problem when we support JSON comparison operators -- comparison of
> Unicode escaped characters in strings or 0 vs. 0.0 in numbers.

Hmm, yeah. I'd be tempted to try to keep the user's original
whitespace as far as possible, but disregard it as far as equality
comparison goes. However, I'm not quite sure what the right thing to
do about 0 vs 0.0 is. Does the JSON spec say anything about that?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-19 03:48:21
Message-ID: AANLkTikD2VTaZ1QvdOZQdD5iR-6fHTksvsi=g_6a8eS4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 18, 2010 at 4:03 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Hmm, yeah.  I'd be tempted to try to keep the user's original
> whitespace as far as possible, but disregard it as far as equality
> comparison goes.  However, I'm not quite sure what the right thing to
> do about 0 vs 0.0 is.  Does the JSON spec say anything about that?

I didn't find anything in the JSON spec about comparison, but in
JavaScript, 0 == 0.0 and 0 === 0.0 are both true. Also, JavaScript
considers two arrays or objects equal if and only if they are
references to the same object, meaning [1,2,3] == [1,2,3] is false,
but if you let var a = [1,2,3]; var b = a; , then a == b and a === b
are both true. Hence, JavaScript can help us when deciding how to
compare scalars, but as for arrays and objects, "we're on our own"
(actually, JSON arrays could be compared lexically like PostgreSQL
arrays already are; I don't think anyone would disagree with that).

I cast my vote for 0 == 0.0 being true.

As for whitespace preservation, I don't think we should go out of our
way to keep it intact. Sure, preserving formatting for input and
output makes some sense because we'd have to go out of our way to
normalize it, but preserving whitespace in JSONPath tree selections
(json_get) and updates (json_set) is a lot of work (that I shouldn't
have done), and it doesn't really help anybody. Consider json_get on
a node under 5 levels of indentation.

Another thing to think about is the possibility of using a non-text
format in the future (such as a binary format or even a format that is
internally indexed). A binary format would likely be faster to
compare (and hence faster to index). If the JSON data type preserves
whitespace today, it might break assumptions of future code when it
stops preserving whitespace. This should at least be documented.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-19 04:15:47
Message-ID: AANLkTikfOWsO=KeCxGqHxbFvGM-67H4aRGhKJ8q4ic=f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 18, 2010 at 11:48 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> As for whitespace preservation, I don't think we should go out of our
> way to keep it intact.  Sure, preserving formatting for input and
> output makes some sense because we'd have to go out of our way to
> normalize it, but preserving whitespace in JSONPath tree selections
> (json_get) and updates (json_set) is a lot of work (that I shouldn't
> have done), and it doesn't really help anybody.  Consider json_get on
> a node under 5 levels of indentation.

That seems reasonable to me. I don't mind messing up the whitespace
when someone pulls a value out using a jsonpath, but I don't think we
should mess with it when they just ask us to store a value. Users
will tend to format their JSON in a way that they find readable, and
while you'd need artificial intelligence to preserve their preferred
formatting in all cases, changing it to what we think is best when
there's no intrinsic necessity doesn't seem helpful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-09-19 09:11:23
Message-ID: 4C95D3BB.5070302@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/12/2010 06:27 AM, David Fetter wrote:

> +1 for putting it in core in 9.1 :)

There are times I really wish I could get object graphs, or at least
hierarchically nested object trees, of objects matching various
criteria. JSON might be a reasonable representation, and one that's
increasingly well supported by many different clients. Having it core
would be really handy for that sort of use, especially as managing
contrib modules is rather far from ideal in Pg as things stand.

Using the usual generic business app example, it'd be good to be able to
retrieve "customer 1234, as well as all dependent records in Invoices
and Addresses" with one query, one round trip ... and no horrid ORM-like
use of left outer joins plus post-processing.

I've been able to do that with nested XML representations, but it's a
remarkably verbose, bulky way to move chunks of data around.

This patch already looks like it has lots of promise for that sort of
use. It'd need aggregates, but that's already come up. A
composite-or-row-type to json converter seems to be much of the point of
this patch, and that's the only other part that's really required. So
I'm excited, and I suspect I won't be the only one.

I'm grabbing it to start playing with it now. I just wanted to chime in
with interest + enthusiasm for JSON as an increasingly useful
representation.

--
Craig Ringer