Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-07-23 18:34:28
Message-ID: AANLkTimKqnq+x5jEsTN8yrUWA4zA+QXK6ms_K-djb6t1@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 23, 2010 at 2:18 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> This is a work-in-progress patch of my GSoC project: Add JSON datatype
> to PostgreSQL.  It provides the following:
>
>  * JSON datatype: A TEXT-like datatype for holding JSON-formatted
> text.  Although the JSON RFC decrees that a JSON text be an "object or
> array" (meaning '"hello"' is considered invalid JSON text), this
> datatype lets you store any JSON "value" (meaning '"hello"'::JSON is
> allowed).
>  * Validation: Content is validated when a JSON datum is constructed,
> but JSON validation can also be done programmatically with the
> json_validate() function.
>  * Conversion to/from JSON for basic types.  Conversion functions are
> needed because casting will not unwrap JSON-encoded values.  For
> instance, json('"string"')::text is '"string"', while
> from_json('"string"') is 'string'.  Also, to_json can convert
> PostgreSQL arrays to JSON arrays, providing a nice option for dealing
> with arrays client-side.  from_json currently can't handle JSON
> arrays/objects yet (how they should act is rather unclear to me,
> except when array dimensions and element type are consistent).
>  * Retrieving/setting values in a JSON node (via selectors very
> similar to, but not 100% like, JSONPath as described at
> http://goessner.net/articles/JsonPath/ ).
>  * Miscellaneous functions json_condense and json_type.
>
> This is a patch against CVS HEAD.  This module compiles, installs, and
> passes all 8 tests successfully on my Ubuntu 9.10 system.  It is
> covered pretty decently with regression tests.  It also has SGML
> documentation (the generated HTML is attached for convenience).
>
> Although I am aware of many problems in this patch, I'd like to put it
> out sooner rather than later so it can get plenty of peer review.
> Problems I'm aware of include:
>  * Probably won't work properly when the encoding (client or server?)
> is not UTF-8.  When encoding (e.g. with json_condense), it should (but
> doesn't) use \uXXXX escapes for characters the target encoding doesn't
> support.
>  * json.c is rather autarkic.  It has its own string buffer system
> (rather than using StringInfo) and UTF-8 validator (rather than using
> pg_verify_mbstr_len(?) ).
>  * Some functions/structures are named suggestively, as if they belong
> to (and would be nice to have in) PostgreSQL's utility libraries.
> They are:
>   - TypeInfo, initTypeInfo, and getTypeInfo: A less cumbersome
> wrapper around get_type_io_data.
>   - FN_EXTRA and FN_EXTRA_SZ: Macros to make working with
> fcinfo->flinfo->fn_extra easier.
>   - enumLabelToOid: Look up the Oid of an enum label; needed to
> return an enum that isn't built-in.
>   - utf8_substring: Extract a range of UTF-8 characters out of a UTF-8 string.
>  * Capitalization and function arrangement are rather inconsistent.
> Braces are K&R-style.
>  * json_cleanup and company aren't even used.
>  * The sql/json.sql test case should be broken into more files.
>
> P.S. The patch is gzipped because it expands to 2.6 megabytes.

Some technical points about the submission:

- If you want your coded to be included in PostgreSQL, you need to put
the same license and attribution on it that we use elsewhere.
Generally that looks sorta like this:

/*-------------------------------------------------------------------------
*
* tablecmds.c
* Commands for creating and altering table structures and settings
*
* Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
*
*
* IDENTIFICATION
* $PostgreSQL$
*
*-------------------------------------------------------------------------
*/

You should have this header on each file, both .c and .h.

- The reason why this patch is 2.6MB is because it has 2.4MB of tests.
I think you should probably pick out the most useful 10% or so, and
drop the rest.

- I was under the impression that we wanted EXPLAIN (FORMAT JSON) to
return type json, but that's obviously not going to be possible if all
of this is contrib. We could (a) move it all into core, (b) move the
type itself and its input and output functions into core and leave the
rest in contrib [or something along those lines], or (c) give up using
it as the return type for EXPLAIN (FORMAT JSON).

- You need to comply with the project coding standards. Thus:

static void
foo()
{
exit(1);
}

Not:

static void foo()
{
exit(1);
}

You should have at least one blank line between every pair of
functions. You should use uncuddled curly braces. Your commenting
style doesn't match the project standard. We prefer explicit NULL
tests over if (!foo). Basically, you should run pgindent on your
code, and also read this:

http://www.postgresql.org/docs/8.4/static/source.html

I don't think this is going to fly either:

/* Declare and initialize a String with the given name. */
#define String(name) String name = NewString()
#define NewString() {{NULL, 0, 0}}

That's just too much magic. We want people to be able to read this
code and easily tell what's going on... spelling a few things out
long hand is OK, good, even.

- You have boatloads of functions in here with no comments whose
functions is entirely non-self-evident. You may or may not need to
rename some of them, but you definitely need to write some comments.

- elog() must be used except for "can't happen" situations. Compare
the way numeric_in() throws an error message versus json_in().

- You have a number of very short convenience functions which don't
seem warranted. For example, build_array_string() is a 2-line
function that is called once. And all the string_append,
string_append_range, string_append_char stuff is duplicating
functionality we already have in appendStringInfoStr,
appendBinaryStringInfo, appendStringInfoChar.

In short, my first impression of this patch is that there's a lot of
good stuff in here, but you need to stop adding features NOW and put A
LOT of work into getting this into a form that the community can
accept. Otherwise, this is likely going to die on the vine, which
would be a shame because a lot of it seems pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-23 18:47:19 Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Previous Message Dimitri Fontaine 2010-07-23 18:34:19 Re: patch: to_string, to_array functions