Initial Review: JSON contrib modul was: Re: Another swing at JSON

From: Bernd Helmle <mailings(at)oopsware(dot)de>
To: Bernd Helmle <mailings(at)oopsware(dot)de>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, David Fetter <david(at)fetter(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Initial Review: JSON contrib modul was: Re: Another swing at JSON
Date: 2011-07-04 11:10:32
Message-ID: A6AFD432F5F6AAB84C0E7D1B@apophis.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

--On 18. Juni 2011 12:29:38 +0200 Bernd Helmle <mailings(at)oopsware(dot)de> wrote:

>> Similar problems occur with a couple other modules I tried (hstore,
>> intarray).
>
> Hmm, works for me. Seems you have messed up your installation in some way
> (build against current -HEAD but running against a 9.1?).
>
> I'm going to review in the next couple of days again.

A bit later than expected, but here is my summary on the patch:

-- Patch Status

The patch applies cleanly. Since it's primarily targeted as an addition to
contrib/, it doesn't touch the backend source tree (besides documentation TOC
entries), but adds a new subdirectory in contrib/json. The patch is in context
diff as requested.

-- Functionality

The patch as is provides a basic implementation for a JSON datatype. It
includes normalization and validation of JSON strings. It adds the datatype
implementation as an extension.

The implementation focus on the datatype functionality only, there is no
additional support for special operators. Thus, i think the comments in the
control file (and docs) promises actually more than the contrib module will
deliver:

+comment = 'data type for storing and manipulating JSON content'

I'm not sure, if "manipulating" is a correct description. Maybe i missed it,
but i didn't see functions to manipulate JSON strings directly, for example,
adding an object to a JSON string, ...

-- Coding

The JSON datatype is implemented as a variable length character type, thus
allows very large JSON strings to be stored. It transparently decides when to
escape unicode code points depending on the selected client- and
server-encoding.

Validation is done through its own JSON parser. The parser itself is a
recursive descent parser. It is noticable that the parser seems to define its
own is_space() and is_digit() functions, e.g.:

+#define is_space(c) ((c) == '\t' || (c) == '\n' || (c) == '\r' || (c) == ' ')

Wouldn't it be more clean to use isspace() instead?

This code block in function json_escape_unicode() looks suspicious:

+ /* Convert to UTF-8, if necessary. */
+ {
+ const char *orig = json;
+ json = (const char *)
+ pg_do_encoding_conversion((unsigned char *) json, length,
+ GetDatabaseEncoding(), PG_UTF8);
+ if (json != orig)
+ length = strlen(json);
+ }

Seems it *always* wants to convert the string. Isn't there a "if" condition
missing?

There's a commented code fragment missing, which should be removed (see last
two lines of the snippet below):

+static unsigned int
+read_hex16(const char *in)
+{
+ unsigned int i;
+ unsigned int tmp;
+ unsigned int ret = 0;
+
+ for (i = 0; i < 4; i++)
+ {
+ char c = *in++;
+
+ Assert(is_hex_digit(c));
+
+ if (c >= '0' && c <= '9')
+ tmp = c - '0';
+ else if (c >= 'A' && c <= 'F')
+ tmp = c - 'A' + 10;
+ else /* if (c >= 'a' && c <= 'f') */
+ tmp = c - 'a' + 10;

json.c introduces new appendStringInfo* functions, e.g.
appendStringInfoEscape() and appendStringInfoUtf8(). Maybe it's better
to name them to something different, since it may occur someday that the backend
will introduce the same notion with a different meaning. They are static,
but why not naming them into something like jsonAppendStringInfoUtf8() or
similar?

-- Regression Tests

The patch includes a fairly complete regression test suite, which covers much
of the formatting, validating and input functionality of the JSON datatype. It
also tests UNICODE sequences and input encoding with other server encoding than
UTF-8.

--
Thanks

Bernd

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2011-07-04 11:57:34 Re: Crash dumps
Previous Message Radosław Smogura 2011-07-04 11:03:38 Re: Crash dumps