Re: Basic JSON support

From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-20 04:38:21
Message-ID: AANLkTi=9qwA6kj5w6v2dXOY_uW1s+ZxyiKUh=7+jYqfk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 14, 2010 at 10:32 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> Here is a patch for basic JSON support. It adds only those features:
>  * Add "json" data type, that is binary-compatible with text.
>  * Syntax checking on text to JSON conversion.
>  * json_pretty() -- print JSON tree with indentation.
>
> We have "JSON datatype (WIP) 01" item:
>  https://commitfest.postgresql.org/action/patch_view?id=351
> but it seems too complex for me to apply all of the feature
> at once, especially JSON-Path support. So, I'd like to submit
> only basic and minimal JSON datatype support at first.
>
> The most different point of my patch is that JSON parser is
> reimplemented with flex and bison to make maintenance easier.
>
> Note that the JSON parser accept top-level scalar values
> intensionally, because of requirement when we add functions
> to extract values from JSON array/object. For example,
>  CREATE FUNCTION json_to_array(json) RETURNS json[]
> might return naked scalar values in the result array.
>
> --
> Itagaki Takahiro
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

I have written a patch that amends the basic_json-20100915.patch . It
does the following:

* Fixes bugs in the lexer (namely, ' -0 '::JSON was not accepted, and
' """ '::JSON was accepted).
* Adds a json_validate() function.
* Adds test strings from original JSON patch along with some new ones.
* Tweaks the error message for invalid JSON input.
* Fixes the Makefile so json.c won't fail to build because
json_parser.h and json_scanner.h haven't been built yet.

In the lexer, the following regular expression for parsing a number
was incorrect:

0|-?([1-9][0-9]*(\.[0-9]+)?|0\.[0-9]+)([Ee][+-]?[0-9]+)?

This regex doesn't accept '-0', but that's a valid JSON number
according to the standard.

Also, the exclusive state <str> didn't have an <<EOF>> rule, causing
the lexer to treat all characters from an unclosed quote to EOF as
nothing at all.

A less important issue with the lexer is that it doesn't allow the
control character \x7F to appear in strings unescaped. The JSON RFC
doesn't mention \x7F as a control character, and many implementations
of JSON and JavaScript don't treat it as one either.

I went ahead and added json_validate() now because it's useful for
testing (my test strings use it).

I made the error message say "ERROR: invalid input syntax for JSON"
rather than the redundant "ERROR: syntax error in json: syntax
error". However, the specific parsing error (we haven't defined any
yet) is simply ignored by json_validate and company.

As for the Makefile, I think the reason you weren't getting build
problems but I was is because I normally build with make -j2 (run 2
processes at once), which tends to test makefile rule correctness.

Here's one thing I'm worried about: the bison/flex code in your patch
looks rather similar to the code in
http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
GPL. In particular, the incorrect number regex I discussed above can
also be found in jsonval verbatim. However, because there are a lot
of differences in both the bison and flex code now, I'm not sure
they're close enough to be "copied", but I am not a lawyer. It might
be a good idea to contact Ben Spencer and ask him for permission to
license our modified version of the code under PostgreSQL's more
relaxed license, just to be on the safe side.

Finally, could you post a patch with your latest work (with or without
my contribution) so we can tinker with it more? Thanks!

Joey Adams

Attachment Content-Type Size
json_fixed_lexer-20100919.diff text/x-patch 41.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-09-20 04:53:39 pg_comments
Previous Message Matthew D. Fuller 2010-09-20 04:21:33 Re: compile/install of git