Re: Basic JSON support

Lists: pgsql-hackers
From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Basic JSON support
Date: 2010-09-15 02:32:32
Message-ID: AANLkTina4SNybVwv9xpN16wjBCSy9PjnEQqU96t6gq5J@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a patch for basic JSON support. It adds only those features:
* Add "json" data type, that is binary-compatible with text.
* Syntax checking on text to JSON conversion.
* json_pretty() -- print JSON tree with indentation.

We have "JSON datatype (WIP) 01" item:
https://commitfest.postgresql.org/action/patch_view?id=351
but it seems too complex for me to apply all of the feature
at once, especially JSON-Path support. So, I'd like to submit
only basic and minimal JSON datatype support at first.

The most different point of my patch is that JSON parser is
reimplemented with flex and bison to make maintenance easier.

Note that the JSON parser accept top-level scalar values
intensionally, because of requirement when we add functions
to extract values from JSON array/object. For example,
CREATE FUNCTION json_to_array(json) RETURNS json[]
might return naked scalar values in the result array.

--
Itagaki Takahiro

Attachment Content-Type Size
basic_json-20100915.patch application/octet-stream 28.6 KB

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-15 16:45:43
Message-ID: E50B963C-C569-4EB2-92C8-4634EDDA5717@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sep 14, 2010, at 7:32 PM, Itagaki Takahiro wrote:

> Here is a patch for basic JSON support. It adds only those features:
> * Add "json" data type, that is binary-compatible with text.
> * Syntax checking on text to JSON conversion.
> * json_pretty() -- print JSON tree with indentation.
>
> We have "JSON datatype (WIP) 01" item:
> https://commitfest.postgresql.org/action/patch_view?id=351
> but it seems too complex for me to apply all of the feature
> at once, especially JSON-Path support. So, I'd like to submit
> only basic and minimal JSON datatype support at first.

So should this be added to the commitfest, or replace this one?

Best,

David


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-15 23:23:55
Message-ID: AANLkTimb5Lm6r3MLHbLj4o6Cg9qYXgzb=YhxOTbxfVNd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> We have "JSON datatype (WIP) 01" item:
>>  https://commitfest.postgresql.org/action/patch_view?id=351
>> but it seems too complex for me to apply all of the feature
>> at once, especially JSON-Path support. So, I'd like to submit
>> only basic and minimal JSON datatype support at first.
>
> So should this be added to the commitfest, or replace this one?

I added my patch, but the previous one will be returned with feedback
if the author doesn't have a new version adjusted to existing comments.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-16 00:44:25
Message-ID: AANLkTimpx_A+6RnWDEV6znSxwZjL0Rx3fQ+9Rcz2V+0z@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2010 at 7:23 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Thu, Sep 16, 2010 at 1:45 AM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>>> We have "JSON datatype (WIP) 01" item:
>>>  https://commitfest.postgresql.org/action/patch_view?id=351
>>> but it seems too complex for me to apply all of the feature
>>> at once, especially JSON-Path support. So, I'd like to submit
>>> only basic and minimal JSON datatype support at first.
>>
>> So should this be added to the commitfest, or replace this one?
>
> I added my patch, but the previous one will be returned with feedback
> if the author doesn't have a new version adjusted to existing comments.

Did you extract this from his work, or is this completely independent?
I'm a bit disinclined to say we should just toss overboard all the
work that's already been done. Why did you write a new patch?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-16 01:53:43
Message-ID: AANLkTinJ36afkbajNjUcPLfeTDjMiuvnWn5aWV5WBGXS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Did you extract this from his work, or is this completely independent?
>  I'm a bit disinclined to say we should just toss overboard all the
> work that's already been done.  Why did you write a new patch?

I read his patch and am inspired by the work, but the code is almost
rewritten. As my previous comment,
http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php
I think it's not a good idea to develop JSON datatype based on the
complex node links. The point of my patch is to simplify it.

I'd like to encourage use of the simplified structure to implement
his other works, including JSONPath.

--
Itagaki Takahiro


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-16 14:16:10
Message-ID: AANLkTimn0OJSWjikKwAkGtoma0tGJ76DYwitnyd3-LtL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2010 at 9:53 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> On Thu, Sep 16, 2010 at 9:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> Did you extract this from his work, or is this completely independent?
>>  I'm a bit disinclined to say we should just toss overboard all the
>> work that's already been done.  Why did you write a new patch?
>
> I read his patch and am inspired by the work, but the code is almost
> rewritten. As my previous comment,
>  http://archives.postgresql.org/pgsql-hackers/2010-08/msg01773.php
> I think it's not a good idea to develop JSON datatype based on the
> complex node links. The point of my patch is to simplify it.
>
> I'd like to encourage use of the simplified structure to implement
> his other works, including JSONPath.

I think that if we make a habit of rewriting the contributions of
first-time contributors in toto, we will have fewer second-time
contributors. I think it would have been a good idea to discuss this
on the list before you went and did it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-16 18:04:32
Message-ID: 4C925C30.8010409@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I think that if we make a habit of rewriting the contributions of
> first-time contributors in toto, we will have fewer second-time
> contributors. I think it would have been a good idea to discuss this
> on the list before you went and did it.

To be fair to Itagaki-san, he DID ask about the status of the SOC
project, and didn't get much of an answer.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-20 04:38:21
Message-ID: AANLkTi=9qwA6kj5w6v2dXOY_uW1s+ZxyiKUh=7+jYqfk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 14, 2010 at 10:32 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> Here is a patch for basic JSON support. It adds only those features:
>  * Add "json" data type, that is binary-compatible with text.
>  * Syntax checking on text to JSON conversion.
>  * json_pretty() -- print JSON tree with indentation.
>
> We have "JSON datatype (WIP) 01" item:
>  https://commitfest.postgresql.org/action/patch_view?id=351
> but it seems too complex for me to apply all of the feature
> at once, especially JSON-Path support. So, I'd like to submit
> only basic and minimal JSON datatype support at first.
>
> The most different point of my patch is that JSON parser is
> reimplemented with flex and bison to make maintenance easier.
>
> Note that the JSON parser accept top-level scalar values
> intensionally, because of requirement when we add functions
> to extract values from JSON array/object. For example,
>  CREATE FUNCTION json_to_array(json) RETURNS json[]
> might return naked scalar values in the result array.
>
> --
> Itagaki Takahiro
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

I have written a patch that amends the basic_json-20100915.patch . It
does the following:

* Fixes bugs in the lexer (namely, ' -0 '::JSON was not accepted, and
' """ '::JSON was accepted).
* Adds a json_validate() function.
* Adds test strings from original JSON patch along with some new ones.
* Tweaks the error message for invalid JSON input.
* Fixes the Makefile so json.c won't fail to build because
json_parser.h and json_scanner.h haven't been built yet.

In the lexer, the following regular expression for parsing a number
was incorrect:

0|-?([1-9][0-9]*(\.[0-9]+)?|0\.[0-9]+)([Ee][+-]?[0-9]+)?

This regex doesn't accept '-0', but that's a valid JSON number
according to the standard.

Also, the exclusive state <str> didn't have an <<EOF>> rule, causing
the lexer to treat all characters from an unclosed quote to EOF as
nothing at all.

A less important issue with the lexer is that it doesn't allow the
control character \x7F to appear in strings unescaped. The JSON RFC
doesn't mention \x7F as a control character, and many implementations
of JSON and JavaScript don't treat it as one either.

I went ahead and added json_validate() now because it's useful for
testing (my test strings use it).

I made the error message say "ERROR: invalid input syntax for JSON"
rather than the redundant "ERROR: syntax error in json: syntax
error". However, the specific parsing error (we haven't defined any
yet) is simply ignored by json_validate and company.

As for the Makefile, I think the reason you weren't getting build
problems but I was is because I normally build with make -j2 (run 2
processes at once), which tends to test makefile rule correctness.

Here's one thing I'm worried about: the bison/flex code in your patch
looks rather similar to the code in
http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
GPL. In particular, the incorrect number regex I discussed above can
also be found in jsonval verbatim. However, because there are a lot
of differences in both the bison and flex code now, I'm not sure
they're close enough to be "copied", but I am not a lawyer. It might
be a good idea to contact Ben Spencer and ask him for permission to
license our modified version of the code under PostgreSQL's more
relaxed license, just to be on the safe side.

Finally, could you post a patch with your latest work (with or without
my contribution) so we can tinker with it more? Thanks!

Joey Adams

Attachment Content-Type Size
json_fixed_lexer-20100919.diff text/x-patch 41.3 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-21 12:38:03
Message-ID: AANLkTimg7qEzVLjjREQuEewpiwWhbCAf8oBVemmWSTfr@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 20, 2010 at 1:38 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I have written a patch that amends the basic_json-20100915.patch .

Thanks. I merged your patch and added json_to_array(), as a demonstration
of json_stringify(). As the current code, json_stringify(json) just returns
the input text as-is, but json_stringify(json, NULL) trims all of unnecessary
whitespaces. We could do it in json_in() and json_parse() and always store
values in compressed representation instead. We leave room for discussion.

I also merge json_test_strings.sql into the main test file.
I slimed some tests -- many test cases seem to be duplicated for me.

> I went ahead and added json_validate() now because it's useful for
> testing (my test strings use it).

Good idea, but how about calling it json_is_well_formed()? We have
similar name of functions for xml type. I renamed it in the patch.

> Here's one thing I'm worried about: the bison/flex code in your patch
> looks rather similar to the code in
> http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
> GPL.  In particular, the incorrect number regex I discussed above can
> also be found in jsonval verbatim.  However, because there are a lot
> of differences in both the bison and flex code now,  I'm not sure
> they're close enough to be "copied", but I am not a lawyer.  It might
> be a good idea to contact Ben Spencer and ask him for permission to
> license our modified version of the code under PostgreSQL's more
> relaxed license, just to be on the safe side.

Sorry for my insincere manner. Surely I read his code.
Do you know his contact address? I cannot find it...

--
Itagaki Takahiro

Attachment Content-Type Size
basic_json-20100921.patch application/octet-stream 36.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-21 12:54:27
Message-ID: AANLkTik9-ztbO=m_2qgnT2QjneRDRD=eeLH5iMQfXQzf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 21, 2010 at 8:38 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> Sorry for my insincere manner. Surely I read his code.
> Do you know his contact address? I cannot find it...

It alarms me quite a bit that someone who is a committer on this
project would accidentally copy code from another project with a
different license into PostgreSQL. How does that happen? And how
much got copied, besides the regular expression? I would be inclined
to flush this patch altogether rather than take ANY risk of GPL
contamination.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-09-21 13:31:01
Message-ID: AANLkTimzLKJDqBj4jrPBAuJn4v=62q+Q81uE6r6XRH9s@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 21, 2010 at 9:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It alarms me quite a bit that someone who is a committer on this
> project would accidentally copy code from another project with a
> different license into PostgreSQL.  How does that happen?  And how
> much got copied, besides the regular expression?  I would be inclined
> to flush this patch altogether rather than take ANY risk of GPL
> contamination.

Only regular expressions in the scanner. So I've thought it's OK,
but I should have been more careful. Sorry.

--
Itagaki Takahiro


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-04 18:50:43
Message-ID: AANLkTinGvV7DgoqBOggaMw-NZ_F0GAaCfWGPSHN+c4+7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> Here's one thing I'm worried about: the bison/flex code in your patch
> looks rather similar to the code in
> http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
> GPL.  In particular, the incorrect number regex I discussed above can
> also be found in jsonval verbatim.  However, because there are a lot
> of differences in both the bison and flex code now,  I'm not sure
> they're close enough to be "copied", but I am not a lawyer.  It might
> be a good idea to contact Ben Spencer and ask him for permission to
> license our modified version of the code under PostgreSQL's more
> relaxed license, just to be on the safe side.

With the help and motivation of David Fetter, I sent an e-mail to Ben
Spencer (google "jsonval", check out the git repo, and look in the git
log to find his e-mail address) a few minutes ago, asking if he would
license jsonval under a license compatible with PostgreSQL, and am
currently awaiting a response.

If he doesn't respond, or outright refuses (which I, for one, doubt
will happen), my fallback plan is to rewrite the JSON validation code
by drawing from my original code (meaning it won't be in bison/flex)
and post a patch for it. Unfortunately, it seems to me that there
aren't very many ways of expressing a JSON parser in bison/flex, and
thus the idea of JSON parsing with bison/flex is effectively locked
down by the GPL unless we can get a more permissive license for
jsonval. But, I am not a lawyer.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-04 19:05:51
Message-ID: AANLkTimQXE2OEJfeYd+1L12dHwrneUraBa+jNH=QZR3c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> On Mon, Sep 20, 2010 at 12:38 AM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Here's one thing I'm worried about: the bison/flex code in your patch
>> looks rather similar to the code in
>> http://www.jsonlint.com/bin/jsonval.tgz , which is licensed under the
>> GPL.  In particular, the incorrect number regex I discussed above can
>> also be found in jsonval verbatim.  However, because there are a lot
>> of differences in both the bison and flex code now,  I'm not sure
>> they're close enough to be "copied", but I am not a lawyer.  It might
>> be a good idea to contact Ben Spencer and ask him for permission to
>> license our modified version of the code under PostgreSQL's more
>> relaxed license, just to be on the safe side.
>
> With the help and motivation of David Fetter, I sent an e-mail to Ben
> Spencer (google "jsonval", check out the git repo, and look in the git
> log to find his e-mail address) a few minutes ago, asking if he would
> license jsonval under a license compatible with PostgreSQL, and am
> currently awaiting a response.
>
> If he doesn't respond, or outright refuses (which I, for one, doubt
> will happen), my fallback plan is to rewrite the JSON validation code
> by drawing from my original code (meaning it won't be in bison/flex)
> and post a patch for it.  Unfortunately, it seems to me that there
> aren't very many ways of expressing a JSON parser in bison/flex, and
> thus the idea of JSON parsing with bison/flex is effectively locked
> down by the GPL unless we can get a more permissive license for
> jsonval.  But, I am not a lawyer.

If someone who hasn't looked at the GPL code sits down and codes
something up based on the json.org home page, it's hard to imagine how
anyone could be grumpy about that. But we do need to be at some
points to make sure that anything we do is truly clean-room, because
we simply can't have GPL code creeping into our code base.

Thanks for continuing to pursue this!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-04 23:45:00
Message-ID: 10101.1286235900@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> If he doesn't respond, or outright refuses (which I, for one, doubt
>> will happen), my fallback plan is to rewrite the JSON validation code
>> by drawing from my original code (meaning it won't be in bison/flex)
>> and post a patch for it. Unfortunately, it seems to me that there
>> aren't very many ways of expressing a JSON parser in bison/flex, and
>> thus the idea of JSON parsing with bison/flex is effectively locked
>> down by the GPL unless we can get a more permissive license for
>> jsonval. But, I am not a lawyer.

> If someone who hasn't looked at the GPL code sits down and codes
> something up based on the json.org home page, it's hard to imagine how
> anyone could be grumpy about that.

Yeah. Joseph seems to be confusing copyrights with patents. The idea
of "parse JSON with bison/flex" is not patentable by any stretch of the
imagination.

But having said that, I wonder whether bison/flex are really the best
tool for the job in the first place. From what I understand of JSON
(which admittedly ain't much) a bison parser seems like overkill:
it'd probably be both bloated and slow compared to a simple handwritten
recursive-descent parser.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-05 00:00:09
Message-ID: 4CAA6A89.4080508@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

> But having said that, I wonder whether bison/flex are really the best
> tool for the job in the first place. From what I understand of JSON
> (which admittedly ain't much) a bison parser seems like overkill:
> it'd probably be both bloated and slow compared to a simple handwritten
> recursive-descent parser.

This appears not to be necessary. The author of JSONval has indicated
that, should we choose to include it in PostgreSQL 9.1, he is open to
re-licensing.

So on a completely *technical* basis, do we want to use JSONval?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-05 01:09:12
Message-ID: 4CAA7AB8.5050205@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/04/2010 08:00 PM, Josh Berkus wrote:
> All,
>
>> But having said that, I wonder whether bison/flex are really the best
>> tool for the job in the first place. From what I understand of JSON
>> (which admittedly ain't much) a bison parser seems like overkill:
>> it'd probably be both bloated and slow compared to a simple handwritten
>> recursive-descent parser.
> This appears not to be necessary. The author of JSONval has indicated
> that, should we choose to include it in PostgreSQL 9.1, he is open to
> re-licensing.
>
> So on a completely *technical* basis, do we want to use JSONval?

I agree with Tom that a hand-cut RD parser is much more likely to be the
way to go. We should not use bison/flex for parsing data values.

cheers

andrew


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, andrew(at)dunslane(dot)net, josh(at)agliodbs(dot)com, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-05 02:29:52
Message-ID: AANLkTi=S0Q1YxG5SuDqWZZjG9C-efLcDsQ1nXxwVM9QY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 4, 2010 at 7:45 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah.  Joseph seems to be confusing copyrights with patents.  The idea
> of "parse JSON with bison/flex" is not patentable by any stretch of the
> imagination.

What I meant is, anyone who sets out to write a JSON parser with
bison/flex is probably going to end up with something very similar to
jsonval even without looking at it.

> But having said that, I wonder whether bison/flex are really the best
> tool for the job in the first place.  From what I understand of JSON
> (which admittedly ain't much) a bison parser seems like overkill:
> it'd probably be both bloated and slow compared to a simple handwritten
> recursive-descent parser.

I agree, and also because JSON isn't a moving target. However, in my
opinion, the differences in quality between a bison parser and a
handwritten parser are to small to be a big deal right now.

A bison parser is probably slower and bigger than a simple
hand-written one, but I haven't benchmarked it, and it's probably not
worth benchmarking at this point.

In correctness, I suspect a bison parser and a hand-written one to be
about the same. Both pass the same army of testcases I built up while
writing my original JSON patch. Granted, there could be weird
semantics of bison/flex that make the parser wrong in subtle ways (for
instance, the . regex char doesn't match \n and EOF), but there can
also be weird errors in hand-written code. Pick your poison.

In safety, a handwritten recursive descent parser could stack overflow
and crash the server unless the proper guards are in place (e.g if
someone tries to encode '[[[[[[...' as JSON). bison guards against
this (chopping the maximum depth of JSON trees to 9997 levels or so),
but I don't know if the real stack is smaller than bison thinks it is,
or if bison uses its own buffer. A handwritten parser could also
accept trees of any depth using a manually managed stack, but it
wouldn't do much good because code that works with JSON trees (e.g.
json_stringify) is currently recursive and has the same problem. In
any case, I think limiting the depth of JSON, to something generous
like 1000, is the right way to go. The RFC allows us to do that.

In my opinion, the path of least resistance is the best path for now.
If we use the bison parser now, then we can replace it with a
hand-written one that's functionally equivalent later.

Joey Adams


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Basic JSON support
Date: 2010-10-05 05:12:38
Message-ID: AANLkTim8xoUvw7Q-muDSfvgYH1iXK5k=D66mp74mk9LO@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/10/5 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Mon, Oct 4, 2010 at 2:50 PM, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> If he doesn't respond, or outright refuses (which I, for one, doubt
>>> will happen), my fallback plan is to rewrite the JSON validation code
>>> by drawing from my original code (meaning it won't be in bison/flex)
>>> and post a patch for it.  Unfortunately, it seems to me that there
>>> aren't very many ways of expressing a JSON parser in bison/flex, and
>>> thus the idea of JSON parsing with bison/flex is effectively locked
>>> down by the GPL unless we can get a more permissive license for
>>> jsonval.  But, I am not a lawyer.
>
>> If someone who hasn't looked at the GPL code sits down and codes
>> something up based on the json.org home page, it's hard to imagine how
>> anyone could be grumpy about that.
>
> Yeah.  Joseph seems to be confusing copyrights with patents.  The idea
> of "parse JSON with bison/flex" is not patentable by any stretch of the
> imagination.
>
> But having said that, I wonder whether bison/flex are really the best
> tool for the job in the first place.  From what I understand of JSON
> (which admittedly ain't much) a bison parser seems like overkill:
> it'd probably be both bloated and slow compared to a simple handwritten
> recursive-descent parser.

PostgreSQL use a bison for same simple things - cube, PostGis,
bootstrap .. so I don't see some special overhead. I am thinking so
lex can be shared from core parser. bison parser for JSON means a less
rows for maintaining a repeating some a basic pattern in pg source
code.

Regards

Pavel Stehule

./contrib/seg/segparse.y
./contrib/cube/cubeparse.y
./src/backend/bootstrap/bootparse.y
./src/backend/parser/gram.y
./src/pl/plpgsql/src/gram.y
./src/interfaces/ecpg/preproc/preproc.y

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>