Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Add JSON datatype to PostgreSQL (GSoC, WIP)
Date: 2010-08-25 05:34:56
Message-ID: AANLkTinaNPBGtcOdTnAxNwbLdXPkNh6KS4Dj-CaMLr5O@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, I start to review JSON patch.

On Fri, Aug 13, 2010 at 7:33 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> Updated patch:  the JSON code has all been moved into core, so this
> patch is now for a built-in data type.

I think the patch can be split into two pieces:
1. Basic I/O support for JSON type (in/out/validate)
2. JSONPath support and functions for partial node management

It is better to submit only 1 at first. Of course we should consider
about JSONPath before deciding the internal representation of JSON,
but separated patches can be easily reviewed.

I have several questions about the spec and implementation.
Sorry if you have already discussed about some of them, but I cannot
understand why the current code is the best design from the patch...

* Should we accept a scalar value as a valid JSON?
According to RFC, the root element of JSON text must be an object
or array. But to_json() and from_json() accept scalar values.

* JSON to a scalar value by from_json()
How about to have json_to_array(JSON) instead of from_json()?
JSON value is always an array or object, it's nonsense to convert
it to a scalar value directly; to an array seems to match better.
In addition, an array can be indexed with GIN; index-able JSON
type is very attractive.

* struct JSON seems to be too complex for me.
Can we use List (pg_list.h) instead of linked-list? 'key' and 'key_length'
fields should be held in the parent's List. i.e, JSON_ARRAY has List of
JSON, and JSON_OBJECT has List of {string, JSON} pairs.

We could also discard 'parent' field. It might be needed by JSONPath,
but we can have parent information in variables on C-stack because we
search JSON trees from root to children, no?

I think we don't need 'orig' field because the original input text is
not so important in normal use cases. Instead, we could have formatter
function something like json_pretty(json) RETURNS text.

* On-disk format of JSON values
(There might be some discussions before... What is the conclusion?)
The current code stores the original input text, but we can use
some kinds of pre-parsed format to store JSON, like hstore.
It can be different from BSON.

* Completeness of JSONPath APIs
json_get() can be replaced with json_path(), no?
Also, we can replace existing nodes with json_set(), but we cannot
append new nodes. What do you think modification of JSON value?
If the design is too difficult, it'd be better only to have search
APIs at this moment. Modification APIs will be added in the future.

--
Itagaki Takahiro

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message KaiGai Kohei 2010-08-25 05:38:30 Re: security label support, part.2
Previous Message Tom Lane 2010-08-25 05:11:27 Re: git: uh-oh