Proposal: Add JSON support

From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proposal: Add JSON support
Date: 2010-03-28 20:48:33
Message-ID: e7e5fefd1003281348v6feb1730u7d43ccf011be6976@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I introduced myself in the thread "Proposal: access control jails (and
introduction as aspiring GSoC student)", and we discussed jails and
session-local variables. But, as Robert Haas suggested, implementing
variable support in the backend would probably be way too ambitious a
project for a newbie like me. I decided instead to pursue the task of
adding JSON support to PostgreSQL, hence the new thread.

I plan to reference datatype-xml.html and functions-xml.html in some
design decisions, but there are some things that apply to XML that
don't apply to JSON and vice versa. For instance, jsoncomment
wouldn't make sense because (standard) JSON doesn't have comments.
For access, we might have something like json_get('foo[1].bar') and
json_set('foo[1].bar', 'hello'). jsonforest and jsonagg would be
beautiful. For mapping, jsonforest/jsonagg could be used to build a
JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2,
...)) FROM tbl), but I'm not sure on the best way to go the other way
around (generate a result set from JSON). CSS-style selectors would
be cool, but "selecting" is what SQL is all about, and I'm not sure
having a json_select("dom-element[key=value]") function is a good,
orthogonal approach.

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot. For the sake of code size, just keeping it in text is
probably best.

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does). Priorities of the JSON implementation, in descending
order, are:

* Small
* Correct
* Fast

Moreover, JSON operations shall not crash due to stack overflows.

I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen
any JSON implementations out there that use it anyway). I would
probably end up writing the JSON parser/serializer manually. It
should not take more than a week.

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8. Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.

Thoughts? Thanks.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-03-28 21:08:32 Re: join removal
Previous Message Josh Berkus 2010-03-28 20:40:01 Alpha release this week?