Should the JSON datatype be a specialization of text?

From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Should the JSON datatype be a specialization of text?
Date: 2010-06-17 06:29:34
Message-ID: AANLkTimzqCIcZCmRjODSjPniZrR5SWr4mZG9RuSMsgwc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Currently, the JSON datatype (repository:
http://git.postgresql.org/gitweb?p=json-datatype.git;a=summary ) is
implemented somewhat like a specialization of TEXT, like XML is. I'm
beginning to question if this is the right way to go. This doesn't
concern whether the JSON datatype should retain the given content
verbatim (e.g. ' "string" '::JSON doesn't strip spaces) or whether
it should be internally stored using varlena (the same way TEXT is
stored). What I'm talking about revolves around two fundamental
approaches to the API design:

A. JSON as a specialization of TEXT. json('"string"')::TEXT =
'"string"'. To unwrap it, you need a special function:
from_json('"string"')::TEXT .
B. JSON as a naked type. json('"string"')::TEXT = 'string'. To
unwrap it, simply cast to the desired type.

Early in the discussion of the JSON datatype proposal, we leaned in
favor of approach A (see
http://archives.postgresql.org/pgsql-hackers/2010-04/msg00263.php ).
However, based on input I've received (mainly questions about why
from_json and to_json exist), I'm beginning to think that while
approach A makes more sense from an implementor's perspective,
approach B makes a heck of a lot more sense to users.

Although my code currently implements approach A, I am in favor of approach B.

Arguments I can think of in favor of approach A (keeping JSON as a
specialization of TEXT):

* No surprises when casting between JSON and TEXT. If approach B is
used, '"string"'::json would be '"string"', but '"string"'::json::text
would be 'string'.
* 'null'::json and NULL are distinct. 'null'::json is just a string
containing 'null' and won't ever become NULL unless you explicitly
pass it through from_json. Also, if I'm not mistaken, input functions
can't yield null when given non-null input (see the above link).
* For users who just want to store some JSON-encoded text in a
database for a while, approach A probably makes more sense.
* Is consistent with the XML datatype.

Arguments in favor of approach B (making JSON a naked data type):

* Makes data more accessible. Just cast to the type you need, just
like any other data type. No need to remember to_json and from_json
(though these function names might be used for functions to convert
JSON-formatted TEXT to/from the JSON datatype).
* Is consistent with other programming languages. When you
json_decode something in PHP, you get an object or an array. When you
paste a JSON literal into JavaScript code, you end up with a native
type, not some object you have to convert down to a native type.
Notice how in these programming languages, you typically carry
verbatim JSON texts around as strings, not a special string type that
performs validation.
* JSON was meant to be a format representing types in a programming
language. It has arrays, objects, strings, true, false and null
because JavaScript and many other popular scripting languages have
those.
* Users tend to care more about the underlying data in JSON values
than the notion of JSON-formatted text (though users care about that
too). If a user really wants to treat JSON like text, why not just
use TEXT along with CHECK (json_validate(content)) ? Granted, it's
not as fun :-)

One workaround to the null problem of approach B might be to throw an
error when 'null' is passed to the input function (as in, don't allow
the JSON type to even hold 'null' (though null can be nested within an
array/object)), and have a function for converting text to JSON that
returns NULL if 'null' is given. Note that I am strongly against only
allowing the JSON type to hold objects and arrays, in particular
because it would break being able to extract non-compound values from
JSON trees (e.g. json_get('[0,1,2,3]', '[2]') ). By the way, how hard
would it be to get 'null'::JSON to yield NULL?

Keep in mind there's a chance someone will standardize JSON/SQL in the
future, so more may be at stake here than just PostgreSQL's codebase
and users.

Although text versus naked is a fundamental design aspect of the JSON
datatype, it shouldn't be a blocker for me moving forward with the
project. Most of the code in place and in the works shouldn't be
affected by a transition from approach A to B.

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2010-06-17 06:31:06 Re: ANNOUNCE list
Previous Message Fujii Masao 2010-06-17 06:26:46 Re: debug log in pg_archivecleanup