Re: Proposal: Add JSON support

Lists: pgsql-hackers
From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Proposal: Add JSON support
Date: 2010-03-28 20:48:33
Message-ID: e7e5fefd1003281348v6feb1730u7d43ccf011be6976@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I introduced myself in the thread "Proposal: access control jails (and
introduction as aspiring GSoC student)", and we discussed jails and
session-local variables. But, as Robert Haas suggested, implementing
variable support in the backend would probably be way too ambitious a
project for a newbie like me. I decided instead to pursue the task of
adding JSON support to PostgreSQL, hence the new thread.

I plan to reference datatype-xml.html and functions-xml.html in some
design decisions, but there are some things that apply to XML that
don't apply to JSON and vice versa. For instance, jsoncomment
wouldn't make sense because (standard) JSON doesn't have comments.
For access, we might have something like json_get('foo[1].bar') and
json_set('foo[1].bar', 'hello'). jsonforest and jsonagg would be
beautiful. For mapping, jsonforest/jsonagg could be used to build a
JSON string from a result set (SELECT jsonagg(jsonforest(col1, col2,
...)) FROM tbl), but I'm not sure on the best way to go the other way
around (generate a result set from JSON). CSS-style selectors would
be cool, but "selecting" is what SQL is all about, and I'm not sure
having a json_select("dom-element[key=value]") function is a good,
orthogonal approach.

I'm wondering whether the internal representation of JSON should be
plain JSON text, or some binary code that's easier to traverse and
whatnot. For the sake of code size, just keeping it in text is
probably best.

Now my thoughts and opinions on the JSON parsing/unparsing itself:

It should be built-in, rather than relying on an external library
(like XML does). Priorities of the JSON implementation, in descending
order, are:

* Small
* Correct
* Fast

Moreover, JSON operations shall not crash due to stack overflows.

I'm thinking Bison/Flex is overkill for parsing JSON (I haven't seen
any JSON implementations out there that use it anyway). I would
probably end up writing the JSON parser/serializer manually. It
should not take more than a week.

As far as character encodings, I'd rather keep that out of the JSON
parsing/serializing code itself and assume UTF-8. Wherever I'm wrong,
I'll just throw encode/decode/validate operations at it.

Thoughts? Thanks.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-28 21:19:37
Message-ID: 603c8f071003281419u3cfc48e4sb984a14379e64a8e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I'm wondering whether the internal representation of JSON should be
> plain JSON text, or some binary code that's easier to traverse and
> whatnot.  For the sake of code size, just keeping it in text is
> probably best.

+1 for text.

> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>
> It should be built-in, rather than relying on an external library
> (like XML does).

Why? I'm not saying you aren't right, but you need to make an
argument rather than an assertion. This is a community, so no one is
entitled to decide anything unilaterally, and people want to be
convinced - including me.

> As far as character encodings, I'd rather keep that out of the JSON
> parsing/serializing code itself and assume UTF-8.  Wherever I'm wrong,
> I'll just throw encode/decode/validate operations at it.

I think you need to assume that the encoding will be the server
encoding, not UTF-8. Although others on this list are better
qualified to speak to that than I am.

...Robert


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-28 21:42:32
Message-ID: 4BAFCD48.3070903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>
>> I'm wondering whether the internal representation of JSON should be
>> plain JSON text, or some binary code that's easier to traverse and
>> whatnot. For the sake of code size, just keeping it in text is
>> probably best.
>>
>
> +1 for text.
>

Agreed.
>
>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>
>> It should be built-in, rather than relying on an external library
>> (like XML does).
>>
>
> Why? I'm not saying you aren't right, but you need to make an
> argument rather than an assertion. This is a community, so no one is
> entitled to decide anything unilaterally, and people want to be
> convinced - including me.
>

Yeah, why? We should not be in the business of reinventing the wheel
(and then maintaining the reinvented wheel), unless the code in question
is *really* small.

>
>> As far as character encodings, I'd rather keep that out of the JSON
>> parsing/serializing code itself and assume UTF-8. Wherever I'm wrong,
>> I'll just throw encode/decode/validate operations at it.
>>
>
> I think you need to assume that the encoding will be the server
> encoding, not UTF-8. Although others on this list are better
> qualified to speak to that than I am.
>
>
>

The trouble is that JSON is defined to be specifically Unicode, and in
practice for us that means UTF8 on the server side. It could get a bit
hairy, and it's definitely not something I think you can wave away with
a simple "I'll just throw some encoding/decoding function calls at it."

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-28 23:08:24
Message-ID: 10260.1269817704@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Robert Haas wrote:
>> I think you need to assume that the encoding will be the server
>> encoding, not UTF-8. Although others on this list are better
>> qualified to speak to that than I am.

> The trouble is that JSON is defined to be specifically Unicode, and in
> practice for us that means UTF8 on the server side. It could get a bit
> hairy, and it's definitely not something I think you can wave away with
> a simple "I'll just throw some encoding/decoding function calls at it."

It's just text, no? Are there any operations where this actually makes
a difference?

Like Robert, I'm *very* wary of trying to introduce any text storage
into the backend that is in an encoding different from server_encoding.
Even the best-case scenarios for that will involve multiple new places for
encoding conversion failures to happen.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-28 23:22:31
Message-ID: 4BAFE4B7.504@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>
>> Robert Haas wrote:
>>
>>> I think you need to assume that the encoding will be the server
>>> encoding, not UTF-8. Although others on this list are better
>>> qualified to speak to that than I am.
>>>
>
>
>> The trouble is that JSON is defined to be specifically Unicode, and in
>> practice for us that means UTF8 on the server side. It could get a bit
>> hairy, and it's definitely not something I think you can wave away with
>> a simple "I'll just throw some encoding/decoding function calls at it."
>>
>
> It's just text, no? Are there any operations where this actually makes
> a difference?
>

If we're going to provide operations on it that might involve some. I
don't know.
> Like Robert, I'm *very* wary of trying to introduce any text storage
> into the backend that is in an encoding different from server_encoding.
> Even the best-case scenarios for that will involve multiple new places for
> encoding conversion failures to happen.
>
>

I agree entirely. All I'm suggesting is that there could be many
wrinkles here.

Here's another thought. Given that JSON is actually specified to consist
of a string of Unicode characters, what will we deliver to the client
where the client encoding is, say Latin1? Will it actually be a legal
JSON byte stream?

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-28 23:36:10
Message-ID: 19298.1269819370@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Here's another thought. Given that JSON is actually specified to consist
> of a string of Unicode characters, what will we deliver to the client
> where the client encoding is, say Latin1? Will it actually be a legal
> JSON byte stream?

No, it won't. We will *not* be sending anything but latin1 in such a
situation, and I really couldn't care less what the JSON spec says about
it. Delivering wrongly-encoded data to a client is a good recipe for
all sorts of problems, since the client-side code is very unlikely to be
expecting that. A datatype doesn't get to make up its own mind whether
to obey those rules. Likewise, data on input had better match
client_encoding, because it's otherwise going to fail the encoding
checks long before a json datatype could have any say in the matter.

While I've not read the spec, I wonder exactly what "consist of a string
of Unicode characters" should actually be taken to mean. Perhaps it
only means that all the characters must be members of the Unicode set,
not that the string can never be represented in any other encoding.
There's more than one Unicode encoding anyway...

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:16:40
Message-ID: 603c8f071003281716h14f23c40y8e2a0eeb3dcb50ea@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Here's another thought. Given that JSON is actually specified to consist
>> of a string of Unicode characters, what will we deliver to the client
>> where the client encoding is, say Latin1? Will it actually be a legal
>> JSON byte stream?
>
> No, it won't.  We will *not* be sending anything but latin1 in such a
> situation, and I really couldn't care less what the JSON spec says about
> it.  Delivering wrongly-encoded data to a client is a good recipe for
> all sorts of problems, since the client-side code is very unlikely to be
> expecting that.  A datatype doesn't get to make up its own mind whether
> to obey those rules.  Likewise, data on input had better match
> client_encoding, because it's otherwise going to fail the encoding
> checks long before a json datatype could have any say in the matter.
>
> While I've not read the spec, I wonder exactly what "consist of a string
> of Unicode characters" should actually be taken to mean.  Perhaps it
> only means that all the characters must be members of the Unicode set,
> not that the string can never be represented in any other encoding.
> There's more than one Unicode encoding anyway...

See sections 2.5 and 3 of:

http://www.ietf.org/rfc/rfc4627.txt?number=4627

...Robert


From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:23:06
Message-ID: b918cf3d1003281723q55a028fak545c71d459a25ef4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 7:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> Here's another thought. Given that JSON is actually specified to consist
>> of a string of Unicode characters, what will we deliver to the client
>> where the client encoding is, say Latin1? Will it actually be a legal
>> JSON byte stream?
>
> No, it won't.  We will *not* be sending anything but latin1 in such a
> situation, and I really couldn't care less what the JSON spec says about
> it.  Delivering wrongly-encoded data to a client is a good recipe for
> all sorts of problems, since the client-side code is very unlikely to be
> expecting that.  A datatype doesn't get to make up its own mind whether
> to obey those rules.  Likewise, data on input had better match
> client_encoding, because it's otherwise going to fail the encoding
> checks long before a json datatype could have any say in the matter.
>
> While I've not read the spec, I wonder exactly what "consist of a string
> of Unicode characters" should actually be taken to mean.  Perhaps it
> only means that all the characters must be members of the Unicode set,
> not that the string can never be represented in any other encoding.
> There's more than one Unicode encoding anyway...

In practice, every parser/serializer I've used (including the one I
helped write) allows (and, often, forces) any non-ASCII character to
be encoded as \u followed by a string of four hex digits.

Whether it would be easy inside the backend, when generating JSON from
user data stored in tables that are not in a UTF-8 encoded cluster, to
convert to UTF-8, that's something else entirely. If it /is/ easy and
safe, then it's just a matter of scanning for multi-byte sequences and
replacing those with their \uXXXX equivalents. I have some simple and
fast code I could share, if it's needed, though I suspect it's not.
:)

UPDATE: Thanks, Robert, for pointing to the RFC.

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mike Rylander <mrylander(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:33:51
Message-ID: 603c8f071003281733t5fb767d0pdd6c84d39f435b1d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
> In practice, every parser/serializer I've used (including the one I
> helped write) allows (and, often, forces) any non-ASCII character to
> be encoded as \u followed by a string of four hex digits.

Is it correct to say that the only feasible place where non-ASCII
characters can be used is within string constants? If so, it might be
reasonable to disallow characters with the high-bit set unless the
server encoding is one of the flavors of Unicode of which the spec
approves. I'm tempted to think that when the server encoding is
Unicode we really ought to allow Unicode characters natively, because
turning a long string of two-byte wide chars into a long string of
six-byte wide chars sounds pretty evil from a performance point of
view.

...Robert


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mike Rylander <mrylander(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:46:11
Message-ID: 4BAFF853.9060107@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
>
>> In practice, every parser/serializer I've used (including the one I
>> helped write) allows (and, often, forces) any non-ASCII character to
>> be encoded as \u followed by a string of four hex digits.
>>
>
> Is it correct to say that the only feasible place where non-ASCII
> characters can be used is within string constants? If so, it might be
> reasonable to disallow characters with the high-bit set unless the
> server encoding is one of the flavors of Unicode of which the spec
> approves. I'm tempted to think that when the server encoding is
> Unicode we really ought to allow Unicode characters natively, because
> turning a long string of two-byte wide chars into a long string of
> six-byte wide chars sounds pretty evil from a performance point of
> view.
>
>
>

We support exactly one unicode encoding on the server side: utf8.

And the maximum possible size of a validly encoded unicode char in utf8
is 4 (and that's pretty rare, IIRC).

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Mike Rylander <mrylander(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:48:20
Message-ID: 4BAFF8D4.9090308@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
>
>
> Robert Haas wrote:
>> On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander <mrylander(at)gmail(dot)com>
>> wrote:
>>
>>> In practice, every parser/serializer I've used (including the one I
>>> helped write) allows (and, often, forces) any non-ASCII character to
>>> be encoded as \u followed by a string of four hex digits.
>>>
>>
>> Is it correct to say that the only feasible place where non-ASCII
>> characters can be used is within string constants? If so, it might be
>> reasonable to disallow characters with the high-bit set unless the
>> server encoding is one of the flavors of Unicode of which the spec
>> approves. I'm tempted to think that when the server encoding is
>> Unicode we really ought to allow Unicode characters natively, because
>> turning a long string of two-byte wide chars into a long string of
>> six-byte wide chars sounds pretty evil from a performance point of
>> view.
>>
>>
>>
>
> We support exactly one unicode encoding on the server side: utf8.
>
> And the maximum possible size of a validly encoded unicode char in
> utf8 is 4 (and that's pretty rare, IIRC).
>
>

Sorry. Disregard this. I see what you mean.

Yeah, I thing *requiring* non-ascii character to be escaped would be evil.

cheers

andrew


From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 00:52:59
Message-ID: b918cf3d1003281752h5b97e78exc5aa19eaf01cec2b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 8:33 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Mar 28, 2010 at 8:23 PM, Mike Rylander <mrylander(at)gmail(dot)com> wrote:
>> In practice, every parser/serializer I've used (including the one I
>> helped write) allows (and, often, forces) any non-ASCII character to
>> be encoded as \u followed by a string of four hex digits.
>
> Is it correct to say that the only feasible place where non-ASCII
> characters can be used is within string constants?

Yes. That includes object property strings -- they are quoted string literals.

> If so, it might be
> reasonable to disallow characters with the high-bit set unless the
> server encoding is one of the flavors of Unicode of which the spec
> approves.  I'm tempted to think that when the server encoding is
> Unicode we really ought to allow Unicode characters natively, because
> turning a long string of two-byte wide chars into a long string of
> six-byte wide chars sounds pretty evil from a performance point of
> view.
>

+1

As an aside, \u-encoded (escaped) characters and native multi-byte
sequences (of any RFC-allowable Unicode encoding) are exactly
equivalent in JSON -- it's a storage and transmission format, and
doesn't prescribe the application-internal representation of the data.

If it's faster (which it almost certainly is) to not mangle the data
when it's all staying server side, that seems like a useful
optimization. For output to the client, however, it would be useful
to provide a \u-escaping function, which (AIUI) should always be safe
regardless of client encoding.

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 03:24:22
Message-ID: e7e5fefd1003282024q573e5f7am39300cc11917a63c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 5:19 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>
>> It should be built-in, rather than relying on an external library
>> (like XML does).
>
> Why?  I'm not saying you aren't right, but you need to make an
> argument rather than an assertion.  This is a community, so no one is
> entitled to decide anything unilaterally, and people want to be
> convinced - including me.

I apologize; I was just starting the conversation with some of my
ideas to receive feedback. I didn't want people to have to wade
through too many "I think"s . I'll be sure to use <opinion> tags in
the future :-)

My reasoning for "It should be built-in" is:
* It would be nice to have a built-in serialization format that's
available by default.
* It might be a little faster because it doesn't have to link to an
external library.
* The code to interface between JSON logic and PostgreSQL will
probably be much larger than the actual JSON encoding/decoding itself.
* The externally-maintained and packaged libjson implementations I
saw brought in lots of dependencies (e.g. glib).
* "Everyone else" (e.g. PHP) uses a statically-linked JSON implementation.

Is the code in question "*really*" small? Well, not really, but it's
not enormous either. By the way, I found a bug in PHP's JSON_parser
(json_decode("true "); /* with a space */ returns null instead of
true). I'll have to get around to reporting that.

Now, assuming JSON support is built-in to PostgreSQL and is enabled by
default, it is my opinion that encoding issues should not be dealt
with in the JSON code itself, but that the JSON code itself should
assume UTF-8. I think conversions should be done to/from UTF-8 before
passing it through the JSON code because this would likely be the
smallest way to implement it (not necessarily the fastest, though).

Mike Rylander pointed out something wonderful, and that is that JSON
code can be stored in plain old ASCII using \u... . If a target
encoding supports all of Unicode, the JSON serializer could be told
not to generate \u escapes. Otherwise, the \u escapes would be
necessary.

Thus, here's an example of how (in my opinion) character sets and such
should be handled in the JSON code:

Suppose the client's encoding is UTF-16, and the server's encoding is
Latin-1. When JSON is stored to the database:
1. The client is responsible and sends a valid UTF-16 JSON string.
2. PostgreSQL checks to make sure it is valid UTF-16, then converts
it to UTF-8.
3. The JSON code parses it (to ensure it's valid).
4. The JSON code unparses it (to get a representation without
needless whitespace). It is given a flag indicating it should only
output ASCII text.
5. The ASCII is stored in the server, since it is valid Latin-1.

When JSON is retrieved from the database:
1. ASCII is retrieved from the server
2. If user needs to extract one or more fields, the JSON is parsed,
and the fields are extracted.
3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

Note that I am being biased toward optimizing code size rather than speed.

Here's a question about semantics: should converting JSON to text
guarantee that Unicode will be \u escaped, or should it render actual
Unicode whenever possible (when the client uses a Unicode-complete
charset) ?

As for reinventing the wheel, I'm in the process of writing yet
another JSON implementation simply because I didn't find the other
ones I looked at palatable. I am aiming for simple code, not fast
code. I am using malloc for structures and realloc for strings/arrays
rather than resorting to clever buffering tricks. Of course, I'll
switch it over to palloc/repalloc before migrating it to PostgreSQL.


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 03:52:36
Message-ID: e08cc0401003282052s29498f23m34bd0eacb7b58f40@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/3/29 Andrew Dunstan <andrew(at)dunslane(dot)net>:
> Robert Haas wrote:
>> On Sun, Mar 28, 2010 at 4:48 PM, Joseph Adams
>> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> I'm wondering whether the internal representation of JSON should be
>>> plain JSON text, or some binary code that's easier to traverse and
>>> whatnot.  For the sake of code size, just keeping it in text is
>>> probably best.
>>
>> +1 for text.
>
> Agreed.

There's another choice, called BSON.

http://www.mongodb.org/display/DOCS/BSON

I've not researched it yet deeply, it seems reasonable to be stored in
databases as it is invented for MongoDB.

>>> Now my thoughts and opinions on the JSON parsing/unparsing itself:
>>>
>>> It should be built-in, rather than relying on an external library
>>> (like XML does).
>>
>> Why?  I'm not saying you aren't right, but you need to make an
>> argument rather than an assertion.  This is a community, so no one is
>> entitled to decide anything unilaterally, and people want to be
>> convinced - including me.
>
> Yeah, why? We should not be in the business of reinventing the wheel (and
> then maintaining the reinvented wheel), unless the code in question is
> *really* small.

Many implementations in many languages of JSON show that parsing JSON
is not so difficult to code and the needs vary. Hence, I wonder if we
can have it very our own.

Never take it wrongly, I don't disagree text format nor disagree to
use an external library.

Regards,

--
Hitoshi Harada


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 03:52:42
Message-ID: 603c8f071003282052o73ceeeccn1484cba74d949db5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I apologize; I was just starting the conversation with some of my
> ideas to receive feedback.  I didn't want people to have to wade
> through too many "I think"s .  I'll be sure to use <opinion> tags in
> the future :-)

FWIW, I don't care at all whether you say "I think" or "I know"; the
point is that you have to provide backup for any position you choose
to take.

> My reasoning for "It should be built-in" is:
>  * It would be nice to have a built-in serialization format that's
> available by default.
>  * It might be a little faster because it doesn't have to link to an
> external library.

I don't think either of these reasons is valid.

>  * The code to interface between JSON logic and PostgreSQL will
> probably be much larger than the actual JSON encoding/decoding itself.

If true, this is a good argument.

>  * The externally-maintained and packaged libjson implementations I
> saw brought in lots of dependencies (e.g. glib).

As is this.

>  * "Everyone else" (e.g. PHP) uses a statically-linked JSON implementation.

But this isn't.

> Is the code in question "*really*" small?  Well, not really, but it's
> not enormous either.  By the way, I found a bug in PHP's JSON_parser
> (json_decode("true "); /* with a space */ returns null instead of
> true).  I'll have to get around to reporting that.
>
> Now, assuming JSON support is built-in to PostgreSQL and is enabled by
> default, it is my opinion that encoding issues should not be dealt
> with in the JSON code itself, but that the JSON code itself should
> assume UTF-8.  I think conversions should be done to/from UTF-8 before
> passing it through the JSON code because this would likely be the
> smallest way to implement it (not necessarily the fastest, though).
>
> Mike Rylander pointed out something wonderful, and that is that JSON
> code can be stored in plain old ASCII using \u... .  If a target
> encoding supports all of Unicode, the JSON serializer could be told
> not to generate \u escapes.  Otherwise, the \u escapes would be
> necessary.
>
> Thus, here's an example of how (in my opinion) character sets and such
> should be handled in the JSON code:
>
> Suppose the client's encoding is UTF-16, and the server's encoding is
> Latin-1.  When JSON is stored to the database:
>  1. The client is responsible and sends a valid UTF-16 JSON string.
>  2. PostgreSQL checks to make sure it is valid UTF-16, then converts
> it to UTF-8.
>  3. The JSON code parses it (to ensure it's valid).
>  4. The JSON code unparses it (to get a representation without
> needless whitespace).  It is given a flag indicating it should only
> output ASCII text.
>  5. The ASCII is stored in the server, since it is valid Latin-1.
>
> When JSON is retrieved from the database:
>  1. ASCII is retrieved from the server
>  2. If user needs to extract one or more fields, the JSON is parsed,
> and the fields are extracted.
>  3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.
>
> Note that I am being biased toward optimizing code size rather than speed.

Can you comment on my proposal elsewhere on this thread and compare
your proposal to mine? In what ways are they different, and which is
better, and why?

> Here's a question about semantics: should converting JSON to text
> guarantee that Unicode will be \u escaped, or should it render actual
> Unicode whenever possible (when the client uses a Unicode-complete
> charset) ?

I feel pretty strongly that the data should be stored in the database
in the format in which it will be returned to the user - any
conversion which is necessary should happen on the way in. I am not
100% sure to what extent we should attempt to canonicalize the input
and to what extend we should simply store it in whichever way the user
chooses to provide it.

> As for reinventing the wheel, I'm in the process of writing yet
> another JSON implementation simply because I didn't find the other
> ones I looked at palatable.  I am aiming for simple code, not fast
> code.  I am using malloc for structures and realloc for strings/arrays
> rather than resorting to clever buffering tricks.  Of course, I'll
> switch it over to palloc/repalloc before migrating it to PostgreSQL.

I'm not sure that optimizing for simplicity over speed is a good idea.
I think we can reject implementations as unpalatable because they are
slow or feature-poor or have licensing issues or are not actively
maintained, but rejecting them because they use complex code in order
to be fast doesn't seem like the right trade-off to me.

...Robert


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 06:47:11
Message-ID: 877hovtyj4.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> writes:
> As for reinventing the wheel, I'm in the process of writing yet
> another JSON implementation simply because I didn't find the other
> ones I looked at palatable.

Even this one (ANSI C, MIT Licenced)?

cJSON -- An ultra-lightweight, portable, single-file, simple-as-can-be
ANSI-C compliant JSON parser, under MIT license.

http://sourceforge.net/projects/cjson/
http://cjson.svn.sourceforge.net/viewvc/cjson/README?revision=7&view=markup
http://cjson.svn.sourceforge.net/viewvc/cjson/cJSON.c?revision=33&view=markup

And from the cJSON.h we read that it could be somewhat easy to integrate
into PostgreSQL's memory management:
56 typedef struct cJSON_Hooks {
57 void *(*malloc_fn)(size_t sz);
58 void (*free_fn)(void *ptr);
59 } cJSON_Hooks;

Just adding some data points, hoping that's not adding only confusion.

Regards,
--
dim


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 09:20:42
Message-ID: 1269854442.3017.5.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On sön, 2010-03-28 at 23:24 -0400, Joseph Adams wrote:
> Thus, here's an example of how (in my opinion) character sets and such
> should be handled in the JSON code:
>
> Suppose the client's encoding is UTF-16, and the server's encoding is
> Latin-1. When JSON is stored to the database:
> 1. The client is responsible and sends a valid UTF-16 JSON string.
> 2. PostgreSQL checks to make sure it is valid UTF-16, then converts
> it to UTF-8.
> 3. The JSON code parses it (to ensure it's valid).
> 4. The JSON code unparses it (to get a representation without
> needless whitespace). It is given a flag indicating it should only
> output ASCII text.
> 5. The ASCII is stored in the server, since it is valid Latin-1.
>
> When JSON is retrieved from the database:
> 1. ASCII is retrieved from the server
> 2. If user needs to extract one or more fields, the JSON is parsed,
> and the fields are extracted.
> 3. Otherwise, the JSON text is converted to UTF-16 and sent to the client.

The problem I see here is that a data type output function is normally
not aware of the client encoding. The alternatives that I see is that
you always escape everything you see to plain ASCII, so it's valid in
every server encoding, but that would result in pretty sad behavior for
users of languages that don't use a lot of ASCII characters, or you
decree a nonstandard JSON variant that momentarily uses whatever
encoding you decide.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 12:06:38
Message-ID: 4BB097CE.7040700@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> I feel pretty strongly that the data should be stored in the database
> in the format in which it will be returned to the user - any
> conversion which is necessary should happen on the way in. I am not
> 100% sure to what extent we should attempt to canonicalize the input
> and to what extend we should simply store it in whichever way the user
> chooses to provide it.
>
>

ISTM that implies that, with a possible exception when the server
encoding is utf8, you would have to \u escape the data on the way in
fairly pessimistically.

I'd be inclined to say we should store and validate it exactly as the
client gives it to us (converted to the server encoding, as it would be,
of course). In practice that would mean that for non-utf8 databases the
client would need to \u escape it. I suspect most uses of this would be
in utf8-encoded databases anyway.

I also think we should provide a function to do the escaping, so users
could do something like:

insert into foo (myjson) values (json_escape('some jason text here'));

I also thought about a switch to turn on \u escaping on output - that
might be useful for pg_dump for instance.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 16:02:33
Message-ID: 14486.1269878553@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>> My reasoning for "It should be built-in" is:
>> * It would be nice to have a built-in serialization format that's
>> available by default.
>> * It might be a little faster because it doesn't have to link to an
>> external library.

> I don't think either of these reasons is valid.

FWIW, our track record with relying on external libraries has been less
than great --- "upstream will maintain it" sounds good but has fallen
over with respect to both the regex engine and the snowball stemmers,
to take two examples. And libxml2 has been nothing but a source of pain.

If this is going to end up being one fairly small C file implementing
a spec that is not a moving target, I'd vote against depending on an
external library instead, no matter how spiffy and license-compatible
the external library might be.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 16:06:28
Message-ID: 603c8f071003290906q6bc3aedeh904c9c0d2a833e5b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 29, 2010 at 12:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sun, Mar 28, 2010 at 11:24 PM, Joseph Adams
>> <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>>> My reasoning for "It should be built-in" is:
>>>  * It would be nice to have a built-in serialization format that's
>>> available by default.
>>>  * It might be a little faster because it doesn't have to link to an
>>> external library.
>
>> I don't think either of these reasons is valid.
>
> FWIW, our track record with relying on external libraries has been less
> than great --- "upstream will maintain it" sounds good but has fallen
> over with respect to both the regex engine and the snowball stemmers,
> to take two examples.  And libxml2 has been nothing but a source of pain.
>
> If this is going to end up being one fairly small C file implementing
> a spec that is not a moving target, I'd vote against depending on an
> external library instead, no matter how spiffy and license-compatible
> the external library might be.

Fair enough. Note that I did go on to say which reasons I did think
were potentially valid. ;-)

...Robert


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 17:25:35
Message-ID: 4BB0E28F.8090005@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3/28/10 8:52 PM, Hitoshi Harada wrote:
> There's another choice, called BSON.
>
> http://www.mongodb.org/display/DOCS/BSON
>
> I've not researched it yet deeply, it seems reasonable to be stored in
> databases as it is invented for MongoDB.

I wouldn't take that for granted. The MongoDB project involves a lot of
"re-inventing the wheel" and I'd scrutinize any of their innovations
pretty thoroughly.

Besides, I thought the point of a JSON type was to be compatible with
the *majority* of JSON users?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 18:23:16
Message-ID: 7852A8E0-17D5-4220-9812-844D79C63387@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mar 29, 2010, at 9:02 AM, Tom Lane wrote:

> If this is going to end up being one fairly small C file implementing
> a spec that is not a moving target, I'd vote against depending on an
> external library instead, no matter how spiffy and license-compatible
> the external library might be.

Perhaps you could fork one, in that case.

Best,

David


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 18:33:10
Message-ID: e7e5fefd1003291133x38785f6fx6d1d24976ce87d04@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Mar 29, 2010 at 2:23 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Mar 29, 2010, at 9:02 AM, Tom Lane wrote:
>
>> If this is going to end up being one fairly small C file implementing
>> a spec that is not a moving target, I'd vote against depending on an
>> external library instead, no matter how spiffy and license-compatible
>> the external library might be.
>
> Perhaps you could fork one, in that case.
>
> Best,
>
> David
>
>

I'm considering using and adapting cJSON instead of continuing with my
redundant implementation. I could run `indent -kr -i4` on it (will
that match PostgreSQL's coding style?), add support for UTF-16
surrogate pairs (pair of \u... escapes for each character above U+FFFF
as required by the JSON spec), and add a switch to turn on/off pure
ASCII output.

P.S.: Sorry for the repeat, David. I forgot to CC the mailing list.


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 19:49:48
Message-ID: m2bpe652n7.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> If this is going to end up being one fairly small C file implementing
> a spec that is not a moving target, I'd vote against depending on an
> external library instead, no matter how spiffy and license-compatible
> the external library might be.

My understanding is that it's possible to include (fork) a MIT or BSD
source code into our source tree, right? (Some other licenses certainly
apply too).

Regards,
--
dim


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-29 20:12:13
Message-ID: 20190.1269893533@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine <dfontaine(at)hi-media(dot)com> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> If this is going to end up being one fairly small C file implementing
>> a spec that is not a moving target, I'd vote against depending on an
>> external library instead, no matter how spiffy and license-compatible
>> the external library might be.

> My understanding is that it's possible to include (fork) a MIT or BSD
> source code into our source tree, right? (Some other licenses certainly
> apply too).

MIT or 2-clause BSD would be ok for such a thing, other licenses
probably not.

regards, tom lane


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-30 21:38:54
Message-ID: 877hota3rl.fsf@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

joeyadams3(dot)14159(at)gmail(dot)com (Joseph Adams) writes:
> I introduced myself in the thread "Proposal: access control jails (and
> introduction as aspiring GSoC student)", and we discussed jails and
> session-local variables. But, as Robert Haas suggested, implementing
> variable support in the backend would probably be way too ambitious a
> project for a newbie like me. I decided instead to pursue the task of
> adding JSON support to PostgreSQL, hence the new thread.

Interesting...

I had a discussion about much this sort of thing with a local LUG
associate; he was interested in this from a "doing CouchDB-ish things
using PostgreSQL" perspective.

There were a couple perspectives there, which may be somewhat orthogonal
to what you're trying to do. I'll mention them as they may suggest
useful operations.

1. Buddy Myles pointed out a NYTimes project which does something
pretty analagous...
http://code.nytimes.com/projects/dbslayer

This is a proxy that allows clients to submit requests via HTTP,
returning responses in JSON form. Note that the HTTP request has the
SQL query embedded into it.

2. CouchDB's interface is much the same, where clients submit HTTP
requests and receive JSON responses back, but with the difference that
the query is a stylized sorta-JSON form.

I'd think that you could get quite a long ways on this, at least doing
something like dbslayer without *necessarily* needing to do terribly
much work inside the DB engine.

Mapping a tuple, or a list of tuples, into a forest of JSON documents
should be pretty straightforward; whether or not it's really desirable
to operate a JSON-flavoured query inside PostgreSQL may be the
difference between *this year's* GSOC and *next year's* :-).
--
"...the Jedi learned early on what language the universe was
programmed in. Then they took advantage of an accident of language to
obscure this fact from the unwashed. They all affected an inverted
lisp. so, a Jedi to be, you the Forth must use."


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-31 00:58:38
Message-ID: 4BB29E3E.40809@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I'd think that you could get quite a long ways on this, at least doing
> something like dbslayer without *necessarily* needing to do terribly
> much work inside the DB engine.

There's actually an HTTP framework tool for Postgres which already does
something of the sort. It was introduced at pgCon 2 years ago ... will
look for.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-31 01:03:26
Message-ID: 603c8f071003301803o60ba3b09lc474ad02dfa810a5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Mar 30, 2010 at 8:58 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> I'd think that you could get quite a long ways on this, at least doing
>> something like dbslayer without *necessarily* needing to do terribly
>> much work inside the DB engine.
>
> There's actually an HTTP framework tool for Postgres which already does
> something of the sort.  It was introduced at pgCon 2 years ago ... will
> look for.

While it might be interesting to have/find/write a tool that puts an
HTTP/JSON layer around the DB connection, it's pretty much entirely
unrelated to the proposed project of creating a json type with
PostgreSQL analagous to the xml type we already have, which is what
the OP is proposing to do.

Personally, I suspect that a JSON type is both a more interesting
project to work on and a more useful result for this community.

...Robert


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-31 01:56:12
Message-ID: 4BB2ABBC.7040607@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> While it might be interesting to have/find/write a tool that puts an
> HTTP/JSON layer around the DB connection, it's pretty much entirely
> unrelated to the proposed project of creating a json type with
> PostgreSQL analagous to the xml type we already have, which is what
> the OP is proposing to do.
>
> Personally, I suspect that a JSON type is both a more interesting
> project to work on and a more useful result for this community.
>
>
>

I agree.

cheers

andrew


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-03-31 15:14:04
Message-ID: 87vdcc8qwz.fsf@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

robertmhaas(at)gmail(dot)com (Robert Haas) writes:
> On Tue, Mar 30, 2010 at 8:58 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> I'd think that you could get quite a long ways on this, at least doing
>>> something like dbslayer without *necessarily* needing to do terribly
>>> much work inside the DB engine.
>>
>> There's actually an HTTP framework tool for Postgres which already does
>> something of the sort.  It was introduced at pgCon 2 years ago ... will
>> look for.
>
> While it might be interesting to have/find/write a tool that puts an
> HTTP/JSON layer around the DB connection, it's pretty much entirely
> unrelated to the proposed project of creating a json type with
> PostgreSQL analagous to the xml type we already have, which is what
> the OP is proposing to do.
>
> Personally, I suspect that a JSON type is both a more interesting
> project to work on and a more useful result for this community.

No disagreement here; I'd expect that a JSON type would significantly
ease building such a framework. Indeed, that could be a demonstration
of success...

"We then implemented an HTTP/JSON proxy in 27 lines of Python code..."
:-)
--
"Unless you used NetInfo. _Then_ changing network settings could
often require torching of the existing system, salting of the ground
it had rested on, and termination of anyone who used it."
-- JFW <jwiede(at)biff(dot)com> on comp.sys.next.advocacy


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-01 03:39:08
Message-ID: p2pe7e5fefd1003312039yf96ecd45n3c84296749e26e29@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I ended up reinventing the wheel and writing another JSON library:

http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2

This is a first release, and it doesn't really have a name besides
"json". It's very similar to cJSON, except it is (sans unknown bugs)
more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
It has a simple test suite. It is not prone to stack overflows, as
it doesn't recurse. It is strict, requires input to be UTF-8 (it
validates it first) and only outputs UTF-8. Other than treating
numbers liberally, my implementation only accepts valid JSON code (it
doesn't try to correct anything, even Unicode problems). It is under
the MIT license.


From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-02 04:34:49
Message-ID: 4BB573E9.50107@pjmodos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 1.4.2010 5:39, Joseph Adams napsal(a):
> I ended up reinventing the wheel and writing another JSON library:
>
> http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2
>
> This is a first release, and it doesn't really have a name besides
> "json". It's very similar to cJSON, except it is (sans unknown bugs)
> more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
> It has a simple test suite. It is not prone to stack overflows, as
> it doesn't recurse. It is strict, requires input to be UTF-8 (it
> validates it first) and only outputs UTF-8. Other than treating
> numbers liberally, my implementation only accepts valid JSON code (it
> doesn't try to correct anything, even Unicode problems). It is under
> the MIT license.
>

I did some testing on my own, it passed everything I have thrown at it
so far.
I also did tests using MSVC for both 32bit and 64bit targets and it
worked fine too (except for missing stdbool.h in msvc which is no big deal).

The coding style compared to cJSON (or other libs I've seen) seems
closer to the style of PostgreSQL, it would however still require
pgindent run and maybe some minor adjustments.

--
Regards
Petr Jelinek (PJMODOS)


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-02 15:36:26
Message-ID: 66082611-E397-4DDD-8258-0BDFA7675786@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Apr 1, 2010, at 9:34 PM, Petr Jelinek wrote:

>> I ended up reinventing the wheel and writing another JSON library:
>>
>> http://constellationmedia.com/~funsite/static/json-0.0.1.tar.bz2
>>
>> This is a first release, and it doesn't really have a name besides
>> "json". It's very similar to cJSON, except it is (sans unknown bugs)
>> more reliable, more correct, and cleaner (unless you hate gotos ;-) ).
>> It has a simple test suite. It is not prone to stack overflows, as
>> it doesn't recurse. It is strict, requires input to be UTF-8 (it
>> validates it first) and only outputs UTF-8. Other than treating
>> numbers liberally, my implementation only accepts valid JSON code (it
>> doesn't try to correct anything, even Unicode problems). It is under
>> the MIT license.
>>
>
> I did some testing on my own, it passed everything I have thrown at it so far.
> I also did tests using MSVC for both 32bit and 64bit targets and it worked fine too (except for missing stdbool.h in msvc which is no big deal).
>
> The coding style compared to cJSON (or other libs I've seen) seems closer to the style of PostgreSQL, it would however still require pgindent run and maybe some minor adjustments.

Someone approve this project for the GSoC quick, before Joseph finishes it!

Best,

David


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-03 12:59:10
Message-ID: k2le7e5fefd1004030559h449d22f1wc7940673bd8e4a5c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been wondering whether the JSON datatype should be strict or conservative.

For one, there's strict JSON (following the exact specification).
Then there's more conservative JSON variants. Some JSON parsers
support comments, some support invalid number formats (e.g. '3.' or
'+5'), etc..

The consensus seems to be that JSON content should be stored verbatim
(it should store the exact string the client sent to it), as is done
with XML. However, this notion is somewhat incompatible with "Be
conservative in what you do; be liberal in what you accept from
others" because we can't accept loose JSON, then spit out conservative
JSON without messing with the content.

Here's my idea: the datatype should only allow strict JSON, but there
should be a function that accepts a liberal format, cleans it up to
make it strict JSON, and converts it to JSON. I think making strict
JSON the default makes the most sense because:
* Inputs to the database will most likely be coming from programs, not humans.
* Output is expected to be valid JSON and work anywhere JSON should work.
* Strict JSON is what more people would expect, I'd think.


From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-03 13:53:08
Message-ID: v2xb918cf3d1004030653i6c71b473lb4eb8afba3d2386b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 3, 2010 at 8:59 AM, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I've been wondering whether the JSON datatype should be strict or conservative.
>
> For one, there's strict JSON (following the exact specification).
> Then there's more conservative JSON variants.  Some JSON parsers
> support comments, some support invalid number formats (e.g. '3.' or
> '+5'), etc..
>
> The consensus seems to be that JSON content should be stored verbatim
> (it should store the exact string the client sent to it), as is done
> with XML.  However, this notion is somewhat incompatible with "Be
> conservative in what you do; be liberal in what you accept from
> others" because we can't accept loose JSON, then spit out conservative
> JSON without messing with the content.
>
> Here's my idea: the datatype should only allow strict JSON, but there
> should be a function that accepts a liberal format, cleans it up to
> make it strict JSON, and converts it to JSON.  I think making strict
> JSON the default makes the most sense because:
>  * Inputs to the database will most likely be coming from programs, not humans.
>  * Output is expected to be valid JSON and work anywhere JSON should work.
>  * Strict JSON is what more people would expect, I'd think.

+1

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Mike Rylander <mrylander(at)gmail(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-03 15:20:45
Message-ID: 4BB75CCD.1040906@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mike Rylander wrote:
>>
>> Here's my idea: the datatype should only allow strict JSON, but there
>> should be a function that accepts a liberal format, cleans it up to
>> make it strict JSON, and converts it to JSON. I think making strict
>> JSON the default makes the most sense because:
>> * Inputs to the database will most likely be coming from programs, not humans.
>> * Output is expected to be valid JSON and work anywhere JSON should work.
>> * Strict JSON is what more people would expect, I'd think.
>>
>
> +1
>
>

Yeah. That's the only thing that makes sense to me. We don't allow badly
formed XML, for example, although we do allow document fragments (as
required by the standard, IIRC). But we could sensibly have some
function like 'cleanup_json(almost_json text) returns json'.

cheers

andrew


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 03:50:14
Message-ID: m2qe7e5fefd1004052050pf99c54b3pb315256a812e279c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Another JSON strictness issue: the JSON standard (
http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
be an array or object. However, my implementation currently accepts
any valid value. Thus, '3', '"hello"', 'true', 'false', and 'null'
are all accepted by my implementation, but are not strictly JSON text.
The question is: should the JSON datatype accept atomic values (those
that aren't arrays or objects) as valid JSON?

I tried a few other JSON implementations to see where they stand
regarding atomic types as input:

JSON_checker (C) does not accept them.
JSON.parse() (JavaScript) accepts them.
json_decode() (PHP) accepts them. However, support is currently buggy
(e.g. '1' is accepted, but '1 ' is not).
cJSON (C) accepts them.
JSON.pm (Perl) accepts them if you specify the allow_nonref option.
Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
a string by itself.

In my opinion, we should accept an atomic value as valid JSON content.
I suppose we could get away with calling it a "content" fragment as
is done with XML without a doctype.

Accepting atomic values as valid JSON would be more orthagonal, as it
would be possible to have a function like this:

json_values(object_or_array JSON) RETURNS SETOF JSON
-- extracts values from an object or members from an array, returning
them as JSON fragments.

Also, should we go even further and accept key:value pairs by themselves? :

'"key":"value"'::JSON

I don't think we should because doing so would be rather zany. It
would mean JSON content could be invalid in value context, as in:

// JavaScript
var content = "key" : "value";

I improved my JSON library. It now only accepts strict, UTF-8 encoded
JSON values (that is, objects, arrays, strings, numbers, true, false,
and null). It also has a json_decode_liberal() function that accepts
a string, cleans it up, and passes it through the stricter
json_decode(). json_decode_liberal() filters out comments, allows
single quoted strings, and accepts a lax number format compared to
strict JSON. I may add Unicode repair to it later on, but
implementing that well really depends on what type of Unicode errors
appear in real life, I think.

http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2

My json.c is now 1161 lines long, so I can't quite call it "small" anymore.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 04:45:49
Message-ID: y2o603c8f071004052145jfa17ee6m177b9d4b94872924@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 5, 2010 at 11:50 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> In my opinion, we should accept an atomic value as valid JSON content.

That seems right to me.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON

Definitely not.

...Robert


From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 05:00:59
Message-ID: 4BBAC00B.4000801@pjmodos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 6.4.2010 5:50, Joseph Adams napsal(a):
> Another JSON strictness issue: the JSON standard (
> http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
> be an array or object. However, my implementation currently accepts
> any valid value. Thus, '3', '"hello"', 'true', 'false', and 'null'
> are all accepted by my implementation, but are not strictly JSON text.
> The question is: should the JSON datatype accept atomic values (those
> that aren't arrays or objects) as valid JSON?
>

Not really sure about this myself, but keep in mind that NULL has
special meaning in SQL.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON
>
>

No, especially considering that '{"key":"value"}' is a valid JSON value.

> I improved my JSON library. It now only accepts strict, UTF-8 encoded
> JSON values (that is, objects, arrays, strings, numbers, true, false,
> and null).
>

Just a note, but PostgreSQL has some UTF-8 validation code, you might
want to look at it maybe, at least once you start the actual integration
into core, so that you are not reinventing too many wheels. I can see
how your own code is good thing for general library which this can (and
I am sure will be) used as, but for the datatype itself, it might be
better idea to use what's already there, unless it's somehow
incompatible of course.

--
Regards
Petr Jelinek (PJMODOS)


From: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 06:15:38
Message-ID: 4BBAD18A.8090105@pjmodos.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 6.4.2010 7:57, Joseph Adams napsal(a):
> On Tue, Apr 6, 2010 at 1:00 AM, Petr Jelinek<pjmodos(at)pjmodos(dot)net> wrote:
>
>> Not really sure about this myself, but keep in mind that NULL has special
>> meaning in SQL.
>>
> To me, the most logical approach is to do the obvious thing: make
> JSON's 'null' be SQL's NULL. For instance, SELECTing on a table with
> NULLs in it and converting the result set to JSON would yield a
> structure with 'null's in it. 'null'::JSON would yield NULL. I'm not
> sure what startling results would come of this approach, but I'm
> guessing this would be most intuitive and useful.
>

+1

>> Just a note, but PostgreSQL has some UTF-8 validation code, you might want
>> to look at it maybe, at least once you start the actual integration into
>> core, so that you are not reinventing too many wheels. I can see how your
>> own code is good thing for general library which this can (and I am sure
>> will be) used as, but for the datatype itself, it might be better idea to
>> use what's already there, unless it's somehow incompatible of course.
>>
> Indeed. My plan is to first get a strong standalone JSON library
> written and tested so it can be used as a general-purpose library. As
> the JSON code is merged into PostgreSQL, it can be adapted. Part of
> this adaptation would most likely be removing the UTF-8 validation
> function I wrote and using PostgreSQL's Unicode support code instead.
>
> There are probably other bits that could be PostgreSQLified as well.
> I wonder if I should consider leveraging PostgreSQL's regex support or
> if it would be a bad fit/waste of time/slower/not worth it.
>

Regex ? What for ? You certainly don't need it for parsing, you have
good parser IMHO and regex would probably be all of the above.

--
Regards
Petr Jelinek (PJMODOS)


From: Mike Rylander <mrylander(at)gmail(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 14:53:39
Message-ID: t2rb918cf3d1004060753ja458c9fyfe1b960e10eeb369@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 5, 2010 at 11:50 PM, Joseph Adams
<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> Another JSON strictness issue:  the JSON standard (
> http://www.ietf.org/rfc/rfc4627.txt ) states that JSON text can only
> be an array or object.  However, my implementation currently accepts
> any valid value.  Thus, '3', '"hello"', 'true', 'false', and 'null'
> are all accepted by my implementation, but are not strictly JSON text.
>  The question is: should the JSON datatype accept atomic values (those
> that aren't arrays or objects) as valid JSON?
>
> I tried a few other JSON implementations to see where they stand
> regarding atomic types as input:
>
> JSON_checker (C) does not accept them.
> JSON.parse() (JavaScript) accepts them.
> json_decode() (PHP) accepts them.  However, support is currently buggy
> (e.g. '1' is accepted, but '1 ' is not).
> cJSON (C) accepts them.
> JSON.pm (Perl) accepts them if you specify the allow_nonref option.
> Otherwise, it accepts 'true' and 'false', but not 'null', a number, or
> a string by itself.
>
> In my opinion, we should accept an atomic value as valid JSON content.
>  I suppose we could get away with calling it a "content" fragment as
> is done with XML without a doctype.
>
> Accepting atomic values as valid JSON would be more orthagonal, as it
> would be possible to have a function like this:
>
> json_values(object_or_array JSON) RETURNS SETOF JSON
> -- extracts values from an object or members from an array, returning
> them as JSON fragments.
>

For these reasons, and the fact that my project uses atomic values ;),
I think yes, we should support them.

IIUC, the reason for requiring an array or object is that the O part
of JSON means "some sort of a collection of atomic values". But, in
ECMAScript (JavaScript), instances of strings, numbers, bools and null
are, indeed, objects. IOW, I think JSON is using a faulty definition
of "object" in the spec. It's the one part of the spec that doesn't
make sense to me at all.

> Also, should we go even further and accept key:value pairs by themselves? :
>
> '"key":"value"'::JSON
>

This, though, is probably a step too far. It violates the JS part of JSON ...

> I don't think we should because doing so would be rather zany.  It
> would mean JSON content could be invalid in value context, as in:
>
> // JavaScript
> var content = "key" : "value";
>

Right.

Thanks, Joseph. I think this will be a great addition!

--
Mike Rylander
| VP, Research and Design
| Equinox Software, Inc. / The Evergreen Experts
| phone: 1-877-OPEN-ILS (673-6457)
| email: miker(at)esilibrary(dot)com
| web: http://www.esilibrary.com


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 15:05:13
Message-ID: 20100406150513.GA3491@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joseph Adams escribió:

> http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2
>
> My json.c is now 1161 lines long, so I can't quite call it "small" anymore.

Just noticed you don't check the return value of malloc and friends.
How do you intend to handle that? There are various places that would
simply dump core with the 0.0.2 code. Within Postgres it's easy -- a
failed palloc aborts the transaction and doesn't continue running your
code. But in a standalone library that's probably not acceptable.

If we were to import this there are some lines that could be ripped out,
like 60 lines in the string buffer stuff and 130 lines for Unicode.
That brings your code just under 1000 lines.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Petr Jelinek <pjmodos(at)pjmodos(dot)net>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 16:03:12
Message-ID: 21867.1270569792@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Petr Jelinek <pjmodos(at)pjmodos(dot)net> writes:
> Dne 6.4.2010 7:57, Joseph Adams napsal(a):
>> To me, the most logical approach is to do the obvious thing: make
>> JSON's 'null' be SQL's NULL. For instance, SELECTing on a table with
>> NULLs in it and converting the result set to JSON would yield a
>> structure with 'null's in it. 'null'::JSON would yield NULL. I'm not
>> sure what startling results would come of this approach, but I'm
>> guessing this would be most intuitive and useful.

> +1

I think it's a pretty bad idea for 'null'::JSON to yield NULL. AFAIR
there is no other standard datatype for which the input converter can
yield NULL from a non-null input string, and I'm not even sure that the
InputFunctionCall protocol allows it. (In fact a quick look indicates
that it doesn't...)

To me, what this throws into question is not so much whether JSON null
should equate to SQL NULL (it should), but whether it's sane to accept
atomic values. If I understood the beginning of this discussion, that's
not strictly legal. I think it would be better for strict input mode
to reject this, and permissive mode to convert it to a non-atomic value.
Thus jsonify('null') wouldn't yield NULL but a structure containing a
null.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 17:04:41
Message-ID: l2y603c8f071004061004o60cb50ferfd1482b2d345bf8b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 11:05 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Joseph Adams escribió:
>
>> http://constellationmedia.com/~funsite/static/json-0.0.2.tar.bz2
>>
>> My json.c is now 1161 lines long, so I can't quite call it "small" anymore.
>
> Just noticed you don't check the return value of malloc and friends.
> How do you intend to handle that?  There are various places that would
> simply dump core with the 0.0.2 code.  Within Postgres it's easy -- a
> failed palloc aborts the transaction and doesn't continue running your
> code.  But in a standalone library that's probably not acceptable.
>
> If we were to import this there are some lines that could be ripped out,
> like 60 lines in the string buffer stuff and 130 lines for Unicode.
> That brings your code just under 1000 lines.

Let me be the first to suggest putting this code under the PostgreSQL license.

...Robert


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 17:28:51
Message-ID: r2v603c8f071004061028ie8515647m19a78dabb544c378@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Petr Jelinek <pjmodos(at)pjmodos(dot)net> writes:
>> Dne 6.4.2010 7:57, Joseph Adams napsal(a):
>>> To me, the most logical approach is to do the obvious thing: make
>>> JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
>>> NULLs in it and converting the result set to JSON would yield a
>>> structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
>>> sure what startling results would come of this approach, but I'm
>>> guessing this would be most intuitive and useful.
>
>> +1
>
> I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
> there is no other standard datatype for which the input converter can
> yield NULL from a non-null input string, and I'm not even sure that the
> InputFunctionCall protocol allows it.  (In fact a quick look indicates
> that it doesn't...)

Oh. I missed this aspect of the proposal. I agree - that's a bad idea.

> To me, what this throws into question is not so much whether JSON null
> should equate to SQL NULL (it should), but whether it's sane to accept
> atomic values.

With this, I disagree. I see no reason to suppose that a JSON NULL
and an SQL NULL are the same thing.

> If I understood the beginning of this discussion, that's
> not strictly legal.  I think it would be better for strict input mode
> to reject this, and permissive mode to convert it to a non-atomic value.
> Thus jsonify('null') wouldn't yield NULL but a structure containing a
> null.

There's no obvious "structure" to convert this into.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 17:31:49
Message-ID: 23394.1270575109@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> To me, what this throws into question is not so much whether JSON null
>> should equate to SQL NULL (it should), but whether it's sane to accept
>> atomic values.

> With this, I disagree. I see no reason to suppose that a JSON NULL
> and an SQL NULL are the same thing.

Oh. If they're not the same, then the problem is easily dodged, but
then what *is* a JSON null?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 18:10:32
Message-ID: s2j603c8f071004061110p2c4449d2n24b744127dfe2330@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> To me, what this throws into question is not so much whether JSON null
>>> should equate to SQL NULL (it should), but whether it's sane to accept
>>> atomic values.
>
>> With this, I disagree.  I see no reason to suppose that a JSON NULL
>> and an SQL NULL are the same thing.
>
> Oh.  If they're not the same, then the problem is easily dodged, but
> then what *is* a JSON null?

I assume we're going to treat JSON much like XML: basically text, but
with some validation (and perhaps canonicalization) under the hood.
So a JSON null will be "null", just a JSON boolean true value will be
"true". It would be pretty weird if storing "true" or "false" or "4"
or "[3,1,4,1,5,9]" into a json column and then reading it back
returned the input string; but at the same time storing "null" into
the column returned a SQL NULL.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 18:20:37
Message-ID: 24454.1270578037@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Oh. If they're not the same, then the problem is easily dodged, but
>> then what *is* a JSON null?

> I assume we're going to treat JSON much like XML: basically text, but
> with some validation (and perhaps canonicalization) under the hood.
> So a JSON null will be "null", just a JSON boolean true value will be
> "true". It would be pretty weird if storing "true" or "false" or "4"
> or "[3,1,4,1,5,9]" into a json column and then reading it back
> returned the input string; but at the same time storing "null" into
> the column returned a SQL NULL.

Hmm. So the idea is that all JSON atomic values are considered to be
text strings, even when they look like something else (like bools or
numbers)? That would simplify matters I guess, but I'm not sure about
the usability. In particular I'd want to have something that dequotes
the value so that I can get foo not "foo" when converting to SQL text.
(I'm assuming that quotes would be there normally, so as not to lose
the distinction between 3 and "3" in the JSON representation.)

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 18:23:17
Message-ID: p2x603c8f071004061123wca0376e0se5f5333bca461f54@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 2:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Apr 6, 2010 at 1:31 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Oh.  If they're not the same, then the problem is easily dodged, but
>>> then what *is* a JSON null?
>
>> I assume we're going to treat JSON much like XML: basically text, but
>> with some validation (and perhaps canonicalization) under the hood.
>> So a JSON null will be "null", just a JSON boolean true value will be
>> "true".  It would be pretty weird if storing "true" or "false" or "4"
>> or "[3,1,4,1,5,9]" into a json column and then reading it back
>> returned the input string; but at the same time storing "null" into
>> the column returned a SQL NULL.
>
> Hmm.  So the idea is that all JSON atomic values are considered to be
> text strings, even when they look like something else (like bools or
> numbers)?  That would simplify matters I guess, but I'm not sure about
> the usability.

I'm not sure what the other option is. If you do SELECT col FROM
table, I'm not aware that you can return differently-typed values for
different rows...

> In particular I'd want to have something that dequotes
> the value so that I can get foo not "foo" when converting to SQL text.
> (I'm assuming that quotes would be there normally, so as not to lose
> the distinction between 3 and "3" in the JSON representation.)

Yes, that seems like a useful support function.

...Robert


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Petr Jelinek <pjmodos(at)pjmodos(dot)net>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 19:39:44
Message-ID: 4BBB8E00.2040307@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>> With this, I disagree. I see no reason to suppose that a JSON NULL
>> and an SQL NULL are the same thing.
>>
>
> Oh. If they're not the same, then the problem is easily dodged, but
> then what *is* a JSON null?
>
Probably the same as the javascript null.

regards,
Yeb Havinga


From: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-06 20:09:00
Message-ID: y2qe7e5fefd1004061309sb1838e74n3acf9a887b8e11a5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Apr 6, 2010 at 12:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Petr Jelinek <pjmodos(at)pjmodos(dot)net> writes:
>> Dne 6.4.2010 7:57, Joseph Adams napsal(a):
>>> To me, the most logical approach is to do the obvious thing: make
>>> JSON's 'null' be SQL's NULL.  For instance, SELECTing on a table with
>>> NULLs in it and converting the result set to JSON would yield a
>>> structure with 'null's in it.  'null'::JSON would yield NULL.  I'm not
>>> sure what startling results would come of this approach, but I'm
>>> guessing this would be most intuitive and useful.
>
>> +1
>
> I think it's a pretty bad idea for 'null'::JSON to yield NULL.  AFAIR
> there is no other standard datatype for which the input converter can
> yield NULL from a non-null input string, and I'm not even sure that the
> InputFunctionCall protocol allows it.  (In fact a quick look indicates
> that it doesn't...)
>
> To me, what this throws into question is not so much whether JSON null
> should equate to SQL NULL (it should), but whether it's sane to accept
> atomic values.  If I understood the beginning of this discussion, that's
> not strictly legal.  I think it would be better for strict input mode
> to reject this, and permissive mode to convert it to a non-atomic value.
> Thus jsonify('null') wouldn't yield NULL but a structure containing a
> null.
>
>                        regards, tom lane
>

Actually, I kind of made a zany mistake here. If 'null'::JSON yielded
NULL, that would mean some type of automatic conversion was going on.
Likewise, '3.14159'::JSON shouldn't magically turn into a FLOAT.

I think the JSON datatype should behave more like TEXT. 'null'::JSON
would yield a JSON fragment containing 'null'. 'null'::JSON::TEXT
would yield the literal text 'null'. However, '3.14159'::JSON::FLOAT
should probably not be allowed as a precaution, as
'"hello"'::JSON::TEXT would yield '"hello"', not 'hello'. In other
words, casting to the target type directly isn't the same as parsing
JSON and extracting a value.

Perhaps there could be conversion functions. E.g.:

json_to_string('"hello"') yields 'hello'
json_to_number('3.14159') yields '3.14159' as text
(it is up to the user to cast it to the number type s/he wants)
json_to_bool('true') yields TRUE
json_to_null('null') yields NULL, json_null('nonsense') fails

string_to_json('hello') yields '"hello"' as JSON
number_to_json(3.14159) yields '3.14159' as JSON
bool_to_json(TRUE) yields 'true' as JSON
null_to_json(NULL) yields 'null' as JSON (kinda useless)

I wonder if these could all be reduced to two generic functions, like
json_to_value and value_to_json.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Petr Jelinek <pjmodos(at)pjmodos(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Add JSON support
Date: 2010-04-07 00:00:48
Message-ID: 1140.1270598448@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com> writes:
> Perhaps there could be conversion functions. E.g.:

Yeah, that's what I was thinking about.

> json_to_string('"hello"') yields 'hello'
> json_to_number('3.14159') yields '3.14159' as text
> (it is up to the user to cast it to the number type s/he wants)
> json_to_bool('true') yields TRUE
> json_to_null('null') yields NULL, json_null('nonsense') fails

> string_to_json('hello') yields '"hello"' as JSON
> number_to_json(3.14159) yields '3.14159' as JSON
> bool_to_json(TRUE) yields 'true' as JSON
> null_to_json(NULL) yields 'null' as JSON (kinda useless)

The null cases seem a bit useless. What might be helpful is to
translate JSON 'null' to and from SQL NULL in each of the other
conversions, in addition to their primary capability.

I'd go with using NUMERIC as the source/result type for the numeric
conversions. Forcing people to insert explicit coercions from text
isn't going to be particularly convenient to use.

> I wonder if these could all be reduced to two generic functions, like
> json_to_value and value_to_json.

value_to_json(any) might work, but the other way could not; and it seems
better to keep some symmetry between the directions.

regards, tom lane