Re: Duplicate JSON Object Keys

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Duplicate JSON Object Keys
Date: 2013-03-08 21:21:46
Message-ID: 513A566A.5090909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 03/08/2013 04:01 PM, Alvaro Herrera wrote:
> Hannu Krosing escribió:
>> On 03/08/2013 09:39 PM, Robert Haas wrote:
>>> On Thu, Mar 7, 2013 at 2:48 PM, David E. Wheeler <david(at)justatheory(dot)com> wrote:
>>>> In the spirit of being liberal about what we accept but strict about what we store, it seems to me that JSON object key uniqueness should be enforced either by throwing an error on duplicate keys, or by flattening so that the latest key wins (as happens in JavaScript). I realize that tracking keys will slow parsing down, and potentially make it more memory-intensive, but such is the price for correctness.
>>> I'm with Andrew. That's a rathole I emphatically don't want to go
>>> down. I wrote this code originally, and I had the thought clearly in
>>> mind that I wanted to accept JSON that was syntactically well-formed,
>>> not JSON that met certain semantic constraints.
>> If it does not meet these "semantic" constraints, then it is not
>> really JSON - it is merely JSON-like.
>>
>> this sounds very much like MySQLs decision to support timestamp
>> "0000-00-00 00:00" - syntactically correct, but semantically wrong.
> Is it wrong? The standard cited says SHOULD, not MUST.

Here's what rfc2119 says about that wording:

4. SHOULD NOT This phrase, or the phrase "NOT RECOMMENDED" mean that
there may exist valid reasons in particular circumstances when the
particular behavior is acceptable or even useful, but the full
implications should be understood and the case carefully weighed
before implementing any behavior described with this label.

So we're allowed to do as Robert chose, and I think there are good
reasons for doing so (apart from anything else, checking it would slow
down the parser enormously).

Now you could argue that in that case the extractor functions should
allow it too, and it's probably fairly easy to change them to allow it.
In that case we need to decide who wins. We could treat a later field
lexically as overriding an earlier field of the same name, which I think
is what David expected. That's what plv8 does (i.e. it's how v8
interprets JSON):

andrew=# create or replace function jget(t json, fld text) returns
text language plv8 as ' return t[fld]; ';
CREATE FUNCTION
andrew=# select jget('{"f1":"x","f1":"y"}','f1');
jget
------
y
(1 row)

Or you could take the view I originally took that in view of the RFC
wording we should raise an error if this was found.

I can live with either view.

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2013-03-08 21:28:53 Re: Duplicate JSON Object Keys
Previous Message Gavin Flower 2013-03-08 21:19:01 Re: Duplicate JSON Object Keys