Re: JSON and unicode surrogate pairs

From: Stefan Drees <stefan(at)drees(dot)name>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndQuadrant(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON and unicode surrogate pairs
Date: 2013-06-11 14:04:53
Message-ID: 51B72E85.4030901@drees.name
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-06-11 15:23 CEST, Hannu Krosing wrote:
> On 06/11/2013 03:08 PM, Stefan Drees wrote:
>> ...
>>
>> What about this:
>> =# SELECT '{"measure":"seconds", "measure":42}'::json;
>> json
>> --------------------------------------
>> {"measure":42}
>>
>> I presume people being used to store metadata in "preceding" json
>> object members with duplicate names, would want to decide in the
>> client requesting the data what to do with the metadata information
>> and at what point to "drop", wouldn't they :-?)
> Seems like blatant misuse of JSON format :)
>
> I assume that as JSON is _serialisation_ format, it should represent a
> data structure, not processing instructions.
>
> I can see no possible JavaScript structure which could produce duplicate
> key when serialised.

ahem, JSON is a notation that allows toplevel an object or an array.
If it is an object, this consists of pairs called (name, value).
Here value can be any object, array, number, string or the literals
null, false or true.
The name must be a string. That's it :-) no key **and** also no ordering
on these "name"s ;-) and as the RFC does not care, where the data came
from or how it was represented before it became "JSON text" (the
top-level element of a JSON document) how should the parser know
... but delta notaion, commenting, or "streaming" needs created many
applications that deliver multibags and trust on some ordering
conventions in their dataexchanging relations.

> And I don't think that any standard JSON reader supports this either.

Oh yes. Convention is merely: Keep all ("Streaming") or the last
(whatever the last may mean, must be carefully ensured in the
interchange relation).
All would like these two scenarios, but the RFC as is does not prevent
an early-out (like INSERT OR IGNORE) :-))

> Of you want to store any JavaScript snippets in database use text.

JSON is language agnostic. I use more JSON from python, php than from
js, but others do so differently ...

> Or perhaps pl/v8 :)
>

Do you mean the "V8 Engine Javascript Procedural Language add-on for
PostgreSQL" (http://code.google.com/p/plv8js/), I guess so.

I did not want to hijack the thread, as this centered more around
escaping where and what in which context (DB vs. client encoding).

As the freshly created IETF json working group revamps the JSON RFC on
its way to the standards track, there are currently also discussions on
what to do with unicode surrogate pairs. See eg. this thread
http://www.ietf.org/mail-archive/web/json/current/msg00675.html starting
a summarizing effort.

Just in case it helps making the fresh JSON feature of PostgreSQL
bright, shining and future proof :-)

Stefan.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2013-06-11 14:11:49 Re: Server side lo-funcs name
Previous Message Hannu Krosing 2013-06-11 13:58:02 Re: JSON and unicode surrogate pairs