Re: JSON and unicode surrogate pairs

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON and unicode surrogate pairs
Date: 2013-06-11 13:42:06
Message-ID: 51B7292E.3070904@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 06/11/2013 09:16 AM, Hannu Krosing wrote:

>>
>> It's a pity that we don't have a non-error producing conversion function
>> (or if we do that I haven't found it). Then we might adopt a rule for
>> processing
>> unicode escapes that said "convert unicode escapes to the database
>> encoding
> only when extracting JSON keys or values to text makes it sense to unescape
> to database encoding.

That's exactly the scenario we are talking about. When emitting JSON the
functions have always emitted unicode escapes as they are in the text,
and will continue to do so.

>
> strings inside JSON itself are by definition utf8

We have deliberately extended that to allow JSON strings to be in any
database server encoding. That was argued back in the 9.2 timeframe and
I am not interested in re-litigating it.

The only issue at hand is how to handle unicode escapes (which in their
string form are pure ASCII) when emitting text strings.

>> if possible, and if not then emit them unchanged." which might be a
>> reasonable
>> compromise.
> I'd opt for "... and if not then emit them quoted". The default should
> be not loosing
> any data.
>
>
>

I don't know what this means at all. Quoted how? Let's say I have a
Latin1 database and have the following JSON string: "\u20AC2.00". In a
UTF8 database the text representation of this is €2.00 - what are you
saying it should be in the Latin1 database?

cheers

andrew

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-11 13:45:23 Re: how to find out whether a view is updatable
Previous Message Hannu Krosing 2013-06-11 13:23:45 Re: JSON and unicode surrogate pairs