Re: JSON for PG 9.2

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-15 00:13:52
Message-ID: 4F121A40.6070508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/14/2012 06:11 PM, Joey Adams wrote:
> On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> Second, what should be do when the database encoding isn't UTF8? I'm
>> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
>> has a unicode code point - are there any code points in the non-unicode
>> encodings that don't have unicode equivalents?). The alternative would be to
>> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
>> to deal with JSON should be using UTF8 anyway, but we still have to deal
>> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
>> there we really have no way of telling what it should be. There at least I
>> think we should probably error out.
> I don't think there is a satisfying solution to this problem. Things
> working against us:
>
> * Some server encodings support characters that don't map to Unicode
> characters (e.g. unused slots in Windows-1252). Thus, converting to
> UTF-8 and back is lossy in general.
>
> * We want a normalized representation for comparison. This will
> involve a mixture of server and Unicode characters, unless the
> encoding is UTF-8.
>
> * We can't efficiently convert individual characters to and from
> Unicode with the current API.
>
> * What do we do about \u0000 ? TEXT datums cannot contain NUL characters.
>
> I'd say just ban Unicode escapes and non-ASCII characters unless the
> server encoding is UTF-8, and ban all \u0000 escapes. It's easy, and
> whatever we support later will be a superset of this.
>
> Strategies for handling this situation have been discussed in prior
> emails. This is where things got stuck last time.
>

Well, from where I'm coming from, nuls are not a problem. But
escape_json() is currently totally encoding-unaware. It produces \unnnn
escapes for low ascii characters, and just passes through characters
with the high bit set. That's possibly OK for EXPLAIN output - we really
don't want don't want EXPLAIN failing. But maybe we should ban JSON
output for EXPLAIN if the encoding isn't UTF8.

Another question in my mind is what to do when the client encoding isn't
UTF8.

None of these is an insurmountable problem, ISTM - we just need to make
some decisions.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Mike Lewis 2012-01-15 00:49:31 Re: JSON for PG 9.2
Previous Message Josh Kupershmidt 2012-01-15 00:13:05 Re: Dry-run mode for pg_archivecleanup