Re: JSON and unicode surrogate pairs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON and unicode surrogate pairs
Date: 2013-06-10 14:18:08
Message-ID: 21439.1370873888@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> After thinking about this some more I have come to the conclusion that
> we should only do any de-escaping of \uxxxx sequences, whether or not
> they are for BMP characters, when the server encoding is utf8. For any
> other encoding, which is already a violation of the JSON standard
> anyway, and should be avoided if you're dealing with JSON, we should
> just pass them through even in text output. This will be a simple and
> very localized fix.

Hmm. I'm not sure that users will like this definition --- it will seem
pretty arbitrary to them that conversion of \u sequences happens in some
databases and not others.

> We'll still have to deal with this issue when we get to binary storage
> of JSON, but that's not something we need to confront today.

Well, if we have to break backwards compatibility when we try to do
binary storage, we're not going to be happy either. So I think we'd
better have a plan in mind for what will happen then.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-06-10 14:19:53 Re: Configurable location for extension .control files
Previous Message Tom Lane 2013-06-10 14:13:45 Re: Configurable location for extension .control files