Re: JSON and unicode surrogate pairs

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON and unicode surrogate pairs
Date: 2013-06-09 23:47:24
Message-ID: 16817.1370821644@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I did that, but it's evident from the buildfarm that there's more work
> to do. The problem is that we do the de-escaping as we lex the json to
> construct the look ahead token, and at that stage we don't know whether
> or not it's really going to be needed. That means we can cause errors to
> be raised in far too many places. It's failing on this line:
> converted = pg_any_to_server(utf8str, utf8len, PG_UTF8);
> even though the operator in use ("->") doesn't even use the de-escaped
> value.

> The real solution is going to be to delay the de-escaping of the string
> until it is known to be wanted. That's unfortunately going to be a bit
> invasive, but I can't see a better solution. I'll work on it ASAP.

Not sure that this idea isn't a dead end. IIUC, you're proposing to
jump through hoops in order to avoid complaining about illegal JSON
data, essentially just for backwards compatibility with 9.2's failure to
complain about it. If we switch over to a pre-parsed (binary) storage
format for JSON values, won't we be forced to throw these errors anyway?
If so, maybe we should just take the compatibility hit now while there's
still a relatively small amount of stored JSON data in the wild.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2013-06-10 00:16:05 Server side lo-funcs name
Previous Message Robins Tharakan 2013-06-09 23:39:49 Revisit items marked 'NO' in sql_features.txt