Re: JSON for PG 9.2

Lists: pgsql-hackers
From: Bruce Momjian <bruce(at)momjian(dot)us>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: JSON for PG 9.2
Date: 2011-12-05 20:12:35
Message-ID: 201112052012.pB5KCZo18363@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Where are we with adding JSON for Postgres 9.2? We got bogged down in
the data representation last time we discussed this.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 19:58:54
Message-ID: CA+TgmoZqsak9Jma-subdOhvPuEVm_L45z=1fg4BHGa+qv1soWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2011 at 3:12 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Where are we with adding JSON for Postgres 9.2?  We got bogged down in
> the data representation last time we discussed this.

We're waiting for you to send a patch that resolves all
previously-raised issues. :-)

In all seriousness, I think the right long-term answer here is to have
two data types - one that simply validates JSON and stores it as text,
and the other of which uses some binary encoding. The first would be
similar to our existing xml datatype and would be suitable for cases
when all or nearly all of your storage and retrieval operations will
be full-column operations, and the json types is basically just
providing validation. The second would be optimized for pulling out
(or, perhaps, replacing) pieces of arrays or hashes, but would have
additional serialization/deserialization overhead when working with
the entire value. As far as I can see, these could be implemented
independently of each other and in either order, but no one seems to
have yet found the round tuits.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 20:38:52
Message-ID: CA+U5nMKFrZKdO=FFNUMLHTNgsWWGaht0jaaP6eKvShU-2gPyOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 7:58 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 5, 2011 at 3:12 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> Where are we with adding JSON for Postgres 9.2?  We got bogged down in
>> the data representation last time we discussed this.
>
> We're waiting for you to send a patch that resolves all
> previously-raised issues.  :-)
>
> In all seriousness, I think the right long-term answer here is to have
> two data types - one that simply validates JSON and stores it as text,
> and the other of which uses some binary encoding.  The first would be
> similar to our existing xml datatype and would be suitable for cases
> when all or nearly all of your storage and retrieval operations will
> be full-column operations, and the json types is basically just
> providing validation.  The second would be optimized for pulling out
> (or, perhaps, replacing) pieces of arrays or hashes, but would have
> additional serialization/deserialization overhead when working with
> the entire value.  As far as I can see, these could be implemented
> independently of each other and in either order, but no one seems to
> have yet found the round tuits.

Rather than fuss with specific data formats, why not implement
something a little more useful?

At present we can have typmods passed as a cstring, so it should be
possible to add typmods onto the TEXT data type.

e.g. TEXT('JSON'), TEXT('JSONB')

We then invent a new catalog table called pg_text_format which has
oid PRIMARY KEY
textformatname UNIQUE
textformatvalidationproc
textformatstorageproc

The typmod must reduce to a single integer, so we just store the
integer. If no typmod, we store 0, so we have a fastpath for normal
TEXT datatypes.

This would then allow people to have variations of the TEXT type that
supports conversions, casts, indexing etc without additional fuss and
without everything else outside the database breaking because it
doesn't know that datatype name.

We could then support JSON (both kinds), YAML, etc
as well as providing a way to add validation into the datatype itself.
We can replace citext with TEXT('CASE_INSENSITIVE')

Think of this as using the object-relational capabilities of Postgres
to extend the TEXT data type.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 20:54:09
Message-ID: CA+TgmobE9D=ERO4zfydvh4Fr0W2axzMjw181uGryTLsRGf24vg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 3:38 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> Rather than fuss with specific data formats, why not implement
> something a little more useful?
>
> At present we can have typmods passed as a cstring, so it should be
> possible to add typmods onto the TEXT data type.
>
> e.g. TEXT('JSON'), TEXT('JSONB')
>
> We then invent a new catalog table called pg_text_format which has
> oid PRIMARY KEY
> textformatname UNIQUE
> textformatvalidationproc
> textformatstorageproc
>
> The typmod must reduce to a single integer, so we just store the
> integer. If no typmod, we store 0, so we have a fastpath for normal
> TEXT datatypes.
>
> This would then allow people to have variations of the TEXT type that
> supports conversions, casts, indexing etc without additional fuss and
> without everything else outside the database breaking because it
> doesn't know that datatype name.
>
> We could then support JSON (both kinds), YAML, etc
> as well as providing a way to add validation into the datatype itself.
> We can replace citext with TEXT('CASE_INSENSITIVE')
>
> Think of this as using the object-relational capabilities of Postgres
> to extend the TEXT data type.

Well, it's arguable that text-format JSON and YAML and our existing
XML datatype ought to share some structure with text, but
binary-format JSON is a different animal altogether; you might as well
propose having text('int8'). In any case, I doubt that trying to make
the typmod provide subclassing behavior is going to work out very
well. There are way too many places that assume that the typmod can
just be discarded. I don't think that's going to fly, because
=(text,text) probably has different semantics from =(json,json).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:08:25
Message-ID: CA+U5nMJkRkZ6BE1hhpfULqVBpDrPJn2vBcmHm1vSQRbbGzNaYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 8:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> There are way too many places that assume that the typmod can
> just be discarded.

If true, that probably ought to be documented cos it sounds fairly important.

Where and when is it true?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:22:09
Message-ID: CA+TgmobCndvNkBO8QNKUFgVT5e66mEQBKg9P7Jp7EbN==-qpjQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 4:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> On Mon, Dec 12, 2011 at 8:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> There are way too many places that assume that the typmod can
>> just be discarded.
>
> If true, that probably ought to be documented cos it sounds fairly important.
>
> Where and when is it true?

I'm not going to go compile an exhaustive list, since that would take
a week and I don't have any particular desire to invest that much time
in it, but just to take a couple of simple examples:

rhaas=# create or replace function wuzzle(numeric(5,2)) returns int as
$$select 1$$ language sql;
CREATE FUNCTION
rhaas=# \df wuzzle
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+--------+------------------+---------------------+--------
public | wuzzle | numeric | | normal
public | wuzzle | integer | numeric | normal
(2 rows)

rhaas=# select pg_typeof(1.23::numeric(5,2));
pg_typeof
-----------
numeric
(1 row)

There are a very large number of others. Possibly grepping for places
where we do getBaseType() rather than getBaseTypeAndTypmod() would be
a way to find some of them.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:23:44
Message-ID: 1323725024.20924.16.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2011-12-12 at 21:08 +0000, Simon Riggs wrote:
> On Mon, Dec 12, 2011 at 8:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> > There are way too many places that assume that the typmod can
> > just be discarded.
>
> If true, that probably ought to be documented cos it sounds fairly important.
>
> Where and when is it true?

Function arguments and return values, for example.


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:26:00
Message-ID: 80A26E37-4B0E-495C-BEC2-2FFB33D23547@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 12, 2011, at 12:54 PM, Robert Haas wrote:

> I don't think that's going to fly, because
> =(text,text) probably has different semantics from =(json,json).

No question:

david=# select '{"foo": 1, "bar": 2}'::json = '{"bar": 2, "foo": 1}'::json;
?column?
----------
t
(1 row)

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:27:42
Message-ID: 4EE671CE.6010304@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/12/2011 03:54 PM, Robert Haas wrote:
> On Mon, Dec 12, 2011 at 3:38 PM, Simon Riggs<simon(at)2ndquadrant(dot)com> wrote:
>> Rather than fuss with specific data formats, why not implement
>> something a little more useful?
>>
>> At present we can have typmods passed as a cstring, so it should be
>> possible to add typmods onto the TEXT data type.
>>
>> e.g. TEXT('JSON'), TEXT('JSONB')
>>

[...]
> There are way too many places that assume that the typmod can
> just be discarded. I don't think that's going to fly, because
> =(text,text) probably has different semantics from =(json,json).
>

And certain places where they are not allowed at all, I think (unless I
am misremembering the early debates about enum types and output functions).

cheers

andrew


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-12 21:34:02
Message-ID: 4EE6734A.2060604@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce,

I thought that Joseph Adams was still working on this, sponsored by
Heroku. Joseph?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Peter van Hardenberg <pvh(at)pvh(dot)ca>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 00:51:54
Message-ID: CAAcg=kUJDjx4QQoUt3Ebgj-mSYp_uGj90aR7x=fQ8ez2DO-o8Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We reached out to Joseph to see if we could help sponsor the project,
but never really heard back from him.

Because we haven't heard from him in a while we've been using PL/V8 to
validate a JSON datatype simulated by a DOMAIN with a simple
acceptance function. (See below.) This is not ideally performant but
thanks to V8's JIT the JSON parser is actually reasonably good.

I think releasing something simple and non-performant with reasonable
semantics would be the best next step. If it were up to me, I'd
probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
crash bugs and deal breakers can be sifted out.

PL/V8 is fast, it's sandboxed, and while it doesn't provide GIN or
GIST operators out of the box, maybe those could be motivated by its
inclusion.

Andrew, you've been down in the guts here, what do you think?

-pvh

(Code sample.)

create or replace function valid_json(json text)
returns bool as $$
try { JSON.parse(json); return true }
catch(e) { return false}
$$ LANGUAGE plv8 IMMUTABLE STRICT;

select valid_json('{"key": "value"}'), valid_json('lol');
valid_json | t
valid_json | f
Time: 0.283 ms

create domain json
as text check(valid_json(VALUE));
create table jsononly(data json);

insert into jsononly values 'lol';
ERROR: syntax error at or near "'lol'"
LINE 1: insert into jsononly values 'lol';

insert into jsononly
values ('{"ok": true}');
INSERT 0 1

-p

On Mon, Dec 12, 2011 at 1:34 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Bruce,
>
> I thought that Joseph Adams was still working on this, sponsored by
> Heroku.  Joseph?
>
>
> --
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)gluefinance(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 01:09:49
Message-ID: 33523926-C8CA-4DAE-A709-B0F6D87BA22D@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 12, 2011, at 4:51 PM, Peter van Hardenberg wrote:

> Because we haven't heard from him in a while we've been using PL/V8 to
> validate a JSON datatype simulated by a DOMAIN with a simple
> acceptance function. (See below.) This is not ideally performant but
> thanks to V8's JIT the JSON parser is actually reasonably good.
>
> I think releasing something simple and non-performant with reasonable
> semantics would be the best next step. If it were up to me, I'd
> probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
> crash bugs and deal breakers can be sifted out.

Note that Claes Jakobsson has been working on a JSON data type using the Jansson JSON library.

http://pgxn.org/dist/pg-json/

I’ve submitted a pull request renaming it to jansson-json (though the data type is still "json"):

https://github.com/theory/pg-json/tree/pgxn

Anyway, it seems like a decent start to an extensible type implemented entirely as an extension. Claes tells me he plans to add index support soonish, so it could get to be pretty robust before long.

Just another stab at the problem to alert folks to.

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 01:36:02
Message-ID: 4EE6AC02.70709@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/12/2011 07:51 PM, Peter van Hardenberg wrote:
> We reached out to Joseph to see if we could help sponsor the project,
> but never really heard back from him.
>
> Because we haven't heard from him in a while we've been using PL/V8 to
> validate a JSON datatype simulated by a DOMAIN with a simple
> acceptance function. (See below.) This is not ideally performant but
> thanks to V8's JIT the JSON parser is actually reasonably good.
>
> I think releasing something simple and non-performant with reasonable
> semantics would be the best next step. If it were up to me, I'd
> probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
> crash bugs and deal breakers can be sifted out.
>
> PL/V8 is fast, it's sandboxed, and while it doesn't provide GIN or
> GIST operators out of the box, maybe those could be motivated by its
> inclusion.
>
> Andrew, you've been down in the guts here, what do you think?

The trouble with using JSON.parse() as a validator is that it's probably
doing way too much work. PLV8 is cool, and I keep trying to get enough
time to work on it more, but I don't think it's a substitute for a JSON
type with a purpose built validator and some native operations. I think
these efforts can continue in parallel.

cheers

andrew


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 01:37:13
Message-ID: CAAZKuFas4ViPODFdmf8Uybx3JuS2E+KyLtHjVu=9BWdjxie=CQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 4:51 PM, Peter van Hardenberg <pvh(at)pvh(dot)ca> wrote:>
> PL/V8 is fast, it's sandboxed, and while it doesn't provide GIN or
> GIST operators out of the box, maybe those could be motivated by its
> inclusion.

I also feel that a big problem with JSON as a data type is that there
is not a powerful, common navigation method. JSON path is basically
pretty obscure by comparison to XPath. As a result, the common
approach to navigation in a JSON structure is basically "write
procedures". And that is only perfectly supported by a full-blown
interpreter.

So that's why I'm personally more inclined to lend my attention to
embedding JavaScript entirely. Not to say there aren't areas ripe for
improvement:

* It'd be nice to pass intermediate in-memory representations rather
than calling JSON.parse all the time, similar to OPAQUE except sound
(so bogus pointers cannot be passed). Basically, an "ephemeral type".
It could save a lot of when composing operators. I've needed this
for other projects, but for much the same reason.

* It'd be nice to be able to safely define indexes in a trusted
language somehow, writing the penalty and split functions, et al.
Right now it's my recollection that defining GiST operators in a naive
port to Javascript would give you the power to return garbage that is
not merely wrong, but could also crash Postgres if it uses a bogus
indexes. Ready and willing to be corrected*

* Some kind of partial toasting of large datums (I think Simon Riggs
briefly glossed over such an idea when we were talking about this
general use case)

But nothing I can quickly identify in the Postgres as-is is opposed to
any of these improvements at a design level, so they can be chipped
off into incremental work in the future.

--
fdr

* Madness, you say? http://bellard.org/jslinux/, if your browser is
new enough. The relevant spec:
https://www.khronos.org/registry/typedarray/specs/latest/


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 01:46:41
Message-ID: CAAZKuFavx_pEAE_t5YEiENATh2v=zBq9i5f9NxCp=bkYSSCrmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 5:36 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> The trouble with using JSON.parse() as a validator is that it's probably
> doing way too much work. PLV8 is cool, and I keep trying to get enough time
> to work on it more, but I don't think it's a substitute for a JSON type with
> a purpose built validator and some native operations. I think these efforts
> can continue in parallel.

Hmm. Maybe? While I'm sure things could be faster, we've had results
that are fast enough to be usable even with constant reparsing. Here
are some microbenchmarks I did some time ago where I tried to find the
overhead of calling JSON.parse and doing some really simple stuff in
V8 that I thought would maximize the amount of constant-time overhead:

https://gist.github.com/1150804

On my workstation, one core was able to do 130,000 JSON.parses + other
stuff necessary to create an index per second. One could maybe try to
improve the speed and memory footprint on large documents by having
validators that don't actually build the V8 representation and
possibly defining a space of operators that are known to build, by
induction, valid JSON without rechecks.

But in the end, I think there's already a class of problem where the
performance plv8 provides is already quite sufficient, and provides a
much more complete and familiar approach to the problem of how people
choose to navigate, project, and manipulate JSON documents.

I also haven't tried this for larger documents, as I was trying to get
a sense of how much time was spent in a few primitive operations, and
not testing performance with regard to document length.

--
fdr


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 03:08:28
Message-ID: 4EE6C1AC.8080307@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/12/2011 08:46 PM, Daniel Farina wrote:
> On Mon, Dec 12, 2011 at 5:36 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> The trouble with using JSON.parse() as a validator is that it's probably
>> doing way too much work. PLV8 is cool, and I keep trying to get enough time
>> to work on it more, but I don't think it's a substitute for a JSON type with
>> a purpose built validator and some native operations. I think these efforts
>> can continue in parallel.
> Hmm. Maybe? While I'm sure things could be faster, we've had results
> that are fast enough to be usable even with constant reparsing. Here
> are some microbenchmarks I did some time ago where I tried to find the
> overhead of calling JSON.parse and doing some really simple stuff in
> V8 that I thought would maximize the amount of constant-time overhead:
>
> https://gist.github.com/1150804
>
> On my workstation, one core was able to do 130,000 JSON.parses + other
> stuff necessary to create an index per second. One could maybe try to
> improve the speed and memory footprint on large documents by having
> validators that don't actually build the V8 representation and
> possibly defining a space of operators that are known to build, by
> induction, valid JSON without rechecks.
>
> But in the end, I think there's already a class of problem where the
> performance plv8 provides is already quite sufficient, and provides a
> much more complete and familiar approach to the problem of how people
> choose to navigate, project, and manipulate JSON documents.
>
> I also haven't tried this for larger documents, as I was trying to get
> a sense of how much time was spent in a few primitive operations, and
> not testing performance with regard to document length.
>

Yes, I didn't mean to say it's not fast. For many cases I agree it is
probably fast enough.

But in the end PLV8 is likely to remain an addon - nice as it can be I
doubt the core team will want to add another PL to the core code,
especially one written in C++. If we want a JSON type built in, which
many people seem to want, we'll need to do it without the support of
PLV8, I think.

cheers

andrew


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 03:42:24
Message-ID: 1323747639-sup-9172@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Daniel Farina's message of lun dic 12 22:37:13 -0300 2011:

> * It'd be nice to pass intermediate in-memory representations rather
> than calling JSON.parse all the time, similar to OPAQUE except sound
> (so bogus pointers cannot be passed). Basically, an "ephemeral type".
> It could save a lot of when composing operators. I've needed this
> for other projects, but for much the same reason.

I remember there was the idea of doing something like this for regexes
-- have a specialized data type that saves the trouble of parsing it.
I imagine this is pretty much the same.

Nobody got around to doing anything about it though.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 05:22:29
Message-ID: 19229.1323753749@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 12/12/2011 03:54 PM, Robert Haas wrote:
>> There are way too many places that assume that the typmod can
>> just be discarded. I don't think that's going to fly, because
>> =(text,text) probably has different semantics from =(json,json).

> And certain places where they are not allowed at all, I think (unless I
> am misremembering the early debates about enum types and output functions).

Yeah. The current system design assumes that typmod specifies a
constraint of some sort. It is not possible to use it to change the
semantics of the datatype. The most obvious way in which this is true
is that selection of which operators and functions to apply to values
does not consider typmod of the values. This is not something we should
lightly revisit. We don't even have a handle on how to make domains
behave differently from their underlying datatypes, and those *do* have
their own type OIDs. Injecting typmod into the algorithm seems like a
disaster from here.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 05:25:59
Message-ID: 1323753959.20924.17.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On mån, 2011-12-12 at 16:51 -0800, Peter van Hardenberg wrote:
> Because we haven't heard from him in a while we've been using PL/V8 to
> validate a JSON datatype simulated by a DOMAIN with a simple
> acceptance function. (See below.) This is not ideally performant but
> thanks to V8's JIT the JSON parser is actually reasonably good.
>
> I think releasing something simple and non-performant with reasonable
> semantics would be the best next step. If it were up to me, I'd
> probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
> crash bugs and deal breakers can be sifted out.

You don't need a new PL to do that. The existing PLs can also parse
JSON. So that's not nearly enough of a reason to consider adding this
new PL.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 05:43:48
Message-ID: CAHyXU0zQjirwm4wGtv2Wen25s4tYAwVgNpkyEHpQ859Zh_VydQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 7:37 PM, Daniel Farina <daniel(at)heroku(dot)com> wrote:
> On Mon, Dec 12, 2011 at 4:51 PM, Peter van Hardenberg <pvh(at)pvh(dot)ca> wrote:>
>> PL/V8 is fast, it's sandboxed, and while it doesn't provide GIN or
>> GIST operators out of the box, maybe those could be motivated by its
>> inclusion.
>
> I also feel that a big problem with JSON as a data type is that there
> is not a powerful, common navigation method.  JSON path is basically
> pretty obscure by comparison to XPath.  As a result, the common
> approach to navigation in a JSON structure is basically "write
> procedures".  And that is only perfectly supported by a full-blown
> interpreter.

This. For me, postgres xml extensions is 'a whole bunch of extra
stuff that comes with the xpath function'. How you get data into and
out of json is much more interesting than how the type is set up
internally or how it's parsed.

There must be some way to avoid iterative set up and tear down of json
objects (maybe as a cast?) -- postgres arrays of composites can set up
data in a way that feels very much like json in it's construction.
One big reason why people might go to server side json is to try and
avoid tedious marshaling of data between client and server. The xpath
function has had its warts, but it offers very tight coupling between
your database and your documents. In the case of json, I think you
can go even further.

merlin


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)gluefinance(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 07:03:33
Message-ID: 1323759813.6139.2.camel@Nokia-N900-42-11
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

----- Original message -----
> On Dec 12, 2011, at 4:51 PM, Peter van Hardenberg wrote:
>
> > Because we haven't heard from him in a while we've been using PL/V8 to
> > validate a JSON datatype simulated by a DOMAIN with a simple
> > acceptance function. (See below.) This is not ideally performant but
> > thanks to V8's JIT the JSON parser is actually reasonably good.
>
> Note that Claes Jakobsson has been working on a JSON data type using the
> Jansson JSON library.
>
>    http://pgxn.org/dist/pg-json/

We recently needed to store/valisate JSON data and be able to do some trivial extraction of values from it and went with pg-json.

The great benefit of having JSON as an extension type is being able to use an external library (Jansson is very small, MIT licensed, looks really well written and has been actively maintaied for years) and not being tied to a yearly release cycle.

Postgres jumps through a lot of hoops to be extensible and I think a JSON type is a kind of thing that fits the bill of an extension perfectly.

Cheers,
Jan


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 07:06:36
Message-ID: CA+U5nMJm+u8ddJExSQAPPDuZxRJ3OHs01y-i7kUrSX868g1Zww@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 5:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 12/12/2011 03:54 PM, Robert Haas wrote:
>>> There are way too many places that assume that the typmod can
>>> just be discarded.  I don't think that's going to fly, because
>>> =(text,text) probably has different semantics from =(json,json).
>
>> And certain places where they are not allowed at all, I think (unless I
>> am misremembering the early debates about enum types and output functions).
>
> Yeah.  The current system design assumes that typmod specifies a
> constraint of some sort.  It is not possible to use it to change the
> semantics of the datatype.  The most obvious way in which this is true
> is that selection of which operators and functions to apply to values
> does not consider typmod of the values.  This is not something we should
> lightly revisit.  We don't even have a handle on how to make domains
> behave differently from their underlying datatypes, and those *do* have
> their own type OIDs.  Injecting typmod into the algorithm seems like a
> disaster from here.

I'm glad I didn't think of doing that before then.

Can we agree some wording to put into the docs? Sounds like some clear
warnings are needed.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Peter van Hardenberg <pvh(at)pvh(dot)ca>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 08:06:22
Message-ID: CAAcg=kV+mfHGf7qkgNoLi00aKYE1aF1suV6krHgD1nsYfa1_Gw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 12, 2011 at 9:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2011-12-12 at 16:51 -0800, Peter van Hardenberg wrote:
> You don't need a new PL to do that.  The existing PLs can also parse
> JSON.  So that's not nearly enough of a reason to consider adding this
> new PL.

PL/V8 is interesting because it is very fast, sandboxed, and well
embedded with little overhead.

My experience with PL/Python and PL/Perl has not been thus, and
although they are handy if you want to break out and run system work,
they're not the kind of thing I'd consider for defining performant
operators with.

I feel PL/V8 has promise in that area.

--
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 08:15:40
Message-ID: CAARyMpDzXxNjLOSgUCYu2sxfbLB6ZyNLeVR9CTaEv6xnNYUZSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 5, 2011 at 3:12 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Where are we with adding JSON for Postgres 9.2?  We got bogged down in
> the data representation last time we discussed this.

We should probably have a wiki page titled "JSON datatype status" to
help break the cycle we're in:

* Someone asks about the status of JSON

* Various ideas are suggested

* Patches are posted (maybe)

* More discussion about fundamental issues ensues

* Nothing is accomplished (as far as adding JSON to Postgres core)

There are several JSON implementations for Postgres floating around, including:

* http://pgxn.org/dist/pg-json/ : Mentioned in previous posts; a JSON
library based on Jansson supporting path subscript and equality
testing

* http://git.postgresql.org/gitweb/?p=json-datatype.git;a=summary :
The JSON datatype I implemented for Google Summer of Code 2010. It
has the most features of any implementation I'm aware of, but:

* Is in the form of a contrib module

* Preserves input text verbatim, a guarantee that will be broken
by more efficient implementations

* http://git.postgresql.org/gitweb/?p=json-datatype.git;a=shortlog;h=refs/heads/json2
: My rewrite of the JSON module that condenses input (but still
stores it as text) and addresses the issue of JSON when either the
server or client encoding is not UTF-8. Needs more features and
documentation, but like my other implementation, may not be quite what
we want.

Issues we've encountered include:

* Should JSON be stored as binary or as text?

* How do we deal with Unicode escapes and characters if the server or
client encoding is not UTF-8? Some (common!) character encodings have
code points that don't map to Unicode. Also, the charset conversion
modules do not provide fast entry points for converting individual
characters; each conversion involves a funcapi call.

---

In an application I'm working on, I store JSON-encoded objects in a
PostgreSQL database (using TEXT). I do so because it allows me to
store non-relational data that is easy for my JavaScript code to work
with.

However, I fail to see much benefit of a JSON type. When I need to
work with the data in PHP, C, or Haskell, I use JSON parsing libraries
available in each programming language. Although being able to
transform or convert JSON data within SQL might be convenient, I can't
think of any compelling reason to do it in my case.

Can someone clarify why a JSON type would be useful, beyond storage
and validation? What is a real-world, *concrete* example of a problem
where JSON manipulation in the database would be much better than:

* Using the application's programming language to manipulate the data
(which it does a lot already) ?

* Using CouchDB or similar instead of PostgreSQL?

- Joey


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 13:42:30
Message-ID: 4EE75646.1030906@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/13/2011 03:15 AM, Joey Adams wrote:
> We should probably have a wiki page titled "JSON datatype status" to
> help break the cycle we're in
>

I was about to point you to
http://wiki.postgresql.org/wiki/JSON_API_Brainstorm , only to realize
you created that thing in the first place. There's
http://wiki.postgresql.org/wiki/JSON_datatype_GSoC_2010 too. I don't
think it's completely stuck in a cycle yet--every pass around seems to
accumulate some better informed ideas than the last still.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 13:44:54
Message-ID: CA+TgmobDSv0a7X8fBe_RA3S2PSUSE6p=61H9iwpaOxq36EF7kA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 12:25 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On mån, 2011-12-12 at 16:51 -0800, Peter van Hardenberg wrote:
>> Because we haven't heard from him in a while we've been using PL/V8 to
>> validate a JSON datatype simulated by a DOMAIN with a simple
>> acceptance function. (See below.) This is not ideally performant but
>> thanks to V8's JIT the JSON parser is actually reasonably good.
>>
>> I think releasing something simple and non-performant with reasonable
>> semantics would be the best next step. If it were up to me, I'd
>> probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
>> crash bugs and deal breakers can be sifted out.
>
> You don't need a new PL to do that.  The existing PLs can also parse
> JSON.  So that's not nearly enough of a reason to consider adding this
> new PL.

Just because all our languages are Turing-complete doesn't mean they
are all equally well-suited to every task. Of course, that doesn't
mean we'd add a whole new language just to get a JSON parser, but I
don't think that's really what Peter was saying. Rather, I think the
point is that embedded Javascript is *extremely* popular, lots and
lots of people are supporting it, and we ought to seriously consider
doing the same. It's hard to think of another PL that we could add
that would give us anywhere near the bang for the buck that Javascript
would.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 13:59:05
Message-ID: CAFj8pRA0JqX+OSgcvMCM=27EXKxhthwCTgQMHjpbfAYcqc+tnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/12/13 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Dec 13, 2011 at 12:25 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On mån, 2011-12-12 at 16:51 -0800, Peter van Hardenberg wrote:
>>> Because we haven't heard from him in a while we've been using PL/V8 to
>>> validate a JSON datatype simulated by a DOMAIN with a simple
>>> acceptance function. (See below.) This is not ideally performant but
>>> thanks to V8's JIT the JSON parser is actually reasonably good.
>>>
>>> I think releasing something simple and non-performant with reasonable
>>> semantics would be the best next step. If it were up to me, I'd
>>> probably even try to just land PL/V8 as PL/JavaScript for 9.2 if the
>>> crash bugs and deal breakers can be sifted out.
>>
>> You don't need a new PL to do that.  The existing PLs can also parse
>> JSON.  So that's not nearly enough of a reason to consider adding this
>> new PL.
>
> Just because all our languages are Turing-complete doesn't mean they
> are all equally well-suited to every task.  Of course, that doesn't
> mean we'd add a whole new language just to get a JSON parser, but I
> don't think that's really what Peter was saying.  Rather, I think the
> point is that embedded Javascript is *extremely* popular, lots and
> lots of people are supporting it, and we ought to seriously consider
> doing the same.  It's hard to think of another PL that we could add
> that would give us anywhere near the bang for the buck that Javascript
> would.

it is true - but there is a few questions

* will be JSON supported from SQL?
* what Javascript engine will be supported?
* will be integrated JSON supported from PLPerl?

I like to see Javacript's in pg, but I don't like Javascript just for
JSON. JSON should be independent on javascript.

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 14:11:52
Message-ID: 4EE75D28.5030603@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/13/2011 08:44 AM, Robert Haas wrote:
> Rather, I think the point is that embedded Javascript is *extremely*
> popular, lots and
> lots of people are supporting it, and we ought to seriously consider
> doing the same. It's hard to think of another PL that we could add
> that would give us anywhere near the bang for the buck that Javascript
> would.
>

Quite. I hate Javascript with a passion, wish it would just go away and
stop meddling with my life. And even with that context, I think in-core
PL/V8 would be a huge advocacy win. PostgreSQL has this great
developer-oriented PL interface, it just doesn't work out of the box
with any of the "pop" languages right now.

Personal story on this. When my book came out, I was trying to take the
#1 spot on Packt's bestseller list, even if it was just for a day.
Never made it higher than #2. The #1 spot the whole time was "jQuery
1.4 Reference Guide", discussing the most popular JavaScript library out
there. And you know what? Over a year later, it's *still there*. At
no point did it over drop out of that top spot. The number of people
who would consider server-side programming suddenly feasible if PL/V8
were easy to do is orders of magnitude larger than the current
PostgreSQL community.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 14:29:33
Message-ID: CAHyXU0we=+LhFXcdwvHPKrEBKduAqB4=qOQ1K+QC2g9Qb3mdsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 8:11 AM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 12/13/2011 08:44 AM, Robert Haas wrote:
>>
>> Rather, I think the point is that embedded Javascript is *extremely*
>> popular, lots and
>> lots of people are supporting it, and we ought to seriously consider
>> doing the same.  It's hard to think of another PL that we could add
>> that would give us anywhere near the bang for the buck that Javascript
>> would.
>>
>
>
> Quite.  I hate Javascript with a passion, wish it would just go away and
> stop meddling with my life.  And even with that context, I think in-core
> PL/V8 would be a huge advocacy win.  PostgreSQL has this great
> developer-oriented PL interface, it just doesn't work out of the box with
> any of the "pop" languages right now.
>
> Personal story on this.  When my book came out, I was trying to take the #1
> spot on Packt's bestseller list, even if it was just for a day.  Never made
> it higher than #2.  The #1 spot the whole time was "jQuery 1.4 Reference
> Guide", discussing the most popular JavaScript library out there.  And you
> know what?  Over a year later, it's *still there*.  At no point did it over
> drop out of that top spot.  The number of people who would consider
> server-side programming suddenly feasible if PL/V8 were easy to do is orders
> of magnitude larger than the current PostgreSQL community.

Yeah -- javascript is making strides server-side with technologies
like node.js. Like you I have really mixed feelings about javascript
-- there's a lot of nastiness but the asynchronous style of coding
javascript developers tend to like is a great fit for postgres both
inside the backend and in database clients. This is on top of the
already nifty type system synergy I mentioned upthread.

Postgres would in fact make a wonderful 'nosql' backend with some
fancy json support -- document style transmission to/from the backend
without sacrificing relational integrity in storage. Properly done
this would be a fabulous public relations coup (PostgreSQL = better
nosql).

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 14:44:24
Message-ID: 4EE764C8.3020404@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/13/2011 09:11 AM, Greg Smith wrote:
> On 12/13/2011 08:44 AM, Robert Haas wrote:
>> Rather, I think the point is that embedded Javascript is *extremely*
>> popular, lots and
>> lots of people are supporting it, and we ought to seriously consider
>> doing the same. It's hard to think of another PL that we could add
>> that would give us anywhere near the bang for the buck that Javascript
>> would.
>
> Quite. I hate Javascript with a passion, wish it would just go away
> and stop meddling with my life. And even with that context, I think
> in-core PL/V8 would be a huge advocacy win. PostgreSQL has this great
> developer-oriented PL interface, it just doesn't work out of the box
> with any of the "pop" languages right now.
>
> Personal story on this. When my book came out, I was trying to take
> the #1 spot on Packt's bestseller list, even if it was just for a
> day. Never made it higher than #2. The #1 spot the whole time was
> "jQuery 1.4 Reference Guide", discussing the most popular JavaScript
> library out there. And you know what? Over a year later, it's *still
> there*. At no point did it over drop out of that top spot. The
> number of people who would consider server-side programming suddenly
> feasible if PL/V8 were easy to do is orders of magnitude larger than
> the current PostgreSQL community.

I think your passion is probably somewhat misdirected. I've long thought
JS would be a good fit for Postgres. It's naturally sandboxed and its
type system fits ours quite well. And, as you say, it's massively
popular and getting a major second wind thanks to things like JQuery,
Ext-JS and node.js. This last one has certainly convinced lots of people
that JS is not just for browsers any more.

Having said that, don't underestimate the complexity of trying to build
in PLV8. In its current incarnation the interface is written in C++ (and
of course so is the underlying V8 engine). I have been doing some
development work for it, even though my C++ is rather rusty (that's an
understatement, folks), which is why I haven't got a lot more done - I'm
just having to go slowly and with reference books by my side. So either
we'd need to rewrite the glue code entirely in C (and it's littered with
C++isms) and handle the difficulties of embedding a C++ library, or we'd
have a major new build infrastructure dependency which could well give
us a major case of developmental indigestion.

cheers

andrew


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 15:22:59
Message-ID: CAFNqd5WzQs+B7W=H8FVHjwH2sQoTiH2p6HHCZp6OQ5suLyH34g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 8:44 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Rather, I think the
> point is that embedded Javascript is *extremely* popular, lots and
> lots of people are supporting it, and we ought to seriously consider
> doing the same.  It's hard to think of another PL that we could add
> that would give us anywhere near the bang for the buck that Javascript
> would.

+1 to that.

I'm not a huge fan of JS; wish that one of the Scheme variations had
"made it" instead.

But it's clear that a LOT of fairly successful work has gone into
making JS implementations performant, and it's clearly heavily used.
JS+hstore would probably draw in a bunch of users, and tempt them to
the "SQL dark side" :-).

Wanting a JSON processor isn't quite a good enough reason to add C++
support in order to draw in a JS interpreter. But I don't imagine
things are restricted to just 1 JS implementation, and JSON isn't the
only reason to do so.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 18:05:57
Message-ID: 64DEC5D2-8AC3-4E30-B199-B1D330EC94FB@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 12, 2011, at 7:42 PM, Alvaro Herrera wrote:

> I remember there was the idea of doing something like this for regexes
> -- have a specialized data type that saves the trouble of parsing it.
> I imagine this is pretty much the same.
>
> Nobody got around to doing anything about it though.

(regex data type)++

David


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Peter van Hardenberg <pvh(at)pvh(dot)ca>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 20:27:03
Message-ID: 1323808023.16048.7.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2011-12-13 at 00:06 -0800, Peter van Hardenberg wrote:
> On Mon, Dec 12, 2011 at 9:25 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> > On mån, 2011-12-12 at 16:51 -0800, Peter van Hardenberg wrote:
> > You don't need a new PL to do that. The existing PLs can also parse
> > JSON. So that's not nearly enough of a reason to consider adding this
> > new PL.
>
> PL/V8 is interesting because it is very fast, sandboxed, and well
> embedded with little overhead.
>
> My experience with PL/Python and PL/Perl has not been thus, and
> although they are handy if you want to break out and run system work,
> they're not the kind of thing I'd consider for defining performant
> operators with.

Some performance numbers comparing a valid_json() functions implemented
in different ways would clarify this. I wouldn't be surprised if PL/V8
won, but we need to work with some facts.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 20:33:28
Message-ID: 1323808408.16048.9.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2011-12-13 at 09:11 -0500, Greg Smith wrote:
> Personal story on this. When my book came out, I was trying to take
> the #1 spot on Packt's bestseller list, even if it was just for a day.
> Never made it higher than #2. The #1 spot the whole time was "jQuery
> 1.4 Reference Guide", discussing the most popular JavaScript library
> out there. And you know what? Over a year later, it's *still
> there*.

I would guess that that's largely because there are a lot more people
developing web sites than people tuning databases, and also because the
on-board documentation of javascript and jquery is poor, at least for
their audience.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 20:36:04
Message-ID: CAHyXU0y7d0GaV1pnUB9RcXRa8hGLc8RXPAYV2qH8uMQ0KjFMuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 2:33 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On tis, 2011-12-13 at 09:11 -0500, Greg Smith wrote:
>> Personal story on this.  When my book came out, I was trying to take
>> the #1 spot on Packt's bestseller list, even if it was just for a day.
>> Never made it higher than #2.  The #1 spot the whole time was "jQuery
>> 1.4 Reference Guide", discussing the most popular JavaScript library
>> out there.  And you know what?  Over a year later, it's *still
>> there*.
>
> I would guess that that's largely because there are a lot more people
> developing web sites than people tuning databases, and also because the
> on-board documentation of javascript and jquery is poor, at least for
> their audience.

jquery being used in as much as 40%+ of all websites by some estimates
is surely a contributing factor.

merlin


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 20:41:12
Message-ID: 1323808872.16048.16.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On tis, 2011-12-13 at 08:44 -0500, Robert Haas wrote:
> Just because all our languages are Turing-complete doesn't mean they
> are all equally well-suited to every task. Of course, that doesn't
> mean we'd add a whole new language just to get a JSON parser, but I
> don't think that's really what Peter was saying.

That was in fact what I was saying.

> Rather, I think the
> point is that embedded Javascript is *extremely* popular, lots and
> lots of people are supporting it, and we ought to seriously consider
> doing the same. It's hard to think of another PL that we could add
> that would give us anywhere near the bang for the buck that Javascript
> would.

If JavaScript (trademark of Oracle, btw.; be careful about calling
anything PL/JavaScript) had a near-canonical implementation with a
stable shared library and a C API, then this might be a no-brainer. But
instead we have lots of implementations, and the one being favored here
is written in C++ and changes the soname every 3 months. I don't think
that's the sort of thing we want to carry around.

The way forward here is to maintain this as an extension, provide debs
and rpms, and show that that is maintainable. I can see numerous
advantages in maintaining a PL outside the core; especially if you are
still starting up and want to iterate quickly.


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 20:57:23
Message-ID: 201112132057.pBDKvN500135@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Mon, Dec 12, 2011 at 4:08 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
> > On Mon, Dec 12, 2011 at 8:54 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> There are way too many places that assume that the typmod can
> >> just be discarded.
> >
> > If true, that probably ought to be documented cos it sounds fairly important.
> >
> > Where and when is it true?
>
> I'm not going to go compile an exhaustive list, since that would take
> a week and I don't have any particular desire to invest that much time
> in it, but just to take a couple of simple examples:
>
> rhaas=# create or replace function wuzzle(numeric(5,2)) returns int as
> $$select 1$$ language sql;
> CREATE FUNCTION
> rhaas=# \df wuzzle
> List of functions
> Schema | Name | Result data type | Argument data types | Type
> --------+--------+------------------+---------------------+--------
> public | wuzzle | numeric | | normal
> public | wuzzle | integer | numeric | normal
> (2 rows)
>
> rhaas=# select pg_typeof(1.23::numeric(5,2));
> pg_typeof
> -----------
> numeric
> (1 row)
>
> There are a very large number of others. Possibly grepping for places
> where we do getBaseType() rather than getBaseTypeAndTypmod() would be
> a way to find some of them.

I think the most common one I see is with concatentation:

test=> select 'abc'::varchar(3) || 'def'::varchar(3);
?column?
----------
abcdef
(1 row)

It is not really clear how the typmod length should be passed in this
example, but passing it unchanged seems wrong:

test=> select ('abc'::varchar(3) || 'def'::varchar(3))::varchar(3);
varchar
---------
abc
(1 row)

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-13 21:13:59
Message-ID: CAHyXU0wedZ7GEDzHb+FV8x73Ur0dGh14R2dfpCWkwRn=X9BoiA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 2:41 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On tis, 2011-12-13 at 08:44 -0500, Robert Haas wrote:
>> Just because all our languages are Turing-complete doesn't mean they
>> are all equally well-suited to every task.  Of course, that doesn't
>> mean we'd add a whole new language just to get a JSON parser, but I
>> don't think that's really what Peter was saying.
>
> That was in fact what I was saying.
>
>> Rather, I think the
>> point is that embedded Javascript is *extremely* popular, lots and
>> lots of people are supporting it, and we ought to seriously consider
>> doing the same.  It's hard to think of another PL that we could add
>> that would give us anywhere near the bang for the buck that Javascript
>> would.
>
> If JavaScript (trademark of Oracle, btw.; be careful about calling
> anything PL/JavaScript) had a near-canonical implementation with a
> stable shared library and a C API, then this might be a no-brainer.  But
> instead we have lots of implementations, and the one being favored here
> is written in C++ and changes the soname every 3 months.  I don't think
> that's the sort of thing we want to carry around.

Mozilla SpiderMonkey seems like a good fit: it compiles to a
dependency free .so, has excellent platform support, has a stable C
API, and while it's C++ internally makes no use of exceptions (in
fact, it turns them off in the c++ compiler). ISTM to be a suitable
foundation for an external module, 'in core' parser, pl, or anything
really.

merlin


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-15 08:31:39
Message-ID: CAP7QgmntStLutMRHsWZhpDV=-g3-KprT+SpVisb5kinz2atE_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 1:13 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> Mozilla SpiderMonkey seems like a good fit: it compiles to a
> dependency free .so, has excellent platform support, has a stable C
> API, and while it's C++ internally makes no use of exceptions (in
> fact, it turns them off in the c++ compiler).  ISTM to be a suitable
> foundation for an external module, 'in core' parser, pl, or anything
> really.

When I started to think about PL/js, I compared three of SpiderMonkey,
SquirrelFish, and V8. SpiderMonkey at that time (around 2009) was
not-fast, not-small in memory while what you raise, as well as its
advanced features like JS1.7 (pure yield!), was attractive. Also
SpiderMonkey was a little harder to build in arbitrary platform
(including Windows) than v8. SquirrelFish was fastest of three, but
yet it's sticky with Webkit and also hard to build itself. Dunno how
they've changed since then.

Thanks,
--
Hitoshi Harada


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-15 18:34:20
Message-ID: CA+U5nMJ5OJCKGqOc2sovtvLo=WtvkTmKSD4iC6nuBg=pXCfNKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 8:15 AM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:

> Issues we've encountered include:
>
>  * Should JSON be stored as binary or as text?

Text

>  * How do we deal with Unicode escapes and characters if the server or
> client encoding is not UTF-8?  Some (common!) character encodings have
> code points that don't map to Unicode.  Also, the charset conversion
> modules do not provide fast entry points for converting individual
> characters; each conversion involves a funcapi call.

Make JSON datatypes only selectable if client encoding is UTF-8.

Lets JFDI

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-15 21:47:30
Message-ID: 4EEA6AF2.3090707@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/15/2011 01:34 PM, Simon Riggs wrote:
> On Tue, Dec 13, 2011 at 8:15 AM, Joey Adams<joeyadams3(dot)14159(at)gmail(dot)com> wrote:
>
>> Issues we've encountered include:
>>
>> * Should JSON be stored as binary or as text?
> Text

Works for me.

>> * How do we deal with Unicode escapes and characters if the server or
>> client encoding is not UTF-8? Some (common!) character encodings have
>> code points that don't map to Unicode. Also, the charset conversion
>> modules do not provide fast entry points for converting individual
>> characters; each conversion involves a funcapi call.
> Make JSON datatypes only selectable if client encoding is UTF-8.

Yuck. Do we have this sort of restriction for any other data type?

ISTM that the encoding problem is at least as likely to be the reverse
of what's above - i.e. that there's a code point in the stored JSON
that's not represented in the client encoding.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-16 13:52:45
Message-ID: CA+Tgmob+duoYxp2Sp17+DR0O66ezkF+71o14cEZF6J8Y6-NENg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 15, 2011 at 4:47 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Make JSON datatypes only selectable if client encoding is UTF-8.
>
> Yuck. Do we have this sort of restriction for any other data type?

No, and I don't think it's necessary to do it here, either. Nor would
it be a good idea, because then the return value of EXPLAIN (FORMAT
JSON) couldn't unconditionally be json. But I think the important
point is that this is an obscure corner case. Let me say that one
more time: obscure corner case!

The only reason JSON needs to care about this at all is that it allows
\u1234 to mean Unicode code point 0x1234. But for that detail, JSON
would be encoding-agnostic. So I think it's sufficient for us to
simply decide that that particular feature may not work (or even, will
not work) for non-ASCII characters if you use a non-UTF8 encoding.
There's still plenty of useful things that can be done with JSON even
if that particular feature is not available; and that way we don't
have to completely disable the data type just because someone wants to
use EUC-JP or something.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-16 13:57:16
Message-ID: CA+U5nMJ9RqyGe0PObUPorr4+SGUMyQ5ZPwFhk=jxo_wUsPD5YA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 1:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Dec 15, 2011 at 4:47 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Make JSON datatypes only selectable if client encoding is UTF-8.
>>
>> Yuck. Do we have this sort of restriction for any other data type?
>
> No, and I don't think it's necessary to do it here, either.  Nor would
> it be a good idea, because then the return value of EXPLAIN (FORMAT
> JSON) couldn't unconditionally be json.  But I think the important
> point is that this is an obscure corner case.  Let me say that one
> more time: obscure corner case!
>
> The only reason JSON needs to care about this at all is that it allows
> \u1234 to mean Unicode code point 0x1234.  But for that detail, JSON
> would be encoding-agnostic.  So I think it's sufficient for us to
> simply decide that that particular feature may not work (or even, will
> not work) for non-ASCII characters if you use a non-UTF8 encoding.
> There's still plenty of useful things that can be done with JSON even
> if that particular feature is not available; and that way we don't
> have to completely disable the data type just because someone wants to
> use EUC-JP or something.

Completely agree. I was going to write almost exactly this in reply.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-16 17:13:56
Message-ID: CAARyMpC_rsMgR2=39ExLkN36d8sozA_7TQHK1bJSA5PCDDBeuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 8:52 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But I think the important point is that this is an obscure corner case. Let me say that one
more time: obscure corner case!

+1

> The only reason JSON needs to care about this at all is that it allows
> \u1234 to mean Unicode code point 0x1234.  But for that detail, JSON
> would be encoding-agnostic.  So I think it's sufficient for us to
> simply decide that that particular feature may not work (or even, will
> not work) for non-ASCII characters if you use a non-UTF8 encoding.
> There's still plenty of useful things that can be done with JSON even
> if that particular feature is not available; and that way we don't
> have to completely disable the data type just because someone wants to
> use EUC-JP or something.

So, if the server encoding is not UTF-8, should we ban Unicode escapes:

"\u00FCber"

or non-ASCII characters?

"über"

Also:

* What if the server encoding is SQL_ASCII?

* What if the server encoding is UTF-8, but the client encoding is
something else (e.g. SQL_ASCII)?

- Joey


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-16 22:39:51
Message-ID: CAAZKuFYrZohS-GCs6KR5xZ6=+pZvhowMa2emXxeFpYyRmKJpZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 13, 2011 at 1:13 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Tue, Dec 13, 2011 at 2:41 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> On tis, 2011-12-13 at 08:44 -0500, Robert Haas wrote:
>>> Just because all our languages are Turing-complete doesn't mean they
>>> are all equally well-suited to every task.  Of course, that doesn't
>>> mean we'd add a whole new language just to get a JSON parser, but I
>>> don't think that's really what Peter was saying.
>>
>> That was in fact what I was saying.
>>
>>> Rather, I think the
>>> point is that embedded Javascript is *extremely* popular, lots and
>>> lots of people are supporting it, and we ought to seriously consider
>>> doing the same.  It's hard to think of another PL that we could add
>>> that would give us anywhere near the bang for the buck that Javascript
>>> would.
>>
>> If JavaScript (trademark of Oracle, btw.; be careful about calling
>> anything PL/JavaScript) had a near-canonical implementation with a
>> stable shared library and a C API, then this might be a no-brainer.  But
>> instead we have lots of implementations, and the one being favored here
>> is written in C++ and changes the soname every 3 months.  I don't think
>> that's the sort of thing we want to carry around.
>
> Mozilla SpiderMonkey seems like a good fit: it compiles to a
> dependency free .so, has excellent platform support, has a stable C
> API, and while it's C++ internally makes no use of exceptions (in
> fact, it turns them off in the c++ compiler).  ISTM to be a suitable
> foundation for an external module, 'in core' parser, pl, or anything
> really.

To the best of my knowledge:

libv8 is also exception-free, and compiled with exceptions off. plv8
does make use of exceptions, though, something that gave me pause when
reading it. At first I thought it was to integrate with libv8, but
that did not seem to be the case, so it probably could learn to use
return codes instead. libv8 also has a light dependency list:

ldd /usr/lib/libv8.so (/lib/ entries and linux omitted)

libicuuc.so.44 => /usr/lib/libicuuc.so.44 (0x00007fc838459000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fc838151000)
libicudata.so.44 => /usr/lib/libicudata.so.44 (0x00007fc836aed000)

So ICU and C++.

In addition, more projects have been successful in embedding libv8;
right now it has the entrenchment advantage over libmozjs in
applications that are not closely tied to XUL/Mozilla, although that
could change in a few years. Institutionally Mozilla has not
historically been quick to prioritize anything not essential to
shipping Firefox, and I would imagine V8 is in a similar situation,
even though they occasionally make concessions for non-browsing use
cases (ex: multi-gigabyte heap sizes).

I would regard either choice as at least equally risky in this way,
given what I know (refinements welcome).

Both libv8 and libmozjs are maintained in Debian, and are parts of at
least one stable release.

In spite of the hazard posed by the aggressive releases and
non-general-purpose focus of the maintainers of both of these runtimes
at this time, I am still in favor of having a binding to at least one
of them into mainline, with the ability to get new or alternative
versions via extensions. If extensions were already pervasive and
everyone was installing them everywhere I'd think otherwise (just
leave it as an extension), but the cost of not being able to index and
manipulate JSON efficiently and with a trusted language is just too
huge to let slide.

Right now the perception of Postgres...actually, databases in general,
including virtually all of the newcomers -- is that they are
monolithic systems, and for most people either "9.3" will "have"
javascript and indexing of JSON documents, or it won't. In most cases
I would say "meh, let them eat cake until extensions become so
apparently dominant that we can wave someone aside to extension-land",
but in this case I think that would be a strategic mistake.

--
fdr


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-16 23:44:46
Message-ID: 4EEBD7EE.1060007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/16/2011 05:39 PM, Daniel Farina wrote:
> To the best of my knowledge:
>
> libv8 is also exception-free, and compiled with exceptions off. plv8
> does make use of exceptions, though, something that gave me pause when
> reading it. At first I thought it was to integrate with libv8, but
> that did not seem to be the case, so it probably could learn to use
> return codes instead.

Yeah. We should look at that.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-17 02:26:31
Message-ID: CA+Tgmoa8phL9tS3g0BwFs5Pgf7EDy-gq78D6QuZP3UDot=PLqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 12:13 PM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> So, if the server encoding is not UTF-8, should we ban Unicode escapes:
>
>    "\u00FCber"
>
> or non-ASCII characters?
>
>    "über"

The former. Refusing the escapes makes sense, because it's totally
unclear how we ought to interpret them. Refusing the characters would
be just breaking something for no particular reason. Right now, for
example, EXPLAIN (FORMAT JSON) could easily end up returning non-ASCII
characters in whatever the database encoding happens to be. That
command would be unusable if we arbitrarily chucked an error every
time a non-ASCII character showed up and the database encoding wasn't
UTF-8.

> Also:
>
>  * What if the server encoding is SQL_ASCII?
>
>  * What if the server encoding is UTF-8, but the client encoding is
> something else (e.g. SQL_ASCII)?

It's not clear to me why these cases would require any special handling.

In the spirit of Simon's suggestion that we JFDI, I cooked up a patch
today that JFDI. See attached. This lacks any form of
canonicalization and therefore doesn't support comparison operators.
It also lacks documentation, regression testing, and probably an
almost uncountable number of other bells and whistles that people
would like to have. This is more or less a deliberate decision on my
part: I feel that the biggest problem with this project is that we've
spent far too much time dithering over what the exactly perfect set of
functionality set would be, and not enough time getting good basic
functionality committed. So this is as basic as it gets. It does
exactly one thing: validation. If people are happy with it, we can
extend from here incrementally.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
json-rmh-v1.patch application/octet-stream 21.1 KB

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-17 11:53:45
Message-ID: CA+U5nMJSLr1C9Py5OY=+VukticudqBV=fcPNjOy24tG9XL_36Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 2:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> In the spirit of Simon's suggestion that we JFDI, I cooked up a patch
> today that JFDI.  See attached.

Which looks very good.

Comments
* Comment for IDENTIFICATION of json.c says contrib/json/json.c
* json.c contains a duplicate of a line from header file "extern Datum
json_in(PG_FUNCTION_ARGS);"

And additionally, a quote from our fine manual...

"Caution: Some XML-related functions may not work at all on non-ASCII
data when the server encoding is not UTF-8. This is known to be an
issue for xpath() in particular." .... so I think this approach works
for JSON too.

Adding tests and docs is a must, nothing else is right now. Once we
have this, others can add the bells and whistles, possibly in 9.2

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-17 22:02:49
Message-ID: m2obv6501i.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> The way forward here is to maintain this as an extension, provide debs
> and rpms, and show that that is maintainable. I can see numerous
> advantages in maintaining a PL outside the core; especially if you are
> still starting up and want to iterate quickly.

I'd like to add some confusion on the implementation choice, because it
looks damn too easy now… Guile 2.0 offers an implementation of the
ECMAscript language and plscheme already exists as a PostgreSQL PL
extension for integrating with Guile.

http://plscheme.projects.postgresql.org/

http://wingolog.org/archives/2009/02/22/ecmascript-for-guile
http://packages.debian.org/sid/guile-2.0

http://www.gnu.org/software/guile/

Guile is an extension language platform

Guile is an efficient virtual machine that executes a portable
instruction set generated by its optimizing compiler, and integrates
very easily with C and C++ application code. In addition to Scheme,
Guile includes compiler front-ends for ECMAScript and Emacs Lisp
(support for Lua is underway), which means your application can be
extended in the language (or languages) most appropriate for your user
base. And Guile's tools for parsing and compiling are exposed as part
of its standard module set, so support for additional languages can be
added without writing a single line of C.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 00:06:02
Message-ID: CA+TgmoZCyxjhTwt+mDuJFf8hKdg6O4Um=5zpANRVe1i6=TPk6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 5:02 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> I'd like to add some confusion on the implementation choice, because it
> looks damn too easy now… Guile 2.0 offers an implementation of the
> ECMAscript language and plscheme already exists as a PostgreSQL PL
> extension for integrating with Guile.

It seems like the licensing there could potentially be problematic.
It's GPL with a linking exception. Not sure we want to go there.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 00:50:11
Message-ID: F087D41D-ABBC-47DF-9DB8-0340554DF954@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 17, 2011, at 3:53 AM, Simon Riggs wrote:

> Which looks very good.

Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here:

> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>
> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.

Personally, I think that there really should be a core key/value-type data type, and json is probably the best possible choice, absent a SQL-standard-mandated type (which would probably suck anyway). But I think it worthwhile to hear alternate points of view, and this isn't far from what Jan said last week.

Best,

David


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 03:21:03
Message-ID: CA+TgmoY=YRh4y1UpKtLC+6oEjJ1=VpLEOGcQp5Qb5y=NPdfBOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here:
>
>> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>>
>> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.

That's fair enough, but we've had *many* requests for this
functionality in core, I don't see what we lose by having at least
some basic functionality built in. There is always room for people to
provide extensions and add-ons that build on whatever core support we
provide. There must be an order of magnitude more demand for this
data type than there is for any other potential new in-core data type.
Off the top of my head, I can't think of anything else that's even in
the same league; can you?

As for whether to use somebody else's implementation or roll our own,
I'm not convinced there's any value in reusing somebody else's
implementation. Consider a library like json-c, just the first thing
I happened to download. The license is compatible, so that's good.
But the coding style is completely different from ours, the memory
management is not compatible with ours (it uses calloc and it's not
pluggable), the error messages don't follow our style guidelines and
won't work with our translation infrastructure, it has its own
printfbuf which is redundant with our StringInfoData, it has its own
hash table implementation which is also redundant with code we already
have, and of course it won't contain anything like
CHECK_FOR_INTERRUPTS() any place that might be needed. Now, sure,
all of those problems are fixable. But by the time you get done it's
not any less work than writing your own, and probably not as well
adapted to our particular needs.

The jansson library has a pluggable memory allocator, but most of the
other complaints above still apply; and I see, for example, that it
contains its own UTF-8 validator and locale conversion routines, which
is undoubtedly not what we want. jansson also interprets numeric
values as either a native integer or floating point values, which
limits the amount of precision available and means that values may be
output in a manner quite different from how they were put in.
Although this is probably legal, since RFC4627 states that JSON
parsers may set limits on the range of numbers, I think it's an
unnecessary and undesirable limitation. I have always enjoyed the
ability to -- for example -- SELECT 400! in PostgreSQL and get an
exact answer, and I think it would be nice if I could store the result
in a JSON object. I am also quite certain that someone will propose
(or, perhaps, actually write) a function to convert a record to a JSON
object, and I think it would be mighty nice if numeric, int4, and int8
could be transformed into JSON numbers rather than JSON strings, which
isn't going to work - at least for numeric - if there are significant
range or precision limitations.

For XML, it makes a lot of sense for us to integrate with an external
library. Consider libxml2, the library we actually do integrate with.
The root directory has over 275,000 lines of code in .c and .h files.
Obviously, it's worth suffering through some pain (and we definitely
have suffered through some pain) to avoid having to rewrite some
substantial portion of that code. Providing some basic JSON support
figures to require about two orders of magnitude less code, which IMHO
makes the calculation completely different. The reason there are so
many JSON libraries out there is because it only takes a day to write
one. If you look at a couple of JSON parsers written by other people
and don't find exactly what you're looking for, you just go write one
of your own. You then have the option to hang out a shingle and
critique the next three people who come along and do the same thing,
but is that really justified? Odds are good that their reasons for
rolling their own were just as good as yours.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 03:40:26
Message-ID: 20680.1324179626@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. Hes a bit less supportive. He gave me permission to quote him here:

>>> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>>> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.

> That's fair enough, but we've had *many* requests for this
> functionality in core, I don't see what we lose by having at least
> some basic functionality built in. There is always room for people to
> provide extensions and add-ons that build on whatever core support we
> provide.

Well, I think that that's exactly the question here: if we do something
in core, will it foreclose options for people who want to do add-ons?

The main thing that's troubling me there is the issue of plain text
representation versus something precompiled (and if the latter, what
exactly). I don't see how you "build on" a core datatype that makes a
decision different from what you wanted for that.

I'm also +1 to Claes' opinion that this can be perfectly well managed as
an add-on. We've sweated blood over many years to make PG extensions
work nicely, and they now work better than they ever have. It's not
clear to me why the push to make something core when it can obviously be
an extension.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 03:42:09
Message-ID: FA1BABEF-426E-4454-B099-A63EB509FAE4@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 17, 2011, at 7:40 PM, Tom Lane wrote:

> Well, I think that that's exactly the question here: if we do something
> in core, will it foreclose options for people who want to do add-ons?

Why would it? They would just have to use a different name.

Best,

David


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 03:46:03
Message-ID: CA+Tgmoan+--uYR59Yc3N9R3a8kSxZMpHcNX6y6a88Xnmh--_Pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 10:42 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Dec 17, 2011, at 7:40 PM, Tom Lane wrote:
>> Well, I think that that's exactly the question here: if we do something
>> in core, will it foreclose options for people who want to do add-ons?
>
> Why would it? They would just have to use a different name.

Yeah, exactly. Or for that matter, the same name in a different
schema. And as for the question of text vs. binary, that's going to
be two separate data types whether it gets done in core or elsewhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jan Urbański <wulczer(at)wulczer(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 09:49:27
Message-ID: 4EEDB727.5050903@wulczer.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 18/12/11 04:21, Robert Haas wrote:
> On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here:
>>
>>> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>>>
>>> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.
>
> That's fair enough, but we've had *many* requests for this
> functionality in core, I don't see what we lose by having at least
> some basic functionality built in.

I think having a JSON data type in core would drastically limit the
exposure third-party JSON extensions would get and that's bad. There are
tons of interesting features a JSON type could have and tying its
development to a one year release cycle might be a disservice both for
people who are willing to provide these features earlier, the users
which are faced with a choice between a fast-moving third-party addon
and a blessed core type and would cause overall confusion.

How about we try the tsearch way and let JSON extensions live outside
core for some time and perhaps if one emerges dominant and would benefit
from inclusion then consider it?

If we keep treating extensions as second-class citizens, they'll never
get the mindshare and importance we seem to want for them (or otherwise
why go through all the trouble to provide an infrastructure for them).

Cheers,
Jan


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 09:54:51
Message-ID: CAFj8pRBMPXYnPF2GD8SyQa55R9F4fLwO=GcbPB9AwK6bO7uO+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2011/12/18 Jan Urbański <wulczer(at)wulczer(dot)org>:
> On 18/12/11 04:21, Robert Haas wrote:
>> On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>>> Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here:
>>>
>>>> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>>>>
>>>> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.
>>
>> That's fair enough, but we've had *many* requests for this
>> functionality in core, I don't see what we lose by having at least
>> some basic functionality built in.
>
> I think having a JSON data type in core would drastically limit the
> exposure third-party JSON extensions would get and that's bad. There are
> tons of interesting features a JSON type could have and tying its
> development to a one year release cycle might be a disservice both for
> people who are willing to provide these features earlier, the users
> which are faced with a choice between a fast-moving third-party addon
> and a blessed core type and would cause overall confusion.
>
> How about we try the tsearch way and let JSON extensions live outside
> core for some time and perhaps if one emerges dominant and would benefit
> from inclusion then consider it?

it should be contrib modules

Pavel

>
> If we keep treating extensions as second-class citizens, they'll never
> get the mindshare and importance we seem to want for them (or otherwise
> why go through all the trouble to provide an infrastructure for them).
>
> Cheers,
> Jan
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Peter van Hardenberg <pvh(at)pvh(dot)ca>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Joseph Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 11:13:35
Message-ID: m2aa6q3zfk.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Dec 17, 2011 at 5:02 PM, Dimitri Fontaine
> <dimitri(at)2ndquadrant(dot)fr> wrote:
>> I'd like to add some confusion on the implementation choice, because it
>> looks damn too easy now… Guile 2.0 offers an implementation of the
>> ECMAscript language and plscheme already exists as a PostgreSQL PL
>> extension for integrating with Guile.
>
> It seems like the licensing there could potentially be problematic.
> It's GPL with a linking exception. Not sure we want to go there.

It's LGPL so it's compatible (only the readline part is subject to GPL,
we're familiar enough with that though).

http://www.gnu.org/software/guile/docs/docs-2.0/guile-ref/Guile-License.html

The Guile library (libguile) and supporting files are published under
the terms of the GNU Lesser General Public License version 3 or later.
See the files COPYING.LESSER and COPYING.

C code linking to the Guile library is subject to terms of that
library. Basically such code may be published on any terms, provided
users can re-link against a new or modified version of Guile.

Scheme level code written to be run by Guile (but not derived from
Guile itself) is not restricted in any way, and may be published on
any terms. We encourage authors to publish on Free terms.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Jan Urbański <wulczer(at)wulczer(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 12:41:16
Message-ID: CABUevEwGGteohK1iut34GZX-U30B2A2t5gZU8R1M-kBO1fh5Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 18, 2011 at 10:49, Jan Urbański <wulczer(at)wulczer(dot)org> wrote:
> On 18/12/11 04:21, Robert Haas wrote:
>> On Sat, Dec 17, 2011 at 7:50 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>>> Love having the start here. I forwarded this message to Claes Jakobsson, creator of the jansson-using pg-json extension. He’s a bit less supportive. He gave me permission to quote him here:
>>>
>>>> Frankly I see the inclusion of a JSON datatype in core as unnecessary. Stuff should be moved out of core rather than in, as we do in Perl. Also, does this patch mean that the 'json' type is forever claimed and can't be replaced by extensions?
>>>>
>>>> There's little reason to reimplement JSON parsing, comparision and other routines when there's a multitude of already good libraries.
>>
>> That's fair enough, but we've had *many* requests for this
>> functionality in core, I don't see what we lose by having at least
>> some basic functionality built in.
>
> I think having a JSON data type in core would drastically limit the
> exposure third-party JSON extensions would get and that's bad. There are

The same way that having replication in core is bad for the rest of
the replication engines? While it has certainly decreased the usage of
for example Slony, I don't think anybody can say it's a bad thing that
we have this in core...

And of course, *not* having it in core, we didn't have people claiming
for many years that "postgres has no replication" or anything like
that... The fact is that a *lot* of our users, particularly in large
companies, will never install an extension that's not part of core.
Just look at other discussions about it even being a problem with it
being in *contrib*, which is still maintained and distributed by the
same developers.

We can hopefully get around this for the extensions in contrib (and
reasonably well has already), but few large companies are going to be
happy to go to pgxn and download an extension that has a single
maintainer (not "the team", and in most cases not even "a team"),
usually no defined lifecycle, no support, etc. (I'm pretty sure you
won't get support included for random pgxn modules when you buy a
contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
if it the datatype is in core, you *will* get this)

So I'm not sure it would really lessen the exposure much at all -
those that are willing to install such extensions already, are surely
capable of finding it themselves (using pgxn for example - or even
google)

> tons of interesting features a JSON type could have and tying its
> development to a one year release cycle might be a disservice both for
> people who are willing to provide these features earlier, the users
> which are faced with a choice between a fast-moving third-party addon
> and a blessed core type and would cause overall confusion.

And the other option would be to *only* have a fast-moving third-party
addon, which simply disqualifies it completely in many environments.

Keeping it as a third party addon is better for the developer. Keeping
it in core is better for the user (if the user is a large company -
not a hacker).

If we can find a way to have a stable part in core and then have
addons that can provide these "tons of interesting features" (which I
agree there are) until such time that they can be considered stable
enough for core, I think that's the best compromise.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>, dimitri(at)2ndquadrant(dot)fr
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 17:17:45
Message-ID: CAHyXU0zAL+fPi71XFfsgMyEBgX-JFbuYYJ2QQub1sVZh1A1UvQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 17, 2011 at 4:02 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:> to add some confusion on the
implementation choice, because it
> looks damn too easy now… Guile 2.0 offers an implementation of the> ECMAscript language and plscheme already exists as a PostgreSQL PL> extension for integrating with Guile.

TBH, I think that's PFC (pretty cool).
On Sun, Dec 18, 2011 at 6:41 AM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> We can hopefully get around this for the extensions in contrib (and
> reasonably well has already), but few large companies are going to be
> happy to go to pgxn and download an extension that has a single
> maintainer (not "the team", and in most cases not even "a team"),
> usually no defined lifecycle, no support, etc. (I'm pretty sure you
> won't get support included for random pgxn modules when you buy a
> contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
> if it the datatype is in core, you *will* get this)

100% agree on all points. with the new extension system, contrib
modules that are packaged with the core system can be considered to be
in core because they are:
*) documented in standard docs
*) supported and bugfixed with postgresql releases
*) ready to be used without compiler support or even shell access
through most binary distributions

One small note about the json type being an extension -- this probably
means the json type oid won't be fixed -- not a huge deal but it could
affect some corner cases with binary format consumers.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, dimitri(at)2ndquadrant(dot)fr, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 17:21:24
Message-ID: 4EEE2114.7000701@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/18/2011 12:17 PM, Merlin Moncure wrote:
> One small note about the json type being an extension -- this probably
> means the json type oid won't be fixed -- not a huge deal but it could
> affect some corner cases with binary format consumers.
>
>

Why would that matter more for JSON than for any other non-core type?

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, dimitri(at)2ndquadrant(dot)fr, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 17:34:21
Message-ID: CAHyXU0y4WxeT1Y+y-yCgj5scNb7nN_tNFTmurpMLFJKoshV5GA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 18, 2011 at 11:21 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 12/18/2011 12:17 PM, Merlin Moncure wrote:
>>
>> One small note about the json type being an extension -- this probably
>> means the json type oid won't be fixed -- not a huge deal but it could
>> affect some corner cases with binary format consumers.
>
> Why would that matter more for JSON than for any other non-core type?

well, it's a minor headache for all the oid-isn't-in-pgtypes.h types,
and only then for high traffic types (which presumably json will be).
a while back we coded up a reworked dblink that was variadic and
could optionally transfer data between database with the binary wire
format. any container of a user defined (by oid) type had to be sent
strictly as text which is a big performance hit for certain types.
recent postgres has an undocumented facility to force type oids to a
particular value, but the type definition being inside the create
extension script makes this problematic.

this is a pretty far out objection though, and I could certainly work
around the problem if necessary, but there is some dependency on
pg_types.h in the wild.

merlin


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, dimitri(at)2ndquadrant(dot)fr, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-18 18:26:38
Message-ID: m2aa6p3fdt.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> Why would that matter more for JSON than for any other non-core type?
>
> well, it's a minor headache for all the oid-isn't-in-pgtypes.h types,
> and only then for high traffic types (which presumably json will be).

Extensions are going to be more and more used and “pervasive” in next
years, and binary wire transfers is a good goal. What about creating
something like the PostgreSQL types IANA?

New type authors would register their OID and as a benefit would get
listed on some public reference sheet, and we could add some mechanism
so that default CREATE TYPE calls will not use reserved OID numbers.

Then it would be all cooperative only, so not a security thing, just a
way to ease binary and extension co-existence.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-19 04:41:54
Message-ID: CAHyXU0xUPdd1wg3V5fY7TyYCHbny3U-2qVKBLr_3oo4_WH+QgQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Dec 18, 2011 at 12:26 PM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>>> Why would that matter more for JSON than for any other non-core type?
>>
>> well, it's a minor headache for all the oid-isn't-in-pgtypes.h types,
>> and only then for high traffic types (which presumably json will be).
>
> Extensions are going to be more and more used and “pervasive” in next
> years, and binary wire transfers is a good goal.  What about creating
> something like the PostgreSQL types IANA?
>
> New type authors would register their OID and as a benefit would get
> listed on some public reference sheet, and we could add some mechanism
> so that default CREATE TYPE calls will not use reserved OID numbers.
>
> Then it would be all cooperative only, so not a security thing, just a
> way to ease binary and extension co-existence.

I think that's a fabulous idea,although we're drifting off the stated
topic here.

Getting back on point, I'm curious about your statement: "without
writing a single line of C". I took a look at the pl/scheme docs and
was pretty impressed -- what exactly would be involved to get a
guile-based ECMAscript working over the pl/scheme implementation? How
would that interact exactly with the stated topic -- JSON support? Do
you even need a json type if you have strong library based parsing and
composition features?

merlin


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-19 10:49:03
Message-ID: 8739cgx2e8.fsf@hi-media-techno.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> Getting back on point, I'm curious about your statement: "without
> writing a single line of C". I took a look at the pl/scheme docs and
> was pretty impressed -- what exactly would be involved to get a
> guile-based ECMAscript working over the pl/scheme implementation? How
> would that interact exactly with the stated topic -- JSON support? Do
> you even need a json type if you have strong library based parsing and
> composition features?

My understanding is that JSON is a subset of ECMAscript, so if you get
the latter you already have the former. Now, someone would have to
check if plscheme still build with guile-2.0, and given that, how
exactly you get pl/ecmascript (or pl/js) out of that.

I won't be in a position to spend time on that this year…

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-19 23:26:37
Message-ID: 7B73B08C-4EBF-48B4-8E5A-5EB2C5720660@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote:

> We can hopefully get around this for the extensions in contrib (and
> reasonably well has already), but few large companies are going to be
> happy to go to pgxn and download an extension that has a single
> maintainer (not "the team", and in most cases not even "a team"),
> usually no defined lifecycle, no support, etc. (I'm pretty sure you
> won't get support included for random pgxn modules when you buy a
> contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
> if it the datatype is in core, you *will* get this)

I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN.

Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from.

> If we can find a way to have a stable part in core and then have
> addons that can provide these "tons of interesting features" (which I
> agree there are) until such time that they can be considered stable
> enough for core, I think that's the best compromise.

+1, though I think the core type will at least need some basic operators and indexing support.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-19 23:39:36
Message-ID: 3E1086B6-636A-461C-82E8-C6FBAA6ABC78@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:

> My understanding is that JSON is a subset of ECMAscript

Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.

http://json.org/

IOW, one does not need a new PL to get this type.

Best,

David


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-19 23:42:38
Message-ID: 19C9A6A1-440B-46FF-B3BB-5E89E2B6C399@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 19, 2011, at 3:39 PM, David E. Wheeler wrote:

> Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.

Bah, it says “It is based on a subset of the JavaScript Programming Language, Standard ECMA-262 3rd Edition - December 1999.” But my point still holds: it is not a programming language, and one does not need a PL to have a JSON data type.

Best,

David


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 05:00:37
Message-ID: CA+TgmoYSGKmXyO4Cnu06D-VYDdihpsFsG+6Zh07zJgRPgke3iw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> +1, though I think the core type will at least need some basic operators and indexing support.

And I'm willing to do that, but I thought it best to submit a bare
bones patch first, in the hopes of minimizing the number of
objectionable things therein. For example, if you want to be able to
index a JSON column, you have to decide on some collation order that
is consistent with JSON's notion of equality, and it's not obvious
what is most logical. Heck, equality itself isn't 100% obvious. If
there's adequate support for including JSON in core, and nobody
objects to my implementation, then I'll throw some ideas for those
things up against the wall and see what sticks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 10:13:27
Message-ID: CABUevEyV6yBEBNAeHwhNhXmUzpPUfTCWTP-OQreVi8iXKRF8_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 20, 2011 at 00:26, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Dec 18, 2011, at 4:41 AM, Magnus Hagander wrote:
>
>> We can hopefully get around this for the extensions in contrib (and
>> reasonably well has already), but few large companies are going to be
>> happy to go to pgxn and download an extension that has a single
>> maintainer (not "the team", and in most cases not even "a team"),
>> usually no defined lifecycle, no support, etc. (I'm pretty sure you
>> won't get support included for random pgxn modules when you buy a
>> contract from EDB, or CMD, or us, or PGX, or anybody really - wheras
>> if it the datatype is in core, you *will* get this)
>
> I support having a JSON type in core, but question the assertions here. *Some* organizations won’t use PGXN, usually because they require things through a different ecosystem (RPMs, .debs, StackBuilder, etc.). But many others will. There are a *lot* of companies out there that use CPAN, easy_install, and Gem. The same sorts of places will use PGXN.

Yes, that's why I said "few" not "none".

Though in my experience, most companies are a lot more restrictive
about addons to their database than addons to their development
environments.

And note that it's not PGXN that's the problem I'm pointing at,
neither is it CPAN or easy_install or gem. The problem is the
vulnerability of the addon, and the maintenance. Meaning if it has a
single maintainer, that's a whole different thing from being
maintained by the PGDG.

> Oh, and at PGX, we’ll happily provide support for random modules, so long as you pay for our time. We’re not picky (and happy to send improvements back upstream), though we might recommend you switch to something better. But such evaluations are based on quality, not simply on what ecosystem it came from.

I think we're talking about different things here. While we can
certainly provide support on specific modules, after that is entered
into the agreement with the customer, we won't support a customer who
just calls up and says "hey, I'm using module xyz which you've never
heard of, and it crashes my database, please come fix it now". Are you
saying you do that - providing SLAs, 24/7 and similar things, on
modules you didn't even know the customer was using?

And FWIW, I'm talking about the quality, and not the ecosystem as
well. I'm just saying it takes a lot more work to verify the quality
and maintenance of an external module - if it's part of postgresql,
you have *already* got a quality stamp and a maintenance promise from
that.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 10:14:23
Message-ID: CABUevEz9Vb3NEEw9J5QkWH=XqAEXSE16-T47RLMqZY0Oxza2pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 20, 2011 at 06:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Dec 19, 2011 at 6:26 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> +1, though I think the core type will at least need some basic operators and indexing support.
>
> And I'm willing to do that, but I thought it best to submit a bare
> bones patch first, in the hopes of minimizing the number of
> objectionable things therein.  For example, if you want to be able to
> index a JSON column, you have to decide on some collation order that
> is consistent with JSON's notion of equality, and it's not obvious
> what is most logical.  Heck, equality itself isn't 100% obvious.  If
> there's adequate support for including JSON in core, and nobody
> objects to my implementation, then I'll throw some ideas for those
> things up against the wall and see what sticks.

+1 for getting the basics in first, and then adding more to it later.
There's still a fair amount of time to do that for 9.2, but not if we
get stuck bikeshedding again...

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 18:23:43
Message-ID: m262hbyuds.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"David E. Wheeler" <david(at)justatheory(dot)com> writes:
> holds: it is not a programming language, and one does not need a PL to have
> a JSON data type.

Exactly. That does not contradict the fact that if you have
pl/ecmascript you already have JSON. And that we might as well have had
the ecmascript PL for some time now, we just need to check about that.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 18:26:33
Message-ID: 201112201926.33782.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, December 20, 2011 07:23:43 PM Dimitri Fontaine wrote:
> "David E. Wheeler" <david(at)justatheory(dot)com> writes:
> > holds: it is not a programming language, and one does not need a PL to
> > have a JSON data type.
> Exactly. That does not contradict the fact that if you have
> pl/ecmascript you already have JSON. And that we might as well have had
> the ecmascript PL for some time now, we just need to check about that.
Not really. You need to be able to "evaluate" json without it possibly
executing code. Many js implementations are likely to have such a feature
though.

Andres


From: Claes Jakobsson <claes(at)surfar(dot)nu>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 18:39:54
Message-ID: D1B8E142-2A38-4B02-A021-CA08D13BE78E@surfar.nu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote:
> On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:
>
>> My understanding is that JSON is a subset of ECMAscript
>
> Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.
>
> http://json.org/

Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?

if b) then this might get a lot more interesting

Cheers,
Claes


From: Christopher Browne <cbbrowne(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2011-12-20 20:52:04
Message-ID: CAFNqd5W+P+tTrCK52CEf-koxzWrFNJGZAG64zef+z6mbu+e=9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 19, 2011 at 5:49 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> My understanding is that JSON is a subset of ECMAscript, so if you get
> the latter you already have the former.  Now, someone would have to
> check if plscheme still build with guile-2.0, and given that, how
> exactly you get pl/ecmascript (or pl/js) out of that.

I don't think so.

I checked it out (still on pgfoundry, still on CVS, and code hasn't
been touched since 2008), and run into some issues.

- It looks for libguile.h
#include "libguile.h"
which, on 2.0, has shifted around from /usr/include/libguile.h (1.8)
to /usr/include/guile/2.0/libguile.h

It's not doing enough indirections internally; there is a guile-config
that is analogous to pg_config

postgres(at)cbbrowne [03:48:43] [~/PostgreSQL/plscheme]
-> % guile-config compile
-pthread -I/usr/include/guile/2.0
postgres(at)cbbrowne [03:48:45] [~/PostgreSQL/plscheme]
-> % guile-config link
-lguile-2.0 -lgc

It looks like there's something PG-related as a next issue:

-> % ./install.sh
pg_config : /var/lib/postgresql/dbs/postgresql-HEAD/bin/pg_config
module-dir : /var/lib/postgresql/dbs/postgresql-HEAD/lib
max-cache-size: 64
dbname : postgres
safe-r5rs : NO
dbacreate : NO
PSQL : /var/lib/postgresql/dbs/postgresql-HEAD/bin/psql postgres
CPPFLAGS : -g -Wall -fpic -c
-I/var/lib/postgresql/dbs/postgresql-HEAD/include/server
-I/usr/include/guile/2.0
LDFLAGS : -shared -lguile

Compiling... failed!

plscheme.c: In function '_PG_init':
plscheme.c:647:2: warning: implicit declaration of function
'DefineCustomStringVariable' [-Wimplicit-function-declaration]
plscheme.c:650:30: error: 'PGC_BACKEND' undeclared (first use in this function)
plscheme.c:650:30: note: each undeclared identifier is reported only
once for each function it appears in
plscheme.c:652:2: warning: implicit declaration of function
'DefineCustomIntVariable' [-Wimplicit-function-declaration]
plscheme.c: In function 'plscheme_func_handler':
plscheme.c:742:2: warning: implicit declaration of function
'GetTopTransactionId' [-Wimplicit-function-declaration]
plscheme.c: In function 'parse_trig_args':
plscheme.c:1623:44: error: dereferencing pointer to incomplete type
plscheme.c:1628:38: error: dereferencing pointer to incomplete type
...
(See error.log for details.)

I'm not sure to what degree this is bitrot relating to:
a) Postgres changes
b) Guile changes
but there's doubtless a bit of both.

I'd think it interesting to get this back to working order, whether
it's useful for JavaScript or not.
--
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-21 02:04:29
Message-ID: A3838829-7DF8-4787-BE11-D69C67492C94@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 19, 2011, at 9:00 PM, Robert Haas wrote:

>> +1, though I think the core type will at least need some basic operators and indexing support.
>
> And I'm willing to do that, but I thought it best to submit a bare
> bones patch first, in the hopes of minimizing the number of
> objectionable things therein. For example, if you want to be able to
> index a JSON column, you have to decide on some collation order that
> is consistent with JSON's notion of equality, and it's not obvious
> what is most logical. Heck, equality itself isn't 100% obvious. If
> there's adequate support for including JSON in core, and nobody
> objects to my implementation, then I'll throw some ideas for those
> things up against the wall and see what sticks.

+1 Sounds good to me.

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, Claes Jakobsson <claes(at)gluefinance(dot)com>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-21 02:05:30
Message-ID: B784010D-3F8B-4E84-B984-2B2FE805623F@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 20, 2011, at 2:13 AM, Magnus Hagander wrote:

> Yes, that's why I said "few" not "none".
>
> Though in my experience, most companies are a lot more restrictive
> about addons to their database than addons to their development
> environments.

Yeah, we’re getting off-topic here, so I’ll just say something we can agree on: We’ll see.

I do still want to see some processes for getting PGXN distributions into RPM/.deb/StackBuilder, though.

Best,

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Claes Jakobsson <claes(at)surfar(dot)nu>
Cc: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-21 02:06:53
Message-ID: 88EBB749-A03C-434C-ADE7-20C1CCE4C887@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:

> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?

Yes.

> if b) then this might get a lot more interesting

JSON is the most popular/likely way to represent that, I think.

David


From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Claes Jakobsson <claes(at)surfar(dot)nu>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2011-12-22 08:21:34
Message-ID: 20111222082133.GO7768@ldn-qws-004.delacy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Let me mention another lightweight data-interchange format.

At http://www.janestreet.com we have developed a small c module to deal
with S-expressions (sexp) as a way to store arbitrary data. As we write
most of our code in OCaml sexps are a natural way for us to store data.
http://hg.ocaml.info/release/sexplib/ provides automatic ways to convert
"any" ocaml value into a sexp).

The extension is still pretty new but we use it successfully on a daily
basis. After we have upgraded to 9.x we will pack it as an extension
and releast it opensource.

API wise the module at the moment offers the following:

sexp_validate(text) returns boolean
Validate that the passed in text is a valid s expression.

create domain sexp as text check (sexp_validate(value));

BTW: It is a PITA that arrays of domains are not valid types.

And several functions to manipulate take apart sexp's or modify sexp's
using a path into the sexp (similar to what xpath does for xml).

Such as:

sexp_get(sexp, text) returns sexp
Get the sub sexp of sexp identified by the path.
Returns NULL if path is not a valid path in sexp.
Example:
path=.a space.b.[1].x
((ignore this) ("a space" ((b (0 ((also ignored) (x "The Value")) )) )))
-> "The Value"

And sexp_get_atom(sexp, text) returns text
Get the sub atom of sexp identified by the path.
Returns NULL if path is not a valid path in sexp or
does not identify an atom.
Example:
path=.a space.b.[1].x
((ignore this) ("a space" ((b (0 ((also ignored) (x "The Value")) )) )))
^^^^^^^^^
-> The Value

Cheers,

Bene

On 20/12/11 19:39, Claes Jakobsson wrote:
> On Dec 20, 2011, at 12:39 AM, David E. Wheeler wrote:
> > On Dec 19, 2011, at 2:49 AM, Dimitri Fontaine wrote:
> >
> >> My understanding is that JSON is a subset of ECMAscript
> >
> > Well, no, JSON is formally “a lightweight data-interchange format.” It’s derived from JavaScript syntax, but it is not a programming language, so I wouldn’t say it was accurate to describe it as a subset of JS or ECMAScript.
> >
> > http://json.org/
>
> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?
>
> if b) then this might get a lot more interesting
>
> Cheers,
> Claes
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 02:04:46
Message-ID: CA+TgmoZOenAPXXj_uy1Un2=Mp00MedypQBDY3hXTahvO_rrnKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
>> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?
>
> Yes.
>
>> if b) then this might get a lot more interesting
>
> JSON is the most popular/likely way to represent that, I think.

On that note, here's an updated version of the patch I posted
upthread, with some regression tests and minimal documentation.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
json-rmh-v2.patch application/octet-stream 34.7 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 06:18:28
Message-ID: CAFj8pRCsG9D2SWgk6PObW8ArkrzfP6KXaydjXhhKr4K-D56hXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>> On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
>>> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?
>>
>> Yes.
>>
>>> if b) then this might get a lot more interesting
>>
>> JSON is the most popular/likely way to represent that, I think.
>
> On that note, here's an updated version of the patch I posted
> upthread, with some regression tests and minimal documentation.

I like this patch and this feature.

I see only one issue - there is not functionality that helps generate
JSON in pg.

What do you think about functions: array_to_json(anyarray),
row_to_json(any) and format_json(text, text, ...)

Regards

Pavel

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 13:10:17
Message-ID: 4F0D8A39.7080305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/11/2012 01:18 AM, Pavel Stehule wrote:
>
> I like this patch and this feature.

I'm about to read the patch in detail - I certainly like the feature.

>
> I see only one issue - there is not functionality that helps generate
> JSON in pg.
>
> What do you think about functions: array_to_json(anyarray),
> row_to_json(any) and format_json(text, text, ...)
>

Actually, more than these, I (and at least one very interested client)
want query_to_json, which would do something like:

# select q2json('select $$a$$ || x as b, y as c from generate_series(1,3) x, generate_series(4,5) y');
q2json
---------------------------------------------------------------------------------------------------------
[{"b":"a1","c":4},{"b":"a1","c":5},{"b":"a2","c":4},{"b":"a2","c":5},{"b":"a3","c":4},{"b":"a3","c":5}]

No doubt several variants are possible such as returning a setof json,
one per row, instead of a single json, and allowing query parameters as
separate arguments (maybe just using variadic functions), but probably
for a first go just something as simple as this would meet the case.

Given the short time span available before patches must be in, I am
prepared to work on this ASAP.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 13:32:06
Message-ID: CAFj8pRAA=J2u2x2aJO-p5a-Yp=8OFf67JGhgqhyCLC70cve4xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/11 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
>
> On 01/11/2012 01:18 AM, Pavel Stehule wrote:
>>
>>
>> I like this patch and this feature.
>
>
> I'm about to read the patch in detail - I certainly like the feature.
>
>
>>
>> I see only one issue - there is not functionality that helps generate
>> JSON in pg.
>>
>> What do you think about functions: array_to_json(anyarray),
>> row_to_json(any) and format_json(text, text, ...)
>>
>
> Actually, more than these, I (and at least one very interested client) want
> query_to_json, which would do something like:
>
>   # select q2json('select $$a$$ || x as b, y as c from generate_series(1,3)
> x, generate_series(4,5) y');
>                                                     q2json
>
> ---------------------------------------------------------------------------------------------------------
>
> [{"b":"a1","c":4},{"b":"a1","c":5},{"b":"a2","c":4},{"b":"a2","c":5},{"b":"a3","c":4},{"b":"a3","c":5}]
>

we have a query_to_xml - so there should similar query_to_json. But
this is not enough for usage from SP. What about two rich functions

* query_to_json - by your proposal
* array_to_json - with possibility to serialize array of records

This can be a basic set

Regards

Pavel

>
> No doubt several variants are possible such as returning a setof json, one
> per row, instead of a single json, and allowing query parameters as separate
> arguments (maybe just using variadic functions), but probably for a first go
> just something as simple as this would meet the case.

>
> Given the short time span available before patches must be in, I am prepared
> to work on this ASAP.
>
> cheers
>
> andrew
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 13:33:53
Message-ID: CA+TgmoYOqU51_REgjq_LsSzCxwfy0uZ7BogX7gxFwm339Pf2Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 11, 2012 at 1:18 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>>> On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
>>>> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?
>>>
>>> Yes.
>>>
>>>> if b) then this might get a lot more interesting
>>>
>>> JSON is the most popular/likely way to represent that, I think.
>>
>> On that note, here's an updated version of the patch I posted
>> upthread, with some regression tests and minimal documentation.
>
> I like this patch and this feature.
>
> I see only one issue - there is not functionality that helps generate
> JSON in pg.
>
> What do you think about functions: array_to_json(anyarray),
> row_to_json(any) and format_json(text, text, ...)

I think we might want all of that stuff, but I doubt there is time to
do it for 9.2.

Actually, I think the next logical step would be to define equality
(is there an official definition of that for JSON?) and build a btree
opclass. I believe the code I've already written could be extended to
construct an abstract syntax tree for those operations that need it.
But we need to make some decisions first. A btree opclass requires a
total ordering, so we have to arbitrarily define whether 1 < true, 1 <
[1], 1 < "1", etc.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 13:38:21
Message-ID: CAFj8pRCgwS8OMyS9CDeDN7SCfi95JCTXzDrK6H9cB9MO6uqGTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Jan 11, 2012 at 1:18 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Tue, Dec 20, 2011 at 9:06 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
>>>> On Dec 20, 2011, at 10:39 AM, Claes Jakobsson wrote:
>>>>> Are people explicitly asking for a) *JSON* datatype or b) a type that lets you store arbitrary complex semi-untyped data structures?
>>>>
>>>> Yes.
>>>>
>>>>> if b) then this might get a lot more interesting
>>>>
>>>> JSON is the most popular/likely way to represent that, I think.
>>>
>>> On that note, here's an updated version of the patch I posted
>>> upthread, with some regression tests and minimal documentation.
>>
>> I like this patch and this feature.
>>
>> I see only one issue - there is not functionality that helps generate
>> JSON in pg.
>>
>> What do you think about functions: array_to_json(anyarray),
>> row_to_json(any) and format_json(text, text, ...)
>
> I think we might want all of that stuff, but I doubt there is time to
> do it for 9.2.
>
> Actually, I think the next logical step would be to define equality
> (is there an official definition of that for JSON?) and build a btree
> opclass.  I believe the code I've already written could be extended to
> construct an abstract syntax tree for those operations that need it.
> But we need to make some decisions first.  A btree opclass requires a
> total ordering, so we have to arbitrarily define whether 1 < true, 1 <
> [1], 1 < "1", etc.
>

I don't understand why we have to do it?

We don't support similar functionality for XML, so why for JSON?

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 14:04:25
Message-ID: CA+TgmoZZY5WicSsDGJwgYJU2wRXwcwWftBdC5DQ1WHHcHWuyqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 11, 2012 at 8:38 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I don't understand why we have to do it?
>
> We don't support similar functionality for XML, so why for JSON?

Hrm. Well, that's an interesting point. Maybe we don't. I assumed
that people would eventually want to optimize queries of the form
SELECT whatever FROM tab WHERE jsoncol = 'constant'. If that's a
sufficiently marginal use case that we don't care, then fine.

One difference between JSON and XML is that XML really has no
well-defined comparison semantics. For example, consider:

<foo><bar>1.0</bar></foo>
<foo><bar>1.0</bar> </foo>

If the XML is being used as a transport mechanism, then the extra
space is semantically insignificant, but if this is markup, then it
might matter a lot. Also, consider:

<foo><bar>1.00</bar></foo>

That one might be equal if we think 1.0 is intended to be a number,
but if it's intended as a string then it's not. We could perhaps do
comparisons in XML relative to some DTD or schema if those provide
details about what the values mean, but in a vacuum it's not
well-defined. On the other hand, in JSON, it's pretty clear that { 1,
2, 3 } is the same value as {1,2,3} but "1,2,3" is different from "1,
2, 3". There are some borderline cases that might need some sweat,
like whether 1 = 1.0 = 1.00 = 1e0, but in general the level of
ambiguity seems to me to be much less, making it more feasible here
than it would be for XML.

That having been said, uncertainties about whether we want this at all
(and if so in what form) are exactly why I didn't include this kind of
stuff in the patch to begin with, and I think that if we get this much
committed for 9.2 we'll be doing pretty well. If we can agree on and
do more, great; if not, we'll at least have this much, which IMHO
would be an improvement over what we have now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 14:41:45
Message-ID: CAFj8pRDVuH7Zq84KbyD6--j3GpRbBv88m8zt1qBmg++uJ3v2ow@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Jan 11, 2012 at 8:38 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I don't understand why we have to do it?
>>
>> We don't support similar functionality for XML, so why for JSON?
>
> Hrm.  Well, that's an interesting point.  Maybe we don't.  I assumed
> that people would eventually want to optimize queries of the form
> SELECT whatever FROM tab WHERE jsoncol = 'constant'.  If that's a
> sufficiently marginal use case that we don't care, then fine.
>
> One difference between JSON and XML is that XML really has no
> well-defined comparison semantics.  For example, consider:
>
> <foo><bar>1.0</bar></foo>
> <foo><bar>1.0</bar> </foo>
>
> If the XML is being used as a transport mechanism, then the extra
> space is semantically insignificant, but if this is markup, then it
> might matter a lot.  Also, consider:
>
> <foo><bar>1.00</bar></foo>
>
> That one might be equal if we think 1.0 is intended to be a number,
> but if it's intended as a string then it's not.  We could perhaps do
> comparisons in XML relative to some DTD or schema if those provide
> details about what the values mean, but in a vacuum it's not
> well-defined.  On the other hand, in JSON, it's pretty clear that { 1,
> 2, 3 } is the same value as {1,2,3} but "1,2,3" is different from "1,
> 2, 3".  There are some borderline cases that might need some sweat,
> like whether 1 = 1.0 = 1.00 = 1e0, but in general the level of
> ambiguity seems to me to be much less, making it more feasible here
> than it would be for XML.
>
> That having been said, uncertainties about whether we want this at all
> (and if so in what form) are exactly why I didn't include this kind of
> stuff in the patch to begin with, and I think that if we get this much
> committed for 9.2 we'll be doing pretty well.  If we can agree on and
> do more, great; if not, we'll at least have this much, which IMHO
> would be an improvement over what we have now.
>

I understand it now. My opinion is so some operators and index search
can be in 9.2 - so use a JSON just as communication format now.

* we need to build JSON
* we need to check if some is valid JSON
* we need to store JSON

other steps should be (9.2)
* basic operators eq, neeq
* some tool like XQuery - simple query on JSON document available from
SQL that can be used for functional indexes.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 15:15:39
Message-ID: CA+TgmoZgqUYyq5mgL6BA_6qfT+paAtv6qY4=s5uEoAy7yXDdeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I understand it now. My opinion is so some operators and index search
> can be in 9.2 - so use a JSON just as communication format now.
>
> * we need to build JSON
> * we need to check if some is valid JSON
> * we need to store JSON
>
> other steps should be (9.2)
> * basic operators eq, neeq
> * some tool like XQuery - simple query on JSON document available from
> SQL that can be used for functional indexes.

That would be nice, but let's not let the perfect be the enemy of the
good. We don't have a lot of time here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-11 15:21:33
Message-ID: CAFj8pRC1+KUfK52ntgBnW6qLT6txXhhggFrrZy3-dfMK7dfVag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/11 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I understand it now. My opinion is so some operators and index search
>> can be in 9.2 - so use a JSON just as communication format now.
>>
>> * we need to build JSON
>> * we need to check if some is valid JSON
>> * we need to store JSON
>>
>> other steps should be (9.2)
>> * basic operators eq, neeq
>> * some tool like XQuery - simple query on JSON document available from
>> SQL that can be used for functional indexes.
>
> That would be nice, but let's not let the perfect be the enemy of the
> good.  We don't have a lot of time here.
>

sorry - replace 9.2 by 9.3 - I am sorry

I am able to write array_to_json fce and Andrew can write query_to_json

Pavel

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 00:53:19
Message-ID: 4F0E2EFF.4050501@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/11/2012 10:21 AM, Pavel Stehule wrote:
> 2012/1/11 Robert Haas<robertmhaas(at)gmail(dot)com>:
>> On Wed, Jan 11, 2012 at 9:41 AM, Pavel Stehule<pavel(dot)stehule(at)gmail(dot)com> wrote:
>>> I understand it now. My opinion is so some operators and index search
>>> can be in 9.2 - so use a JSON just as communication format now.
>>>
>>> * we need to build JSON
>>> * we need to check if some is valid JSON
>>> * we need to store JSON
>>>
>>> other steps should be (9.2)
>>> * basic operators eq, neeq
>>> * some tool like XQuery - simple query on JSON document available from
>>> SQL that can be used for functional indexes.
>> That would be nice, but let's not let the perfect be the enemy of the
>> good. We don't have a lot of time here.
>>
> sorry - replace 9.2 by 9.3 - I am sorry
>
> I am able to write array_to_json fce and Andrew can write query_to_json

For those who want to play along, see
<https://bitbucket.org/adunstan/pgdevel> which has Robert's patch and my
additions to it.

I'm actually half way through writing an array_to_json function, since
it it necessary anyway for query_to_json. I hope to have a fairly
complete working function in about 24 hours.

cheers

andrew


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 14:00:51
Message-ID: CAARyMpAzDLEAQT31QYwTxPJBfKkmXiAZ3+WR6iOFDgQhOj96Jg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote an array_to_json function during GSoC 2010:

http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289

It's not exposed as a procedure called array_to_json: it's part of the
to_json function, which decides what to do based on the argument type.

- Joey


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 15:23:23
Message-ID: 4F0EFAEB.4000909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/12/2012 09:00 AM, Joey Adams wrote:
> I wrote an array_to_json function during GSoC 2010:
>
> http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289
>
> It's not exposed as a procedure called array_to_json: it's part of the
> to_json function, which decides what to do based on the argument type.
>

Excellent, this is just the point at which I stopped work last night, so
with your permission I'll steal this and it will save me a good chunk of
time.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 15:44:25
Message-ID: CAFj8pRCtqO6e5cUqK2e249GO=pBp8WcdX9x6rvohBM7Bd0E9yA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/12 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
>
> On 01/12/2012 09:00 AM, Joey Adams wrote:
>>
>> I wrote an array_to_json function during GSoC 2010:
>>
>>
>> http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289
>>
>> It's not exposed as a procedure called array_to_json: it's part of the
>> to_json function, which decides what to do based on the argument type.
>>
>
>
> Excellent, this is just the point at which I stopped work last night, so
> with your permission I'll steal this and it will save me a good chunk of
> time.
>

this should be little bit more enhanced to support a row arrays - it
can be merged with some routines from pst tool
http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html

Regards

Pavel

> cheers
>
> andrew
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 15:51:13
Message-ID: 4F0F0171.6070909@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/12/2012 10:44 AM, Pavel Stehule wrote:
> 2012/1/12 Andrew Dunstan<andrew(at)dunslane(dot)net>:
>>
>> On 01/12/2012 09:00 AM, Joey Adams wrote:
>>> I wrote an array_to_json function during GSoC 2010:
>>>
>>>
>>> http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289
>>>
>>> It's not exposed as a procedure called array_to_json: it's part of the
>>> to_json function, which decides what to do based on the argument type.
>>>
>>
>> Excellent, this is just the point at which I stopped work last night, so
>> with your permission I'll steal this and it will save me a good chunk of
>> time.
>>
> this should be little bit more enhanced to support a row arrays - it
> can be merged with some routines from pst tool
> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>
>

I will be covering composites.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 16:21:16
Message-ID: CAHyXU0y4SM8gzGYbNHM4shU6Qmi_QM8A6EjQMR1j8B6jmD1szw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 01/12/2012 10:44 AM, Pavel Stehule wrote:
>> this should be little bit more enhanced to support a row arrays - it
>> can be merged with some routines from pst tool
>> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>
> I will be covering composites.

curious, will your function work on unregistered composites? What
would this do?

select array_to_json(array[row('a', 1), row('b', 2)]);

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 16:25:39
Message-ID: CAFj8pRC23e-zaj=0Z2_kTTWrxzSTZtU07UgS7de8MMheeyTuAw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/12 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 01/12/2012 10:44 AM, Pavel Stehule wrote:
>>> this should be little bit more enhanced to support a row arrays - it
>>> can be merged with some routines from pst tool
>>> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>>
>> I will be covering composites.
>
> curious, will your function work on unregistered composites?  What
> would this do?
>
> select array_to_json(array[row('a', 1), row('b', 2)]);

it can do it - but it has to use some defaults for names.

Pavel

>
> merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-12 16:28:10
Message-ID: 4F0F0A1A.5090509@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/12/2012 11:21 AM, Merlin Moncure wrote:
> On Thu, Jan 12, 2012 at 9:51 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> On 01/12/2012 10:44 AM, Pavel Stehule wrote:
>>> this should be little bit more enhanced to support a row arrays - it
>>> can be merged with some routines from pst tool
>>> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>> I will be covering composites.
> curious, will your function work on unregistered composites? What
> would this do?
>
> select array_to_json(array[row('a', 1), row('b', 2)]);
>

Expected behaviour is something like this:

andrew=# select q2json('
select $$a$$ || x as b,
y as c,
array[row(x.*,array[1,2,3]),
row(y.*,array[4,5,6])] as z
from generate_series(1,2) x,
generate_series(4,5) y');

[{"b":"a1","c":4,"z":[{"f1":1,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]},
{"b":"a1","c":5,"z":[{"f1":1,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]},
{"b":"a2","c":4,"z":[{"f1":2,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]},
{"b":"a2","c":5,"z":[{"f1":2,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]}]

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-14 20:06:27
Message-ID: 4F11E043.1040107@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/12/2012 10:51 AM, Andrew Dunstan wrote:
>
>
> On 01/12/2012 10:44 AM, Pavel Stehule wrote:
>> 2012/1/12 Andrew Dunstan<andrew(at)dunslane(dot)net>:
>>>
>>> On 01/12/2012 09:00 AM, Joey Adams wrote:
>>>> I wrote an array_to_json function during GSoC 2010:
>>>>
>>>>
>>>> http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289
>>>>
>>>>
>>>> It's not exposed as a procedure called array_to_json: it's part of the
>>>> to_json function, which decides what to do based on the argument type.
>>>>
>>>
>>> Excellent, this is just the point at which I stopped work last
>>> night, so
>>> with your permission I'll steal this and it will save me a good
>>> chunk of
>>> time.
>>>
>> this should be little bit more enhanced to support a row arrays - it
>> can be merged with some routines from pst tool
>> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>>
>>
>
> I will be covering composites.
>
>

OK, here's a patch that does both query_to_json and array_to_json, along
with docs and regression tests. It include Robert's original patch,
although I can produce a differential patch if required. It can also be
pulled from <https://bitbucket.org/adunstan/pgdevel>

A couple of things to note. First, the problem about us losing column
names that I noted a couple of months ago and Tom did a bit of work on
is exercised by this. We really need to fix it. Example:

andrew=# select array_to_json(array_agg(row(z.*)))
from (select $$a$$ || x as b,
y as c,
array[row(x.*,array[1,2,3]),
row(y.*,array[4,5,6])] as z
from generate_series(1,1) x,
generate_series(4,4) y) z;
array_to_json
-------------------------------------------------------------------------
[{"f1":"a1","f2":4,"f3":[{"f1":1,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}]
(1 row)

Here we've lost b, c and z as column names.

Second, what should be do when the database encoding isn't UTF8? I'm
inclined to emit a \unnnn escape for any non-ASCII character (assuming
it has a unicode code point - are there any code points in the
non-unicode encodings that don't have unicode equivalents?). The
alternative would be to fail on non-ASCII characters, which might be
ugly. Of course, anyone wanting to deal with JSON should be using UTF8
anyway, but we still have to deal with these things. What about
SQL_ASCII? If there's a non-ASCII sequence there we really have no way
of telling what it should be. There at least I think we should probably
error out.

cheers

andrew

Attachment Content-Type Size
adjson.patch text/x-patch 53.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-14 20:31:54
Message-ID: CAFj8pRAEebkmg9t2n9Cze+-FKXJwuy9fhhZtQLq8izsuBV8qqQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/14 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
>
> On 01/12/2012 10:51 AM, Andrew Dunstan wrote:
>>
>>
>>
>> On 01/12/2012 10:44 AM, Pavel Stehule wrote:
>>>
>>> 2012/1/12 Andrew Dunstan<andrew(at)dunslane(dot)net>:
>>>>
>>>>
>>>> On 01/12/2012 09:00 AM, Joey Adams wrote:
>>>>>
>>>>> I wrote an array_to_json function during GSoC 2010:
>>>>>
>>>>>
>>>>>
>>>>> http://git.postgresql.org/gitweb/?p=json-datatype.git;a=blob;f=json_io.c#l289
>>>>>
>>>>> It's not exposed as a procedure called array_to_json: it's part of the
>>>>> to_json function, which decides what to do based on the argument type.
>>>>>
>>>>
>>>> Excellent, this is just the point at which I stopped work last night, so
>>>> with your permission I'll steal this and it will save me a good chunk of
>>>> time.
>>>>
>>> this should be little bit more enhanced to support a row arrays - it
>>> can be merged with some routines from pst tool
>>> http://okbob.blogspot.com/2010/11/new-version-of-pst-collection-is.html
>>>
>>>
>>
>> I will be covering composites.
>>
>>
>
> OK, here's a patch that does both query_to_json and array_to_json, along
> with docs and regression tests. It include Robert's original patch, although
> I can produce a differential patch if required. It can also be pulled from
> <https://bitbucket.org/adunstan/pgdevel>
>
> A couple of things to note. First, the problem about us losing column names
> that I noted a couple of months ago and Tom did a bit of work on is
> exercised by this. We really need to fix it. Example:
>

support SELECT ROW (x AS "real name", y AS "real name") is good idea
and should be used more time than only here.

Regards

Pavel

>   andrew=#  select array_to_json(array_agg(row(z.*)))
>       from (select $$a$$ || x as b,
>
>                y as c,
>                array[row(x.*,array[1,2,3]),
>                      row(y.*,array[4,5,6])] as z
>             from generate_series(1,1) x,
>                  generate_series(4,4) y) z;
>                                  array_to_json
>   -------------------------------------------------------------------------
>     [{"f1":"a1","f2":4,"f3":[{"f1":1,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}]
>   (1 row)
>
>
> Here we've lost b, c and z as column names.
>
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.
>
> cheers
>
> andrew
>
>
>


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-14 23:11:57
Message-ID: CAARyMpDS_4xcwWPH3XXcxBbOqEmGyc9YCkCXcH9q=pka1PQZYg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.

I don't think there is a satisfying solution to this problem. Things
working against us:

* Some server encodings support characters that don't map to Unicode
characters (e.g. unused slots in Windows-1252). Thus, converting to
UTF-8 and back is lossy in general.

* We want a normalized representation for comparison. This will
involve a mixture of server and Unicode characters, unless the
encoding is UTF-8.

* We can't efficiently convert individual characters to and from
Unicode with the current API.

* What do we do about \u0000 ? TEXT datums cannot contain NUL characters.

I'd say just ban Unicode escapes and non-ASCII characters unless the
server encoding is UTF-8, and ban all \u0000 escapes. It's easy, and
whatever we support later will be a superset of this.

Strategies for handling this situation have been discussed in prior
emails. This is where things got stuck last time.

- Joey


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-15 00:13:52
Message-ID: 4F121A40.6070508@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/14/2012 06:11 PM, Joey Adams wrote:
> On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> Second, what should be do when the database encoding isn't UTF8? I'm
>> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
>> has a unicode code point - are there any code points in the non-unicode
>> encodings that don't have unicode equivalents?). The alternative would be to
>> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
>> to deal with JSON should be using UTF8 anyway, but we still have to deal
>> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
>> there we really have no way of telling what it should be. There at least I
>> think we should probably error out.
> I don't think there is a satisfying solution to this problem. Things
> working against us:
>
> * Some server encodings support characters that don't map to Unicode
> characters (e.g. unused slots in Windows-1252). Thus, converting to
> UTF-8 and back is lossy in general.
>
> * We want a normalized representation for comparison. This will
> involve a mixture of server and Unicode characters, unless the
> encoding is UTF-8.
>
> * We can't efficiently convert individual characters to and from
> Unicode with the current API.
>
> * What do we do about \u0000 ? TEXT datums cannot contain NUL characters.
>
> I'd say just ban Unicode escapes and non-ASCII characters unless the
> server encoding is UTF-8, and ban all \u0000 escapes. It's easy, and
> whatever we support later will be a superset of this.
>
> Strategies for handling this situation have been discussed in prior
> emails. This is where things got stuck last time.
>

Well, from where I'm coming from, nuls are not a problem. But
escape_json() is currently totally encoding-unaware. It produces \unnnn
escapes for low ascii characters, and just passes through characters
with the high bit set. That's possibly OK for EXPLAIN output - we really
don't want don't want EXPLAIN failing. But maybe we should ban JSON
output for EXPLAIN if the encoding isn't UTF8.

Another question in my mind is what to do when the client encoding isn't
UTF8.

None of these is an insurmountable problem, ISTM - we just need to make
some decisions.

cheers

andrew


From: Mike Lewis <mikelikespie(at)gmail(dot)com>
To: PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: JSON for PG 9.2
Date: 2012-01-15 00:49:31
Message-ID: CA+foqsR7mTDEBcvtXpgx49t_JEJGfXU6tBYmSDsJSQRAYSNzJg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I am very interested in experimenting with functional indexes into JSON
structures. I think this could be very powerful combined with full text
search as well as constraints. It would allow for using postgres as an
unstructured data store without sacrificing the powerful indexing features,
durability, and transactional semantics that comes with using postgres or
RDBMSes in general.

One use case in particular I have been trying to solve for lately is
persisting and synchronizing client-side state (in a mobile application)
with a server. It would be nice to have a flat, schemaless table (maybe a
table that's like (id, type, owner, data) where data would be a JSON blob).
I could do this now without JSON support, but I think indexing inside that
JSON blob and having validation database side is valuable as well. And as
I mentioned before, i'd rather not throw out the baby with the bathwater by
using a different type of database because ACID, replication, and
constraints are also very important to this. As is being consistent with
the rest of our technology stack. (I'd essentially be using a relational
database to persist an object database)

I'm also not too concerned about storage consumption with this (even though
columnar compression would help a lot in the future) since it's easily
partitionable by user ID.

For my case the equivalent of postgres's XPath would work. Also having it
as a maintained contrib module would be sufficient, although it being part
of core as XPath is would be even better.

Just my $0.02... even if I'm a bit late to the conversation.

Thanks!
Mike


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-15 16:08:05
Message-ID: 4F12F9E5.3090801@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/14/2012 03:06 PM, Andrew Dunstan wrote:
>
>
>
> OK, here's a patch that does both query_to_json and array_to_json,
> along with docs and regression tests. It include Robert's original
> patch, although I can produce a differential patch if required. It can
> also be pulled from <https://bitbucket.org/adunstan/pgdevel>
>
>

Here's an update that adds row_to_json, plus a bit more cleanup. Example:

andrew=# SELECT row_to_json(q)
FROM (SELECT $$a$$ || x AS b,
y AS c,
ARRAY[ROW(x.*,ARRAY[1,2,3]),
ROW(y.*,ARRAY[4,5,6])] AS z
FROM generate_series(1,2) x,
generate_series(4,5) y) q;
row_to_json
--------------------------------------------------------------------
{"b":"a1","c":4,"z":[{"f1":1,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}
{"b":"a1","c":5,"z":[{"f1":1,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]}
{"b":"a2","c":4,"z":[{"f1":2,"f2":[1,2,3]},{"f1":4,"f2":[4,5,6]}]}
{"b":"a2","c":5,"z":[{"f1":2,"f2":[1,2,3]},{"f1":5,"f2":[4,5,6]}]}
(4 rows)

(This might be more to Robert's taste than query_to_json() :-) )

cheers

andrew

Attachment Content-Type Size
adjson3.patch text/x-patch 53.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 20:49:48
Message-ID: CA+TgmoYqhmMJtNp_w0MuFA7kL1J1S9EQDx7bZex8ZX+=SJXEQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 14, 2012 at 3:06 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Second, what should be do when the database encoding isn't UTF8? I'm
> inclined to emit a \unnnn escape for any non-ASCII character (assuming it
> has a unicode code point - are there any code points in the non-unicode
> encodings that don't have unicode equivalents?). The alternative would be to
> fail on non-ASCII characters, which might be ugly. Of course, anyone wanting
> to deal with JSON should be using UTF8 anyway, but we still have to deal
> with these things. What about SQL_ASCII? If there's a non-ASCII sequence
> there we really have no way of telling what it should be. There at least I
> think we should probably error out.

I don't see any reason to escape anything more than the minimum
required by the spec, which only requires it for control characters.
If somebody's got a non-ASCII character in there, we can simply allow
it to be represented by itself. That's almost certainly more compact
(and very possibly more readable) than emitting \uXXXX for each such
instance, and it also matches what the current EXPLAIN (FORMAT JSON)
output does.

In other words, let's decree that when the database encoding isn't
UTF-8, *escaping* of non-ASCII characters doesn't work. But
*unescaped* non-ASCII characters should still work just fine.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 21:07:30
Message-ID: 4F188612.4010602@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/19/2012 03:49 PM, Robert Haas wrote:
>
> In other words, let's decree that when the database encoding isn't
> UTF-8, *escaping* of non-ASCII characters doesn't work. But
> *unescaped* non-ASCII characters should still work just fine.

The spec only allows unescaped Unicode chars (and for our purposes that
means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
result in something that's not legal JSON. See
<http://www.ietf.org/rfc/rfc4627.txt?number=4627> section 3.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 21:12:13
Message-ID: CA+Tgmoa8=cDerpukdhEop-L7ps=AMNc7xA8m5esWxxq=EA6Lfw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 01/19/2012 03:49 PM, Robert Haas wrote:
>> In other words, let's decree that when the database encoding isn't
>> UTF-8, *escaping* of non-ASCII characters doesn't work.  But
>> *unescaped* non-ASCII characters should still work just fine.
>
> The spec only allows unescaped Unicode chars (and for our purposes that
> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
> result in something that's not legal JSON. See
> <http://www.ietf.org/rfc/rfc4627.txt?number=4627> section 3.

I understand. I'm proposing that we not care. In other words, if the
server encoding is UTF-8, it'll really be JSON. But if the server
encoding is something else, it'll be almost-JSON. And specifically,
the \uXXXX syntax won't work, and there might be some non-Unicode
characters in there. If that's not the behavior you want, then use
UTF-8.

It seems pretty clear that we're going to have to make some trade-off
to handle non-UTF8 encodings, and I think what I'm suggesting is a lot
less painful than disabling high-bit characters altogether. If we do
that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his
column label has a non-Unicode character in there? Should we say, oh,
sorry, you can't explain that in JSON format? That is mighty
unfriendly, and probably mighty complicated and expensive to figure
out, too. We *do not support* mixing encodings in the same database,
and if we make it the job of this patch to fix that problem, we're
going to be in the same place for 9.2 that we have been for the last
several releases: nowhere.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-19 22:59:03
Message-ID: 4F18A037.1070305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/19/2012 04:12 PM, Robert Haas wrote:
> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> On 01/19/2012 03:49 PM, Robert Haas wrote:
>>> In other words, let's decree that when the database encoding isn't
>>> UTF-8, *escaping* of non-ASCII characters doesn't work. But
>>> *unescaped* non-ASCII characters should still work just fine.
>> The spec only allows unescaped Unicode chars (and for our purposes that
>> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
>> result in something that's not legal JSON. See
>> <http://www.ietf.org/rfc/rfc4627.txt?number=4627> section 3.
> I understand. I'm proposing that we not care. In other words, if the
> server encoding is UTF-8, it'll really be JSON. But if the server
> encoding is something else, it'll be almost-JSON. And specifically,
> the \uXXXX syntax won't work, and there might be some non-Unicode
> characters in there. If that's not the behavior you want, then use
> UTF-8.
>
> It seems pretty clear that we're going to have to make some trade-off
> to handle non-UTF8 encodings, and I think what I'm suggesting is a lot
> less painful than disabling high-bit characters altogether. If we do
> that, then what happens if a user runs EXPLAIN (FORMAT JSON) and his
> column label has a non-Unicode character in there? Should we say, oh,
> sorry, you can't explain that in JSON format? That is mighty
> unfriendly, and probably mighty complicated and expensive to figure
> out, too. We *do not support* mixing encodings in the same database,
> and if we make it the job of this patch to fix that problem, we're
> going to be in the same place for 9.2 that we have been for the last
> several releases: nowhere.

OK, then we need to say that very clearly and up front (including in the
EXPLAIN docs.)

Of course, for data going to the client, if the client encoding is UTF8,
they should get legal JSON, regardless of what the database encoding is,
and conversely too, no?

cheers

andrew
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 00:27:01
Message-ID: CA+TgmobEjwpwWE35MXddntgUYves047AC=zuN7gSVhmc8X4ZFw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 19, 2012 at 5:59 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> OK, then we need to say that very clearly and up front (including in the
> EXPLAIN docs.)

Can do.

> Of course, for data going to the client, if the client encoding is UTF8,
> they should get legal JSON, regardless of what the database encoding is, and
> conversely too, no?

Well, that would be nice, but I don't think it's practical. It will
certainly be the case, under the scheme I'm proposing, or probably any
other sensible scheme also, that if a client whose encoding is UTF-8
gets a value of type json back fro the database, it's strictly valid
JSON. But it won't be possible to store every legal JSON value in the
database if the database encoding is anything other than UTF-8, even
if the client encoding is UTF-8. The backend will get the client's
UTF-8 bytes and transcode them to the server encoding before calling
the type-input function, so if there are characters in there that
can't be represented in UTF-8, then we'll error out before the JSON
data type ever gets control. In theory, it would be possible to
accept such strings if the client chooses to represent them using a
\uXXXX sequence, but I'm unexcited about doing the work required to
make that happen, because it will still be a pretty half-baked: we'll
be able to accept some representations of the same JSON constant but
not others.

I think the real fix for this problem is to introduce an
infrastructure inside the database that allows us to have different
columns stored in different encodings. People use bytea for that
right now, but that's pretty unfriendly: it would be nice to have a
better system. However, I expect that to take a lot of work and break
a lot of things, and until we do it I don't feel that compelled to
provide buggy and incomplete support for it under the guise of
implementing a JSON datatype.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 00:29:41
Message-ID: AB5CF1EC-B8FD-44FB-B67A-3188E70343C0@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 19, 2012, at 4:27 PM, Robert Haas wrote:

> I think the real fix for this problem is to introduce an
> infrastructure inside the database that allows us to have different
> columns stored in different encodings. People use bytea for that
> right now, but that's pretty unfriendly: it would be nice to have a
> better system. However, I expect that to take a lot of work and break
> a lot of things, and until we do it I don't feel that compelled to
> provide buggy and incomplete support for it under the guise of
> implementing a JSON datatype.

+1 This seems like a reasonable compromise and course of action, especially if someone is interested in taking on column-level encodings at some point in the next year or two.

Best,

David


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 05:07:06
Message-ID: 21994.1327036026@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 01/19/2012 04:12 PM, Robert Haas wrote:
>> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>> The spec only allows unescaped Unicode chars (and for our purposes that
>>> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
>>> result in something that's not legal JSON.

>> I understand. I'm proposing that we not care. In other words, if the
>> server encoding is UTF-8, it'll really be JSON. But if the server
>> encoding is something else, it'll be almost-JSON.

> Of course, for data going to the client, if the client encoding is UTF8,
> they should get legal JSON, regardless of what the database encoding is,
> and conversely too, no?

Yes. I think this argument has been mostly theologizing, along the
lines of how many JSON characters can dance on the head of a pin.
>From a user's perspective, the database encoding is only a constraint on
which characters he can store. He does not know or care what the bit
representation is inside the server. As such, if we store a non-ASCII
character in a JSON string, it's valid JSON as far as the user is
concerned, so long as that character exists in the Unicode standard.
If his client encoding is UTF8, the value will be letter-perfect JSON
when it gets to him; and if his client encoding is not UTF8, then he's
already pretty much decided that he doesn't give a fig about the
Unicode-centricity of the JSON spec, no?

So I'm with Robert: we should just plain not care. I would further
suggest that maybe what we should do with incoming JSON escape sequences
is convert them to Unicode code points and then to the equivalent
character in the database encoding (or throw error if there is none).

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 14:19:51
Message-ID: CA+TgmoaL9ftBac4xY_TfhU1WCQx59NP5YxiDRdR2U0Yt3zSzQw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 01/19/2012 04:12 PM, Robert Haas wrote:
>>> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan<andrew(at)dunslane(dot)net>  wrote:
>>>> The spec only allows unescaped Unicode chars (and for our purposes that
>>>> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
>>>> result in something that's not legal JSON.
>
>>> I understand.  I'm proposing that we not care.  In other words, if the
>>> server encoding is UTF-8, it'll really be JSON.  But if the server
>>> encoding is something else, it'll be almost-JSON.
>
>> Of course, for data going to the client, if the client encoding is UTF8,
>> they should get legal JSON, regardless of what the database encoding is,
>> and conversely too, no?
>
> Yes.  I think this argument has been mostly theologizing, along the
> lines of how many JSON characters can dance on the head of a pin.
> From a user's perspective, the database encoding is only a constraint on
> which characters he can store.

Bingo.

> He does not know or care what the bit
> representation is inside the server.  As such, if we store a non-ASCII
> character in a JSON string, it's valid JSON as far as the user is
> concerned, so long as that character exists in the Unicode standard.
> If his client encoding is UTF8, the value will be letter-perfect JSON
> when it gets to him; and if his client encoding is not UTF8, then he's
> already pretty much decided that he doesn't give a fig about the
> Unicode-centricity of the JSON spec, no?

Also agreed. Personally, I think it may not have been a great idea to
tie the JSON spec so closely to Unicode, but I understand that it
would have been difficult to define an encoding-agnostic equivalent of
\uXXXX, since it's hard to know for sure whether an arbitrary encoding
even has a (sensible?) definition of code points, and they probably
wanted to avoid ambiguity. But, it's bound to cause problems for any
system that runs in some other encoding, which, when so requested, we
do. Even if we had the ability to support multiple encodings in the
same database, I'm not sure I'd be very excited about insisting that
JSON data always be stored in UTF-8, because that would introduce a
lot of unnecessary transcoding for people using other encodings and
basically unnecessarily handicap the functionality provided by the
datatype. But at least if we had that, people would have the *option*
to use JSON with UTF-8 and get the fully spec-compliant behavior. As
it is, they don't; the system we have forces the database encoding on
all datatypes whether they like it or not, and that ain't changing for
9.2.

> So I'm with Robert: we should just plain not care.  I would further
> suggest that maybe what we should do with incoming JSON escape sequences
> is convert them to Unicode code points and then to the equivalent
> character in the database encoding (or throw error if there is none).

The code I've written so far does no canonicalization of the input
value of any kind, just as we do for XML. I'm inclined to leave it
that way. Eventually, we might want to make the JSON datatype support
equality comparisons and so on, and that will require the system to
knowing that the letter r can be encoded as some \uXXXX sequence and
that the escape \r is equivalent to some other escape \uXXXX, but
right now all the code does is try to validate that the JSON is legal,
NOT second-guess the user's choice about how to spell things or where
to insert whitespace. I think that's a good choice because (1) AFAIK,
there's no official canonicalization method for JSON, so whatever we
pick will be something we think is best, not an official method
sanction by the spec, (2) users might prefer the way they chose to
represent a given value over the way we choose to represent it, and
(3) by simply validating and storing the JSON object, rather than
doing any canonicalization, the input function avoids the need to do
any data copying, hopefully maximizing speed. Canonicalization can be
added on top of what I've done here and people who want or need it can
use it; I have some ideas around how to make that leverage the
existing code that I intend to pursue for 9.3, but right now I'd
rather not go there.

So, given that framework, what the patch does is this: if you're using
UTF-8, then \uXXXX is accepted, provided that XXXX is something that
equates to a legal Unicode code point. It isn't converted to the
corresponding character: it's just validated. If you're NOT using
UTF-8, then it allows \uXXXX for code points up through 127 (which we
assume are the same in all encodings) and anything higher than that is
rejected. If someone knows an easy way to check whether a \uXXXX
sequence for XXXX > 007F is a legal Unicode code point that has an
equivalent in the current server encoding, then we can add logic to
allow that case also, but personally I'm not that excited about it.
Anyone who is using \uXXXX escapes with a non-Unicode coding is
probably hoping that we can store arbitrary code points, not just the
ones that happen to exist in the server encoding, so they're probably
going to be disappointed whether or not we spend time eating away at
the edges of the problem.

It's awfully interesting to think about how we could actually make
cross-encoding stuff work for real. Would we make it a separate
property that can be associated with each column, like we did for
collations? Or would we handle it more like range types - give me an
encoding, and I'll give you a new type OID that represents text stored
in that encoding? But I guess that's a question for another day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 15:27:24
Message-ID: 9459.1327073244@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> The code I've written so far does no canonicalization of the input
> value of any kind, just as we do for XML.

Fair enough.

> So, given that framework, what the patch does is this: if you're using
> UTF-8, then \uXXXX is accepted, provided that XXXX is something that
> equates to a legal Unicode code point. It isn't converted to the
> corresponding character: it's just validated. If you're NOT using
> UTF-8, then it allows \uXXXX for code points up through 127 (which we
> assume are the same in all encodings) and anything higher than that is
> rejected.

This seems a bit silly. If you're going to leave the escape sequence as
ASCII, then why not just validate that it names a legal Unicode code
point and be done? There is no reason whatever that that behavior needs
to depend on the database encoding.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 15:31:26
Message-ID: CA+TgmoadO1egz3sq3YPqRmoHYbzhxwmyDif4f8MW5EZ6pttCqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 10:27 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> The code I've written so far does no canonicalization of the input
>> value of any kind, just as we do for XML.
>
> Fair enough.
>
>> So, given that framework, what the patch does is this: if you're using
>> UTF-8, then \uXXXX is accepted, provided that XXXX is something that
>> equates to a legal Unicode code point.  It isn't converted to the
>> corresponding character: it's just validated.  If you're NOT using
>> UTF-8, then it allows \uXXXX for code points up through 127 (which we
>> assume are the same in all encodings) and anything higher than that is
>> rejected.
>
> This seems a bit silly.  If you're going to leave the escape sequence as
> ASCII, then why not just validate that it names a legal Unicode code
> point and be done?  There is no reason whatever that that behavior needs
> to depend on the database encoding.

Mostly because that would prevent us from adding canonicalization in
the future, AFAICS, and I don't want to back myself into a corner.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 15:45:11
Message-ID: 4F198C07.5060605@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/20/2012 09:19 AM, Robert Haas wrote:
> On Fri, Jan 20, 2012 at 12:07 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Andrew Dunstan<andrew(at)dunslane(dot)net> writes:
>>> On 01/19/2012 04:12 PM, Robert Haas wrote:
>>>> On Thu, Jan 19, 2012 at 4:07 PM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>>>> The spec only allows unescaped Unicode chars (and for our purposes that
>>>>> means UTF8). An unescaped non-ASCII character in, say, ISO-8859-1 will
>>>>> result in something that's not legal JSON.
>>>> I understand. I'm proposing that we not care. In other words, if the
>>>> server encoding is UTF-8, it'll really be JSON. But if the server
>>>> encoding is something else, it'll be almost-JSON.
>>> Of course, for data going to the client, if the client encoding is UTF8,
>>> they should get legal JSON, regardless of what the database encoding is,
>>> and conversely too, no?
>> Yes. I think this argument has been mostly theologizing, along the
>> lines of how many JSON characters can dance on the head of a pin.
>> From a user's perspective, the database encoding is only a constraint on
>> which characters he can store.
> Bingo.
>
>> He does not know or care what the bit
>> representation is inside the server. As such, if we store a non-ASCII
>> character in a JSON string, it's valid JSON as far as the user is
>> concerned, so long as that character exists in the Unicode standard.
>> If his client encoding is UTF8, the value will be letter-perfect JSON
>> when it gets to him; and if his client encoding is not UTF8, then he's
>> already pretty much decided that he doesn't give a fig about the
>> Unicode-centricity of the JSON spec, no?
> Also agreed. Personally, I think it may not have been a great idea to
> tie the JSON spec so closely to Unicode, but I understand that it
> would have been difficult to define an encoding-agnostic equivalent of
> \uXXXX, since it's hard to know for sure whether an arbitrary encoding
> even has a (sensible?) definition of code points, and they probably
> wanted to avoid ambiguity. But, it's bound to cause problems for any
> system that runs in some other encoding, which, when so requested, we
> do. Even if we had the ability to support multiple encodings in the
> same database, I'm not sure I'd be very excited about insisting that
> JSON data always be stored in UTF-8, because that would introduce a
> lot of unnecessary transcoding for people using other encodings and
> basically unnecessarily handicap the functionality provided by the
> datatype. But at least if we had that, people would have the *option*
> to use JSON with UTF-8 and get the fully spec-compliant behavior. As
> it is, they don't; the system we have forces the database encoding on
> all datatypes whether they like it or not, and that ain't changing for
> 9.2.
>
>> So I'm with Robert: we should just plain not care. I would further
>> suggest that maybe what we should do with incoming JSON escape sequences
>> is convert them to Unicode code points and then to the equivalent
>> character in the database encoding (or throw error if there is none).
> The code I've written so far does no canonicalization of the input
> value of any kind, just as we do for XML. I'm inclined to leave it
> that way. Eventually, we might want to make the JSON datatype support
> equality comparisons and so on, and that will require the system to
> knowing that the letter r can be encoded as some \uXXXX sequence and
> that the escape \r is equivalent to some other escape \uXXXX, but
> right now all the code does is try to validate that the JSON is legal,
> NOT second-guess the user's choice about how to spell things or where
> to insert whitespace. I think that's a good choice because (1) AFAIK,
> there's no official canonicalization method for JSON, so whatever we
> pick will be something we think is best, not an official method
> sanction by the spec, (2) users might prefer the way they chose to
> represent a given value over the way we choose to represent it, and
> (3) by simply validating and storing the JSON object, rather than
> doing any canonicalization, the input function avoids the need to do
> any data copying, hopefully maximizing speed. Canonicalization can be
> added on top of what I've done here and people who want or need it can
> use it; I have some ideas around how to make that leverage the
> existing code that I intend to pursue for 9.3, but right now I'd
> rather not go there.
>
> So, given that framework, what the patch does is this: if you're using
> UTF-8, then \uXXXX is accepted, provided that XXXX is something that
> equates to a legal Unicode code point. It isn't converted to the
> corresponding character: it's just validated. If you're NOT using
> UTF-8, then it allows \uXXXX for code points up through 127 (which we
> assume are the same in all encodings) and anything higher than that is
> rejected. If someone knows an easy way to check whether a \uXXXX
> sequence for XXXX> 007F is a legal Unicode code point that has an
> equivalent in the current server encoding, then we can add logic to
> allow that case also, but personally I'm not that excited about it.
> Anyone who is using \uXXXX escapes with a non-Unicode coding is
> probably hoping that we can store arbitrary code points, not just the
> ones that happen to exist in the server encoding, so they're probably
> going to be disappointed whether or not we spend time eating away at
> the edges of the problem.
>
> It's awfully interesting to think about how we could actually make
> cross-encoding stuff work for real. Would we make it a separate
> property that can be associated with each column, like we did for
> collations? Or would we handle it more like range types - give me an
> encoding, and I'll give you a new type OID that represents text stored
> in that encoding? But I guess that's a question for another day.

(thinks out loud)

XML's &#nnnn; escape mechanism is more or less the equivalent of JSON's
\unnnn. But XML documents can be encoded in a variety of encodings,
including non-unicode encodings such as Latin-1. However, no matter what
the document encoding, &#nnnn; designates the character with Unicode
code point nnnn, whether or not that is part of the document encoding's
charset.

Given that precedent, I'm wondering if we do need to enforce anything
other than that it is a valid unicode code point.

Equivalence comparison is going to be difficult anyway if you're not
resolving all \unnnn escapes. Possibly we need some sort of
canonicalization function to apply for comparison purposes. But we're
not providing any comparison ops today anyway, so I don't think we need
to make that decision now. As you say, there doesn't seem to be any
defined canonical form - the spec is a bit light on in this respect.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 16:58:07
Message-ID: CA+TgmoZksnjJTN4ejqPXOvZE5hWDEfj5AqTH=yzZYz4PhczL9Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> XML's &#nnnn; escape mechanism is more or less the equivalent of JSON's
> \unnnn. But XML documents can be encoded in a variety of encodings,
> including non-unicode encodings such as Latin-1. However, no matter what the
> document encoding, &#nnnn; designates the character with Unicode code point
> nnnn, whether or not that is part of the document encoding's charset.

OK.

> Given that precedent, I'm wondering if we do need to enforce anything other
> than that it is a valid unicode code point.
>
> Equivalence comparison is going to be difficult anyway if you're not
> resolving all \unnnn escapes. Possibly we need some sort of canonicalization
> function to apply for comparison purposes. But we're not providing any
> comparison ops today anyway, so I don't think we need to make that decision
> now. As you say, there doesn't seem to be any defined canonical form - the
> spec is a bit light on in this respect.

Well, we clearly have to resolve all \uXXXX to do either comparison or
canonicalization. The current patch does neither, but presumably we
want to leave the door open to such things. If we're using UTF-8 and
comparing two strings, and we get to a position where one of them has
a character and the other has \uXXXX, it's pretty simple to do the
comparison: we just turn XXXX into a wchar_t and test for equality.
That should be trivial, unless I'm misunderstanding. If, however,
we're not using UTF-8, we have to first turn \uXXXX into a Unicode
code point, then covert that to a character in the database encoding,
and then test for equality with the other character after that. I'm
not sure whether that's possible in general, how to do it, or how
efficient it is. Can you or anyone shed any light on that topic?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 17:12:13
Message-ID: AEDD8804-CC75-49FF-97DE-1897CFB4A101@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:

> If his client encoding is UTF8, the value will be letter-perfect JSON
> when it gets to him; and if his client encoding is not UTF8, then he's
> already pretty much decided that he doesn't give a fig about the
> Unicode-centricity of the JSON spec, no?

Don’t entirely agree with this. Some folks are stuck with other encodings and cannot change them for one reason or another. That said, they can convert JSON from their required encoding into UTF-8 on the client side, so there is a workaround.

Not that this changes anything, and I agree with the overall direction of the discussion here. I just want to make sure we keep in mind folks who don’t necessarily have the freedom to switch to UTF-8. (And I say this as someone who *always* uses UTF-8!)

David


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 17:14:35
Message-ID: 21D78B15-863A-4E27-A64D-AEAF62A4C98B@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:

> If, however,
> we're not using UTF-8, we have to first turn \uXXXX into a Unicode
> code point, then covert that to a character in the database encoding,
> and then test for equality with the other character after that. I'm
> not sure whether that's possible in general, how to do it, or how
> efficient it is. Can you or anyone shed any light on that topic?

If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no?

At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec.

Best,

David


From: Garick Hamlin <ghamlin(at)isc(dot)upenn(dot)edu>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 17:32:45
Message-ID: 20120120173245.GA12828@isc.upenn.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 09:12:13AM -0800, David E. Wheeler wrote:
> On Jan 19, 2012, at 9:07 PM, Tom Lane wrote:
>
> > If his client encoding is UTF8, the value will be letter-perfect JSON
> > when it gets to him; and if his client encoding is not UTF8, then he's
> > already pretty much decided that he doesn't give a fig about the
> > Unicode-centricity of the JSON spec, no?
>
> Don’t entirely agree with this. Some folks are stuck with other encodings and
> cannot change them for one reason or another. That said, they can convert
> JSON from their required encoding into UTF-8 on the client side, so there is
> a workaround.

Perhaps in addition to trying to just 'do the right thing by default',
it makes sense to have a two canonicalization functions?

Say: json_utf8() and json_ascii().

They could give the same output no matter what encoding was set?

json_utf8 would give nice output where characters were canonicalized to
native utf8 characters and json_ascii() would output only non-control
ascii characters literally and escape everything else or something
like that?

Garick


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 17:34:44
Message-ID: 4F19A5B4.2020902@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/20/2012 11:58 AM, Robert Haas wrote:
> On Fri, Jan 20, 2012 at 10:45 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> XML's&#nnnn; escape mechanism is more or less the equivalent of JSON's
>> \unnnn. But XML documents can be encoded in a variety of encodings,
>> including non-unicode encodings such as Latin-1. However, no matter what the
>> document encoding,&#nnnn; designates the character with Unicode code point
>> nnnn, whether or not that is part of the document encoding's charset.
> OK.
>
>> Given that precedent, I'm wondering if we do need to enforce anything other
>> than that it is a valid unicode code point.
>>
>> Equivalence comparison is going to be difficult anyway if you're not
>> resolving all \unnnn escapes. Possibly we need some sort of canonicalization
>> function to apply for comparison purposes. But we're not providing any
>> comparison ops today anyway, so I don't think we need to make that decision
>> now. As you say, there doesn't seem to be any defined canonical form - the
>> spec is a bit light on in this respect.
> Well, we clearly have to resolve all \uXXXX to do either comparison or
> canonicalization. The current patch does neither, but presumably we
> want to leave the door open to such things. If we're using UTF-8 and
> comparing two strings, and we get to a position where one of them has
> a character and the other has \uXXXX, it's pretty simple to do the
> comparison: we just turn XXXX into a wchar_t and test for equality.
> That should be trivial, unless I'm misunderstanding. If, however,
> we're not using UTF-8, we have to first turn \uXXXX into a Unicode
> code point, then covert that to a character in the database encoding,
> and then test for equality with the other character after that. I'm
> not sure whether that's possible in general, how to do it, or how
> efficient it is. Can you or anyone shed any light on that topic?

We know perfectly well how to turn two strings from encoding x to utf8
(see mb_utils.c::pg_do_encoding_conversion() ). Once we've done that
ISTM we have reduced this to the previous problem, as the mathematicians
like to say.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 18:08:03
Message-ID: CA+Tgmoa9GLx06S5KiG7YgF_T+3QkY+Dfq9RF2g79STM=LEn1_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 12:14 PM, David E. Wheeler <david(at)kineticode(dot)com> wrote:
> On Jan 20, 2012, at 8:58 AM, Robert Haas wrote:
>
>> If, however,
>> we're not using UTF-8, we have to first turn \uXXXX into a Unicode
>> code point, then covert that to a character in the database encoding,
>> and then test for equality with the other character after that.  I'm
>> not sure whether that's possible in general, how to do it, or how
>> efficient it is.  Can you or anyone shed any light on that topic?
>
> If it’s like the XML example, it should always represent a Unicode code point, and *not* be converted to the other character set, no?

Well, you can pick which way you want to do the conversion. If the
database encoding is SJIS, and there's an SJIS character in a string
that gets passed to json_in(), and there's another string which also
gets passed to json_in() which contains \uXXXX, then any sort of
canonicalization or equality testing is going to need to convert the
SJIS character to a Unicode code point, or the Unicode code point to
an SJIS character, to see whether they match.

Err, actually, now that I think about it, that might be a problem:
what happens if we're trying to test two characters for equality and
the encoding conversion fails? We really just want to return false -
the strings are clearly not equal if either contains even one
character that can't be converted to the other encoding - so it's not
good if an error gets thrown in there anywhere.

> At any rate, since the JSON standard requires UTF-8, such distinctions having to do with alternate encodings are not likely to be covered, so I suspect we can do whatever we want here. It’s outside the spec.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 19:21:16
Message-ID: 1559.1327087276@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Err, actually, now that I think about it, that might be a problem:
> what happens if we're trying to test two characters for equality and
> the encoding conversion fails?

This is surely all entirely doable given the encoding infrastructure
we already have. We might need some minor refactoring, eg to have
a way of not throwing an error, but it's not going to be that hard
to achieve if somebody wants to do it. So I still see little reason
for making the JSON type behave visibly differently in non-UTF8 database
encodings.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-20 20:26:27
Message-ID: CA+TgmoZCf5HTqtZKvv__+190DKAyObZPdTc6ym=FvLetXEzmUg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 20, 2012 at 2:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Err, actually, now that I think about it, that might be a problem:
>> what happens if we're trying to test two characters for equality and
>> the encoding conversion fails?
>
> This is surely all entirely doable given the encoding infrastructure
> we already have.  We might need some minor refactoring, eg to have
> a way of not throwing an error, but it's not going to be that hard
> to achieve if somebody wants to do it.  So I still see little reason
> for making the JSON type behave visibly differently in non-UTF8 database
> encodings.

OK. It feels a little grotty to me, but I'll go with the flow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-22 04:40:30
Message-ID: CAMkU=1x-9FiYPjae=Ae=-DiZcw2wPvf5Nk8ROu93XOomU+=y2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 01/14/2012 03:06 PM, Andrew Dunstan wrote:
>>
>>
>>
>>
>> OK, here's a patch that does both query_to_json and array_to_json, along
>> with docs and regression tests. It include Robert's original patch, although
>> I can produce a differential patch if required. It can also be pulled from
>> <https://bitbucket.org/adunstan/pgdevel>
>>
>>
>
>
> Here's an update that adds row_to_json, plus a bit more cleanup.

This is bit-rotted such that initdb fails

creating template1 database in
/tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could
not create unique index "pg_proc_oid_index"
DETAIL: Key (oid)=(3145) is duplicated.

I bumped up those oids in the patch, and it passes make check once I
figure out how to get the test run under UTF-8. Is it supposed to
pass under other encodings? I can't tell from the rest of thread
whether it supposed to pass in other encodings or not.

Cheers,

Jeff


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-22 09:28:25
Message-ID: 4F1BD6B9.8010906@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/21/2012 11:40 PM, Jeff Janes wrote:
> On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>>
>> On 01/14/2012 03:06 PM, Andrew Dunstan wrote:
>>>
>>>
>>>
>>> OK, here's a patch that does both query_to_json and array_to_json, along
>>> with docs and regression tests. It include Robert's original patch, although
>>> I can produce a differential patch if required. It can also be pulled from
>>> <https://bitbucket.org/adunstan/pgdevel>
>>>
>>>
>>
>> Here's an update that adds row_to_json, plus a bit more cleanup.
> This is bit-rotted such that initdb fails
>
> creating template1 database in
> /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could
> not create unique index "pg_proc_oid_index"
> DETAIL: Key (oid)=(3145) is duplicated.
>
> I bumped up those oids in the patch, and it passes make check once I
> figure out how to get the test run under UTF-8. Is it supposed to
> pass under other encodings? I can't tell from the rest of thread
> whether it supposed to pass in other encodings or not.
>

Yeah, regression tests generally are supposed to run in all encodings.
Either we could knock out the offending test, or we could supply an
alternative result file. If we do the latter, maybe we should modify the
query slightly, so it reads

SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, "\uaBcD"'::json;

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-22 16:43:34
Message-ID: 4F1C3CB6.6090104@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/22/2012 04:28 AM, Andrew Dunstan wrote:
>
>
> On 01/21/2012 11:40 PM, Jeff Janes wrote:
>> On Sun, Jan 15, 2012 at 8:08 AM, Andrew Dunstan<andrew(at)dunslane(dot)net>
>> wrote:
>>>
>>> On 01/14/2012 03:06 PM, Andrew Dunstan wrote:
>>>>
>>>>
>>>>
>>>> OK, here's a patch that does both query_to_json and array_to_json,
>>>> along
>>>> with docs and regression tests. It include Robert's original patch,
>>>> although
>>>> I can produce a differential patch if required. It can also be
>>>> pulled from
>>>> <https://bitbucket.org/adunstan/pgdevel>
>>>>
>>>>
>>>
>>> Here's an update that adds row_to_json, plus a bit more cleanup.
>> This is bit-rotted such that initdb fails
>>
>> creating template1 database in
>> /tmp/bar/src/test/regress/./tmp_check/data/base/1 ... FATAL: could
>> not create unique index "pg_proc_oid_index"
>> DETAIL: Key (oid)=(3145) is duplicated.
>>
>> I bumped up those oids in the patch, and it passes make check once I
>> figure out how to get the test run under UTF-8. Is it supposed to
>> pass under other encodings? I can't tell from the rest of thread
>> whether it supposed to pass in other encodings or not.
>>
>
> Yeah, regression tests generally are supposed to run in all encodings.
> Either we could knock out the offending test, or we could supply an
> alternative result file. If we do the latter, maybe we should modify
> the query slightly, so it reads
>
> SELECT 'getdatabaseencoding() = 'UTF8' as is_utf8, "\uaBcD"'::json;
>
>

Actually, given recent discussion I think that test should just be
removed from json.c. We don't actually have any test that the code point
is valid (e.g. that it doesn't refer to an unallocated code point). We
don't do that elsewhere either - the unicode_to_utf8() function the
scanner uses to turn \unnnn escapes into utf8 doesn't look for
unallocated code points. I'm not sure how much other validation we
should do - for example on correct use of surrogate pairs. I'd rather
get this as right as possible now - every time we tighten encoding rules
to make sure incorrectly encoded data doesn't get into the database it
causes someone real pain.

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-23 20:20:03
Message-ID: 1327350003.12714.0.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On sön, 2012-01-22 at 11:43 -0500, Andrew Dunstan wrote:
> Actually, given recent discussion I think that test should just be
> removed from json.c. We don't actually have any test that the code
> point is valid (e.g. that it doesn't refer to an unallocated code
> point). We don't do that elsewhere either - the unicode_to_utf8()
> function the scanner uses to turn \unnnn escapes into utf8 doesn't
> look for unallocated code points. I'm not sure how much other
> validation we should do - for example on correct use of surrogate
> pairs.

We do check the correctness of surrogate pairs elsewhere. Search for
"surrogate" in scan.l; should be easy to copy.


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-23 22:21:18
Message-ID: CAHyXU0yUxxN+HyYmiRosj6=EBsa6p+DPHsdSEPYGCKeGRSEn7w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Here's an update that adds row_to_json, plus a bit more cleanup.

why not call all these functions 'to_json' and overload them?

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-23 22:32:24
Message-ID: 4F1DDFF8.1000601@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/23/2012 05:21 PM, Merlin Moncure wrote:
> On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan<andrew(at)dunslane(dot)net> wrote:
>> Here's an update that adds row_to_json, plus a bit more cleanup.
> why not call all these functions 'to_json' and overload them?
>

I don't honestly feel that advances clarity much. And we might want to
overload each at some stage with options that are specific to the datum
type. We have various foo_to_xml() functions now.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-23 22:33:21
Message-ID: CAFj8pRAaz9PpHPcGT5DsEBoi1SPt9L7-VdZPGBgRDbonr-QNbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2012/1/23 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Sun, Jan 15, 2012 at 10:08 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Here's an update that adds row_to_json, plus a bit more cleanup.
>
> why not call all these functions 'to_json' and overload them?

-1

older proposal is more consistent with xml functions

Pavel

>
> merlin


From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-27 04:17:05
Message-ID: 20120127041705.GA20868@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-01-15 11:08:05 -0500, andrew(at)dunslane(dot)net wrote:
>
> Here's an update that adds row_to_json, plus a bit more cleanup.

I've started reviewing this patch, but it'll take me a bit longer to go
through json.c properly. Here are a few preliminary notes:

1. The patch has a lot of whitespace errors (primarily lines ending in
whitespace), but applying with git apply --whitespace=fix isn't safe,
because the test results need some (but not all) of those spaces. I
applied the patch, backed out the changes to expected/json.out, and
created the file from the patch, then removed the superfluous
whitespace.

2. I bumped some function OIDs to avoid conflicts.

3. One documentation typo.

Everything other than json.c (which I haven't read yet) looks fine
(builds, passes tests). I've attached a patch covering the changes
I made.

More later.

-- ams

Attachment Content-Type Size
andrew-json-tweaks.diff text/x-diff 4.7 KB

From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-30 14:54:13
Message-ID: 20120130145413.GA22873@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-01-27 09:47:05 +0530, ams(at)toroid(dot)org wrote:
>
> I've started reviewing this patch, but it'll take me a bit longer to go
> through json.c properly.

OK, I finished reading json.c. I don't have an answer to the detoasting
question in the XXX comment, but the code looks fine.

Aside: is query_to_json really necessary? It seems rather ugly and
easily avoidable using row_to_json.

-- ams


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-30 15:37:06
Message-ID: 4F26B922.7000000@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote:
> At 2012-01-27 09:47:05 +0530, ams(at)toroid(dot)org wrote:
>> I've started reviewing this patch, but it'll take me a bit longer to go
>> through json.c properly.
> OK, I finished reading json.c. I don't have an answer to the detoasting
> question in the XXX comment, but the code looks fine.

Looking at somewhat analogous code in xml.c, it doesn't seem to be done
there. SO maybe we don't need to worry about it.

>
> Aside: is query_to_json really necessary? It seems rather ugly and
> easily avoidable using row_to_json.
>

I started with this, again by analogy with query_to_xml(). But I agree
it's a bit ugly. If we're not going to do it, then we definitely need to
look at caching the output funcs in the function info. A closer
approximation is actually:

SELECT array_to_json(array_agg(q))
FROM ( your query here ) q;

But then I'd want the ability to break that up a bit with line feeds, so
we'd need to adjust the interface slightly. (Hint: don't try the above
with "select * from pg_class".)

I'll wait on further comments, but I can probably turn these changes
around very quickly once we're agreed.

Cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 14:35:03
Message-ID: CAHyXU0ws+7NYZJ3rFvvO39jiBUL-=S+Mvpsqg_g3bDz0LxA71A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 30, 2012 at 9:37 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 01/30/2012 09:54 AM, Abhijit Menon-Sen wrote:
>>
>> At 2012-01-27 09:47:05 +0530, ams(at)toroid(dot)org wrote:
>>>
>>> I've started reviewing this patch, but it'll take me a bit longer to go
>>> through json.c properly.
>>
>> OK, I finished reading json.c. I don't have an answer to the detoasting
>> question in the XXX comment, but the code looks fine.
>
>
>
> Looking at somewhat analogous code in xml.c, it doesn't seem to be done
> there. SO maybe we don't need to worry about it.
>
>
>>
>> Aside: is query_to_json really necessary? It seems rather ugly and
>> easily avoidable using row_to_json.
>>
>
> I started with this, again by analogy with query_to_xml(). But I agree it's
> a bit ugly. If we're not going to do it, then we definitely need to look at
> caching the output funcs in the function info. A closer approximation is
> actually:
>
>   SELECT array_to_json(array_agg(q))
>   FROM ( your query here ) q;

yup -- although I'd probably write it like this most of the time:
select array_to_json(array(
<query>
));

if we did have a 'query_to_json', the array() constructor would be a
lot more pleasant to to deal with than a textual query for obvious
reasons even though it's highly irregular syntax. however, since
arrays can already handle it, I wouldn't miss it at all.

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 17:04:31
Message-ID: CA+TgmoYg_SdB70gxx2vFW3z+oB8K7aU8XnQwp+sB0_H7c2FehQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 23, 2012 at 3:20 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On sön, 2012-01-22 at 11:43 -0500, Andrew Dunstan wrote:
>> Actually, given recent discussion I think that test should just be
>> removed from json.c. We don't actually have any test that the code
>> point is valid (e.g. that it doesn't refer to an unallocated code
>> point). We don't do that elsewhere either - the unicode_to_utf8()
>> function the scanner uses to turn \unnnn escapes into utf8 doesn't
>> look for unallocated code points. I'm not sure how much other
>> validation we should do - for example on correct use of surrogate
>> pairs.
>
> We do check the correctness of surrogate pairs elsewhere.  Search for
> "surrogate" in scan.l; should be easy to copy.

I've committed a version of this that does NOT do surrogate pair
validation. Per discussion elsewhere, I also removed the check for
\uXXXX with XXXX > 007F and database encoding != UTF8. This will
complicate things somewhat when we get around to doing
canonicalization and comparison, but Tom seems confident that those
issues are manageble. I did not commit Andrew's further changes,
either; I'm assuming he'll do that himself.

With respect to the issue of whether we ought to check surrogate
pairs, the JSON spec is not a whole lot of help. RFC4627 says:

To escape an extended character that is not in the Basic Multilingual
Plane, the character is represented as a twelve-character sequence,
encoding the UTF-16 surrogate pair. So, for example, a string
containing only the G clef character (U+1D11E) may be represented as
"\uD834\uDD1E".

That fails to answer the question of what we ought to do if we get an
invalid sequence there. You could make an argument that we ought to
just allow it; it doesn't particularly hinder our ability to
canonicalize or compare strings, because our notion of sort-ordering
for characters that may span multiple encodings is going to be pretty
funky anyway. We can just leave those bits as \uXXXX sequences and
call it good. However, it would hinder our ability to convert a JSON
string to a string in the database encoding: we could find an
invalidate surrogate pair that was allowable as JSON but
unrepresentable in the database encoding. On the flip side, given our
decision to allow all \uXXXX sequences even when not using UTF-8, we
could also run across a perfectly valid UTF-8 sequence that's not
representable as a character in the server encoding, so it seems we
have that problem anyway, so maybe it's not much worse to have two
reasons why it can happen rather than one. On the third hand, most
people are probably using UTF-8, and those people aren't going to have
any transcoding issues, so the invalid surrogate pair case may be the
only one they can hit (unless invalid code points are also an issue?),
so maybe it's worth avoiding on that basis.

Anyway, I defer to the wisdom of the collective on this one: how
should we handle this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 17:58:08
Message-ID: 4F282BB0.4060309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/30/2012 10:37 AM, Andrew Dunstan wrote:
>
>
>> Aside: is query_to_json really necessary? It seems rather ugly and
>> easily avoidable using row_to_json.
>>
>
> I started with this, again by analogy with query_to_xml(). But I agree
> it's a bit ugly. If we're not going to do it, then we definitely need
> to look at caching the output funcs in the function info. A closer
> approximation is actually:
>
> SELECT array_to_json(array_agg(q))
> FROM ( your query here ) q;
>
>
> But then I'd want the ability to break that up a bit with line feeds,
> so we'd need to adjust the interface slightly. (Hint: don't try the
> above with "select * from pg_class".)
>
>
> I'll wait on further comments, but I can probably turn these changes
> around very quickly once we're agreed.
>
>
>

based on Abhijit's feeling and some discussion offline, the consensus
seems to be to remove query_to_json.

cheers

andrew


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 18:15:27
Message-ID: 4F282FBF.7090100@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew,

> based on Abhijit's feeling and some discussion offline, the consensus
> seems to be to remove query_to_json.

If we do that, what would getting complete query results back from a
query look like? It's important to make this as simple for developers
as possible.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 18:29:59
Message-ID: 20120131182959.GA5577@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-01-31 12:04:31 -0500, robertmhaas(at)gmail(dot)com wrote:
>
> That fails to answer the question of what we ought to do if we get an
> invalid sequence there.

I think it's best to categorically reject invalid surrogates as early as
possible, considering the number of bugs that are related to them (not
in Postgres, just in general). I can't see anything good coming from
letting them in and leaving them to surprise someone in future.

-- ams


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 18:32:31
Message-ID: CAHyXU0zJG+wDR6BYQfxUoz14NirKaaR2teXQLDwxZSE7G+tTXA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Andrew,
>
>> based on Abhijit's feeling and some discussion offline, the consensus
>> seems to be to remove query_to_json.
>
> If we do that, what would getting complete query results back from a
> query look like?  It's important to make this as simple for developers
> as possible.

two options:
1. row_to_json(rowvar)
SELECT row_to_json(foo) from foo;
SELECT row_to_json(row(a,b,c)) from foo;

2. array_to_json(array_agg()/array())
SELECT array_to_json(array(select foo from foo));
SELECT array_to_json(array[1,2,3]);

#1 I expect will be the more used version -- most json handling client
side api (for example node.js drivers) are optimized for row by row
processing, but via #2 you can stuff a whole query into single json
object if you're so inclined.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 18:48:36
Message-ID: 4F283784.8010707@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/31/2012 01:32 PM, Merlin Moncure wrote:
> On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
>> Andrew,
>>
>>> based on Abhijit's feeling and some discussion offline, the consensus
>>> seems to be to remove query_to_json.
>> If we do that, what would getting complete query results back from a
>> query look like? It's important to make this as simple for developers
>> as possible.
> two options:
> 1. row_to_json(rowvar)
> SELECT row_to_json(foo) from foo;
> SELECT row_to_json(row(a,b,c)) from foo;
>
> 2. array_to_json(array_agg()/array())
> SELECT array_to_json(array(select foo from foo));
> SELECT array_to_json(array[1,2,3]);
>
> #1 I expect will be the more used version -- most json handling client
> side api (for example node.js drivers) are optimized for row by row
> processing, but via #2 you can stuff a whole query into single json
> object if you're so inclined.
>

You could also write a wrapper something like this:

create function query_to_json(qtext text) returns json language
plpgsql as
$$
begin
return query execute 'select array_to_json(array(' || qtext ||
'))';
end;
$$;

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 19:49:41
Message-ID: CAHyXU0wK4kObr3GvhAKhhusGQe1=xic67AQhX6zyZ438wMoVRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 12:48 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 01/31/2012 01:32 PM, Merlin Moncure wrote:
>> On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkus<josh(at)agliodbs(dot)com>  wrote:
>>>
>>> Andrew,
>>>
>>>> based on Abhijit's feeling and some discussion offline, the consensus
>>>> seems to be to remove query_to_json.
>>>
>>> If we do that, what would getting complete query results back from a
>>> query look like?  It's important to make this as simple for developers
>>> as possible.
>>
>> two options:
>> 1. row_to_json(rowvar)
>> SELECT row_to_json(foo) from foo;
>> SELECT row_to_json(row(a,b,c)) from foo;
>>
>> 2. array_to_json(array_agg()/array())
>> SELECT array_to_json(array(select foo from foo));
>> SELECT array_to_json(array[1,2,3]);
>>
>> #1 I expect will be the more used version -- most json handling client
>> side api (for example node.js drivers) are optimized for row by row
>> processing, but via #2 you can stuff a whole query into single json
>> object if you're so inclined.
>>
>
> You could also write a wrapper something like this:
>
>   create function query_to_json(qtext text) returns json language
>   plpgsql as
>   $$
>   begin
>        return query execute 'select array_to_json(array(' || qtext ||
>   '))';
>   end;
>   $$;

right -- then you can leverage execute/using parameterization etc.
and/or rig a variadic version.

The major hole in functionality I see for heavy json users is the
reverse; how do you get json back into the database? With xml, at
least you could (ab)use xpath for that...with json you have to rely on
add-on support and/or ad hoc string parsing (that is, unless I'm
missing something -- I just noted Robert's commit of the JSON type).

since we can do:
select array_to_json(array(select foo from foo));

it seems natural to be able to want do do something like:
WITH foos AS (SELECT a_json_var::foo[] AS f)
(
INSERT INTO foo SELECT (f).* FROM foos
);

Of course, you'd have to have non-anonymous (that is, defined with
CREATE TYPE AS) types defined to receive all the data, but that's not
so bad. Also, could xxx_to_json be hypothetically executed via casts?

e.g. select array(select foo from foo)::json;

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 20:15:42
Message-ID: 4F284BEE.5020903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/31/2012 02:49 PM, Merlin Moncure wrote:
> The major hole in functionality I see for heavy json users is the
> reverse; how do you get json back into the database? With xml, at
> least you could (ab)use xpath for that...with json you have to rely on
> add-on support and/or ad hoc string parsing (that is, unless I'm
> missing something -- I just noted Robert's commit of the JSON type).
> since we can do: select array_to_json(array(select foo from foo)); it
> seems natural to be able to want do do something like: WITH foos AS
> (SELECT a_json_var::foo[] AS f) ( INSERT INTO foo SELECT (f).* FROM
> foos ); Of course, you'd have to have non-anonymous (that is, defined
> with CREATE TYPE AS) types defined to receive all the data, but that's
> not so bad. Also, could xxx_to_json be hypothetically executed via
> casts? e.g. select array(select foo from foo)::json;

At some stage there will possibly be some json-processing (as opposed to
json-producing) functions, but not in 9.2 - it's too late for that.
Until then there is PLV8: see
<http://people.planetpostgresql.org/andrew/index.php?/archives/249-Using-PLV8-to-index-JSON.html>
for example. Or someone could write an extension.

cheers

andrew


From: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-01-31 20:47:05
Message-ID: CAARyMpAW9N+6_kcob-R=DMP3HyvXXduA3kAecsocLPNZq04CyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 1:29 PM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> At 2012-01-31 12:04:31 -0500, robertmhaas(at)gmail(dot)com wrote:
>>
>> That fails to answer the question of what we ought to do if we get an
>> invalid sequence there.
>
> I think it's best to categorically reject invalid surrogates as early as
> possible, considering the number of bugs that are related to them (not
> in Postgres, just in general). I can't see anything good coming from
> letting them in and leaving them to surprise someone in future.
>
> -- ams

+1

Another sequence to beware of is \u0000. While escaped NUL characters
are perfectly valid in JSON, NUL characters aren't allowed in TEXT
values. This means not all JSON strings can be converted to TEXT,
even in UTF-8. This may also complicate collation, if comparison
functions demand null-terminated strings.

I'm mostly in favor of allowing \u0000. Banning \u0000 means users
can't use JSON strings to marshal binary blobs, e.g. by escaping
non-printable characters and only using U+0000..U+00FF. Instead, they
have to use base64 or similar.

Banning \u0000 doesn't quite violate the RFC:

An implementation may set limits on the length and character
contents of strings.

-Joey


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-02-01 05:46:55
Message-ID: 4F28D1CF.7040007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/31/2012 01:48 PM, Andrew Dunstan wrote:
>
>
> On 01/31/2012 01:32 PM, Merlin Moncure wrote:
>> On Tue, Jan 31, 2012 at 12:15 PM, Josh Berkus<josh(at)agliodbs(dot)com> wrote:
>>> Andrew,
>>>
>>>> based on Abhijit's feeling and some discussion offline, the consensus
>>>> seems to be to remove query_to_json.
>>> If we do that, what would getting complete query results back from a
>>> query look like? It's important to make this as simple for developers
>>> as possible.
>> two options:
>> 1. row_to_json(rowvar)
>> SELECT row_to_json(foo) from foo;
>> SELECT row_to_json(row(a,b,c)) from foo;
>>
>> 2. array_to_json(array_agg()/array())
>> SELECT array_to_json(array(select foo from foo));
>> SELECT array_to_json(array[1,2,3]);
>>
>> #1 I expect will be the more used version -- most json handling client
>> side api (for example node.js drivers) are optimized for row by row
>> processing, but via #2 you can stuff a whole query into single json
>> object if you're so inclined.
>>
>
> You could also write a wrapper something like this:
>
> create function query_to_json(qtext text) returns json language
> plpgsql as
> $$
> begin
> return query execute 'select array_to_json(array(' || qtext ||
> '))';
> end;
> $$;

The array(select...) locution turns out to have less flexibility than
the array_agg(record-ref) locution.

Anyway, for those playing along, I have removed query_to_json, and added
optional pretty printing to array_to_json and row_to_json. WIP can be
seen at <https://bitbucket.org/adunstan/pgdevel>. I still have docs and
output function caching to do, and should post a revised patch within
the next 48 hours.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-02-01 15:49:30
Message-ID: CAHyXU0xiH-1afzY5dT-F83jBhC6D0BjdH+ib2forcA4_+U+NDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 11:46 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> The array(select...) locution turns out to have less flexibility than the
> array_agg(record-ref) locution.

Less flexible maybe, but it can cleaner for exactly the type of
queries that will tend to come up in exactly the type of functionality
people are looking for with JSON output. libpqtypes does exactly the
same stuff but for C clients -- so I've done tons of this kind of
programming and am maybe a bit ahead of the curve here. Note: while
the following contrived example may seem a bit complex it has a
certain elegance and shows how the postgres type system can whip out
document style 'nosql' objects to clients who can handle them.
Perhaps there is more simplification through syntax possible, but as
it stands things are pretty functional. The equivalent production
through array_agg I find to be pretty awful looking although it can
produce a better plan since it doesn't force everything through
flattened subqueries:

create table foo
(
foo_id serial primary key,
a int
);

create table bar
(
bar_id serial primary key,
foo_id int references foo,
b int
);

create table baz
(
baz_id serial primary key,
bar_id int references bar,
c int
);

create type baz_t as
(
c int
);

create type bar_t as
(
bazs baz_t[],
b int
);

create type foo_t as
(
bars bar_t[],
a int
);

INSERT INTO foo(a) VALUES (1);
INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 100);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 1000);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 2000);
INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 200);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 3000);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 4000);
INSERT INTO foo(a) VALUES (2);
INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 300);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 5000);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 6000);
INSERT INTO bar(foo_id, b) VALUES (currval('foo_foo_id_seq'), 400);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 7000);
INSERT INTO baz(bar_id, c) VALUES (currval('bar_bar_id_seq'), 8000);

-- nosql!
select array(
select row(
array(
select row(
array(
select row(
c
)::baz_t from baz where baz.bar_id = bar.bar_id
)::baz_t[],
b
)::bar_t from bar where bar.foo_id = foo.foo_id
)::bar_t[],
a
)::foo_t from foo
)::foo_t[];

foo_t
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
{"(\"{\"\"(\\\\\"\"{(1000),(2000)}\\\\\"\",100)\"\",\"\"(\\\\\"\"{(3000),(4000)}\\\\\"\",200)\"\"}\",1)","(\"{\"\"(\\\\\"\"{(5000),(6000)}\\\\\"\",300)\"\",\"\"(\\\\\"\"{(7000),(8000)}\\\\\"\",400)\"\"}\",2)"}

as you can see, the postgres default escaping format sucks for sending
nested data -- throw even one quote or backslash in there and your
data can explode in size 10+ times -- this is why we insisted on
binary. json, of course, is much better suited for this type of
communication. despite the complicated-ness look of the above, this
type of code is in fact very easy to write once you get the knack.
This type of coding also leads to much simpler coding on the cilent
since relationships are directly built into the structure and don't
have to be inferred or duplicated.

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-02-01 16:28:50
Message-ID: CA+TgmoZ2iA=ssFpJetw0UZoOew4SgjdJ-wocYM4YCrx7AtRZgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 31, 2012 at 3:47 PM, Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com> wrote:
> I'm mostly in favor of allowing \u0000.  Banning \u0000 means users
> can't use JSON strings to marshal binary blobs, e.g. by escaping
> non-printable characters and only using U+0000..U+00FF.  Instead, they
> have to use base64 or similar.

I agree. I mean, representing data using six bytes per source byte is
a bit unattractive from an efficiency point of view, but I'm sure
someone is going to want to do it. It's also pretty clear that JSON
string -> PG text data type is going to admit of a number of error
conditions (transcoding errors and perhaps invalid surrogate pairs) so
throwing one more on the pile doesn't cost much.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-02-02 09:54:59
Message-ID: 20120202095459.GB779@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-02-01 11:28:50 -0500, robertmhaas(at)gmail(dot)com wrote:
>
> It's also pretty clear that JSON
> string -> PG text data type is going to admit of a number of error
> conditions (transcoding errors and perhaps invalid surrogate pairs) so
> throwing one more on the pile doesn't cost much.

Hi Robert.

I'm sorry for being slow, but I don't understand what you're proposing
to do here (if anything). Could I ask you to explain, please?

Are you talking about allowing the six literal bytes "\u0000" to be
present in the JSON? If so, I agree, there seems to be no reason to
disallow it.

Are you also saying we should allow any "\uNNNN" sequence, without
checking for errors (e.g. invalid surrogate pairs or parts thereof)?

And what transcoding errors are you referring to?

-- ams


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)toroid(dot)org>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-02-02 13:54:32
Message-ID: CA+TgmoaJbketfb1yiZwWoiiMa92bgBNkDfkyLfFhzcdV=MVqGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Feb 2, 2012 at 4:54 AM, Abhijit Menon-Sen <ams(at)toroid(dot)org> wrote:
> At 2012-02-01 11:28:50 -0500, robertmhaas(at)gmail(dot)com wrote:
>> It's also pretty clear that JSON
>> string -> PG text data type is going to admit of a number of error
>> conditions (transcoding errors and perhaps invalid surrogate pairs) so
>> throwing one more on the pile doesn't cost much.
>
> I'm sorry for being slow, but I don't understand what you're proposing
> to do here (if anything). Could I ask you to explain, please?
>
> Are you talking about allowing the six literal bytes "\u0000" to be
> present in the JSON? If so, I agree, there seems to be no reason to
> disallow it.
>
> Are you also saying we should allow any "\uNNNN" sequence, without
> checking for errors (e.g. invalid surrogate pairs or parts thereof)?
>
> And what transcoding errors are you referring to?

Consider the following JSON object:

"abc"

This is a JSON string. Someone is eventually going to propose a
function with a name like json_to_string() which, when given this
JSON object, returns a three-character string with the PostgreSQL text
type. That's useful and I support it. But now suppose we pass this
JSON object to that same function:

"a\u0062c"

The user will quite rightly expect that since this JSON object
represents the same value as the first JSON object, they're going to
get the same answer back from json_to_string(), namely "abc". So far,
so good. But now suppose we pass this JSON object to that same
function:

"a\u0000c"

This is going to have to be an error condition, because PostgreSQL
does not allow values of type text to contain embedded NUL characters.
Now consider this:

"a\uABCDc"

Suppose that \uABCD represents a character that exists in Unicode, but
the server-encoding is SHIFT-JIS or EUC-JP or some other system which
has no equivalent for the character represented by \uABCD. Again,
when json_to_string() is applied to a value of this type, it must
fail.

In other words, we're knowingly allowing JSON strings to contain
characters which might not be representable as PostgreSQL strings,
because JSON allows any Unicode character, and the server encoding
might not be Unicode, and the server doesn't allow embedded NULs in
any encoding.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Abhijit Menon-Sen <ams(at)toroid(dot)org>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joey Adams <joeyadams3(dot)14159(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, "David E(dot) Wheeler" <david(at)kineticode(dot)com>, Claes Jakobsson <claes(at)surfar(dot)nu>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Jan Urbański <wulczer(at)wulczer(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Jan Wieck <janwieck(at)yahoo(dot)com>
Subject: Re: JSON for PG 9.2
Date: 2012-02-02 14:51:26
Message-ID: 20120202145126.GA5221@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2012-02-02 08:54:32 -0500, robertmhaas(at)gmail(dot)com wrote:
>
> Someone is eventually going to propose a function with a name like
> json_to_string() which, when given this JSON object, returns a
> three-character string with the PostgreSQL text type.

Ah, that's the bit I was missing. I thought you were talking about an
immediate error condition.

> That's useful and I support it.

Agreed. Also, now I understand that you are saying that json_to_string()
(json_string_to_text?) would fail if the result couldn't be represented
as a text in the current encoding, and that's sensible as well. I had
misunderstood "is going to admit of a number of error…" in your mail.

As for surrogate pairs, just to be clear, what I was proposing earlier
in the thread was to change json.c:json_lex_string() to detect errors
(e.g. only one half of a surrogate pair, which is the commonest error
I've encountered in the wild) and reject such strings.

Thanks for the explanation.

-- ams


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-04-16 13:34:12
Message-ID: 1334583252.31618.2626.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2012-01-31 at 12:58 -0500, Andrew Dunstan wrote:
>
> On 01/30/2012 10:37 AM, Andrew Dunstan wrote:
> >
> >
> >> Aside: is query_to_json really necessary? It seems rather ugly and
> >> easily avoidable using row_to_json.
> >>
> >
> > I started with this, again by analogy with query_to_xml(). But I agree
> > it's a bit ugly. If we're not going to do it, then we definitely need
> > to look at caching the output funcs in the function info. A closer
> > approximation is actually:
> >
> > SELECT array_to_json(array_agg(q))
> > FROM ( your query here ) q;
> >
> >
> > But then I'd want the ability to break that up a bit with line feeds,
> > so we'd need to adjust the interface slightly. (Hint: don't try the
> > above with "select * from pg_class".)
> >
> >
> > I'll wait on further comments, but I can probably turn these changes
> > around very quickly once we're agreed.
> >
> >
> >
>
> based on Abhijit's feeling and some discussion offline, the consensus
> seems to be to remove query_to_json.

The only comment I have here is that query_to_json could have been
replaced with json_agg, so thet you don't need to do double-buffering
for the results of array(<yourquery>) call in

SELECT array_to_json(array(<yourquery>));

Or is there some other way to avoid it except to wrap row_to_json()
calls in own aggregate function which adds enclosing brackets and comma
separator ( like this : '['<row1>[,<rowN>]']' ?

> cheers
>
> andrew
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-04-16 14:10:33
Message-ID: 4F8C2859.6050203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 04/16/2012 09:34 AM, Hannu Krosing wrote:
>> based on Abhijit's feeling and some discussion offline, the consensus
>> seems to be to remove query_to_json.
> The only comment I have here is that query_to_json could have been
> replaced with json_agg, so thet you don't need to do double-buffering
> for the results of array(<yourquery>) call in
>
> SELECT array_to_json(array(<yourquery>));
>
> Or is there some other way to avoid it except to wrap row_to_json()
> calls in own aggregate function which adds enclosing brackets and comma
> separator ( like this : '['<row1>[,<rowN>]']' ?
>
>

The way I usually write this is:

select array_to_json(array_agg(q))
from (<yourquery>) q;

It's a pity you didn't make this comment back in January when we were
talking about this. I think it's too late now in this release cycle to
be talking about adding the aggregate function.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: JSON for PG 9.2
Date: 2012-04-16 14:40:41
Message-ID: CAHyXU0wv0zCbRujf3zXQB8G9oApUeTrpD6j-KizU6WXnjAazaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 16, 2012 at 9:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> On 04/16/2012 09:34 AM, Hannu Krosing wrote:
>>>
>>> based on Abhijit's feeling and some discussion offline, the consensus
>>> seems to be to remove query_to_json.
>>
>> The only comment I have here is that query_to_json could have been
>> replaced with json_agg, so thet you don't need to do double-buffering
>> for the results of array(<yourquery>) call in
>>
>> SELECT array_to_json(array(<yourquery>));
>>
>> Or is there some other way to avoid it except to wrap row_to_json()
>> calls in own aggregate function which adds enclosing brackets and comma
>> separator ( like this : '['<row1>[,<rowN>]']' ?
>>
>>
>
> The way I usually write this is:
>
>    select array_to_json(array_agg(q))
>    from (<yourquery>) q;
>
> It's a pity you didn't make this comment back in January when we were
> talking about this. I think it's too late now in this release cycle to be
> talking about adding the aggregate function.

I find array_agg to be pretty consistently slower than
array()...although not much, say around 5-10%. I use array_agg only
when grouping. try timing
select array_to_json(array_agg(v)) from (select v from
generate_series(1,1000000) v) q;
vs
select array_to_json(array(select v from generate_series(1,1000000) v));

I agree with Hannu but as things stand if I'm trying to avoid the
extra buffer I've found myself doing the final aggregation on the
client -- it's easy enough. BTW, I'm using the json stuff heavily and
it's just absolutely fantastic. Finally I can write web applications
without wondering exactly where it was that computer science went off
the rails.

I've already demoed a prototype app that integrates pg directly with
the many high quality js libraries out there and it makes things very
easy and quick by making writing data services trivial. Data pushes
are still quite a pain but I figure something can be worked out.

merlin


From: Hannu Krosing <hannu(at)2ndQuadrant(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org, hannu(at)krosing(dot)net
Subject: Re: JSON for PG 9.2
Date: 2012-04-16 16:19:40
Message-ID: 1334593180.31618.2673.camel@hvost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2012-04-16 at 10:10 -0400, Andrew Dunstan wrote:
>
> On 04/16/2012 09:34 AM, Hannu Krosing wrote:
> >> based on Abhijit's feeling and some discussion offline, the consensus
> >> seems to be to remove query_to_json.
> > The only comment I have here is that query_to_json could have been
> > replaced with json_agg, so thet you don't need to do double-buffering
> > for the results of array(<yourquery>) call in
> >
> > SELECT array_to_json(array(<yourquery>));
> >
> > Or is there some other way to avoid it except to wrap row_to_json()
> > calls in own aggregate function which adds enclosing brackets and comma
> > separator ( like this : '['<row1>[,<rowN>]']' ?
> >
> >
>
> The way I usually write this is:
>
> select array_to_json(array_agg(q))
> from (<yourquery>) q;
>
> It's a pity you didn't make this comment back in January when we were
> talking about this. I think it's too late now in this release cycle to
> be talking about adding the aggregate function.

My comment is not meant to propose changing anything in 9.2.

I think what we have here is absolutely fantastic :)

If doing something in 9.3 then what I would like is some way to express
multiple queries. Basically a variant of

query_to_json(query text[])

where queries would be evaluated in order and then all the results
aggregated into on json object.

But "aggregation on client" as suggested by Merlin may be a better way
to do it for larger result(set)s.

Especially as it could enable streaming of the resultsets without having
to first buffer everything on the server.

If we can add something, then perhaps a "deeper" pretty_print feature

samples:

hannu=# \d test
Table "public.test"
Column | Type |
Modifiers
--------+-----------------------------+---------------------------------------------------
id | integer | not null default
nextval('test_id_seq'::regclass)
data | text |
tstamp | timestamp without time zone | default now()
Indexes:
"test_pkey" PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test from test),true);
-[ RECORD
1 ]-+----------------------------------------------------------------------------
array_to_json | [{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05
13:21:03.235204"},
| {"id":2,"data":"0.157406373415142","tstamp":"2012-04-05
13:21:05.2033"}]

This is OK

hannu=# \d test2
Table "public.test2"
Column | Type |
Modifiers
--------+-----------------------------+----------------------------------------------------
id | integer | not null default
nextval('test2_id_seq'::regclass)
data2 | test |
tstamp | timestamp without time zone | default now()
Indexes:
"test2_pkey" PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test2 from test2),true);
-[ RECORD
1 ]-+-----------------------------------------------------------------------------------------------------------------------------------
array_to_json |
[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05
13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},
|
{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05
13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}]

This is "kind of OK"

hannu=# \d test3
Table "public.test3"
Column | Type |
Modifiers
--------+-----------------------------+----------------------------------------------------
id | integer | not null default
nextval('test3_id_seq'::regclass)
data3 | test2[] |
tstamp | timestamp without time zone | default now()
Indexes:
"test3_pkey" PRIMARY KEY, btree (id)

hannu=# select array_to_json(array(select test3 from test3),true);
-[ RECORD
1 ]-+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
array_to_json |
[{"id":1,"data3":[{"id":1,"data2":{"id":1,"data":"0.262814193032682","tstamp":"2012-04-05 13:21:03.235204"},"tstamp":"2012-04-05 13:25:03.644497"},{"id":2,"data2":{"id":2,"data":"0.157406373415142","tstamp":"2012-04-05 13:21:05.2033"},"tstamp":"2012-04-05 13:25:03.644497"}],"tstamp":"2012-04-16 14:40:15.795947"}]

but this would be nicer if printed like pythons pprint :

>>> pprint.pprint(d)
[{'data3': [{'data2': {'data': '0.262814193032682',
'id': 1,
'tstamp': '2012-04-05 13:21:03.235204'},
'id': 1,
'tstamp': '2012-04-05 13:25:03.644497'},
{'data2': {'data': '0.157406373415142',
'id': 2,
'tstamp': '2012-04-05 13:21:05.2033'},
'id': 2,
'tstamp': '2012-04-05 13:25:03.644497'}],
'id': 1,
'tstamp': '2012-04-16 14:40:15.795947'}]

:D

Again, I don't expect it anytime soon.

What we will get in 9.2 is wonderful already.

Cheers,
Hannu
--
-------
Hannu Krosing
PostgreSQL Unlimited Scalability and Performance Consultant
2ndQuadrant Nordic
PG Admin Book: http://www.2ndQuadrant.com/books/


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)2ndquadrant(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Abhijit Menon-Sen <ams(at)toroid(dot)org>, pgsql-hackers(at)postgresql(dot)org, hannu(at)krosing(dot)net
Subject: Re: JSON for PG 9.2
Date: 2012-04-16 16:41:30
Message-ID: CAHyXU0yoH=6xj58CRN2_TGkaP-fBiYksjvDaPhnESCNz2JdQkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Apr 16, 2012 at 11:19 AM, Hannu Krosing <hannu(at)2ndquadrant(dot)com> wrote:
> If doing something in 9.3 then what I would like is some way to express
> multiple queries. Basically a variant of
>
> query_to_json(query text[])
>
> where queries would be evaluated in order and then all the results
> aggregated into on json object.

I personally don't like variants of to_json that push the query in as
text. They defeat parameterization and have other issues. Another
point for client side processing is the new row level processing in
libpq, so I'd argue that if the result is big enough to warrant
worring about buffering (and it'd have to be a mighty big json doc),
the best bet is to extract it as rows. I'm playing around with
node.js for the json serving and the sending code looks like this:

var first = true;

query.on('row', function(row) {
if(first) {
first = false;
response.write('[');
}
else response.write(',');
response.write(row.jsondata);
});
query.on('end', function() {
response.write(']');
response.end();
});

-- not too bad

merlin