Re: json api WIP patch

Lists: pgsql-hackers
From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: json api WIP patch
Date: 2012-12-26 20:33:59
Message-ID: 50DB5F37.3000103@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is a patch for the first part of the JSON API that was recently
discussed. It includes the json parser hook infrastructure and
functions for json_get and friends, plus json_keys.

As is, this exposes the json lexer fully for use by the hook functions.
But I could easily be persuaded that this should be an opaque structure
with some constructor and getter functions - I don't think the hook
functions need or should be able to set anything in the lexer.

Work is proceeding on some of the more advanced functionality discussed.

cheers

andrew

Attachment Content-Type Size
jsonapi1.patch text/x-patch 59.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-02 21:45:35
Message-ID: CA+TgmoaqyQV+SpvD47zpN8iaYrM_p4mxHEavta0LbLwkpqMpWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Here is a patch for the first part of the JSON API that was recently
> discussed. It includes the json parser hook infrastructure and functions
> for json_get and friends, plus json_keys.
>
> As is, this exposes the json lexer fully for use by the hook functions. But
> I could easily be persuaded that this should be an opaque structure with
> some constructor and getter functions - I don't think the hook functions
> need or should be able to set anything in the lexer.
>
> Work is proceeding on some of the more advanced functionality discussed.

This seems to contain a large number of spurious whitespace changes.

And maybe some other spurious changes. For example, I'm not sure why
this comment is truncated:

}
}

! /*
! * Check for trailing garbage. As in json_lex(), any alphanumeric stuff
! * here should be considered part of the token for error-reporting
! * purposes.
! */
for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
error = true;
lex->token_terminator = p;
if (error)
report_invalid_token(lex);
--- 730,739 ----
}
}

! /* Check for trailing garbage. */
for (p = s; JSON_ALPHANUMERIC_CHAR(*p); p++)
error = true;
+ lex->prev_token_terminator = lex->token_terminator;
lex->token_terminator = p;
if (error)
report_invalid_token(lex);

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-02 22:51:50
Message-ID: 50E4BA06.5040608@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/02/2013 04:45 PM, Robert Haas wrote:
> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Here is a patch for the first part of the JSON API that was recently
>> discussed. It includes the json parser hook infrastructure and functions
>> for json_get and friends, plus json_keys.
>>
>> As is, this exposes the json lexer fully for use by the hook functions. But
>> I could easily be persuaded that this should be an opaque structure with
>> some constructor and getter functions - I don't think the hook functions
>> need or should be able to set anything in the lexer.
>>
>> Work is proceeding on some of the more advanced functionality discussed.
> This seems to contain a large number of spurious whitespace changes.

I'm glad you're looking at it :-)

I did do a run of pgindent on the changed files before I cut the patch,
which might have made some of those changes.

I notice a couple of other infelicities too, which are undoubtedly my fault.

The original prototype of this was cut against some older code, and I
then tried to merge it with the current code base, and make sure that
all the regression tests passed. That might well have resulted in a
number of things that need review.

>
> And maybe some other spurious changes. For example, I'm not sure why
> this comment is truncated:
>
>

Another good question.

I'll make another pass over this and try to remove some of what's
annoying you.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-04 20:23:58
Message-ID: 50E73A5E.5080201@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>
> On 01/02/2013 04:45 PM, Robert Haas wrote:
>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>> wrote:
>>> Here is a patch for the first part of the JSON API that was recently
>>> discussed. It includes the json parser hook infrastructure and
>>> functions
>>> for json_get and friends, plus json_keys.

Udated patch that contains most of the functionality I'm after. One
piece left is populate_recordset (populate a set of records from a
single json datum which is an array of objects, in one pass). That
requires a bit of thought.

I hope most of the whitespace issues are fixed.

cheers

andrew

Attachment Content-Type Size
jsonapi2.patch text/x-patch 90.1 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-04 20:36:07
Message-ID: CAFj8pRDuqU-H8Y5PYYF1_a3TFvLCXMW4tiJMwfGpb1wzvcvHaw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

2013/1/4 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>>
>>
>> On 01/02/2013 04:45 PM, Robert Haas wrote:
>>>
>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>>
>>>> Here is a patch for the first part of the JSON API that was recently
>>>> discussed. It includes the json parser hook infrastructure and
>>>> functions
>>>> for json_get and friends, plus json_keys.
>
>
>
> Udated patch that contains most of the functionality I'm after. One piece
> left is populate_recordset (populate a set of records from a single json
> datum which is an array of objects, in one pass). That requires a bit of
> thought.
>
> I hope most of the whitespace issues are fixed.
>

it is looking well

I have one note - is it "json_each" good name?

Regards

Pavel

> cheers
>
> andrew
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-04 20:51:54
Message-ID: 50E740EA.3070903@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/04/2013 03:36 PM, Pavel Stehule wrote:
> Hello
>
>
>
> 2013/1/4 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>> On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>>>
>>> On 01/02/2013 04:45 PM, Robert Haas wrote:
>>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>> wrote:
>>>>> Here is a patch for the first part of the JSON API that was recently
>>>>> discussed. It includes the json parser hook infrastructure and
>>>>> functions
>>>>> for json_get and friends, plus json_keys.
>>
>>
>> Udated patch that contains most of the functionality I'm after. One piece
>> left is populate_recordset (populate a set of records from a single json
>> datum which is an array of objects, in one pass). That requires a bit of
>> thought.
>>
>> I hope most of the whitespace issues are fixed.
>>
> it is looking well
>
> I have one note - is it "json_each" good name?
>

Possibly not, although hstore has each(). json_unnest might be even less
felicitous.

I'm expecting a good deal of bikeshedding - I'm trying to ignore those
issues for the most part because the functionality seems much more
important to me than the names.

The more people that pound on it and try to break it the happier I'll be.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-04 21:03:36
Message-ID: CAFj8pRBXTmeOyobRkkJET8OJ6b+Wc5s50qyAGCwtOzSyzHHARA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/1/4 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 01/04/2013 03:36 PM, Pavel Stehule wrote:
>>
>> Hello
>>
>>
>>
>> 2013/1/4 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>>>
>>> On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>>>>
>>>>
>>>> On 01/02/2013 04:45 PM, Robert Haas wrote:
>>>>>
>>>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>>> wrote:
>>>>>>
>>>>>> Here is a patch for the first part of the JSON API that was recently
>>>>>> discussed. It includes the json parser hook infrastructure and
>>>>>> functions
>>>>>> for json_get and friends, plus json_keys.
>>>
>>>
>>>
>>> Udated patch that contains most of the functionality I'm after. One piece
>>> left is populate_recordset (populate a set of records from a single json
>>> datum which is an array of objects, in one pass). That requires a bit of
>>> thought.
>>>
>>> I hope most of the whitespace issues are fixed.
>>>
>> it is looking well
>>
>> I have one note - is it "json_each" good name?
>>
>
>
> Possibly not, although hstore has each(). json_unnest might be even less
> felicitous.

I understand - but hstore isn't in core - so it should not be precedent

regexp_split_to_table

I am not native speaker, it sounds little bit strange - but maybe
because I am not native speaker :)

Regards

Pavel

>
> I'm expecting a good deal of bikeshedding - I'm trying to ignore those
> issues for the most part because the functionality seems much more important
> to me than the names.
>
> The more people that pound on it and try to break it the happier I'll be.
>
> cheers
>
> andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-07 15:25:16
Message-ID: CAHyXU0xywF4M3TJL6FGMm=19mjA+9v0dGcxTDv51RzGzFa4s1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I understand - but hstore isn't in core - so it should not be precedent
>
> regexp_split_to_table
>
> I am not native speaker, it sounds little bit strange - but maybe
> because I am not native speaker :)

it's common usage: http://api.jquery.com/jQuery.each/

the patch looks fabulous. There are a few trivial whitespace issues
yet and I noticed a leaked hstore comment@ 2440:
+ /*
+ * if the input hstore is empty, we can only skip the rest if we were
+ * passed in a non-null record, since otherwise there may be issues with
+ * domain nulls.
+ */

merlin


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-07 19:31:18
Message-ID: CAFj8pRDQEge7kas0NWVSaNn3rgUAzHHZG50fOoKrF8FwuK5Jag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/1/7 Merlin Moncure <mmoncure(at)gmail(dot)com>:
> On Fri, Jan 4, 2013 at 3:03 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> I understand - but hstore isn't in core - so it should not be precedent
>>
>> regexp_split_to_table
>>
>> I am not native speaker, it sounds little bit strange - but maybe
>> because I am not native speaker :)
>
> it's common usage: http://api.jquery.com/jQuery.each/
>

ook

Regards

Pavel

> the patch looks fabulous. There are a few trivial whitespace issues
> yet and I noticed a leaked hstore comment@ 2440:
> + /*
> + * if the input hstore is empty, we can only skip the rest if we were
> + * passed in a non-null record, since otherwise there may be issues with
> + * domain nulls.
> + */
>
> merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-07 22:15:47
Message-ID: 50EB4913.1070003@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/07/2013 10:25 AM, Merlin Moncure wrote:

>
> the patch looks fabulous. There are a few trivial whitespace issues
> yet and I noticed a leaked hstore comment@ 2440:
> + /*
> + * if the input hstore is empty, we can only skip the rest if we were
> + * passed in a non-null record, since otherwise there may be issues with
> + * domain nulls.
> + */
>

Here is a patch that has all the functionality I'm intending to provide.

The API is improved some, with the parser now keeping track of the
nesting levels instead of callers having to do so, and a constructor
function provided for JsonLexContext objects.

The processing functions have been extended to provide populate_record()
and populate_recordset() functions.The latter in particular could be
useful in decomposing a piece of json representing an array of flat
objects (a fairly common pattern) into a set of Postgres records in a
single pass.

The main thing I'm going to concentrate on now is making sure that this
doesn't leak memory. I'm sure there's some tightening required in that
area. Any eyeballs there will be greatly appreciated. There are also a
couple of very minor things to clean up.

You (Merlin) have kindly volunteered to work on documentation, so before
we go too far with that any bikeshedding on names, or on the
functionality being provided, should now take place.

cheers

andrew

Attachment Content-Type Size
jsonapi3.patch text/x-patch 106.0 KB

From: james <james(at)mansionfamily(dot)plus(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 06:45:50
Message-ID: 50EBC09E.8070303@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The processing functions have been extended to provide populate_record() and populate_recordset() functions.The latter in particular could be useful in decomposing a piece of json representing an array of flat objects (a fairly common pattern) into a set of Postgres records in a single pass.

So this would allow an 'insert into ... select ... from
<unpack-the-JSON>(...)'?

I had been wondering how to do such an insertion efficiently in the
context of SPI, but it seems that there is no SPI_copy equiv that would
allow a query parse and plan to be avoided.

Is this mechanism likely to be as fast as we can get at the moment in
contexts where copy is not feasible?


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: james(at)mansionfamily(dot)plus(dot)com
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 14:58:51
Message-ID: 50EC342B.2000106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2013 01:45 AM, james wrote:
>> The processing functions have been extended to provide
>> populate_record() and populate_recordset() functions.The latter in
>> particular could be useful in decomposing a piece of json
>> representing an array of flat objects (a fairly common pattern) into
>> a set of Postgres records in a single pass.
>
> So this would allow an 'insert into ... select ... from
> <unpack-the-JSON>(...)'?

Yes.

>
> I had been wondering how to do such an insertion efficiently in the
> context of SPI, but it seems that there is no SPI_copy equiv that
> would allow a query parse and plan to be avoided.

Your query above would need to be planned too, although the plan will be
trivial.

>
> Is this mechanism likely to be as fast as we can get at the moment in
> contexts where copy is not feasible?
>

You should not try to use it as a general bulk load facility. And it
will not be as fast as COPY for several reasons, including that the Json
parsing routines are necessarily much heavier than the COPY parse
routines, which have in any case been optimized over quite a long
period. Also, a single json datum is limited to no more than 1Gb. If you
have such a datum, parsing it involves having it in memory and then
taking a copy (I wonder if we could avoid that step - will take a look).
Then each object is decomposed into a hash table of key value pairs,
which it then used to construct the record datum. Each field name in
the result record is used to look up the value in the hash table - this
happens once in the case of populate_record() and once per object in the
array in the case of populate_recordset(). In the latter case the
resulting records are put into a tuplestore structure (which spills to
disk if necessary) which is then returned to the caller when all the
objects in the json array are processed. COPY doesn't have these sorts
of issues. It knows without having to look things up where each datum is
in each record, and it stashes the result straight into the target
table. It can read and insert huge numbers of rows without significant
memory implications.

Both these routines and COPY in non-binary mode use the data type input
routines to convert text values. In some cases (very notably timestamps)
these routines can easily be shown to be fantastically expensive
compared to binary input. This is part of what has led to the creation
of utilities like pg_bulkload.

Perhaps if you give us a higher level view of what you're trying to
achieve we can help you better.

cheers

andrew


From: james <james(at)mansionfamily(dot)plus(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 20:07:12
Message-ID: 50EC7C70.5010104@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>
>> I had been wondering how to do such an insertion efficiently in the context of SPI, but it seems that there is no SPI_copy equiv that would allow a query parse and plan to be avoided.
>
> Your query above would need to be planned too, although the plan will be trivial.

Ah yes, I meant that I had not found a way to avoid it (for multi-row
inserts etc) from a stored proc context where I have SPI functions
available.

> You should not try to use it as a general bulk load facility. And it will not be as fast as COPY for several reasons, including that the Json parsing routines are necessarily much heavier than the COPY parse routines, which have in any case been optimized over quite a long period. Also, a single json datum is limited to no more than 1Gb. If you have such a datum, parsing it involves having it in memory and then taking a copy (I wonder if we could avoid that step - will take a look). Then each object is decomposed into a hash table of key value pairs, which it then used to construct the record datum. Each field name in the result record is used to look up the value in the hash table - this happens once in the case of populate_record() and once per object in the array in the case of populate_recordset(). In the latter case the resulting records are put into a tuplestore structure (which spills to disk if necessary) which is then returned to the caller when all the objects in
the js
on array are processed. COPY doesn't have these sorts of issues. It knows without having to look things up where each datum is in each record, and it stashes the result straight into the target table. It can read and insert huge numbers of rows without significant memory implications.

Yes - but I don't think I can use COPY from a stored proc context can I?
If I could use binary COPY from a stored proc that has received a
binary param and unpacked to the data, it would be handy.

If SPI provided a way to perform a copy to a temp table and then some
callback on an iterator that yields rows to it, that would do the trick
I guess.

> Perhaps if you give us a higher level view of what you're trying to achieve we can help you better.

I had been trying to identify a way to work with record sets where the
records might be used for insert, or for updates or deletion statements,
preferably without forming a large custom SQL statement that must then
be parsed and planned (and which would be a PITA if I wanted to use the
SQL-C preprocessor or some language bindings that like to prepare a
statement and execute with params).

The data I work with has a master-detail structure and insertion
performance matters, so I'm trying to limit manipulations to one
statement per table per logical operation even where there are multiple
detail rows.

Sometimes the network latency can be a pain too and that also suggests
an RPC with unpack and insert locally.

Cheers
James


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: james(at)mansionfamily(dot)plus(dot)com
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 20:12:44
Message-ID: 50EC7DBC.8020806@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2013 09:58 AM, Andrew Dunstan wrote:
>
> If you have such a datum, parsing it involves having it in memory and
> then taking a copy (I wonder if we could avoid that step - will take a
> look).

Here is a Proof Of Concept patch against my development tip on what's
involved in getting the JSON lexer not to need a nul-terminated string
to parse. This passes regression, incidentally. The downside is that
processing is very slightly more complex, and that json_in() would need
to call strlen() on its input. The upside would be that the processing
routines I've been working on would no longer need to create copies of
their json arguments using text_to_cstring() just so they can get a
null-terminated string to process.

Consequent changes would modify the signature of makeJsonLexContext() so
it's first argument would be a text* instead of a char* (and of course
its logic would change accordingly).

I could go either way. Thoughts?

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: james(at)mansionfamily(dot)plus(dot)com
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 20:22:13
Message-ID: 50EC7FF5.9010403@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2013 03:12 PM, Andrew Dunstan wrote:
>
> On 01/08/2013 09:58 AM, Andrew Dunstan wrote:
>>
>> If you have such a datum, parsing it involves having it in memory and
>> then taking a copy (I wonder if we could avoid that step - will take
>> a look).
>
>
> Here is a Proof Of Concept patch against my development tip on what's
> involved in getting the JSON lexer not to need a nul-terminated string
> to parse. This passes regression, incidentally. The downside is that
> processing is very slightly more complex, and that json_in() would
> need to call strlen() on its input. The upside would be that the
> processing routines I've been working on would no longer need to
> create copies of their json arguments using text_to_cstring() just so
> they can get a null-terminated string to process.
>
> Consequent changes would modify the signature of makeJsonLexContext()
> so it's first argument would be a text* instead of a char* (and of
> course its logic would change accordingly).
>
> I could go either way. Thoughts?
>
>

this time with patch ...

Attachment Content-Type Size
jsonparser.patch text/x-patch 6.6 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: james(at)mansionfamily(dot)plus(dot)com
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 20:26:36
Message-ID: 50EC80FC.1040706@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2013 03:07 PM, james wrote:
>
>
> Yes - but I don't think I can use COPY from a stored proc context can
> I? If I could use binary COPY from a stored proc that has received a
> binary param and unpacked to the data, it would be handy.
>

You can use COPY from a stored procedure, but only to and from files.

> If SPI provided a way to perform a copy to a temp table and then some
> callback on an iterator that yields rows to it, that would do the
> trick I guess.

SPI is useful, but it's certainly possible to avoid its use. After all,
that what almost the whole backend does, including the COPY code. Of
course, it's a lot harder to write that way, which is part of why SPI
exists. Efficiency has its price.

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 21:19:06
Message-ID: 50EC8D4A.9060308@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/7/13 5:15 PM, Andrew Dunstan wrote:
> You (Merlin) have kindly volunteered to work on documentation, so before
> we go too far with that any bikeshedding on names, or on the
> functionality being provided, should now take place.

Hmm, I was going to say, this patch contains no documentation, so I have
no idea what it is supposed to do. "Recently discussed" isn't a good
substitute for describing what the patch is supposed to accomplish.


From: james <james(at)mansionfamily(dot)plus(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 21:31:23
Message-ID: 50EC902B.70205@mansionfamily.plus.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> You can use COPY from a stored procedure, but only to and from files.

I think that's in the chocolate fireguard realm though as far as
efficiency for this sort of scenario goes, even if its handled by
retaining an mmap'd file as workspace.

>
>> If SPI provided a way to perform a copy to a temp table and then some callback on an iterator that yields rows to it, that would do the trick I guess.
>
> SPI is useful, but it's certainly possible to avoid its use. After all, that what almost the whole backend does, including the COPY code. Of course, it's a lot harder to write that way, which is part of why SPI exists. Efficiency has its price.

So it is possible to use a lower level interface from a C stored proc?
SPI is the (only) documented direct function extension API isn't it?

Is the issue with using the JSON data-to-record set that the parsing can
be costly? Perhaps it can be achieved with B64 of compressed protobuf,
or such. I don't mind if it seems a bit messy - the code can be
generated from the table easily enough, especially if I can use C++. I
guess an allocator that uses SPI_palloc would solve issues with memory
management on error?


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 21:32:46
Message-ID: CAHyXU0wWHezyxAK_HBQX-R5BeEXAzr+R5sQ57z2vyQwDWijkKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On 1/7/13 5:15 PM, Andrew Dunstan wrote:
>> You (Merlin) have kindly volunteered to work on documentation, so before
>> we go too far with that any bikeshedding on names, or on the
>> functionality being provided, should now take place.
>
> Hmm, I was going to say, this patch contains no documentation, so I have
> no idea what it is supposed to do. "Recently discussed" isn't a good
> substitute for describing what the patch is supposed to accomplish.

Why not? There are functional examples in the docs and the purpose of
the various functions was hashed out pretty well a couple weeks back,
deficiencies corrected, etc.

reference: http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-08 22:01:00
Message-ID: 50EC971C.3040003@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/08/2013 04:32 PM, Merlin Moncure wrote:
> On Tue, Jan 8, 2013 at 3:19 PM, Peter Eisentraut<peter_e(at)gmx(dot)net> wrote:
>> On 1/7/13 5:15 PM, Andrew Dunstan wrote:
>>> You (Merlin) have kindly volunteered to work on documentation, so before
>>> we go too far with that any bikeshedding on names, or on the
>>> functionality being provided, should now take place.
>> Hmm, I was going to say, this patch contains no documentation, so I have
>> no idea what it is supposed to do. "Recently discussed" isn't a good
>> substitute for describing what the patch is supposed to accomplish.
> Why not? There are functional examples in the docs and the purpose of
> the various functions was hashed out pretty well a couple weeks back,
> deficiencies corrected, etc.
>
> reference:http://postgresql.1045698.n5.nabble.com/json-accessors-td5733929.html

Well, at a high level the patch is meant to do two things: provide an
API that can be used to build JSON processing functions easily, and
provide some basic json processing functions built on the API. Those
functions provide similar capabilities to the accessor functions that
hstore has.

Perhaps also this will help. Here is the list of functions and operators
as currently implemented. I also have working operators for the get_path
functions which will be in a future patch.

All these are used in the included regression tests.

Name | Result data type | Argument data types

-------------------------+------------------+------------------------------------------------------------------------

json_array_length | integer | json

json_each | SETOF record | from_json json, OUT key text, OUT value json

json_each_as_text | SETOF record | from_json json, OUT key text, OUT value text

json_get | json | json, integer

json_get | json | json, text

json_get_as_text | text | json, integer

json_get_as_text | text | json, text

json_get_path | json | from_json json, VARIADIC path_elems text[]

json_get_path_as_text | text | from_json json, VARIADIC path_elems text[]

json_object_keys | SETOF text | json

json_populate_record | anyelement | anyelement, json

json_populate_recordset | SETOF anyelement | base anyelement, from_json json, use_json_as_text boolean DEFAULT false

json_unnest | SETOF json | from_json json, OUT value json

Name | Left arg type | Right arg type | Result type | Description

------+---------------+----------------+-------------+--------------------------------

-> | json | integer | json | get json array element

-> | json | text | json | get json object field

->> | json | integer | text | get json array element as text

->> | json | text | text | get json object field as text

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-10 20:01:21
Message-ID: CAHyXU0xs=K1ksHXVDiRxKQE_Z0cXNj548WfbFsa3qpm1-1RLiw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 7, 2013 at 4:15 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> You (Merlin) have kindly volunteered to work on documentation, so before we
> go too far with that any bikeshedding on names, or on the functionality
> being provided, should now take place.

Barring comment/complaint, I'm just going to roll with what we've got.
It seems pretty good to me.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-10 23:42:09
Message-ID: 50EF51D1.7020507@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/04/2013 03:23 PM, Andrew Dunstan wrote:
>
> On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>>
>> On 01/02/2013 04:45 PM, Robert Haas wrote:
>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan
>>> <andrew(at)dunslane(dot)net> wrote:
>>>> Here is a patch for the first part of the JSON API that was recently
>>>> discussed. It includes the json parser hook infrastructure and
>>>> functions
>>>> for json_get and friends, plus json_keys.
>
>
> Udated patch that contains most of the functionality I'm after. One
> piece left is populate_recordset (populate a set of records from a
> single json datum which is an array of objects, in one pass). That
> requires a bit of thought.
>
> I hope most of the whitespace issues are fixed.
>
>

This updated patch contains all the intended functionality, including
operators for the json_get_path functions, so you can say things like

select jsonval->array['f1','0','f2] ...

It also removes any requirement to copy the json value before setting up
the lexer by removing the lexer's requirement to have a nul terminated
string. Instead the lexer is told the input length and relies on that.
For this reason, json_in() now calls cstring_get_text() before rather
than after calling the validation routine, but that's really not
something worth bothering about.

A couple of points worth noting: it's a pity that we have to run CREATE
OR REPLACE FUNCTION in system_views.sql in order to set up default
values for builtin functions. That feels very kludgy. Also, making
operators for variadic functions is a bit of a pain. I had to set up
non-variadic version of the same functions (see json_get_path_op and
json_get_path_as_text_op) just so I could set up the operators. Neither
of these are exactly showstopper items, just mild annoyances.

I will continue hunting memory leaks, but when Merlin gets done with
docco I think we'll be far enough advanced to add this to the commitfest.

cheers

andrew

Attachment Content-Type Size
jsonapi4.patch text/x-patch 113.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-14 16:32:49
Message-ID: CA+TgmoaSvdKOt=uysy8bAWjigsh4iLp+bgJZLcorcFZTdimSbQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 10, 2013 at 6:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Udated patch that contains most of the functionality I'm after. One piece
>> left is populate_recordset (populate a set of records from a single json
>> datum which is an array of objects, in one pass). That requires a bit of
>> thought.
>>
>> I hope most of the whitespace issues are fixed.
>
>
> This updated patch contains all the intended functionality, including
> operators for the json_get_path functions, so you can say things like
>
> select jsonval->array['f1','0','f2] ...
>
> It also removes any requirement to copy the json value before setting up the
> lexer by removing the lexer's requirement to have a nul terminated string.
> Instead the lexer is told the input length and relies on that. For this
> reason, json_in() now calls cstring_get_text() before rather than after
> calling the validation routine, but that's really not something worth
> bothering about.
>
> A couple of points worth noting: it's a pity that we have to run CREATE OR
> REPLACE FUNCTION in system_views.sql in order to set up default values for
> builtin functions. That feels very kludgy. Also, making operators for
> variadic functions is a bit of a pain. I had to set up non-variadic version
> of the same functions (see json_get_path_op and json_get_path_as_text_op)
> just so I could set up the operators. Neither of these are exactly
> showstopper items, just mild annoyances.
>
> I will continue hunting memory leaks, but when Merlin gets done with docco I
> think we'll be far enough advanced to add this to the commitfest.

So, how much performance does this lose on json_in() on a large
cstring, as compared with master?

I can't shake the feeling that this is adding a LOT of unnecessary
data copying. For one thing, instead of copying every single lexeme
(including the single-character ones?) out of the original object, we
could just store a pointer to the offset where the object starts and a
length, instead of copying it.

This is also remarkably thin on comments.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-14 17:52:11
Message-ID: 50F445CB.90905@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/14/2013 11:32 AM, Robert Haas wrote:
>
> So, how much performance does this lose on json_in() on a large
> cstring, as compared with master?

That's a good question. I'll try to devise a test.

>
> I can't shake the feeling that this is adding a LOT of unnecessary
> data copying. For one thing, instead of copying every single lexeme
> (including the single-character ones?) out of the original object, we
> could just store a pointer to the offset where the object starts and a
> length, instead of copying it.

In the pure pares case (json_in, json_reccv) nothing extra should be
copied. On checking this after reading the above I found that wasn't
quite the case, and some lexemes (scalars and field names, but not
punctuation) were being copied when not needed. I have made a fix (see
<https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
which I will include in the next version I publish.

In the case of string lexemes, we are passing back a de-escaped version,
so just handing back pointers to the beginning and end in the input
string doesn't work.

>
> This is also remarkably thin on comments.

Fair criticism. I'll work on that.

Thanks for looking at this.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 00:36:00
Message-ID: CAHyXU0watb0hOpF0DQ1v5Hx2TX=406RS6+OoszZG1Vc88jF85g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 10, 2013 at 5:42 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 01/04/2013 03:23 PM, Andrew Dunstan wrote:
>>
>>
>> On 01/02/2013 05:51 PM, Andrew Dunstan wrote:
>>>
>>>
>>> On 01/02/2013 04:45 PM, Robert Haas wrote:
>>>>
>>>> On Wed, Dec 26, 2012 at 3:33 PM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>> wrote:
>>>>>
>>>>> Here is a patch for the first part of the JSON API that was recently
>>>>> discussed. It includes the json parser hook infrastructure and
>>>>> functions
>>>>> for json_get and friends, plus json_keys.
>>
>>
>>
>> Udated patch that contains most of the functionality I'm after. One piece
>> left is populate_recordset (populate a set of records from a single json
>> datum which is an array of objects, in one pass). That requires a bit of
>> thought.
>>
>> I hope most of the whitespace issues are fixed.
>>
>>
>
>
> This updated patch contains all the intended functionality, including
> operators for the json_get_path functions, so you can say things like
>
> select jsonval->array['f1','0','f2] ...
>
> It also removes any requirement to copy the json value before setting up the
> lexer by removing the lexer's requirement to have a nul terminated string.
> Instead the lexer is told the input length and relies on that. For this
> reason, json_in() now calls cstring_get_text() before rather than after
> calling the validation routine, but that's really not something worth
> bothering about.
>
> A couple of points worth noting: it's a pity that we have to run CREATE OR
> REPLACE FUNCTION in system_views.sql in order to set up default values for
> builtin functions. That feels very kludgy. Also, making operators for
> variadic functions is a bit of a pain. I had to set up non-variadic version
> of the same functions (see json_get_path_op and json_get_path_as_text_op)
> just so I could set up the operators. Neither of these are exactly
> showstopper items, just mild annoyances.
>
> I will continue hunting memory leaks, but when Merlin gets done with docco I
> think we'll be far enough advanced to add this to the commitfest.

While testing this I noticed that integer based 'get' routines are
zero based -- was this intentional? Virtually all other aspects of
SQL are 1 based:

postgres=# select json_get('[1,2,3]', 1);
json_get
----------
2
(1 row)

postgres=# select json_get('[1,2,3]', 0);
json_get
----------
1
(1 row)

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 00:52:56
Message-ID: 50F4A868.2080303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/14/2013 07:36 PM, Merlin Moncure wrote:
> While testing this I noticed that integer based 'get' routines are
> zero based -- was this intentional? Virtually all other aspects of
> SQL are 1 based:
>
> postgres=# select json_get('[1,2,3]', 1);
> json_get
> ----------
> 2
> (1 row)
>
> postgres=# select json_get('[1,2,3]', 0);
> json_get
> ----------
> 1
> (1 row)
>

Yes. it's intentional. SQL arrays might be 1-based by default, but
JavaScript arrays are not. JsonPath and similar gadgets treat the arrays
as zero-based. I suspect the Json-using community would not thank us for
being overly SQL-centric on this - and I say that as someone who has
always thought zero based arrays were a major design mistake,
responsible for countless off-by-one errors.

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 04:02:44
Message-ID: 50F4D4E4.70604@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
>
> On 01/14/2013 11:32 AM, Robert Haas wrote:
>>
>> So, how much performance does this lose on json_in() on a large
>> cstring, as compared with master?
>
> That's a good question. I'll try to devise a test.
>
>>
>> I can't shake the feeling that this is adding a LOT of unnecessary
>> data copying. For one thing, instead of copying every single lexeme
>> (including the single-character ones?) out of the original object, we
>> could just store a pointer to the offset where the object starts and a
>> length, instead of copying it.
>
> In the pure pares case (json_in, json_reccv) nothing extra should be
> copied. On checking this after reading the above I found that wasn't
> quite the case, and some lexemes (scalars and field names, but not
> punctuation) were being copied when not needed. I have made a fix (see
> <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
> which I will include in the next version I publish.
>
> In the case of string lexemes, we are passing back a de-escaped
> version, so just handing back pointers to the beginning and end in the
> input string doesn't work.

After a couple of iterations, some performance enhancements to the json
parser and lexer have ended up with a net performance improvement over
git tip. On our test rig, the json parse test runs at just over 13s per
10000 parses on git tip and approx 12.55s per 10000 parses with the
attached patch.

Truth be told, I think the lexer changes have more than paid for the
small cost of the switch to an RD parser. But since the result is a net
performance win PLUS some enhanced functionality, I think we should be
all good.

cheers

andrew

Attachment Content-Type Size
current_jsonapi.patch text/x-patch 107.3 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 16:31:28
Message-ID: 50F58460.6060803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/14/2013 11:02 PM, Andrew Dunstan wrote:
>
> On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
>>
>> On 01/14/2013 11:32 AM, Robert Haas wrote:
>>>
>>> So, how much performance does this lose on json_in() on a large
>>> cstring, as compared with master?
>>
>> That's a good question. I'll try to devise a test.
>>
>>>
>>> I can't shake the feeling that this is adding a LOT of unnecessary
>>> data copying. For one thing, instead of copying every single lexeme
>>> (including the single-character ones?) out of the original object, we
>>> could just store a pointer to the offset where the object starts and a
>>> length, instead of copying it.
>>
>> In the pure pares case (json_in, json_reccv) nothing extra should be
>> copied. On checking this after reading the above I found that wasn't
>> quite the case, and some lexemes (scalars and field names, but not
>> punctuation) were being copied when not needed. I have made a fix
>> (see
>> <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
>> which I will include in the next version I publish.
>>
>> In the case of string lexemes, we are passing back a de-escaped
>> version, so just handing back pointers to the beginning and end in
>> the input string doesn't work.
>
>
> After a couple of iterations, some performance enhancements to the
> json parser and lexer have ended up with a net performance improvement
> over git tip. On our test rig, the json parse test runs at just over
> 13s per 10000 parses on git tip and approx 12.55s per 10000 parses
> with the attached patch.
>
> Truth be told, I think the lexer changes have more than paid for the
> small cost of the switch to an RD parser. But since the result is a
> net performance win PLUS some enhanced functionality, I think we
> should be all good.
>
>

Latest version of this patch, including some documentation, mainly from
Merlin Moncure but tweaked by me.

cheers

andrew

Attachment Content-Type Size
jsonapi5.patchx text/plain 127.2 KB

From: David Fetter <david(at)fetter(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 19:04:24
Message-ID: 20130115190424.GB32407@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>
> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
> >While testing this I noticed that integer based 'get' routines are
> >zero based -- was this intentional? Virtually all other aspects of
> >SQL are 1 based:
> >
> >postgres=# select json_get('[1,2,3]', 1);
> > json_get
> >----------
> > 2
> >(1 row)
> >
> >postgres=# select json_get('[1,2,3]', 0);
> > json_get
> >----------
> > 1
> >(1 row)
>
> Yes. it's intentional. SQL arrays might be 1-based by default, but
> JavaScript arrays are not. JsonPath and similar gadgets treat the
> arrays as zero-based. I suspect the Json-using community would not
> thank us for being overly SQL-centric on this - and I say that as
> someone who has always thought zero based arrays were a major design
> mistake, responsible for countless off-by-one errors.

Perhaps we could compromise by making arrays 0.5-based.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 19:47:23
Message-ID: CAHyXU0y6GDcTxqD+jRHJ5M39kNZbDNEUXCA1-Mc_HHbL+ebqhQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david(at)fetter(dot)org> wrote:
> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>
>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>> >While testing this I noticed that integer based 'get' routines are
>> >zero based -- was this intentional? Virtually all other aspects of
>> >SQL are 1 based:
>> >
>> >postgres=# select json_get('[1,2,3]', 1);
>> > json_get
>> >----------
>> > 2
>> >(1 row)
>> >
>> >postgres=# select json_get('[1,2,3]', 0);
>> > json_get
>> >----------
>> > 1
>> >(1 row)
>>
>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>> arrays as zero-based. I suspect the Json-using community would not
>> thank us for being overly SQL-centric on this - and I say that as
>> someone who has always thought zero based arrays were a major design
>> mistake, responsible for countless off-by-one errors.
>
> Perhaps we could compromise by making arrays 0.5-based.

Well, I'm not prepared to argue with Andrew in this one. It was
surprising behavior to me, but that's sample size one.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Taras Mitran <taras(at)ivc(dot)com>, Joe Van Dyk <joe(at)tanga(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(dot)wheeler(at)pgexperts(dot)com>, Daniel Farina <daniel(at)heroku(dot)com>
Subject: Re: json api WIP patch
Date: 2013-01-15 20:17:10
Message-ID: 50F5B946.50808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/15/2013 02:47 PM, Merlin Moncure wrote:
> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david(at)fetter(dot)org> wrote:
>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>>>> While testing this I noticed that integer based 'get' routines are
>>>> zero based -- was this intentional? Virtually all other aspects of
>>>> SQL are 1 based:
>>>>
>>>> postgres=# select json_get('[1,2,3]', 1);
>>>> json_get
>>>> ----------
>>>> 2
>>>> (1 row)
>>>>
>>>> postgres=# select json_get('[1,2,3]', 0);
>>>> json_get
>>>> ----------
>>>> 1
>>>> (1 row)
>>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>>> arrays as zero-based. I suspect the Json-using community would not
>>> thank us for being overly SQL-centric on this - and I say that as
>>> someone who has always thought zero based arrays were a major design
>>> mistake, responsible for countless off-by-one errors.
>> Perhaps we could compromise by making arrays 0.5-based.
> Well, I'm not prepared to argue with Andrew in this one. It was
> surprising behavior to me, but that's sample size one.
>

I doubt I'm very representative either. People like David Wheeler, Taras
Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask
than me. I'm quite prepared to change it if that's the consensus.

cheers

andrew


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Taras Mitran <taras(at)ivc(dot)com>, Joe Van Dyk <joe(at)tanga(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "David E(dot) Wheeler" <david(dot)wheeler(at)pgexperts(dot)com>
Subject: Re: json api WIP patch
Date: 2013-01-15 21:23:27
Message-ID: CAAZKuFb6Ho7MgiBuDt4trCf_gqh6bChJO46WnUfhbegOJUn66g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jan 15, 2013 at 12:17 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 01/15/2013 02:47 PM, Merlin Moncure wrote:
>>
>> On Tue, Jan 15, 2013 at 1:04 PM, David Fetter <david(at)fetter(dot)org> wrote:
>>>
>>> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>>>>
>>>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>>>>>
>>>>> While testing this I noticed that integer based 'get' routines are
>>>>> zero based -- was this intentional? Virtually all other aspects of
>>>>> SQL are 1 based:
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 1);
>>>>> json_get
>>>>> ----------
>>>>> 2
>>>>> (1 row)
>>>>>
>>>>> postgres=# select json_get('[1,2,3]', 0);
>>>>> json_get
>>>>> ----------
>>>>> 1
>>>>> (1 row)
>>>>
>>>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>>>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>>>> arrays as zero-based. I suspect the Json-using community would not
>>>> thank us for being overly SQL-centric on this - and I say that as
>>>> someone who has always thought zero based arrays were a major design
>>>> mistake, responsible for countless off-by-one errors.
>>>
>>> Perhaps we could compromise by making arrays 0.5-based.
>>
>> Well, I'm not prepared to argue with Andrew in this one. It was
>> surprising behavior to me, but that's sample size one.
>
> I doubt I'm very representative either. People like David Wheeler, Taras
> Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than
> me. I'm quite prepared to change it if that's the consensus.

Hello.

I'm inclined to go with the same gut feeling you had (zero-based-indexing).

Here is the background for my reasoning:

The downside of zero-based-indexing is that people who want to use
multiple sequential container types will inevitably have to deal with
detailed and not easily type-checked integer coordinates that mean
different things in each domain that will, no doubt, lead to a number
of off-by-one errors. Nevertheless, this cost is already paid because
one of the first things many people will do in programs generating SQL
queries is try to zero-index a SQL array, swear a bit after figuring
things out (because a NULL will be generated, not an error), and then
adjust all the offsets. So, this is not a new problem. On many
occasions I'm sure this has caused off-by-one bugs, or the NULLs
slipped through testing and delivered funny results, yet the world
moves on.

On the other hand, the downside of going down the road of 1-based
indexing and attempting to attain relative sameness to SQL arrays, it
would also feel like one would be obliged to implement SQL array
infelicities like 'out of bounds' being SQL NULL rather than an error,
related to other spectres like non-rectangular nested arrays. SQL
array semantics are complex and The Committee can change them or --
slightly more likely -- add interactions, so it seems like a general
expectation that Postgres container types that happen to have any
reasonable ordinal addressing will implement some level of same-ness
with SQL arrays is a very messy one. As such, if it becomes customary
to implement one-based indexing of containers, I think such customs
are best carefully circumscribed so that attempts to be 'like' SQL
arrays are only as superficial as that.

What made me come down on the side of zero-based indexing in spite of
the weaknesses are these two reasons:

* The number of people who use JSON and zero-based-indexing is very
large, and furthermore, within that set the number that know that
SQL even defines array support -- much less that Postgres implements
it -- is much smaller. Thus, one is targeting cohesion with a fairly
alien concept that is not understood by the audience.

* Maintaining PL integrated code that uses both 1-based indexing in PG
functions and 0-based indexing in embedded languages that are likely
to be combined with JSON -- doesn't sound very palatable, and the
use of such PLs (e.g. plv8) seems pretty likely, too. That can
probably be a rich source of bugs and frustration.

If one wants SQL array semantics, it seems like the right way to get
them is coercion to a SQL array value. Then one will receive SQL
array semantics exactly.

--
fdr


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 21:34:38
Message-ID: 50F5CB6E.9070400@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16/01/13 08:04, David Fetter wrote:
> On Mon, Jan 14, 2013 at 07:52:56PM -0500, Andrew Dunstan wrote:
>> On 01/14/2013 07:36 PM, Merlin Moncure wrote:
>>> While testing this I noticed that integer based 'get' routines are
>>> zero based -- was this intentional? Virtually all other aspects of
>>> SQL are 1 based:
>>>
>>> postgres=# select json_get('[1,2,3]', 1);
>>> json_get
>>> ----------
>>> 2
>>> (1 row)
>>>
>>> postgres=# select json_get('[1,2,3]', 0);
>>> json_get
>>> ----------
>>> 1
>>> (1 row)
>> Yes. it's intentional. SQL arrays might be 1-based by default, but
>> JavaScript arrays are not. JsonPath and similar gadgets treat the
>> arrays as zero-based. I suspect the Json-using community would not
>> thank us for being overly SQL-centric on this - and I say that as
>> someone who has always thought zero based arrays were a major design
>> mistake, responsible for countless off-by-one errors.
> Perhaps we could compromise by making arrays 0.5-based.
>
> Cheers,
> David.
I think that is far to rational, perhaps the reciprocal of the golden
ratio(0.618033...) would be more appropriate?

I used to be insistent that arrays should start with 1, now I find
starting at 0 far more natural - because evrytime you start an array at
1, the computer has to subtract 1 in order to calculate the entry. Also
both Java & C are zero based.

I first learnt FORTRAN IV which is 1 based, had a shock when I was
learning Algol and found it was 0 based - many moons ago...

Cheers,
Gavin


From: "David E(dot) Wheeler" <david(dot)wheeler(at)pgexperts(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Taras Mitran <taras(at)ivc(dot)com>, Joe Van Dyk <joe(at)tanga(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Farina <daniel(at)heroku(dot)com>
Subject: Re: json api WIP patch
Date: 2013-01-15 21:36:30
Message-ID: F1667CA0-A272-4CC9-80F8-BC015641FC16@pgexperts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 15, 2013, at 12:17 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> I doubt I'm very representative either. People like David Wheeler, Taras Mitran, Joe Van Dyk, and the Heroku guys would be better people to ask than me. I'm quite prepared to change it if that's the consensus.

They’re JSON arrays, not SQL arrays, and JSON arrays are based on JavaScript, where they are 0-based.

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-15 22:45:46
Message-ID: 50F5DC1A.60008@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/15/2013 11:31 AM, Andrew Dunstan wrote:
>
> On 01/14/2013 11:02 PM, Andrew Dunstan wrote:
>>
>> On 01/14/2013 12:52 PM, Andrew Dunstan wrote:
>>>
>>> On 01/14/2013 11:32 AM, Robert Haas wrote:
>>>>
>>>> So, how much performance does this lose on json_in() on a large
>>>> cstring, as compared with master?
>>>
>>> That's a good question. I'll try to devise a test.
>>>
>>>>
>>>> I can't shake the feeling that this is adding a LOT of unnecessary
>>>> data copying. For one thing, instead of copying every single lexeme
>>>> (including the single-character ones?) out of the original object, we
>>>> could just store a pointer to the offset where the object starts and a
>>>> length, instead of copying it.
>>>
>>> In the pure pares case (json_in, json_reccv) nothing extra should be
>>> copied. On checking this after reading the above I found that wasn't
>>> quite the case, and some lexemes (scalars and field names, but not
>>> punctuation) were being copied when not needed. I have made a fix
>>> (see
>>> <https://bitbucket.org/adunstan/pgdevel/commits/139043dba7e6b15f1f9f7675732bd9dae1fb6497>)
>>> which I will include in the next version I publish.
>>>
>>> In the case of string lexemes, we are passing back a de-escaped
>>> version, so just handing back pointers to the beginning and end in
>>> the input string doesn't work.
>>
>>
>> After a couple of iterations, some performance enhancements to the
>> json parser and lexer have ended up with a net performance
>> improvement over git tip. On our test rig, the json parse test runs
>> at just over 13s per 10000 parses on git tip and approx 12.55s per
>> 10000 parses with the attached patch.
>>
>> Truth be told, I think the lexer changes have more than paid for the
>> small cost of the switch to an RD parser. But since the result is a
>> net performance win PLUS some enhanced functionality, I think we
>> should be all good.
>>
>>
>
> Latest version of this patch, including some documentation, mainly
> from Merlin Moncure but tweaked by me.
>

Now with more comments.

cheers

andrew

Attachment Content-Type Size
jsonapi6.patch text/x-patch 136.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-16 15:28:49
Message-ID: CA+TgmobH0G7EYsuCVGRzvmmXTCAXzhhiYheU3BS6kHfLAU0aXQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 14, 2013 at 11:02 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> After a couple of iterations, some performance enhancements to the json
> parser and lexer have ended up with a net performance improvement over git
> tip. On our test rig, the json parse test runs at just over 13s per 10000
> parses on git tip and approx 12.55s per 10000 parses with the attached
> patch.
>
> Truth be told, I think the lexer changes have more than paid for the small
> cost of the switch to an RD parser. But since the result is a net
> performance win PLUS some enhanced functionality, I think we should be all
> good.

Yeah, that sounds great. Thanks for putting in the effort.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-26 15:27:04
Message-ID: 5103F5C8.5090808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/15/2013 05:45 PM, Andrew Dunstan wrote:
>>>
>>
>> Latest version of this patch, including some documentation, mainly
>> from Merlin Moncure but tweaked by me.
>>
>
>
> Now with more comments.
>

New version attached. The only change is to remove some unnecessary uses
of PG_FUNCTION_INFO_V1 that were the result of overenthusiastic copy and
paste.

cheers

andrew

Attachment Content-Type Size
jsonapi7.patch text/x-patch 135.9 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-31 22:06:15
Message-ID: 510AEAD7.5070909@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 1/10/13 6:42 PM, Andrew Dunstan wrote:
> This updated patch contains all the intended functionality, including
> operators for the json_get_path functions, so you can say things like
>
> select jsonval->array['f1','0','f2] ...

I would like to not create any -> operators, so that that syntax could
be used in the future for method invocation or something similar (it's
in the SQL standard).

I also don't find the proposed use to be very intuitive. You invented
lots of other function names -- why not invent a few more for this
purpose that are clearer?


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-31 22:20:13
Message-ID: 510AEE1D.6090608@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
> On 1/10/13 6:42 PM, Andrew Dunstan wrote:
>> This updated patch contains all the intended functionality, including
>> operators for the json_get_path functions, so you can say things like
>>
>> select jsonval->array['f1','0','f2] ...
> I would like to not create any -> operators, so that that syntax could
> be used in the future for method invocation or something similar (it's
> in the SQL standard).

This is the first time I have heard that we should stay away from this.
We have operators with this name in hstore, which is why I chose it.

Have we officially deprecated '->'? I know we deprecated "=>", but I
simply don't recall anything about '->'.

>
> I also don't find the proposed use to be very intuitive. You invented
> lots of other function names -- why not invent a few more for this
> purpose that are clearer?
>
>

I'm happy to take opinions about this, and I expected some bikeshedding,
but your reaction is contrary to everything others have told me. Mostly
they love the operators.

I guess that '~>' and '~>>' would work as well as '->' and '->>'.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-01-31 23:35:47
Message-ID: CAHyXU0w4=aUP=54obduNWXUbF4FusoQYff8KsYrLaXgCj=245Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
>>
>> On 1/10/13 6:42 PM, Andrew Dunstan wrote:
>>>
>>> This updated patch contains all the intended functionality, including
>>> operators for the json_get_path functions, so you can say things like
>>>
>>> select jsonval->array['f1','0','f2] ...
>>
>> I would like to not create any -> operators, so that that syntax could
>> be used in the future for method invocation or something similar (it's
>> in the SQL standard).
>
>
>
> This is the first time I have heard that we should stay away from this. We
> have operators with this name in hstore, which is why I chose it.
>
> Have we officially deprecated '->'? I know we deprecated "=>", but I simply
> don't recall anything about '->'.
>
>
>>
>> I also don't find the proposed use to be very intuitive. You invented
>> lots of other function names -- why not invent a few more for this
>> purpose that are clearer?
>>
>>
>
>
> I'm happy to take opinions about this, and I expected some bikeshedding, but
> your reaction is contrary to everything others have told me. Mostly they
> love the operators.
>
> I guess that '~>' and '~>>' would work as well as '->' and '->>'.

also hstore implements ->

quick off-topic aside: is colon (:) reserved for any purpose as an
operator in SQL?

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:12:03
Message-ID: 8716.1359677523@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
>>> I would like to not create any -> operators, so that that syntax could
>>> be used in the future for method invocation or something similar (it's
>>> in the SQL standard).

>> This is the first time I have heard that we should stay away from this. We
>> have operators with this name in hstore, which is why I chose it.

I'm not happy about this either. It's bad enough that we're thinking
about taking away =>, but to disallow -> as well? My inclination is to
just say no, we're not implementing that. Even if we remove the contrib
operators named that way, it's insane to suppose that nobody has chosen
these names for user-defined operators in their applications.

> quick off-topic aside: is colon (:) reserved for any purpose as an
> operator in SQL?

We disallow it as an operator character, because of the conflict with
parameter/variable syntax in ecpg and psql. It was allowed before
PG 7.0, IIRC.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:16:12
Message-ID: F2E3D046-5FA1-4D84-B395-F4B559AEBE28@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> I'm happy to take opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything others have told me. Mostly they love the operators.
>
> I guess that '~>' and '~>>' would work as well as '->' and '->>'.

Or +> and +>>, since ~ is set very high and small by some fonts (where the fontmakers though of it as a kind of superscript character).

I suppose that := is out of the question?

Best,

David


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:26:58
Message-ID: 510B0BD2.4010000@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 01/31/2013 07:16 PM, David E. Wheeler wrote:
> On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> I'm happy to take opinions about this, and I expected some bikeshedding, but your reaction is contrary to everything others have told me. Mostly they love the operators.
>>
>> I guess that '~>' and '~>>' would work as well as '->' and '->>'.
> Or +> and +>>, since ~ is set very high and small by some fonts (where the fontmakers though of it as a kind of superscript character).
>
> I suppose that := is out of the question?
>

Even if it were I would not on any account use it. As an old Ada
programmer my mind just revolts at the idea of using this for anything
but assignment.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:32:25
Message-ID: 9161.1359678745@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 01/31/2013 07:16 PM, David E. Wheeler wrote:
>> I suppose that := is out of the question?

> Even if it were I would not on any account use it. As an old Ada
> programmer my mind just revolts at the idea of using this for anything
> but assignment.

Ada or no, its use in plpgsql would render that a seriously bad idea.

regards, tom lane


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:38:15
Message-ID: 9619BB27-99C3-43F4-B060-E42293D2553A@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 31, 2013, at 4:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ada or no, its use in plpgsql would render that a seriously bad idea.

I assumed that its use in function params would be the main reason not to use it.

David


From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 00:41:11
Message-ID: 510B0F27.6040508@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/02/13 13:26, Andrew Dunstan wrote:
>
> On 01/31/2013 07:16 PM, David E. Wheeler wrote:
>> On Jan 31, 2013, at 2:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>>> I'm happy to take opinions about this, and I expected some
>>> bikeshedding, but your reaction is contrary to everything others
>>> have told me. Mostly they love the operators.
>>>
>>> I guess that '~>' and '~>>' would work as well as '->' and '->>'.
>> Or +> and +>>, since ~ is set very high and small by some fonts
>> (where the fontmakers though of it as a kind of superscript character).
>>
>> I suppose that := is out of the question?
>>
>
>
> Even if it were I would not on any account use it. As an old Ada
> programmer my mind just revolts at the idea of using this for anything
> but assignment.
>
> cheers
>
> andrew
>
>
Ancient Algol 60 programmer here, otherwise ditto!


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 22:08:36
Message-ID: CA+TgmoYjzaErdbZXUDEgsUkF-o2ixMM7N5NGny8Ct8yBE5Ajdg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>> On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
>>>> I would like to not create any -> operators, so that that syntax could
>>>> be used in the future for method invocation or something similar (it's
>>>> in the SQL standard).
>
>>> This is the first time I have heard that we should stay away from this. We
>>> have operators with this name in hstore, which is why I chose it.
>
> I'm not happy about this either. It's bad enough that we're thinking
> about taking away =>, but to disallow -> as well? My inclination is to
> just say no, we're not implementing that. Even if we remove the contrib
> operators named that way, it's insane to suppose that nobody has chosen
> these names for user-defined operators in their applications.

I think it's smarter for us to ship functions, and let users wrap them
in operators if they so choose. It's not difficult for people who
want it to do it, and that way we dodge this whole mess.

The thing that was really awful about hstore's => operator is that it
was =>(text, text), meaning that if somebody else invented, say,
xstore, they could not do the same thing that hstore did without
colliding with hstore. I think we ought to have an ironclad rule that
any operators introduced in our core distribution for particular types
must have at least one argument of that type. Otherwise, we'll end up
with a free-for-all where everyone tries to grab the "good" operator
names (of which there are only a handful) for their type, and types
we're adding five years from now will be stuck with no operator names
at all, or dumb stuff like ~~~~>.

But even leaving that aside, I'm surprised to hear so many people
dismissing SQL standards compliance so blithely. We've certainly
spent a lot of blood, sweat, and tears on minor standards-compliance
issues over they years - why is it OK to not care about this
particular issue when we've spent so much effort caring about other
ones?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 22:39:07
Message-ID: CAAZKuFY9J_0dgiE0F9SGrLcyCHgLsKk=oReEX3xPwcKTdcGAug@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jan 31, 2013 at 7:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
>>> On Thu, Jan 31, 2013 at 4:20 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>>> On 01/31/2013 05:06 PM, Peter Eisentraut wrote:
>>>>> I would like to not create any -> operators, so that that syntax could
>>>>> be used in the future for method invocation or something similar (it's
>>>>> in the SQL standard).
>>
>>>> This is the first time I have heard that we should stay away from this. We
>>>> have operators with this name in hstore, which is why I chose it.
>>
>> I'm not happy about this either. It's bad enough that we're thinking
>> about taking away =>, but to disallow -> as well? My inclination is to
>> just say no, we're not implementing that. Even if we remove the contrib
>> operators named that way, it's insane to suppose that nobody has chosen
>> these names for user-defined operators in their applications.
>
> I think it's smarter for us to ship functions, and let users wrap them
> in operators if they so choose. It's not difficult for people who
> want it to do it, and that way we dodge this whole mess.

Normally I'd agree with you, but I think there's a complexity here
that is very frown-inducing, although I'm not positively inclined to
state that it's so great that your suggested solution is not the least
of all evils:

http://www.postgresql.org/message-id/8551.1331580169@sss.pgh.pa.us

The problem being: even though pg_operator resolves to functions in
pg_proc, they have distinct identities as far as the planner is
concerned w.r.t selectivity estimation and index selection. Already
there is a slight hazard that some ORMs that want to grow hstore
support will spell it "fetchval" and others "->". So far, infix
syntax seems to be the common default, but a minor disagreement among
ORMs or different user preferences will be destructive.

Another way to look at this is that by depriving people multiple
choices in the default install, that hazard goes away. But it also
means that, practically, CREATE OPERATOR is going to be hazardous to
use because almost all software is probably not going to assume the
existence of any non-default installed operators for JSON, and those
that do will not receive the benefit of indexes from software using
the other notation. So, I think that if one takes the 'when in doubt,
leave it out' approach you seem to be proposing, I also think that
very little profitable use of CREATE OPERATOR will take place -- ORMs
et al will choose the lowest common denominator (for good sensible
reason) and flirting with CREATE OPERATOR will probably cause
surprising plans, so I doubt it'll take hold.

--
fdr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-01 23:03:46
Message-ID: 25624.1359759826@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Daniel Farina <daniel(at)heroku(dot)com> writes:
> On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> I think it's smarter for us to ship functions, and let users wrap them
>> in operators if they so choose. It's not difficult for people who

> The problem being: even though pg_operator resolves to functions in
> pg_proc, they have distinct identities as far as the planner is
> concerned w.r.t selectivity estimation and index selection.

Yeah, this is surely not a workable policy unless we first move all
those planner smarts to apply to functions not operators. And rewrite
all the index AM APIs to use functions not operators, too. Now this is
something that's been a wish-list item right along, but actually doing
it has always looked like a great deal of work for rather small reward.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 01:20:01
Message-ID: CA+TgmoZxK+uFU_7d5CQTipVqjBRrb67Mur+5-jVdo6qUA-XJSA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Daniel Farina <daniel(at)heroku(dot)com> writes:
>> On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> I think it's smarter for us to ship functions, and let users wrap them
>>> in operators if they so choose. It's not difficult for people who
>
>> The problem being: even though pg_operator resolves to functions in
>> pg_proc, they have distinct identities as far as the planner is
>> concerned w.r.t selectivity estimation and index selection.
>
> Yeah, this is surely not a workable policy unless we first move all
> those planner smarts to apply to functions not operators. And rewrite
> all the index AM APIs to use functions not operators, too. Now this is
> something that's been a wish-list item right along, but actually doing
> it has always looked like a great deal of work for rather small reward.

Hmm. Well, if the operators are going to be indexable, then I agree
that's an issue, but isn't -> just a key-extraction operator? That
wouldn't be something you could index anyway.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 02:05:22
Message-ID: 510F1762.4030009@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/03/2013 08:20 PM, Robert Haas wrote:
> On Fri, Feb 1, 2013 at 6:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Daniel Farina <daniel(at)heroku(dot)com> writes:
>>> On Fri, Feb 1, 2013 at 2:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>>> I think it's smarter for us to ship functions, and let users wrap them
>>>> in operators if they so choose. It's not difficult for people who
>>> The problem being: even though pg_operator resolves to functions in
>>> pg_proc, they have distinct identities as far as the planner is
>>> concerned w.r.t selectivity estimation and index selection.
>> Yeah, this is surely not a workable policy unless we first move all
>> those planner smarts to apply to functions not operators. And rewrite
>> all the index AM APIs to use functions not operators, too. Now this is
>> something that's been a wish-list item right along, but actually doing
>> it has always looked like a great deal of work for rather small reward.
> Hmm. Well, if the operators are going to be indexable, then I agree
> that's an issue, but isn't -> just a key-extraction operator? That
> wouldn't be something you could index anyway.
>

Er, what? It gives you the value corresponding to a key (or the numbered
array element).

With the Json operators I provided you're more likely to use ->> in an
index, because it returns de-escaped text rather than json, but I don't
see any reason in principle why -> couldn't be used.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 14:18:04
Message-ID: CAHyXU0zgokK_U=cjKqruTtdjuk72DxkszBkSHpWxEDf=Ck=u6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 1, 2013 at 4:08 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> But even leaving that aside, I'm surprised to hear so many people
> dismissing SQL standards compliance so blithely. We've certainly
> spent a lot of blood, sweat, and tears on minor standards-compliance
> issues over they years - why is it OK to not care about this
> particular issue when we've spent so much effort caring about other
> ones?

Does the SQL Standard suggest you can't extend the language with
operators? Or does it reserve certain characters for future use? And
if so, is there a list?

merlin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 15:47:56
Message-ID: CA+TgmoY78=Z-XspsZ7WCVbGPOHs=HsgbyeUcfFoCjEezN_YpkA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 3, 2013 at 9:05 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> Yeah, this is surely not a workable policy unless we first move all
>>> those planner smarts to apply to functions not operators. And rewrite
>>> all the index AM APIs to use functions not operators, too. Now this is
>>> something that's been a wish-list item right along, but actually doing
>>> it has always looked like a great deal of work for rather small reward.
>>
>> Hmm. Well, if the operators are going to be indexable, then I agree
>> that's an issue, but isn't -> just a key-extraction operator? That
>> wouldn't be something you could index anyway.
>
> Er, what? It gives you the value corresponding to a key (or the numbered
> array element).

That's what I figured.

> With the Json operators I provided you're more likely to use ->> in an
> index, because it returns de-escaped text rather than json, but I don't see
> any reason in principle why -> couldn't be used.

The point is that if you're talking about indexing something like
myeav->'andrew' you could equally well index json_get(myeav,
'andrew'). So there's no real need for it to be an operator rather
than a function.

The case in which it would matter is if it were something that could
be used as an index predicate, like:

Index Scan
-> Index Cond: myeav->'andrew'

As of now, the query planner won't consider such a plan if it's only a
function and not an operator. So if we had a case like that, the use
of operator notation could be justified on performance grounds. If we
don't, I argue that it's better to stick with functional notation,
because the number of sensible function names is much larger than the
number of sensible operator names, and if we start using operator
notation every time someone thinks it will look nicer that way, we
will very quickly either run out of nice-looking operator names or
start overloading them heavily.

The SQL standards considerations seem worth thinking about, too.
We've certainly gone through a lot of pain working toward eliminating
=> as an operator name, and if the SQL standard has commandeered ->
for some purpose or other, I'd really rather not add to the headaches
involved should we ever decide to reclaim it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 16:10:47
Message-ID: 510FDD87.9050305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/04/2013 10:47 AM, Robert Haas wrote:
>
> The SQL standards considerations seem worth thinking about, too.
> We've certainly gone through a lot of pain working toward eliminating
> => as an operator name, and if the SQL standard has commandeered ->
> for some purpose or other, I'd really rather not add to the headaches
> involved should we ever decide to reclaim it.

OK, but I'd like to know what is going to be safe. There's no way to
future-proof the language. I'm quite prepared to replace -> with
something else, and if I do then ->> will need to be adjusted
accordingly, I think.

My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
on the ground that some fonts elevate ~ rather than aligning it in the
middle as most monospaced fonts do, but I'm tempted just to say "then
use a different font." Other possibilities that come to mind are +> and
+>>, although I think they're less attractive. But I'll be guided by the
consensus, assuming there is one ;-)

cheers

andrew


From: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 16:18:02
Message-ID: CADbMkNOENqbBxvJNWxZPLr=b=Sk_GrfsKAVq-kFxoApX-Ewsgw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 02/04/2013 10:47 AM, Robert Haas wrote:
>
>>
>> The SQL standards considerations seem worth thinking about, too.
>> We've certainly gone through a lot of pain working toward eliminating
>> => as an operator name, and if the SQL standard has commandeered ->
>> for some purpose or other, I'd really rather not add to the headaches
>> involved should we ever decide to reclaim it.
>>
>
>
> OK, but I'd like to know what is going to be safe. There's no way to
> future-proof the language. I'm quite prepared to replace -> with something
> else, and if I do then ->> will need to be adjusted accordingly, I think.
>
> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
> on the ground that some fonts elevate ~ rather than aligning it in the
> middle as most monospaced fonts do, but I'm tempted just to say "then use a
> different font." Other possibilities that come to mind are +> and +>>,
> although I think they're less attractive. But I'll be guided by the
> consensus, assuming there is one ;-)
>
> As a user I would be much in favor of just functions and no additional
operators if the sole difference is syntactical. I think custom operators
are much harder to remember than function names (assuming reasonably well
chosen function names).

Now Robert seems to suggest that there will also be speed / planner
difference which seems sad (I would have expected operators to be just
syntactical sugar for specially named functions and once we are past the
parser there should be no difference).


From: "David E(dot) Wheeler" <david(at)justatheory(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 17:50:43
Message-ID: C853A687-E5FD-4508-9C8A-1CE5FBCF2F5D@justatheory.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Feb 4, 2013, at 8:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on the ground that some fonts elevate ~ rather than aligning it in the middle as most monospaced fonts do, but I'm tempted just to say "then use a different font." Other possibilities that come to mind are +> and +>>, although I think they're less attractive. But I'll be guided by the consensus, assuming there is one ;-)

On the contrary, I quite like ~>. I've used it in pair.

http://pgxn.org/extension/pair

But others have complained about the font issue when I've suggested it for things in the past.

My fonts don't suck. :-)

I can live with +> and +>>.

David


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 17:57:11
Message-ID: CAHyXU0zzteefx3RP05qKC=6PKjBLHHhrnDfHYD_xCaYyOutvHA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 10:18 AM, Benedikt Grundmann
<bgrundmann(at)janestreet(dot)com> wrote:
>
>
>
> On Mon, Feb 4, 2013 at 4:10 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>>
>> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>>
>>>
>>> The SQL standards considerations seem worth thinking about, too.
>>> We've certainly gone through a lot of pain working toward eliminating
>>> => as an operator name, and if the SQL standard has commandeered ->
>>> for some purpose or other, I'd really rather not add to the headaches
>>> involved should we ever decide to reclaim it.
>>
>>
>>
>> OK, but I'd like to know what is going to be safe. There's no way to
>> future-proof the language. I'm quite prepared to replace -> with something
>> else, and if I do then ->> will need to be adjusted accordingly, I think.
>>
>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
>> on the ground that some fonts elevate ~ rather than aligning it in the
>> middle as most monospaced fonts do, but I'm tempted just to say "then use a
>> different font." Other possibilities that come to mind are +> and +>>,
>> although I think they're less attractive. But I'll be guided by the
>> consensus, assuming there is one ;-)
>>
> As a user I would be much in favor of just functions and no additional
> operators if the sole difference is syntactical. I think custom operators
> are much harder to remember than function names (assuming reasonably well
> chosen function names).

couple quick observations:

*) just about all postgres extension types expose operators -- problem
is not specific to json (and therefore IMO irrelevant to 9.3 release
of enhancements)

*) hstore exposes ->. I use it all over the place. I find operator
to be terse and readable -- much more so than function definition.
Ok, operator such as "@-@" is pretty silly, but "->" for get is
natural. The cat is out of the bag, so removing -> for 9.3 for
production seems pretty fruitless.

*) Removing -> (breaking all my and countless others' hstore dependent
code) should not happen until there is a very good reason. This was
extensively discussed in development of hstore. Breaking
compatibility sucks -- my company is just wrapping up a 12 month code
overhaul so we could get off 8.1.

*) it's bad enough that we drift from sql naming conventions and all
type manipulation functions (except in hstore) with type name.
json_get etc. at least using operators allow avoiding some of that
unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 18:07:31
Message-ID: 510FF8E3.1030203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/04/2013 12:57 PM, Merlin Moncure wrote:

> *) it's bad enough that we drift from sql naming conventions and all
> type manipulation functions (except in hstore) with type name.
> json_get etc. at least using operators allow avoiding some of that
> unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?
>

This names aren't set in stone either. I've been expecting some
bikeshedding there, and I'm surprised there hasn't been more.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 19:38:45
Message-ID: CA+TgmoayX576+cSTpSuivsKby+E6J-5AOR47vbeahCBudpsZ7A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>
>>
>> The SQL standards considerations seem worth thinking about, too.
>> We've certainly gone through a lot of pain working toward eliminating
>> => as an operator name, and if the SQL standard has commandeered ->
>> for some purpose or other, I'd really rather not add to the headaches
>> involved should we ever decide to reclaim it.
>
> OK, but I'd like to know what is going to be safe. There's no way to
> future-proof the language. I'm quite prepared to replace -> with something
> else, and if I do then ->> will need to be adjusted accordingly, I think.
>
> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on
> the ground that some fonts elevate ~ rather than aligning it in the middle
> as most monospaced fonts do, but I'm tempted just to say "then use a
> different font." Other possibilities that come to mind are +> and +>>,
> although I think they're less attractive. But I'll be guided by the
> consensus, assuming there is one ;-)

I suspect both of those are pretty safe from an SQL standards point of
view. Of course, as Tom is often wont to point out, the SQL standards
committee sometimes does bizarre things, so nothing's perfect, but I'd
be rather shocked if any of those got tapped to mean something else.

That having been said, I still don't see value in adding operators at
all. Good old function call notation seems perfectly adequate from
where I sit. Sure, it's a little more verbose, but when you try to
too hard make things concise then you end up having to explain to your
users why \ditS is a sensible thing for them to type into psql, or why
s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
this argument, but I've worked with a couple of languages where
operators can be overloaded (C++) or defined (ML) and it's just never
seemed to work out very well. YMMV, of course.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 19:59:07
Message-ID: CAHyXU0zDNzJwkTa=K=oc==aydMwjO4+6rGTgAKTdrW-dicueeg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 12:07 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 02/04/2013 12:57 PM, Merlin Moncure wrote:
>
>> *) it's bad enough that we drift from sql naming conventions and all
>> type manipulation functions (except in hstore) with type name.
>> json_get etc. at least using operators allow avoiding some of that
>> unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?
>>
>
> This names aren't set in stone either. I've been expecting some bikeshedding
> there, and I'm surprised there hasn't been more.

Well -- heh (asked to bikeshed: joy!) -- I felt like my objections
were noted and am more interested in getting said functionality out
the door than splitting hairs over names. Type prefix issue is under
the same umbrella as use of the -> operator, that is, *not
specifically related to this patch, and not worth holding up this
patch over*. In both cases it's essentially crying over spilt milk.

My only remaining nit with the proposal is with json_unnest().

SQL unnest() produces list of scalars regardless of dimensionality --
json unnest unwraps one level only (contrast: pl/pgsql array 'slice').
So I think 'json_array_each', or perhaps json_slice() is a better fit
there.

merlin


From: Will Leinweber <will(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 20:06:25
Message-ID: CAL8LqZR17XjT0FLvz1FHUVmLBcz-kMUL9d3c4hmLcA73vM4pyQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> I suspect both of those are pretty safe from an SQL standards point of
> view. Of course, as Tom is often wont to point out, the SQL standards
> committee sometimes does bizarre things, so nothing's perfect, but I'd
> be rather shocked if any of those got tapped to mean something else.
>
> That having been said, I still don't see value in adding operators at
> all. Good old function call notation seems perfectly adequate from
> where I sit. Sure, it's a little more verbose, but when you try to
> too hard make things concise then you end up having to explain to your
> users why \ditS is a sensible thing for them to type into psql, or why
> s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
> this argument, but I've worked with a couple of languages where
> operators can be overloaded (C++) or defined (ML) and it's just never
> seemed to work out very well. YMMV, of course.
>

For what my opinion is worth I absolute agree with just having function
names. The -> in hstore is kind of nice, but it lead me to a whole lot of
greif when I couldn't figure out how to create an index using it (turns out
you have to use _double_ parens, who knew?), but could create an index on
fetchval and assumed that postgres would figure it out.

Also a for quite a while it felt just like incantation of when I'd need
parens around those operatiors or not. Now that I sorta-kinda-not-really
understand the operation precedence rules in postgres/sql standard, I've
mostly given up on using cute operators because their much more of a pain
on a day-to-day basis.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 20:10:58
Message-ID: 511015D2.2000005@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/04/2013 02:59 PM, Merlin Moncure wrote:
> On Mon, Feb 4, 2013 at 12:07 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 02/04/2013 12:57 PM, Merlin Moncure wrote:
>>
>>> *) it's bad enough that we drift from sql naming conventions and all
>>> type manipulation functions (except in hstore) with type name.
>>> json_get etc. at least using operators allow avoiding some of that
>>> unnecessary verbosity. what's next: text_concatenate('foo', 'bar')?
>>>
>> This names aren't set in stone either. I've been expecting some bikeshedding
>> there, and I'm surprised there hasn't been more.
> Well -- heh (asked to bikeshed: joy!) -- I felt like my objections
> were noted and am more interested in getting said functionality out
> the door than splitting hairs over names. Type prefix issue is under
> the same umbrella as use of the -> operator, that is, *not
> specifically related to this patch, and not worth holding up this
> patch over*. In both cases it's essentially crying over spilt milk.
>
> My only remaining nit with the proposal is with json_unnest().
>
> SQL unnest() produces list of scalars regardless of dimensionality --
> json unnest unwraps one level only (contrast: pl/pgsql array 'slice').
> So I think 'json_array_each', or perhaps json_slice() is a better fit
> there.
>

how about json_array_elements()?

cheers

andrew


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 20:16:20
Message-ID: CAAZKuFan9iGR9K16cijHxsPsTvmJLCx0bbry9URFN2ErY+TWqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>>
>>>
>>> The SQL standards considerations seem worth thinking about, too.
>>> We've certainly gone through a lot of pain working toward eliminating
>>> => as an operator name, and if the SQL standard has commandeered ->
>>> for some purpose or other, I'd really rather not add to the headaches
>>> involved should we ever decide to reclaim it.
>>
>> OK, but I'd like to know what is going to be safe. There's no way to
>> future-proof the language. I'm quite prepared to replace -> with something
>> else, and if I do then ->> will need to be adjusted accordingly, I think.
>>
>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on
>> the ground that some fonts elevate ~ rather than aligning it in the middle
>> as most monospaced fonts do, but I'm tempted just to say "then use a
>> different font." Other possibilities that come to mind are +> and +>>,
>> although I think they're less attractive. But I'll be guided by the
>> consensus, assuming there is one ;-)
>
> I suspect both of those are pretty safe from an SQL standards point of
> view. Of course, as Tom is often wont to point out, the SQL standards
> committee sometimes does bizarre things, so nothing's perfect, but I'd
> be rather shocked if any of those got tapped to mean something else.
>
> That having been said, I still don't see value in adding operators at
> all. Good old function call notation seems perfectly adequate from
> where I sit. Sure, it's a little more verbose, but when you try to
> too hard make things concise then you end up having to explain to your
> users why \ditS is a sensible thing for them to type into psql, or why
> s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
> this argument, but I've worked with a couple of languages where
> operators can be overloaded (C++) or defined (ML) and it's just never
> seemed to work out very well. YMMV, of course.

I also basically feel this way, although I know I tend more towards
notational brutalism than many. I think we shouldn't kid ourselves
that non-default operators will be used, and for
current-implementation reasons (that maybe could be fixed by someone
determined) it's not really at the pleasure of the author to use them
via CREATE OPERATOR either.

So, I basically subscribe to view that we should investigate what
total reliance on prefix syntax looks like. I guess it'll make nested
navigation horribly ugly, though...positively lisp-esque. That' s one
consideration hstore doesn't have that may make use of infix notations
considerably more useful for json than hstore.

--
fdr


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 20:37:08
Message-ID: 51101BF4.1040802@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/04/2013 03:16 PM, Daniel Farina wrote:
> On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>>>
>>>> The SQL standards considerations seem worth thinking about, too.
>>>> We've certainly gone through a lot of pain working toward eliminating
>>>> => as an operator name, and if the SQL standard has commandeered ->
>>>> for some purpose or other, I'd really rather not add to the headaches
>>>> involved should we ever decide to reclaim it.
>>> OK, but I'd like to know what is going to be safe. There's no way to
>>> future-proof the language. I'm quite prepared to replace -> with something
>>> else, and if I do then ->> will need to be adjusted accordingly, I think.
>>>
>>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that on
>>> the ground that some fonts elevate ~ rather than aligning it in the middle
>>> as most monospaced fonts do, but I'm tempted just to say "then use a
>>> different font." Other possibilities that come to mind are +> and +>>,
>>> although I think they're less attractive. But I'll be guided by the
>>> consensus, assuming there is one ;-)
>> I suspect both of those are pretty safe from an SQL standards point of
>> view. Of course, as Tom is often wont to point out, the SQL standards
>> committee sometimes does bizarre things, so nothing's perfect, but I'd
>> be rather shocked if any of those got tapped to mean something else.
>>
>> That having been said, I still don't see value in adding operators at
>> all. Good old function call notation seems perfectly adequate from
>> where I sit. Sure, it's a little more verbose, but when you try to
>> too hard make things concise then you end up having to explain to your
>> users why \ditS is a sensible thing for them to type into psql, or why
>> s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
>> this argument, but I've worked with a couple of languages where
>> operators can be overloaded (C++) or defined (ML) and it's just never
>> seemed to work out very well. YMMV, of course.
> I also basically feel this way, although I know I tend more towards
> notational brutalism than many. I think we shouldn't kid ourselves
> that non-default operators will be used, and for
> current-implementation reasons (that maybe could be fixed by someone
> determined) it's not really at the pleasure of the author to use them
> via CREATE OPERATOR either.
>
> So, I basically subscribe to view that we should investigate what
> total reliance on prefix syntax looks like. I guess it'll make nested
> navigation horribly ugly, though...positively lisp-esque. That' s one
> consideration hstore doesn't have that may make use of infix notations
> considerably more useful for json than hstore.
>

We don't have the luxury of designing things like this in or out from
scratch. Creation of operators has been a part of PostgreSQL for a good
while longer than my involvement, and a great many people expect to be
able to use it. I can just imagine the outrage at any suggestion of
removing it.

So, please, let's get real. A "total reliance on prefix syntax" isn't
going to happen, and investigating what it would look like seems to me
just so much wasted time and effort.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Benedikt Grundmann <bgrundmann(at)janestreet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Daniel Farina <daniel(at)heroku(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 21:16:01
Message-ID: CAHyXU0yYfaq_tThXSyHpmimT6KP_FcLxTbCDszHGMji5XgtPwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 2:10 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> My only remaining nit with the proposal is with json_unnest().
>>
>> SQL unnest() produces list of scalars regardless of dimensionality --
>> json unnest unwraps one level only (contrast: pl/pgsql array 'slice').
>> So I think 'json_array_each', or perhaps json_slice() is a better fit
>> there.
>>
>
>
> how about json_array_elements()?

that works (although it's a little verbose for my taste). maybe
json_unwrap, json_array_unwrap, etc.

merlin


From: Daniel Farina <daniel(at)heroku(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 21:19:32
Message-ID: CAAZKuFaU-eNXrEqNXCdQF9sv+3FO4x32mM2S8gaz7gmSthszoA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
> On 02/04/2013 03:16 PM, Daniel Farina wrote:
>>
>> On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
>> wrote:
>>>
>>> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>> wrote:
>>>>
>>>> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>>>>
>>>>>
>>>>> The SQL standards considerations seem worth thinking about, too.
>>>>> We've certainly gone through a lot of pain working toward eliminating
>>>>> => as an operator name, and if the SQL standard has commandeered ->
>>>>> for some purpose or other, I'd really rather not add to the headaches
>>>>> involved should we ever decide to reclaim it.
>>>>
>>>> OK, but I'd like to know what is going to be safe. There's no way to
>>>> future-proof the language. I'm quite prepared to replace -> with
>>>> something
>>>> else, and if I do then ->> will need to be adjusted accordingly, I
>>>> think.
>>>>
>>>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
>>>> on
>>>> the ground that some fonts elevate ~ rather than aligning it in the
>>>> middle
>>>> as most monospaced fonts do, but I'm tempted just to say "then use a
>>>> different font." Other possibilities that come to mind are +> and +>>,
>>>> although I think they're less attractive. But I'll be guided by the
>>>> consensus, assuming there is one ;-)
>>>
>>> I suspect both of those are pretty safe from an SQL standards point of
>>> view. Of course, as Tom is often wont to point out, the SQL standards
>>> committee sometimes does bizarre things, so nothing's perfect, but I'd
>>> be rather shocked if any of those got tapped to mean something else.
>>>
>>> That having been said, I still don't see value in adding operators at
>>> all. Good old function call notation seems perfectly adequate from
>>> where I sit. Sure, it's a little more verbose, but when you try to
>>> too hard make things concise then you end up having to explain to your
>>> users why \ditS is a sensible thing for them to type into psql, or why
>>> s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
>>> this argument, but I've worked with a couple of languages where
>>> operators can be overloaded (C++) or defined (ML) and it's just never
>>> seemed to work out very well. YMMV, of course.
>>
>> I also basically feel this way, although I know I tend more towards
>> notational brutalism than many. I think we shouldn't kid ourselves
>> that non-default operators will be used, and for
>> current-implementation reasons (that maybe could be fixed by someone
>> determined) it's not really at the pleasure of the author to use them
>> via CREATE OPERATOR either.
>>
>> So, I basically subscribe to view that we should investigate what
>> total reliance on prefix syntax looks like. I guess it'll make nested
>> navigation horribly ugly, though...positively lisp-esque. That' s one
>> consideration hstore doesn't have that may make use of infix notations
>> considerably more useful for json than hstore.
>>
>
>
> We don't have the luxury of designing things like this in or out from
> scratch. Creation of operators has been a part of PostgreSQL for a good
> while longer than my involvement, and a great many people expect to be able
> to use it. I can just imagine the outrage at any suggestion of removing it.

I am only referring to referring the restriction that the planner
can't understand that fetchval() and '->' mean the same thing for,
say, hstore. Hence, use of non-default CREATE OPERATOR may become
more useful some day, instead of basically being a pitfall when
someone reasonably thinks they could use either spelling of the same
functionality and the optimizer will figure it out.

I'm not suggesting removal of any feature.

My reference to "total reliance of prefix syntax" refers only to the
JSON operators, since the previous correspondence from Robert was
about how function call syntax alone may be sufficient. This phrase
refers to the same idea he is proposing.

I also included a weakness to that idea, which is that nesting in JSON
makes the situation worse than the common compared case, hstore.

--
fdr


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 21:29:21
Message-ID: 51102831.40301@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/04/2013 04:19 PM, Daniel Farina wrote:
> On Mon, Feb 4, 2013 at 12:37 PM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> On 02/04/2013 03:16 PM, Daniel Farina wrote:
>>> On Mon, Feb 4, 2013 at 11:38 AM, Robert Haas <robertmhaas(at)gmail(dot)com>
>>> wrote:
>>>> On Mon, Feb 4, 2013 at 11:10 AM, Andrew Dunstan <andrew(at)dunslane(dot)net>
>>>> wrote:
>>>>> On 02/04/2013 10:47 AM, Robert Haas wrote:
>>>>>>
>>>>>> The SQL standards considerations seem worth thinking about, too.
>>>>>> We've certainly gone through a lot of pain working toward eliminating
>>>>>> => as an operator name, and if the SQL standard has commandeered ->
>>>>>> for some purpose or other, I'd really rather not add to the headaches
>>>>>> involved should we ever decide to reclaim it.
>>>>> OK, but I'd like to know what is going to be safe. There's no way to
>>>>> future-proof the language. I'm quite prepared to replace -> with
>>>>> something
>>>>> else, and if I do then ->> will need to be adjusted accordingly, I
>>>>> think.
>>>>>
>>>>> My suggestion would be ~> and ~>>. I know David Wheeler didn't like that
>>>>> on
>>>>> the ground that some fonts elevate ~ rather than aligning it in the
>>>>> middle
>>>>> as most monospaced fonts do, but I'm tempted just to say "then use a
>>>>> different font." Other possibilities that come to mind are +> and +>>,
>>>>> although I think they're less attractive. But I'll be guided by the
>>>>> consensus, assuming there is one ;-)
>>>> I suspect both of those are pretty safe from an SQL standards point of
>>>> view. Of course, as Tom is often wont to point out, the SQL standards
>>>> committee sometimes does bizarre things, so nothing's perfect, but I'd
>>>> be rather shocked if any of those got tapped to mean something else.
>>>>
>>>> That having been said, I still don't see value in adding operators at
>>>> all. Good old function call notation seems perfectly adequate from
>>>> where I sit. Sure, it's a little more verbose, but when you try to
>>>> too hard make things concise then you end up having to explain to your
>>>> users why \ditS is a sensible thing for them to type into psql, or why
>>>> s(at)\W@sprintf"%%%02x",ord($&)@e in Perl. I recognize that I may lose
>>>> this argument, but I've worked with a couple of languages where
>>>> operators can be overloaded (C++) or defined (ML) and it's just never
>>>> seemed to work out very well. YMMV, of course.
>>> I also basically feel this way, although I know I tend more towards
>>> notational brutalism than many. I think we shouldn't kid ourselves
>>> that non-default operators will be used, and for
>>> current-implementation reasons (that maybe could be fixed by someone
>>> determined) it's not really at the pleasure of the author to use them
>>> via CREATE OPERATOR either.
>>>
>>> So, I basically subscribe to view that we should investigate what
>>> total reliance on prefix syntax looks like. I guess it'll make nested
>>> navigation horribly ugly, though...positively lisp-esque. That' s one
>>> consideration hstore doesn't have that may make use of infix notations
>>> considerably more useful for json than hstore.
>>>
>>
>> We don't have the luxury of designing things like this in or out from
>> scratch. Creation of operators has been a part of PostgreSQL for a good
>> while longer than my involvement, and a great many people expect to be able
>> to use it. I can just imagine the outrage at any suggestion of removing it.
> I am only referring to referring the restriction that the planner
> can't understand that fetchval() and '->' mean the same thing for,
> say, hstore. Hence, use of non-default CREATE OPERATOR may become
> more useful some day, instead of basically being a pitfall when
> someone reasonably thinks they could use either spelling of the same
> functionality and the optimizer will figure it out.
>
> I'm not suggesting removal of any feature.
>
> My reference to "total reliance of prefix syntax" refers only to the
> JSON operators, since the previous correspondence from Robert was
> about how function call syntax alone may be sufficient. This phrase
> refers to the same idea he is proposing.
>
> I also included a weakness to that idea, which is that nesting in JSON
> makes the situation worse than the common compared case, hstore.

I see. OK, sorry for misunderstanding.

I suspect, BTW that mostly people will use get_path*() (or rather, its
equivalent operator ;-) ) rather than operator chaining:

select myjson->>'{"authors",0,"name"}'::text[];

cheers

andrew


From: Hannu Krosing <hannu(at)krosing(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-04 23:42:45
Message-ID: 51104775.9080106@krosing.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 01/31/2013 11:20 PM, Andrew Dunstan wrote:
>
> I'm happy to take opinions about this, and I expected
> some bikeshedding, but your reaction is contrary to
> everything others have told me. Mostly they love the operators.
What I would really like is if we extended postgresql core and made
a few more constructs definable as overloadable operator:

1) array / dictionary element lookup
a[b]
CREATE OPERATOR [] (...)

2) attribute lookup
a.b
CREATE OPERATOR . (...)

then you could make json lookups either step-by-step using

CREATE OPERATOR [] (
PROCEDURE = json_array_lookup, LEFTARG = json, RIGHTARG = int)

and

CREATE OPERATOR [] (
PROCEDURE = json_dict_lookup, LEFTARG = json, RIGHTARG = text)

fourthname = myjson[4]['name']

or perhaps a single

CREATE OPERATOR [] (
PROCEDURE = json_deep_lookup, LEFTARG = json, RIGHTARG = VARIADIC
"any")

fourthname = myjson[4, 'name']

though I suspect that we do not support type VARIADIC "any" in operator
definitions

---------
Hannu

> I guess that '~>' and '~>>' would work as well as '->' and '->>'.
>
>
> cheers
>
> andrew
>
>


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-05 07:09:39
Message-ID: CAFj8pRCxzarTmRf58O=dZOxonw7wMJMBLgK3gMwst9uyj8n1+g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/2/5 Hannu Krosing <hannu(at)krosing(dot)net>:
> On 01/31/2013 11:20 PM, Andrew Dunstan wrote:
>>
>>
>> I'm happy to take opinions about this, and I expected
>> some bikeshedding, but your reaction is contrary to
>> everything others have told me. Mostly they love the operators.
>
> What I would really like is if we extended postgresql core and made
> a few more constructs definable as overloadable operator:
>
> 1) array / dictionary element lookup
> a[b]
> CREATE OPERATOR [] (...)
>
> 2) attribute lookup
> a.b
> CREATE OPERATOR . (...)
>
> then you could make json lookups either step-by-step using
>
> CREATE OPERATOR [] (
> PROCEDURE = json_array_lookup, LEFTARG = json, RIGHTARG = int)
>
> and
>
> CREATE OPERATOR [] (
> PROCEDURE = json_dict_lookup, LEFTARG = json, RIGHTARG = text)
>
> fourthname = myjson[4]['name']
>
>
> or perhaps a single
>
>
> CREATE OPERATOR [] (
> PROCEDURE = json_deep_lookup, LEFTARG = json, RIGHTARG = VARIADIC "any")
>
> fourthname = myjson[4, 'name']
>

it is near to full collection implementation - and can be nice to have
it. For this moment we should to return to this topic.

My preference is using well named functions (prefer it against
operator) and operator that are not in collision with current ANSI SQL

I don't see any nice on design select
myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as
dinosaurs

better and more usual

myjson['authors']['0']['name']

or

myjson['authors/0/name']

Regards

Pavel

>
> though I suspect that we do not support type VARIADIC "any" in operator
> definitions
>
> ---------
> Hannu
>
>
>
>> I guess that '~>' and '~>>' would work as well as '->' and '->>'.
>>
>>
>> cheers
>>
>> andrew
>>
>>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-05 08:26:41
Message-ID: 5110C241.9020704@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 02/05/2013 02:09 AM, Pavel Stehule wrote:

>
> I don't see any nice on design select
> myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as
> dinosaurs

I rather like dinosaurs. Beauty is, as they say, in the eye of the beholder.

Let me also point out that you can say (somewhat less efficiently):

myjson->'authors'->0->>'name'

which is not terribly inelegant.

>
> better and more usual
>
> myjson['authors']['0']['name']
>
> or
>
> myjson['authors/0/name']

Well, nothing like that is going to happen in this release. If you or
someone wants to work on a general subscripting facility for arbitrary
data types then I look forward to seeing it.

Let me also point out that the most important part of this patch is the
part that almost nobody has commented on, namely the parser changes and
API that the actual visible functions are built on. Writing JSON
accessor / transformation functions without this framework is hard, and
often redundant. I'm much more concerned to get this framework and some
basic accessor functions (and preferably operators) added than bothered
about how the latter are precisely spelled.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Hannu Krosing <hannu(at)krosing(dot)net>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: json api WIP patch
Date: 2013-02-05 09:04:37
Message-ID: CAFj8pRBn4Bw_Z2BL38AKP0Aua+Tkf3pRE=CQuof5ghLBtwXM1w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/2/5 Andrew Dunstan <andrew(at)dunslane(dot)net>:
>
> On 02/05/2013 02:09 AM, Pavel Stehule wrote:
>
>
>>
>> I don't see any nice on design select
>> myjson->>'{"authors",0,"name"}'::text[]; - more it is ugly as
>> dinosaurs
>
>
> I rather like dinosaurs. Beauty is, as they say, in the eye of the beholder.
>
> Let me also point out that you can say (somewhat less efficiently):
>
> myjson->'authors'->0->>'name'
>
> which is not terribly inelegant.
>
>
>>
>> better and more usual
>>
>> myjson['authors']['0']['name']
>>
>> or
>>
>> myjson['authors/0/name']
>
>
>
> Well, nothing like that is going to happen in this release. If you or
> someone wants to work on a general subscripting facility for arbitrary data
> types then I look forward to seeing it.
>
> Let me also point out that the most important part of this patch is the part
> that almost nobody has commented on, namely the parser changes and API that
> the actual visible functions are built on. Writing JSON accessor /
> transformation functions without this framework is hard, and often
> redundant. I'm much more concerned to get this framework and some basic
> accessor functions (and preferably operators) added than bothered about how
> the latter are precisely spelled.

C API and implementation can be changed or fixed without hard issues
- it is usual so about SQL interface is hard discussion.

Regards

Pavel

>
>
> cheers
>
> andrew
>
>
>