Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns

Lists: pgsql-bugspgsql-hackers
From: matti(dot)hameister(at)technologygroup(dot)de
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-22 15:34:09
Message-ID: 20140622153409.2618.90346@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

The following bug has been logged on the website:

Bug reference: 10728
Logged by: Matti Hameister
Email address: matti(dot)hameister(at)technologygroup(dot)de
PostgreSQL version: 9.4beta1
Operating system: Linux
Description:

This query:

--
SELECT X.* FROM
json_to_record(
'
{"a":2,"c":3,"b":{"z":4}, "d":6}
',true
) AS X(a int, b json, c int, d int);
--

returns as expected
a: 2
b: {"z":4}
c: 3
d: 6

Now I changed the query a bit (using recordset):

--
SELECT X.* FROM
json_to_recordset(
'[
{"a":2,"c":3,"b":{"z":4}, "d":6}
]
',true
) AS X(a int, b json, c int, d int);
--

the result is surprising:
a: NULL
b: {"z":4}
c: NULL
d: 6


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: matti(dot)hameister(at)technologygroup(dot)de
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 05:13:28
Message-ID: CAB7nPqQ2hLVt3o-d8-5=h5MEfPr1sS9jT4CK0GpQXew8PeY=yw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Sun, Jun 22, 2014 at 8:34 AM, <matti(dot)hameister(at)technologygroup(dot)de>
wrote:
> The following bug has been logged on the website:
>
> Bug reference: 10728
> Logged by: Matti Hameister
> Email address: matti(dot)hameister(at)technologygroup(dot)de
> PostgreSQL version: 9.4beta1
> Operating system: Linux
> Description:
>
> This query:
>
> --
> SELECT X.* FROM
> json_to_record(
> '
> {"a":2,"c":3,"b":{"z":4}, "d":6}
> ',true
> ) AS X(a int, b json, c int, d int);
> --
>
> returns as expected
> a: 2
> b: {"z":4}
> c: 3
> d: 6
>
>
> Now I changed the query a bit (using recordset):
>
> --
> SELECT X.* FROM
> json_to_recordset(
> '[
> {"a":2,"c":3,"b":{"z":4}, "d":6}
> ]
> ',true
> ) AS X(a int, b json, c int, d int);
> --
>
> the result is surprising:
> a: NULL
> b: {"z":4}
> c: NULL
> d: 6
Interesting. I would have expected the same result as well. It is worth
noticing that jsonb_to_recordset works as expected:
=# SELECT X.* FROM json_to_recordset('[{"a":2,"c":3,"b":{"z":4}, "d":6}]',
true)
AS X(a int, b json, c int, d int);
a | b | c | d
------+---------+------+---
null | {"z":4} | null | 6
(1 row)
=# SELECT X.* FROM jsonb_to_recordset('[{"a":2,"c":3,"b":{"z":4}, "d":6}]',
true)
AS X(a int, b json, c int, d int);
a | b | c | d
---+----------+---+---
2 | {"z": 4} | 3 | 6
(1 row)

Digging more into it, you can see this error happens iff only one of the
fields is a json itself, and that it deletes all the values prior to it.
For example in this case a json value is set as the 3rd return element,
note that the two ones prior to it get deleted:
=# SELECT X.* FROM json_to_recordset('[{"a":2,"b":3,"c":{"z":4}, "d":6}]',
true) as X(a int, b int, c json, d int);
a | b | c | d
------+------+----------+---
null | null | {"z": 4} | 6
(1 row)
The error is as well independent on the order of the elements in the alias
clause, but in their order in the json field:
=# SELECT X.* FROM json_to_recordset('[{"a":2,"b":3,"c":{"z":4}, "d":6}]',
true)
AS X(a int, c json, b int, d int);
a | c | b | d
------+---------+------+---
null | {"z":4} | null | 6
(1 row)

Finally, the last json value deletes all the prior values, even other json:
=# SELECT X.* FROM
json_to_recordset('[{"a":2,"b":{"v":4},"c":6,"d":{"x":6},"e":7}]', true)
AS X(a int, b json, c int, d json, e int);
a | b | c | d | e
------+------+------+---------+---
null | null | null | {"x":6} | 7
(1 row)

I am guessing that the bug origin is in pg_parse_json in the way nested
json is managed, it is the only code path of populate_recordset_worker
where a switch on JSON[B]OID is used.
Regards,
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: matti(dot)hameister(at)technologygroup(dot)de
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 07:36:16
Message-ID: CAB7nPqS0b=1DEKPWTQgzB0L3iV-ZQxTdffnqpc2Qp8_F1U2-6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Jun 23, 2014 at 2:13 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> I am guessing that the bug origin is in pg_parse_json in the way nested
> json is managed, it is the only code path of populate_recordset_worker
> where a switch on JSON[B]OID is used.
>
Digging into that, I am seeing that the hash table used to find the field
values queried in populate_recordset_object_end in the hash table
(PopulateRecordsetState *)->json_hash has null entries for all the columns
inserted before the last nested json value. For example in my last example
with '[{"a":2,"b":3,"c":{"z":4}, "d":6}]', this results in having null
values for "a" and "b", "c" and "d" remaining correct.
populate_recordset_object_field_end inserts those values correctly within
the hash table though, so something strange is going on when inserting in
json_hash directly a json value.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: matti(dot)hameister(at)technologygroup(dot)de
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 12:18:26
Message-ID: CAB7nPqTP1m5H=kNsm6mmv-5f7A99O7AP2X6E9ubb4ShZWq-COQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Jun 23, 2014 at 4:36 PM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Mon, Jun 23, 2014 at 2:13 PM, Michael Paquier <
> michael(dot)paquier(at)gmail(dot)com> wrote:
>
>> I am guessing that the bug origin is in pg_parse_json in the way nested
>> json is managed, it is the only code path of populate_recordset_worker
>> where a switch on JSON[B]OID is used.
>>
> Digging into that, I am seeing that the hash table used to find the field
> values queried in populate_recordset_object_end in the hash table
> (PopulateRecordsetState *)->json_hash has null entries for all the columns
> inserted before the last nested json value. For example in my last example
> with '[{"a":2,"b":3,"c":{"z":4}, "d":6}]', this results in having null
> values for "a" and "b", "c" and "d" remaining correct.
> populate_recordset_object_field_end inserts those values correctly within
> the hash table though, so something strange is going on when inserting in
> json_hash directly a json value.
>
Digging more into that, I have found the issue and a fix for it. It happens
that populate_recordset_object_start, which is used to initialize the
process for the population of the record, is taken *each* time a json
object is found, re-creating every time the hash table for the parsing
process, hence removing from PopulateRecordsetState all the entries already
parsed and creating the problem reported by Matti. The fix I am proposing
to fix this issue is rather simple: simply bypass the creation of the hash
table if lex_level > 1 as we are in presence of a nested object and rely on
the existing hash table.
Patch is attached, and should be backpatched to REL9_4_STABLE where
json_to_recordset has been introduced.
Regards,
--
Michael

Attachment Content-Type Size
20140623_fix_json_record_hash.patch text/x-diff 3.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 15:43:00
Message-ID: 16858.1403538180@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
> Digging more into that, I have found the issue and a fix for it. It happens
> that populate_recordset_object_start, which is used to initialize the
> process for the population of the record, is taken *each* time a json
> object is found, re-creating every time the hash table for the parsing
> process, hence removing from PopulateRecordsetState all the entries already
> parsed and creating the problem reported by Matti. The fix I am proposing
> to fix this issue is rather simple: simply bypass the creation of the hash
> table if lex_level > 1 as we are in presence of a nested object and rely on
> the existing hash table.

Yes, this code is clearly not handling the nested-objects case correctly.
I had written a fix more or less equivalent to yours last night.

However, it seems to me that these functions (json[b]_to_record[set]) are
handling the nested-json-objects case in a fairly brain-dead fashion to
start with. I would like to propose that we should think about getting
rid of the use_json_as_text flag arguments altogether. What purpose do
they serve? If we're going to the trouble of parsing the nested JSON
objects anyway, why don't we just reconstruct from that data?

(IOW, we probably actually should have nested hash tables in this case.
I suspect that the current bug arose from incompletely-written logic
to do it like that.)

Since we've already decided to force an initdb for 9.4beta2, it's not
quite too late to revisit this API, and I think it needs revisiting.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 16:19:05
Message-ID: CAHyXU0wiJ78MD8Sap9RL4-yigC_d5dKdPDvQ+nKUsVPqCUTfZg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Jun 23, 2014 at 10:43 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> Digging more into that, I have found the issue and a fix for it. It happens
>> that populate_recordset_object_start, which is used to initialize the
>> process for the population of the record, is taken *each* time a json
>> object is found, re-creating every time the hash table for the parsing
>> process, hence removing from PopulateRecordsetState all the entries already
>> parsed and creating the problem reported by Matti. The fix I am proposing
>> to fix this issue is rather simple: simply bypass the creation of the hash
>> table if lex_level > 1 as we are in presence of a nested object and rely on
>> the existing hash table.
>
> Yes, this code is clearly not handling the nested-objects case correctly.
> I had written a fix more or less equivalent to yours last night.
>
> However, it seems to me that these functions (json[b]_to_record[set]) are
> handling the nested-json-objects case in a fairly brain-dead fashion to
> start with. I would like to propose that we should think about getting
> rid of the use_json_as_text flag arguments altogether. What purpose do
> they serve? If we're going to the trouble of parsing the nested JSON
> objects anyway, why don't we just reconstruct from that data?

I think they should be removed. (I called this out in the feature
level review: http://www.postgresql.org/message-id/CAHyXU0wqadCJk7MMkeARuuY05VrD=AXDn6wDceMtuWo5p4CUiA@mail.gmail.com).
AIUI, the flag was introduced as a workaround to try and deal with
mapping nested structures. Text variant 'json' flags have had them.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 20:01:03
Message-ID: 53A8877F.7060208@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On 06/23/2014 11:43 AM, Tom Lane wrote:
> Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>> Digging more into that, I have found the issue and a fix for it. It happens
>> that populate_recordset_object_start, which is used to initialize the
>> process for the population of the record, is taken *each* time a json
>> object is found, re-creating every time the hash table for the parsing
>> process, hence removing from PopulateRecordsetState all the entries already
>> parsed and creating the problem reported by Matti. The fix I am proposing
>> to fix this issue is rather simple: simply bypass the creation of the hash
>> table if lex_level > 1 as we are in presence of a nested object and rely on
>> the existing hash table.
> Yes, this code is clearly not handling the nested-objects case correctly.
> I had written a fix more or less equivalent to yours last night.
>
> However, it seems to me that these functions (json[b]_to_record[set]) are
> handling the nested-json-objects case in a fairly brain-dead fashion to
> start with. I would like to propose that we should think about getting
> rid of the use_json_as_text flag arguments altogether. What purpose do
> they serve? If we're going to the trouble of parsing the nested JSON
> objects anyway, why don't we just reconstruct from that data?
>
> (IOW, we probably actually should have nested hash tables in this case.
> I suspect that the current bug arose from incompletely-written logic
> to do it like that.)
>
> Since we've already decided to force an initdb for 9.4beta2, it's not
> quite too late to revisit this API, and I think it needs revisiting.
>
>

Looks like we have some problems in this whole area, not just the new
function, so we need to fix 9.3 also :-(

IIRC, originally, the intention was to disallow nested json objects, but
the use_json_as_text was put in as a possibly less drastic possibility.
If we get rid of it our only recourse is to error out if we encounter
nested json. I was probably remiss in not considering the likelihood of
a json target field.

I currently don't have lots of time to devote to this, sadly, but
Michael's patch looks like a good minimal fix.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-23 23:34:08
Message-ID: 17766.1403566448@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 06/23/2014 11:43 AM, Tom Lane wrote:
>> However, it seems to me that these functions (json[b]_to_record[set]) are
>> handling the nested-json-objects case in a fairly brain-dead fashion to
>> start with. I would like to propose that we should think about getting
>> rid of the use_json_as_text flag arguments altogether. What purpose do
>> they serve? If we're going to the trouble of parsing the nested JSON
>> objects anyway, why don't we just reconstruct from that data?

> Looks like we have some problems in this whole area, not just the new
> function, so we need to fix 9.3 also :-(

> IIRC, originally, the intention was to disallow nested json objects, but
> the use_json_as_text was put in as a possibly less drastic possibility.
> If we get rid of it our only recourse is to error out if we encounter
> nested json. I was probably remiss in not considering the likelihood of
> a json target field.

> I currently don't have lots of time to devote to this, sadly, but
> Michael's patch looks like a good minimal fix.

I can spend some time on it over the next couple of days. I take it you
don't have a problem with the concept of doing recursive processing,
as long as it doesn't add much complication?

I'm not following your comment about 9.3. The json[b]_to_record[set]
functions are new in 9.4, which is what makes me feel it's not too
late to redefine their behavior. But changing behavior of stuff that
was in 9.3 seems a lot more debatable.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 00:34:08
Message-ID: 53A8C780.2070701@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On 06/23/2014 07:34 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 06/23/2014 11:43 AM, Tom Lane wrote:
>>> However, it seems to me that these functions (json[b]_to_record[set]) are
>>> handling the nested-json-objects case in a fairly brain-dead fashion to
>>> start with. I would like to propose that we should think about getting
>>> rid of the use_json_as_text flag arguments altogether. What purpose do
>>> they serve? If we're going to the trouble of parsing the nested JSON
>>> objects anyway, why don't we just reconstruct from that data?
>> Looks like we have some problems in this whole area, not just the new
>> function, so we need to fix 9.3 also :-(
>> IIRC, originally, the intention was to disallow nested json objects, but
>> the use_json_as_text was put in as a possibly less drastic possibility.
>> If we get rid of it our only recourse is to error out if we encounter
>> nested json. I was probably remiss in not considering the likelihood of
>> a json target field.
>> I currently don't have lots of time to devote to this, sadly, but
>> Michael's patch looks like a good minimal fix.
> I can spend some time on it over the next couple of days. I take it you
> don't have a problem with the concept of doing recursive processing,
> as long as it doesn't add much complication?
>
> I'm not following your comment about 9.3. The json[b]_to_record[set]
> functions are new in 9.4, which is what makes me feel it's not too
> late to redefine their behavior. But changing behavior of stuff that
> was in 9.3 seems a lot more debatable.
>
>

This problem is also manifest in json_populate_recordset, which also
uses the function in question, and is in 9.3:

andrew=# create type yyy as (a int, b json, c int, d int);
CREATE TYPE
andrew=# select * from json_populate_recordset(null::yyy, '[
{"a":2,"c":3,"b":{"z":4}, "d":6}
]
',true) x;
a | b | c | d
---+---------+---+---
| {"z":4} | | 6
(1 row)

I don't have any problem with recursive processing, but I'm not sure I
understand how it will work. If you post a patch I will be able to look
it over, though.

cheers

andrew

>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 01:43:31
Message-ID: 20561.1403574211@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 06/23/2014 07:34 PM, Tom Lane wrote:
>> I'm not following your comment about 9.3. The json[b]_to_record[set]
>> functions are new in 9.4, which is what makes me feel it's not too
>> late to redefine their behavior. But changing behavior of stuff that
>> was in 9.3 seems a lot more debatable.

> This problem is also manifest in json_populate_recordset, which also
> uses the function in question, and is in 9.3:

Ah, I see the problem.

Here is a first cut suggestion:

* Get rid of the use_json_as_text flag argument for the new functions.
In json_populate_record(set), ignore its value and deprecate using it.
(The fact that it already had a default makes that easier.) The
behavior should always be as below.

* For nested json objects, we'll spit those out in json textual format,
which means they'll successfully convert to either text or json/jsonb.
Compared to the old behavior of json_populate_recordset, this just means
that we don't throw an error anymore regardless of the flag value,
which seems ok (though maybe not something to backpatch into 9.3).

* Nested json arrays are a bit more problematic. What I'd ideally like
is to spit them out in a form that would be successfully parsable as a SQL
array of the appropriate element type. Unfortunately, I think that that
ship has sailed because json_populate_recordset failed to do that in 9.3.
What we should probably do is define this the same as the nested object
case, ie, we spit it out in *json* array format, meaning you can insert it
into a text or json/jsonb field of the result record. Maybe sometime in
the future we can add a json-array-to-SQL-array converter function, but
these functions won't do that.

From a user's standpoint this just boils down to (a) fix the bug with
mishandling of the hash tables, and (b) get rid of the gratuitous
error report.

regards, tom lane


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 02:08:18
Message-ID: CAB7nPqSfxZj2qU155tzpvQ7yDWaNwL9jOuXUBqEf7aR_7TLwyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 24, 2014 at 10:43 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > On 06/23/2014 07:34 PM, Tom Lane wrote:
> >> I'm not following your comment about 9.3. The json[b]_to_record[set]
> >> functions are new in 9.4, which is what makes me feel it's not too
> >> late to redefine their behavior. But changing behavior of stuff that
> >> was in 9.3 seems a lot more debatable.
>
> > This problem is also manifest in json_populate_recordset, which also
> > uses the function in question, and is in 9.3:
>
> Ah, I see the problem.
>
> Here is a first cut suggestion:
>
> * Get rid of the use_json_as_text flag argument for the new functions.
> In json_populate_record(set), ignore its value and deprecate using it.
> (The fact that it already had a default makes that easier.) The
> behavior should always be as below.
>
Agreed. This simplifies the interface of the existing functions and will
need a mention in the release notes of 9.3.

> * For nested json objects, we'll spit those out in json textual format,
> which means they'll successfully convert to either text or json/jsonb.
> Compared to the old behavior of json_populate_recordset, this just means
> that we don't throw an error anymore regardless of the flag value,
> which seems ok (though maybe not something to backpatch into 9.3).
>
* Nested json arrays are a bit more problematic. What I'd ideally like
> is to spit them out in a form that would be successfully parsable as a SQL
> array of the appropriate element type. Unfortunately, I think that that
> ship has sailed because json_populate_recordset failed to do that in 9.3.
> What we should probably do is define this the same as the nested object
> case, ie, we spit it out in *json* array format, meaning you can insert it
> into a text or json/jsonb field of the result record. Maybe sometime in
> the future we can add a json-array-to-SQL-array converter function, but
> these functions won't do that.
>
Just a question (lack of coffee): do those two points implicitly mean that
we do not parse the nested json objects, pass them as simple text to the
hash table, and bypass the creation of fresh hash tables with lex_level > 1
in populate_recordset_object_start.
--
Michael


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 02:34:38
Message-ID: CAB7nPqT_Tybncy_ixDtpK0SzzqevC_6T-Vf0nOOVdD94G1-8qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 24, 2014 at 9:34 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:

>
> On 06/23/2014 07:34 PM, Tom Lane wrote:
>
>> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>>
>>> On 06/23/2014 11:43 AM, Tom Lane wrote:
>>>
>>>> However, it seems to me that these functions (json[b]_to_record[set])
>>>> are
>>>> handling the nested-json-objects case in a fairly brain-dead fashion to
>>>> start with. I would like to propose that we should think about getting
>>>> rid of the use_json_as_text flag arguments altogether. What purpose do
>>>> they serve? If we're going to the trouble of parsing the nested JSON
>>>> objects anyway, why don't we just reconstruct from that data?
>>>>
>>> Looks like we have some problems in this whole area, not just the new
>>> function, so we need to fix 9.3 also :-(
>>> IIRC, originally, the intention was to disallow nested json objects, but
>>> the use_json_as_text was put in as a possibly less drastic possibility.
>>> If we get rid of it our only recourse is to error out if we encounter
>>> nested json. I was probably remiss in not considering the likelihood of
>>> a json target field.
>>> I currently don't have lots of time to devote to this, sadly, but
>>> Michael's patch looks like a good minimal fix.
>>>
>> I can spend some time on it over the next couple of days. I take it you
>> don't have a problem with the concept of doing recursive processing,
>> as long as it doesn't add much complication?
>>
>> I'm not following your comment about 9.3. The json[b]_to_record[set]
>> functions are new in 9.4, which is what makes me feel it's not too
>> late to redefine their behavior. But changing behavior of stuff that
>> was in 9.3 seems a lot more debatable.
>>
>>
>>
>
> This problem is also manifest in json_populate_recordset, which also uses
> the function in question, and is in 9.3:
>
> andrew=# create type yyy as (a int, b json, c int, d int);
> CREATE TYPE
> andrew=# select * from json_populate_recordset(null::yyy, '[
>
> {"a":2,"c":3,"b":{"z":4}, "d":6}
> ]
> ',true) x;
> a | b | c | d
> ---+---------+---+---
>
> | {"z":4} | | 6
> (1 row)
>
Yeah, the somewhat-backpatchable patch I sent fixes that as well. I
wouldn't mind writing a more complete patch with new regression tests and
tutti-quanti for 9.3 and 9.4master, but it seems that Tom is already on it.
--
Michael


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 12:31:45
Message-ID: CAHyXU0x60k9Ue_HM_FhBCvssh=NXUyFT69Kqov0sOWc4ee-AsA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Mon, Jun 23, 2014 at 8:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> * Nested json arrays are a bit more problematic. What I'd ideally like
> is to spit them out in a form that would be successfully parsable as a SQL
> array of the appropriate element type. Unfortunately, I think that that
> ship has sailed because json_populate_recordset failed to do that in 9.3.
> What we should probably do is define this the same as the nested object
> case, ie, we spit it out in *json* array format, meaning you can insert it
> into a text or json/jsonb field of the result record. Maybe sometime in
> the future we can add a json-array-to-SQL-array converter function, but
> these functions won't do that.

Not quite following your logic here. 9.3 gave an error for an
internally nested array:

postgres=# create type foo as(a int, b int[]);
postgres=# select * from json_populate_recordset(null::foo, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]');
ERROR: cannot call json_populate_recordset on a nested object

With your proposal this would still fail? TBH, I'd rather this
function fail as above than implement a behavior we couldn't take back
later.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 14:08:15
Message-ID: 53A9864F.4080305@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


On 06/23/2014 09:43 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 06/23/2014 07:34 PM, Tom Lane wrote:
>>> I'm not following your comment about 9.3. The json[b]_to_record[set]
>>> functions are new in 9.4, which is what makes me feel it's not too
>>> late to redefine their behavior. But changing behavior of stuff that
>>> was in 9.3 seems a lot more debatable.
>> This problem is also manifest in json_populate_recordset, which also
>> uses the function in question, and is in 9.3:
> Ah, I see the problem.
>
> Here is a first cut suggestion:
>
> * Get rid of the use_json_as_text flag argument for the new functions.
> In json_populate_record(set), ignore its value and deprecate using it.
> (The fact that it already had a default makes that easier.) The
> behavior should always be as below.
>
> * For nested json objects, we'll spit those out in json textual format,
> which means they'll successfully convert to either text or json/jsonb.
> Compared to the old behavior of json_populate_recordset, this just means
> that we don't throw an error anymore regardless of the flag value,
> which seems ok (though maybe not something to backpatch into 9.3).
>
> * Nested json arrays are a bit more problematic. What I'd ideally like
> is to spit them out in a form that would be successfully parsable as a SQL
> array of the appropriate element type. Unfortunately, I think that that
> ship has sailed because json_populate_recordset failed to do that in 9.3.
> What we should probably do is define this the same as the nested object
> case, ie, we spit it out in *json* array format, meaning you can insert it
> into a text or json/jsonb field of the result record. Maybe sometime in
> the future we can add a json-array-to-SQL-array converter function, but
> these functions won't do that.
>
> >From a user's standpoint this just boils down to (a) fix the bug with
> mishandling of the hash tables, and (b) get rid of the gratuitous
> error report.
>
>

The big problem is that we have been ignoring the result type when
constructing the hash, even though the info is available. There is some
sense in this in that the field might not even be present in the result
type. And it works except for structured types like records, arrays and
json. Even if we don't have a nested value, the functions will do the
wrong thing for a scalar string destined for a json field (it will be
de-escaped, when it should not be).

w.r.t. json arrays, I think you're chasing a chimera, since they are
heterogenous, unlike SQL arrays.

w.r.t. the use_json_as_text argument, yes, it has a default, but the
default is false. Ignoring it seems to be more than just deprecating it.
I agree it's a mess, though :-(

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-24 16:57:53
Message-ID: CAHyXU0zKHCX-onc-hULCtE9gx=xkB3d7DjNki5_DeHtAhKP=Bg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Jun 24, 2014 at 9:08 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> w.r.t. json arrays, I think you're chasing a chimera, since they are
> heterogenous, unlike SQL arrays.

But, there are many useful cases where the json is known to be well
formed, right? Or do you mean that the difficulties stem from simply
validating the type? Basically, I'm wondering if

SELECT to_json(foo_t[])

is ever going to be able to be reversed by:

SELECT array(json[b]_populate_recordset(null::foo_t[]), '...'::json[b])

...where foo_t is some arbitrarily complex nested type. even simpler
(although not necessarily faster) would be:

SELECT from_json(null::foo_t[], ',,,');

or even

SELECT '...'::foo_t[]::json::foo_t[];

My basic gripe with the json[b] APIs is that there is no convenient
deserialization reverse of to_json. Tom's proposal AIUI, in particular
having internal json arrays force to json, would foreclose the last
two cases from ever being possible.

merlin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-25 00:01:00
Message-ID: 21726.1403654460@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Merlin Moncure <mmoncure(at)gmail(dot)com> writes:
> On Mon, Jun 23, 2014 at 8:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> * Nested json arrays are a bit more problematic. What I'd ideally like
>> is to spit them out in a form that would be successfully parsable as a SQL
>> array of the appropriate element type. Unfortunately, I think that that
>> ship has sailed because json_populate_recordset failed to do that in 9.3.
>> What we should probably do is define this the same as the nested object
>> case, ie, we spit it out in *json* array format, meaning you can insert it
>> into a text or json/jsonb field of the result record. Maybe sometime in
>> the future we can add a json-array-to-SQL-array converter function, but
>> these functions won't do that.

> Not quite following your logic here. 9.3 gave an error for an
> internally nested array:

> postgres=# create type foo as(a int, b int[]);
> postgres=# select * from json_populate_recordset(null::foo, '[{"a": 1,
> "b": [1,2,3]},{"a": 1, "b": [1,2,3]}]');
> ERROR: cannot call json_populate_recordset on a nested object

Yeah, that's the default behavior, with use_json_as_text false.
However, consider what happens with use_json_as_text true:

regression=# select * from json_populate_recordset(null::foo, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
ERROR: missing "]" in array dimensions

That case is certainly useless, but suppose somebody had done

regression=# create type foo2 as(a int, b json);
CREATE TYPE
regression=# select * from json_populate_recordset(null::foo2, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
a | b
---+---------
1 | [1,2,3]
1 | [1,2,3]
(2 rows)

or even just

regression=# create type foo3 as(a int, b text);
CREATE TYPE
regression=# select * from json_populate_recordset(null::foo3, '[{"a": 1,
"b": [1,2,3]},{"a": 1, "b": [1,2,3]}]', true);
a | b
---+---------
1 | [1,2,3]
1 | [1,2,3]
(2 rows)

Since these cases work and do something arguably useful, I doubt we
can break them.

However, I don't see anything wrong with changing the behavior in
cases that currently throw an error, since presumably no application
is depending on them. Perhaps Andrew's comment about looking at the
target type info yields a way forward, ie, we could output in SQL-array
format if the target is an array, or in JSON-array format if the target
is json. Multiply-nested cases might be a pain to get right though.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-25 04:25:53
Message-ID: 75169.1403670353@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>>> I currently don't have lots of time to devote to this, sadly, but
>>> Michael's patch looks like a good minimal fix.

> This problem is also manifest in json_populate_recordset, which also
> uses the function in question, and is in 9.3:

I've pushed this patch back through 9.3, along with a fix to ensure that
json_populate_record destroys the hashtable it creates. I want to do some
more work on this code, but this much is indubitably a bug fix.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, matti(dot)hameister(at)technologygroup(dot)de, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] BUG #10728: json_to_recordset with nested json objects NULLs columns
Date: 2014-06-26 22:31:54
Message-ID: 78528.1403821914@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

I wrote:
> * Get rid of the use_json_as_text flag argument for the new functions.
> In json_populate_record(set), ignore its value and deprecate using it.
> (The fact that it already had a default makes that easier.) The
> behavior should always be as below.

Here's a draft patch that gets rid of the use_json_as_text flag argument
altogether for all the functions newly added in 9.4. The pre-existing
json_populate_record(set) functions still have such an argument, but
it's ignored and undocumented (not that it was well documented before).
The behavior is as if the flag had been specified as true, which AFAICS
is the only useful case.

This does not do anything about the question of possibly printing
JSON arrays in SQL array syntax when the target record field is of
array type. While I think that's definitely do-able, it would require
some significant rewriting of the code, which is probably not something
to be doing at this point for 9.4. I think it's something we can add
as a new feature in 9.5 or later, because it changes the behavior only
in cases that are currently guaranteed-to-fail, so there's not really
any backwards compatibility problem IMO.

BTW, depending on how hard we want to work, one could imagine also
printing JSON objects in SQL record format if the target type is record
rather than JSON, and then making this happen recursively for nested
arrays/composites. Again, any attempt to do that today would fail with
a syntax error from record_in, so there's no backwards compatibility
problem.

regards, tom lane

Attachment Content-Type Size
no-use-json-as-text-flag.patch text/x-patch 69.2 KB