Re: row_to_json bug with index only scans: empty keys!

Lists: pgsql-hackers
From: Ross Reedstrom <reedstrm(at)rice(dot)edu>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: row_to_json bug with index only scans: empty keys!
Date: 2014-11-07 15:51:35
Message-ID: 20141107155135.GA11578@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is a serious bug in 9.3.5 and 9.4 beta3:

row_to_json() yields empty strings for json keys if the data is
fulfilled by an index only scan.

Example:

testjson=# select count(*) from document_acl;
count
-------
426
(1 row)

testjson=# SELECT row_to_json(combined_rows) FROM (
SELECT uuid, user_id AS uid, permission
FROM document_acl_text AS acl
WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
ORDER BY user_id ASC, permission ASC
) as combined_rows;
row_to_json
---------------------------------------------------------------------
{"":"8f774048-8936-4d7f-aa38-1974c91bbef2","":"admin","":"publish"}
(1 row)

testjson=# explain SELECT row_to_json(combined_rows) FROM (
SELECT uuid, user_id AS uid, permission
FROM document_acl_text AS acl
WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
ORDER BY user_id ASC, permission ASC
) as combined_rows;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Subquery Scan on combined_rows (cost=0.27..8.30 rows=1 width=76)
-> Index Only Scan using document_acl_text_pkey on document_acl_text acl (cost=0.27..8.29 rows=1 width=52)
Index Cond: (uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'::text)
Planning time: 0.093 ms
(4 rows)

# set enable_indexonlyscan to off;
SET
testjson=# SELECT row_to_json(combined_rows) FROM (
SELECT uuid, user_id AS uid, permission
FROM document_acl_text AS acl
WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
ORDER BY user_id ASC, permission ASC
) as combined_rows;
row_to_json
------------------------------------------------------------------------------------------
{"uuid":"8f774048-8936-4d7f-aa38-1974c91bbef2","user_id":"admin","permission":"publish"}
(1 row)

tjson=# explain SELECT row_to_json(combined_rows) FROM (
SELECT uuid, user_id AS uid, permission
FROM document_acl_text AS acl
WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
ORDER BY user_id ASC, permission ASC
) as combined_rows;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------
Subquery Scan on combined_rows (cost=0.27..8.30 rows=1 width=76)
-> Index Scan using document_acl_text_pkey on document_acl_text acl (cost=0.27..8.29 rows=1 width=52)
Index Cond: (uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'::text)
Planning time: 0.095 ms
(4 rows)

We have a table defined as so:

CREATE TYPE permission_type AS ENUM (
'publish'
);

create table "document_acl" (
"uuid" UUID,
"user_id" TEXT,
"permission" permission_type NOT NULL,
PRIMARY KEY ("uuid", "user_id", "permission"),
FOREIGN KEY ("uuid") REFERENCES document_controls ("uuid")
);

The uuid and enums make no difference - I've made an all text version as well,
same problem.

testjson=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 9.4beta3 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit
(1 row)

Ross
--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-07 16:17:48
Message-ID: 545CF0AC.9040809@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/07/2014 10:51 AM, Ross Reedstrom wrote:
> This is a serious bug in 9.3.5 and 9.4 beta3:
>
> row_to_json() yields empty strings for json keys if the data is
> fulfilled by an index only scan.
>
> Example:
>
> testjson=# select count(*) from document_acl;
> count
> -------
> 426
> (1 row)
>
> testjson=# SELECT row_to_json(combined_rows) FROM (
> SELECT uuid, user_id AS uid, permission
> FROM document_acl_text AS acl
> WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
> ORDER BY user_id ASC, permission ASC
> ) as combined_rows;
> row_to_json
> ---------------------------------------------------------------------
> {"":"8f774048-8936-4d7f-aa38-1974c91bbef2","":"admin","":"publish"}

That seems odd. Here's what the relevant code does:

td = DatumGetHeapTupleHeader(composite);

/* Extract rowtype info and find a tupdesc */
tupType = HeapTupleHeaderGetTypeId(td);
tupTypmod = HeapTupleHeaderGetTypMod(td);
tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);

...
for (i = 0; i < tupdesc->natts; i++)
...

attname = NameStr(tupdesc->attrs[i]->attname);
escape_json(result, attname);

Could this be a bug in lookup_rowtype_tupdesc()?

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-07 20:15:48
Message-ID: 545D2874.2090300@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/07/2014 11:17 AM, Andrew Dunstan wrote:
>
> On 11/07/2014 10:51 AM, Ross Reedstrom wrote:
>> This is a serious bug in 9.3.5 and 9.4 beta3:
>>
>> row_to_json() yields empty strings for json keys if the data is
>> fulfilled by an index only scan.
>>
>> Example:
>>
>> testjson=# select count(*) from document_acl;
>> count
>> -------
>> 426
>> (1 row)
>>
>> testjson=# SELECT row_to_json(combined_rows) FROM (
>> SELECT uuid, user_id AS uid, permission
>> FROM document_acl_text AS acl
>> WHERE uuid = '8f774048-8936-4d7f-aa38-1974c91bbef2'
>> ORDER BY user_id ASC, permission ASC
>> ) as combined_rows;
>> row_to_json
>> ---------------------------------------------------------------------
>> {"":"8f774048-8936-4d7f-aa38-1974c91bbef2","":"admin","":"publish"}
>
>
> That seems odd. Here's what the relevant code does:
>
> td = DatumGetHeapTupleHeader(composite);
>
> /* Extract rowtype info and find a tupdesc */
> tupType = HeapTupleHeaderGetTypeId(td);
> tupTypmod = HeapTupleHeaderGetTypMod(td);
> tupdesc = lookup_rowtype_tupdesc(tupType, tupTypmod);
>
> ...
> for (i = 0; i < tupdesc->natts; i++)
> ...
>
> attname = NameStr(tupdesc->attrs[i]->attname);
> escape_json(result, attname);
>
> Could this be a bug in lookup_rowtype_tupdesc()?
>
>
>

Further data point:

There's nothing json-specific about this, BTW:

andrew=# select hstore(q) from (select * from idxo order by a) q;
hstore
---------
""=>"1"
(1 row)

andrew=# set enable_seqscan = true;
SET
andrew=# select hstore(q) from (select * from idxo order by a) q;
hstore
------------------------------
"a"=>"1", "b"=>"b", "c"=>"c"
(1 row)

So it looks like the index scan only stuff is broken somewhere.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-07 21:59:40
Message-ID: 23206.1415397580@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/07/2014 10:51 AM, Ross Reedstrom wrote:
>> row_to_json() yields empty strings for json keys if the data is
>> fulfilled by an index only scan.

> Could this be a bug in lookup_rowtype_tupdesc()?

I think this is probably a variant of bug #11210, in which the problem is
that tupledescs bubbled up from inheritance children never get column
names assigned to them. I've been speculating about ways to fix that
but I've not thought of anything that's not kinda painful.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 00:06:41
Message-ID: 545D5E91.9020705@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/07/2014 04:59 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 11/07/2014 10:51 AM, Ross Reedstrom wrote:
>>> row_to_json() yields empty strings for json keys if the data is
>>> fulfilled by an index only scan.
>> Could this be a bug in lookup_rowtype_tupdesc()?
> I think this is probably a variant of bug #11210, in which the problem is
> that tupledescs bubbled up from inheritance children never get column
> names assigned to them. I've been speculating about ways to fix that
> but I've not thought of anything that's not kinda painful.

Yeah, I've been running this down a bit with a debugger, and it looked
kinda like that. Maybe we should look for this in places we know it
matters (e.g. hstore_from_record() and composite_to_json() and raise an
error if we find empty names, with a hint to do (what?).

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 04:27:56
Message-ID: 31531.1415420876@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/07/2014 04:59 PM, Tom Lane wrote:
>> I think this is probably a variant of bug #11210, in which the problem is
>> that tupledescs bubbled up from inheritance children never get column
>> names assigned to them. I've been speculating about ways to fix that
>> but I've not thought of anything that's not kinda painful.

> Yeah, I've been running this down a bit with a debugger, and it looked
> kinda like that.

After chewing on this for awhile, it occurred to me that the problem could
be solved in ExecEvalWholeRowVar, which is really the only place that
produces composite datums "from scratch". We can look up the RTE that
the whole-row Var references and scrape column aliases from it. The
attached draft patch does that, and appears to fix both Ross' example
and the one in bug #11210.

Although this patch doesn't change any existing regression test outputs
as far as I can find, it's not hard to think of cases where it will change
the results. In particular:

regression=# create table foo (f1 int, f2 int);
CREATE TABLE
regression=# insert into foo values(1,2);
INSERT 0 1
regression=# select row_to_json(f) from foo f;
row_to_json
-----------------
{"f1":1,"f2":2}
(1 row)

regression=# select row_to_json(f) from foo f(x,y);
row_to_json
---------------
{"x":1,"y":2}
(1 row)

Without this patch, you get the same result from the second query as
from the first. I argue that using the column aliases is clearly saner
behavior; I think there have been complaints about that in the past,
but am too lazy to trawl the archives right now. However, it's also
clear that people might have applications that are expecting the old
behavior and don't realize that it might be thought a bug.

I'm hesitant to back-patch this anyway, because I'm not sure that there
aren't any other unexpected side-effects. Note in particular the
assertion I had to weaken in ExecEvalConvertRowtype because a whole-row
Var is now capable of returning tuple datums that are marked with a
different rowtype than the parser marked the Var with. That's a little
bit scary ...

On the positive side, this lets us remove some kluges, like the place in
nodeFunctionscan.c where we were hacking the output tupledesc's column
names in pretty much the same way this patch has ExecEvalWholeRowVar do.
I thought that was a crock when it went in, because no other plan node
is doing anything like that, so it's good to be able to remove it.

Thoughts?

regards, tom lane

Attachment Content-Type Size
whole-row-var-column-aliases.patch text/x-diff 11.1 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 14:26:45
Message-ID: CA+TgmoaWHP9tuFkei+pyRhgGwHwQ3pOiO+LHMxjMSmR+U0t++g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 7, 2014 at 11:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thoughts?

I'm not sure whether this is safe enough to back-patch, but it seems
like we should probably plan to back-patch *something*, because the
status quo isn't great either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 15:08:12
Message-ID: 545E31DC.9020009@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/08/2014 09:26 AM, Robert Haas wrote:
> On Fri, Nov 7, 2014 at 11:27 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Thoughts?
> I'm not sure whether this is safe enough to back-patch, but it seems
> like we should probably plan to back-patch *something*, because the
> status quo isn't great either.
>

I confirm that Tom's patch does indeed fix my test case that produces
empty field names.

We should probably not backpatch it, as it is a behaviour change.
However, I do think we should add checks in composite_to_json and
hstore_from_record for empty field names, and error out if they are
found. That seems like an outright bug which we should defend against,
including in the back branches. Giving back json or hstore objects with
made up names like f1 is one thing, giving them back with empty names
(which, in the hstore case will collapse everything to a single field)
is far worse.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 16:24:39
Message-ID: 20558.1415463879@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/08/2014 09:26 AM, Robert Haas wrote:
>> I'm not sure whether this is safe enough to back-patch, but it seems
>> like we should probably plan to back-patch *something*, because the
>> status quo isn't great either.

> I confirm that Tom's patch does indeed fix my test case that produces
> empty field names.

> We should probably not backpatch it, as it is a behaviour change.
> However, I do think we should add checks in composite_to_json and
> hstore_from_record for empty field names, and error out if they are
> found.

That seems like a pretty silly move: it wouldn't actually fix anything,
and it would break cases that perhaps are acceptable to users today.

We could reduce the risks involved by narrowing the cases in which
ExecEvalWholeRowVar will replace field names it got from the input.
I'd be inclined to propose:

1. If Var is of a named composite type, use *exactly* the field names
associated with that type. (This avoids the need to possibly produce
RECORD outputs from a named-type Var, thus removing the Assert-weakening
issue.)

2. If Var is of type RECORD, replace only empty field names with aliases
from the RTE. (This might sound inconsistent --- could you end up with
some names coming from point A and some from point B? --- but in practice
I think it would always be all-or-nothing, because the issue is whether
or not the planner bothered to attach column names to a lower-level
targetlist.)

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 16:57:39
Message-ID: 545E4B83.1070309@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/08/2014 11:24 AM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 11/08/2014 09:26 AM, Robert Haas wrote:
>>> I'm not sure whether this is safe enough to back-patch, but it seems
>>> like we should probably plan to back-patch *something*, because the
>>> status quo isn't great either.
>> I confirm that Tom's patch does indeed fix my test case that produces
>> empty field names.
>> We should probably not backpatch it, as it is a behaviour change.
>> However, I do think we should add checks in composite_to_json and
>> hstore_from_record for empty field names, and error out if they are
>> found.
> That seems like a pretty silly move: it wouldn't actually fix anything,
> and it would break cases that perhaps are acceptable to users today.

What evidence do you have that it might be acceptable to today's users?
The only evidence we have that I know of is Ross' complaint that
indicates that it's not acceptable.

However,

>
> We could reduce the risks involved by narrowing the cases in which
> ExecEvalWholeRowVar will replace field names it got from the input.
> I'd be inclined to propose:
>
> 1. If Var is of a named composite type, use *exactly* the field names
> associated with that type. (This avoids the need to possibly produce
> RECORD outputs from a named-type Var, thus removing the Assert-weakening
> issue.)
>
> 2. If Var is of type RECORD, replace only empty field names with aliases
> from the RTE. (This might sound inconsistent --- could you end up with
> some names coming from point A and some from point B? --- but in practice
> I think it would always be all-or-nothing, because the issue is whether
> or not the planner bothered to attach column names to a lower-level
> targetlist.)
>
>

I assume that's what you would propose for just the stable branches, and
that going forward we'd always use the aliases from the RTE? Or maybe
I'm not quite understanding enough which cases will be covered. To the
extent that I do this sounds OK.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 17:14:02
Message-ID: 22095.1415466842@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/08/2014 11:24 AM, Tom Lane wrote:
>> That seems like a pretty silly move: it wouldn't actually fix anything,
>> and it would break cases that perhaps are acceptable to users today.

> What evidence do you have that it might be acceptable to today's users?
> The only evidence we have that I know of is Ross' complaint that
> indicates that it's not acceptable.

The fact that we've only gotten two bug reports in however many years the
failure has been possible might mean that few people encounter the case,
or it might mean that it doesn't break things for their usages so badly
that they need to complain. If the latter, throwing an error rather than
fixing it is not going to be an improvement.

> I assume that's what you would propose for just the stable branches, and
> that going forward we'd always use the aliases from the RTE?

That would be my druthers. But given the lack of complaints, maybe we
should just stick to the more-backwards-compatible behavior until someone
does complain. Thoughts?

regards, tom lane


From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 17:22:07
Message-ID: 1415467327.82056.YahooMailNeo@web122302.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:

>> I assume that's what you would propose for just the stable branches, and
>> that going forward we'd always use the aliases from the RTE?
>
> That would be my druthers. But given the lack of complaints, maybe we
> should just stick to the more-backwards-compatible behavior until someone
> does complain. Thoughts?

I think consistent use of the aliases would be less surprising and
should be what we do on master. I agree that we should avoid
breaking anything that might be working as intended on stable
branches.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 17:30:05
Message-ID: 545E531D.6010808@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/08/2014 12:14 PM, Tom Lane wrote:
>> I assume that's what you would propose for just the stable branches, and
>> that going forward we'd always use the aliases from the RTE?
> That would be my druthers. But given the lack of complaints, maybe we
> should just stick to the more-backwards-compatible behavior until someone
> does complain. Thoughts?
>
>

Wouldn't that would mean we might not pick up the expected aliases in

select row_to_json(q) from (select a,b from foo) as q(x,y)

? If so, I'm definitely in favor of fixing this now.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 17:40:28
Message-ID: 22920.1415468428@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/08/2014 12:14 PM, Tom Lane wrote:
>> That would be my druthers. But given the lack of complaints, maybe we
>> should just stick to the more-backwards-compatible behavior until someone
>> does complain. Thoughts?

> Wouldn't that would mean we might not pick up the expected aliases in
> select row_to_json(q) from (select a,b from foo) as q(x,y)
> ? If so, I'm definitely in favor of fixing this now.

Well, it's inconsistent now. In existing releases you might or might not
get x,y:

regression=# create table foo (f1 int, f2 int);
CREATE TABLE
regression=# insert into foo values(1,2);
INSERT 0 1
regression=# select row_to_json(q) from (select f1 as a, f2 as b from foo) as q(x,y);
row_to_json
---------------
{"x":1,"y":2}
(1 row)

regression=# select row_to_json(q) from (select f1 as a, f2 as b from foo offset 0) as q(x,y);
row_to_json
-----------------
{"f1":1,"f2":2}
(1 row)

That seems like something that's worth fixing going forward, but it's a
lot harder to make the case that we should change it in back branches.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 18:03:18
Message-ID: 545E5AE6.9000606@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/08/2014 12:40 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> On 11/08/2014 12:14 PM, Tom Lane wrote:
>>> That would be my druthers. But given the lack of complaints, maybe we
>>> should just stick to the more-backwards-compatible behavior until someone
>>> does complain. Thoughts?
>> Wouldn't that would mean we might not pick up the expected aliases in
>> select row_to_json(q) from (select a,b from foo) as q(x,y)
>> ? If so, I'm definitely in favor of fixing this now.
> Well, it's inconsistent now. In existing releases you might or might not
> get x,y:
>
> regression=# create table foo (f1 int, f2 int);
> CREATE TABLE
> regression=# insert into foo values(1,2);
> INSERT 0 1
> regression=# select row_to_json(q) from (select f1 as a, f2 as b from foo) as q(x,y);
> row_to_json
> ---------------
> {"x":1,"y":2}
> (1 row)
>
> regression=# select row_to_json(q) from (select f1 as a, f2 as b from foo offset 0) as q(x,y);
> row_to_json
> -----------------
> {"f1":1,"f2":2}
> (1 row)
>
> That seems like something that's worth fixing going forward, but it's a
> lot harder to make the case that we should change it in back branches.
>
>

Sure. As long as we fix the empty alias issue in the back branches, just
fixing this prospectively seems fine. But I don't think we should wait
for more complaints.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 18:09:23
Message-ID: 23873.1415470163@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oh, some more fun: a RowExpr that's labeled with a named composite type
(rather than RECORD) has the same issue of not respecting aliases.
Continuing previous example with table "foo":

regression=# create view vv as select * from foo;
CREATE VIEW
regression=# select row_to_json(q) from vv q;
row_to_json
-----------------
{"f1":1,"f2":2}
(1 row)

regression=# select row_to_json(q) from vv q(a,b);
row_to_json
-----------------
{"f1":1,"f2":2}
(1 row)

So that's another case we probably want to change in HEAD but not the back
branches.

regards, tom lane


From: Ross Reedstrom <reedstrm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 21:19:36
Message-ID: 20141108211935.GA541@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've no opinion on the not respecting aliases aspect of this, but both
the hstore and json emtpy keys case breaks the format: it's duplicate keys
that collapse to a single value, and expected keynames are missing.

The insidious bit about this bug though is that it works fine during testing
with the developers typically small data sets. It's only triggered in my case
when we the plan switches to index-only. Even an index scan works fine. I can't
imagine that there is code out there that _depends_ on this behavior. Just as
likely to me are that there exist systems that just have "can't reproduce" bugs
that would be fixed by this.

Ross

On Sat, Nov 08, 2014 at 01:09:23PM -0500, Tom Lane wrote:
> Oh, some more fun: a RowExpr that's labeled with a named composite type
> (rather than RECORD) has the same issue of not respecting aliases.
> Continuing previous example with table "foo":
>
> regression=# create view vv as select * from foo;
> CREATE VIEW
> regression=# select row_to_json(q) from vv q;
> row_to_json
> -----------------
> {"f1":1,"f2":2}
> (1 row)
>
> regression=# select row_to_json(q) from vv q(a,b);
> row_to_json
> -----------------
> {"f1":1,"f2":2}
> (1 row)
>
> So that's another case we probably want to change in HEAD but not the back
> branches.
>
> regards, tom lane
>

--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Ross Reedstrom <reedstrm(at)rice(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-08 21:32:25
Message-ID: 545E8BE9.7020802@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/08/2014 04:19 PM, Ross Reedstrom wrote:
> I've no opinion on the not respecting aliases aspect of this, but both
> the hstore and json emtpy keys case breaks the format: it's duplicate keys
> that collapse to a single value, and expected keynames are missing.
>
> The insidious bit about this bug though is that it works fine during testing
> with the developers typically small data sets. It's only triggered in my case
> when we the plan switches to index-only. Even an index scan works fine. I can't
> imagine that there is code out there that _depends_ on this behavior. Just as
> likely to me are that there exist systems that just have "can't reproduce" bugs
> that would be fixed by this.
>
>

No, I can't imagine it either - it's utterly broken. That's the piece
that Tom is proposing to fix on the back branches. AIUI.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-09 21:00:45
Message-ID: 11282.1415566845@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> We could reduce the risks involved by narrowing the cases in which
> ExecEvalWholeRowVar will replace field names it got from the input.
> I'd be inclined to propose:

> 1. If Var is of a named composite type, use *exactly* the field names
> associated with that type. (This avoids the need to possibly produce
> RECORD outputs from a named-type Var, thus removing the Assert-weakening
> issue.)

> 2. If Var is of type RECORD, replace only empty field names with aliases
> from the RTE. (This might sound inconsistent --- could you end up with
> some names coming from point A and some from point B? --- but in practice
> I think it would always be all-or-nothing, because the issue is whether
> or not the planner bothered to attach column names to a lower-level
> targetlist.)

Attached are patches meant for HEAD and 9.2-9.4 respectively. The HEAD
patch extends the prior patch to fix the RowExpr case I mentioned
yesterday. The back-branch patch works as suggested above. I added a
bunch of regression tests that document behavior in this area. The two
patches include the same set of tests but have different expected output
for the cases we are intentionally not fixing in back branches. If you
try the back-branch regression tests on an unpatched backend, you can
verify that the only cases that change behavior are ones where current
sources put out empty field names.

The test cases use row_to_json() so they could not be used directly before
9.2. I tried the same cases using hstore(record) in 9.1. While 9.1 does
some pretty darn dubious things, it does not produce empty field names in
any of these cases, so I think we probably should not consider
back-patching further than 9.2.

regards, tom lane

Attachment Content-Type Size
composite-column-aliases-HEAD.patch text/x-diff 25.1 KB
composite-column-aliases-9.4.patch text/x-diff 18.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 15:11:25
Message-ID: 25271.1415632285@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Attached are patches meant for HEAD and 9.2-9.4 respectively.

BTW, has anyone got an opinion about whether to stick the full fix into
9.4? The argument for, of course, is that we'd get the full fix out to
the public a year sooner. The argument against is that someone might
have already done compatibility testing of their application against
9.4 betas, and then get blindsided if we change this before release.

I'm inclined to think that since we expect json/jsonb usage to really
take off with 9.4, it might be better if we get row_to_json behaving
unsurprisingly now. But I'm not dead set on it.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 15:19:28
Message-ID: CA+TgmoZhtpaGRW3c+M2Y0V3haC6wrK4SA-mLRSjr5-Yom1scqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Nov 10, 2014 at 10:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Attached are patches meant for HEAD and 9.2-9.4 respectively.
>
> BTW, has anyone got an opinion about whether to stick the full fix into
> 9.4? The argument for, of course, is that we'd get the full fix out to
> the public a year sooner. The argument against is that someone might
> have already done compatibility testing of their application against
> 9.4 betas, and then get blindsided if we change this before release.
>
> I'm inclined to think that since we expect json/jsonb usage to really
> take off with 9.4, it might be better if we get row_to_json behaving
> unsurprisingly now. But I'm not dead set on it.

I think we should put the full fix in 9.4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 15:30:06
Message-ID: 1415633406543-5826338.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane-2 wrote
> I wrote:
>> Attached are patches meant for HEAD and 9.2-9.4 respectively.
>
> BTW, has anyone got an opinion about whether to stick the full fix into
> 9.4? The argument for, of course, is that we'd get the full fix out to
> the public a year sooner. The argument against is that someone might
> have already done compatibility testing of their application against
> 9.4 betas, and then get blindsided if we change this before release.
>
> I'm inclined to think that since we expect json/jsonb usage to really
> take off with 9.4, it might be better if we get row_to_json behaving
> unsurprisingly now. But I'm not dead set on it.

I am not one of those people who would be blindsided but I'd much rather
inconvenience our beta testers in order to get a better product in place for
the masses.

Beta testers read release notes and should be able to identify and fix any
problematic areas of their code while simultaneously being happy for the
improvements.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/row-to-json-bug-with-index-only-scans-empty-keys-tp5826070p5826338.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Ross Reedstrom <reedstrm(at)rice(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 15:30:21
Message-ID: 20141110153021.GA18057@rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Yes, it's late beta, but especially if we're pushing json/jsonb usage
as a major feature of this release, I'd say remove surprising cases
now. It's late beta, but that's what beta is for, the last chance for
bug fixes, before we live w/ it for the support life of the release.

The affected class are people with an app that already uses json,
so 9.2 or better, have ran acceptance tests against an early beta of
9.4, and have SQL that returns madeup fieldnames instead of the appropriate
aliases? Which they were probably annoyed at when they wrote the code that
consumes that output in the first place. I think an item in the list of
compatability changes/gotchas should catch anyone who is that on the ball
as to be testing the betas anyway. Anyone pushing that envelope is going to
do the same acceptance testing against the actual release before rolling
to production.

Ross

On Mon, Nov 10, 2014 at 10:11:25AM -0500, Tom Lane wrote:
> I wrote:
> > Attached are patches meant for HEAD and 9.2-9.4 respectively.
>
> BTW, has anyone got an opinion about whether to stick the full fix into
> 9.4? The argument for, of course, is that we'd get the full fix out to
> the public a year sooner. The argument against is that someone might
> have already done compatibility testing of their application against
> 9.4 betas, and then get blindsided if we change this before release.
>
> I'm inclined to think that since we expect json/jsonb usage to really
> take off with 9.4, it might be better if we get row_to_json behaving
> unsurprisingly now. But I'm not dead set on it.
>
> regards, tom lane
>

--
Ross Reedstrom, Ph.D. reedstrm(at)rice(dot)edu
Systems Engineer & Admin, Research Scientist phone: 713-348-6166
Connexions http://cnx.org fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 15:30:53
Message-ID: 5460DA2D.9010906@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 11/10/2014 10:19 AM, Robert Haas wrote:
> On Mon, Nov 10, 2014 at 10:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I wrote:
>>> Attached are patches meant for HEAD and 9.2-9.4 respectively.
>> BTW, has anyone got an opinion about whether to stick the full fix into
>> 9.4? The argument for, of course, is that we'd get the full fix out to
>> the public a year sooner. The argument against is that someone might
>> have already done compatibility testing of their application against
>> 9.4 betas, and then get blindsided if we change this before release.
>>
>> I'm inclined to think that since we expect json/jsonb usage to really
>> take off with 9.4, it might be better if we get row_to_json behaving
>> unsurprisingly now. But I'm not dead set on it.
> I think we should put the full fix in 9.4.
>

+1

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Ross Reedstrom <reedstrm(at)rice(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: row_to_json bug with index only scans: empty keys!
Date: 2014-11-10 20:28:02
Message-ID: 18383.1415651282@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Mon, Nov 10, 2014 at 10:11 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> BTW, has anyone got an opinion about whether to stick the full fix into
>> 9.4?

> I think we should put the full fix in 9.4.

Since nobody spoke against that, I've put the full fix into HEAD and 9.4,
and the restricted version into 9.3 and 9.2.

regards, tom lane