Quick Links

Re: Confusing EXPLAIN output in case of inherited tables

Lists:	pgsql-hackers

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-11 11:43:10
Message-ID:	CAFjFpRcoE1poEOZoYYpN7c2aZsDziVhzf2mP3HwuiQvem+VLuw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,
After running regression, I ran EXPLAIN on one of the queries in regression
(test create_misc) and got following output
regression=# explain verbose select * into table ramp from road where name
~ '.*Ramp';
QUERY
PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..154.00 rows=841 width=67)
Output: public.road.name, public.road.thepath
-> Append (cost=0.00..154.00 rows=841 width=67)
-> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67)
Output: public.road.name, public.road.thepath
Filter: (public.road.name ~ '.*Ramp'::text)
-> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
width=67)
^^^^^
Output: public.road.name, public.road.thepath
^^^^^^^^^^, ^^^^^^
Filter: (public.road.name ~ '.*Ramp'::text)
^^^^^^^^^^^
-> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
width=67)
Output: public.road.name, public.road.thepath
Filter: (public.road.name ~ '.*Ramp'::text)
(12 rows)

Table "road" has children "ihighway" and "shighway" as seen in the \d+
output above. The EXPLAIN output of Seq Scan node on children has
"public.road" as prefix for variables. "public.road" could imply the parent
table "road" and thus can cause confusion, as to what's been referreed, the
columns of parent table or child table. In the EXPLAIN output children
tables have "road" as alias (as against "public.road"). The alias comes
from RangeTblEntry->eref->aliasname. It might be better to have "road" as
prefix in the variable names over "public.road".

The reason why this happens is the code in get_variable()
3865 /* Exceptions occur only if the RTE is alias-less */
3866 if (rte->alias == NULL)
3867 {
3868 if (rte->rtekind == RTE_RELATION)
3869 {
3870 /*
3871 * It's possible that use of the bare refname would find
another
3872 * more-closely-nested RTE, or be ambiguous, in which case
we need
3873 * to specify the schemaname to avoid these errors.
3874 */
3875 if (find_rte_by_refname(rte->eref->aliasname, context) !=
rte)
3876 schemaname =
get_namespace_name(get_rel_namespace(rte->relid));
3877 }

If there is no alias, we find out the schema name and later add it to the
prefix. In the inherited table case, we are actually creating a "kind of"
alias for the children table and thus we should not find out the schema
name and add it to the prefix. This case has been taken care of in
get_from_clause_item(),
6505 else if (rte->rtekind == RTE_RELATION &&
6506 strcmp(rte->eref->aliasname,
get_relation_name(rte->relid)) != 0)
6507 {
6508 /*
6509 * Apparently the rel has been renamed since the rule was
made.
6510 * Emit a fake alias clause so that variable references
will still
6511 * work. This is not a 100% solution but should work in
most
6512 * reasonable situations.
6513 */
6514 appendStringInfo(buf, " %s",
6515 quote_identifier(rte->eref->aliasname));
6516 gavealias = true;
6517 }

I see similar code in ExplainTargetRel()
1778 if (objectname == NULL ||
1779 strcmp(rte->eref->aliasname, objectname) != 0)
1780 appendStringInfo(es->str, " %s",
1781 quote_identifier(rte->eref->aliasname));

Based on this, here is patch to not add schemaname in the prefix for a
variable.

I have run make check. All except inherit.sql passed. The expected output
change is included in the patch.

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

Attachment	Content-Type	Size
pg_inh_prefix.patch	text/x-diff	4.4 KB

From:	Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-11 11:55:21
Message-ID:	CAPtHcnHr3EFr0yWOWzNByvfNHrvG6qH_Bn50F0B_Ch8_NGSrkA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 11, 2012 at 5:13 PM, Ashutosh Bapat <
ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:

> Hi,
> After running regression, I ran EXPLAIN on one of the queries in
> regression (test create_misc) and got following output
> regression=# explain verbose select * into table ramp from road where name
> ~ '.*Ramp';
> QUERY
> PLAN
>
> ------------------------------------------------------------------------------------
> Result (cost=0.00..154.00 rows=841 width=67)
> Output: public.road.name, public.road.thepath
> -> Append (cost=0.00..154.00 rows=841 width=67)
> -> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67)
> Output: public.road.name, public.road.thepath
> Filter: (public.road.name ~ '.*Ramp'::text)
> -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
> width=67)
> ^^^^^
> Output: public.road.name, public.road.thepath
> ^^^^^^^^^^, ^^^^^^
> Filter: (public.road.name ~ '.*Ramp'::text)
> ^^^^^^^^^^^
> -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
> width=67)
> Output: public.road.name, public.road.thepath
> Filter: (public.road.name ~ '.*Ramp'::text)
> (12 rows)
>
> regression=# \d+ road
> Table "public.road"
> Column | Type | Modifiers | Storage | Stats target | Description
> ---------+------+-----------+----------+--------------+-------------
> name | text | | extended | |
> thepath | path | | extended | |
> Indexes:
> "rix" btree (name)
> Child tables: ihighway,
> shighway
> Has OIDs: no
>
> Table "road" has children "ihighway" and "shighway" as seen in the \d+
> output above. The EXPLAIN output of Seq Scan node on children has
> "public.road" as prefix for variables. "public.road" could imply the parent
> table "road" and thus can cause confusion, as to what's been referreed, the
> columns of parent table or child table. In the EXPLAIN output children
> tables have "road" as alias (as against "public.road"). The alias comes
> from RangeTblEntry->eref->aliasname. It might be better to have "road" as
> prefix in the variable names over "public.road".
>
> The reason why this happens is the code in get_variable()
> 3865 /* Exceptions occur only if the RTE is alias-less */
> 3866 if (rte->alias == NULL)
> 3867 {
> 3868 if (rte->rtekind == RTE_RELATION)
> 3869 {
> 3870 /*
> 3871 * It's possible that use of the bare refname would find
> another
> 3872 * more-closely-nested RTE, or be ambiguous, in which
> case we need
> 3873 * to specify the schemaname to avoid these errors.
> 3874 */
> 3875 if (find_rte_by_refname(rte->eref->aliasname, context) !=
> rte)
> 3876 schemaname =
> get_namespace_name(get_rel_namespace(rte->relid));
> 3877 }
>
> If there is no alias, we find out the schema name and later add it to the
> prefix. In the inherited table case, we are actually creating a "kind of"
> alias for the children table and thus we should not find out the schema
> name and add it to the prefix. This case has been taken care of in
> get_from_clause_item(),
> 6505 else if (rte->rtekind == RTE_RELATION &&
> 6506 strcmp(rte->eref->aliasname,
> get_relation_name(rte->relid)) != 0)
> 6507 {
> 6508 /*
> 6509 * Apparently the rel has been renamed since the rule was
> made.
> 6510 * Emit a fake alias clause so that variable references
> will still
> 6511 * work. This is not a 100% solution but should work in
> most
> 6512 * reasonable situations.
> 6513 */
> 6514 appendStringInfo(buf, " %s",
> 6515 quote_identifier(rte->eref->aliasname));
> 6516 gavealias = true;
> 6517 }
>
> I see similar code in ExplainTargetRel()
> 1778 if (objectname == NULL ||
> 1779 strcmp(rte->eref->aliasname, objectname) != 0)
> 1780 appendStringInfo(es->str, " %s",
> 1781 quote_identifier(rte->eref->aliasname));
>
> Based on this, here is patch to not add schemaname in the prefix for a
> variable.
>
> I have run make check. All except inherit.sql passed. The expected output
> change is included in the patch.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EntepriseDB Corporation
> The Enterprise Postgres Company
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
A table can inherit from one or more parent table. So in that case,
qualifying schema/table name
helps in finding out where the column is coming from.

Regards,
Chetan

--
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Website: www.enterprisedb.com
EnterpriseDB Blog : http://blogs.enterprisedb.com
Follow us on Twitter : http://www.twitter.com/enterprisedb

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Chetan Suttraway <chetan(dot)suttraway(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-12 04:28:55
Message-ID:	CAFjFpReknaRjdBN3NMYYsGNeaVCtVfL-2WXfoCH5gPcBaNvPgg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 11, 2012 at 5:25 PM, Chetan Suttraway <
chetan(dot)suttraway(at)enterprisedb(dot)com> wrote:

>
>
> On Wed, Jan 11, 2012 at 5:13 PM, Ashutosh Bapat <
> ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
>
>> Hi,
>> After running regression, I ran EXPLAIN on one of the queries in
>> regression (test create_misc) and got following output
>> regression=# explain verbose select * into table ramp from road where
>> name ~ '.*Ramp';
>> QUERY
>> PLAN
>>
>> ------------------------------------------------------------------------------------
>> Result (cost=0.00..154.00 rows=841 width=67)
>> Output: public.road.name, public.road.thepath
>> -> Append (cost=0.00..154.00 rows=841 width=67)
>> -> Seq Scan on public.road (cost=0.00..135.05 rows=418
>> width=67)
>> Output: public.road.name, public.road.thepath
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
>> width=67)
>> ^^^^^
>> Output: public.road.name, public.road.thepath
>> ^^^^^^^^^^, ^^^^^^
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> ^^^^^^^^^^^
>> -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
>> width=67)
>> Output: public.road.name, public.road.thepath
>> Filter: (public.road.name ~ '.*Ramp'::text)
>> (12 rows)
>>
>> regression=# \d+ road
>> Table "public.road"
>> Column | Type | Modifiers | Storage | Stats target | Description
>> ---------+------+-----------+----------+--------------+-------------
>> name | text | | extended | |
>> thepath | path | | extended | |
>> Indexes:
>> "rix" btree (name)
>> Child tables: ihighway,
>> shighway
>> Has OIDs: no
>>
>> Table "road" has children "ihighway" and "shighway" as seen in the \d+
>> output above. The EXPLAIN output of Seq Scan node on children has
>> "public.road" as prefix for variables. "public.road" could imply the parent
>> table "road" and thus can cause confusion, as to what's been referreed, the
>> columns of parent table or child table. In the EXPLAIN output children
>> tables have "road" as alias (as against "public.road"). The alias comes
>> from RangeTblEntry->eref->aliasname. It might be better to have "road" as
>> prefix in the variable names over "public.road".
>>
>> The reason why this happens is the code in get_variable()
>> 3865 /* Exceptions occur only if the RTE is alias-less */
>> 3866 if (rte->alias == NULL)
>> 3867 {
>> 3868 if (rte->rtekind == RTE_RELATION)
>> 3869 {
>> 3870 /*
>> 3871 * It's possible that use of the bare refname would find
>> another
>> 3872 * more-closely-nested RTE, or be ambiguous, in which
>> case we need
>> 3873 * to specify the schemaname to avoid these errors.
>> 3874 */
>> 3875 if (find_rte_by_refname(rte->eref->aliasname, context)
>> != rte)
>> 3876 schemaname =
>> get_namespace_name(get_rel_namespace(rte->relid));
>> 3877 }
>>
>> If there is no alias, we find out the schema name and later add it to the
>> prefix. In the inherited table case, we are actually creating a "kind of"
>> alias for the children table and thus we should not find out the schema
>> name and add it to the prefix. This case has been taken care of in
>> get_from_clause_item(),
>> 6505 else if (rte->rtekind == RTE_RELATION &&
>> 6506 strcmp(rte->eref->aliasname,
>> get_relation_name(rte->relid)) != 0)
>> 6507 {
>> 6508 /*
>> 6509 * Apparently the rel has been renamed since the rule
>> was made.
>> 6510 * Emit a fake alias clause so that variable references
>> will still
>> 6511 * work. This is not a 100% solution but should work in
>> most
>> 6512 * reasonable situations.
>> 6513 */
>> 6514 appendStringInfo(buf, " %s",
>> 6515 quote_identifier(rte->eref->aliasname));
>> 6516 gavealias = true;
>> 6517 }
>>
>> I see similar code in ExplainTargetRel()
>> 1778 if (objectname == NULL ||
>> 1779 strcmp(rte->eref->aliasname, objectname) != 0)
>> 1780 appendStringInfo(es->str, " %s",
>> 1781 quote_identifier(rte->eref->aliasname));
>>
>> Based on this, here is patch to not add schemaname in the prefix for a
>> variable.
>>
>> I have run make check. All except inherit.sql passed. The expected output
>> change is included in the patch.
>>
>> --
>> Best Wishes,
>> Ashutosh Bapat
>> EntepriseDB Corporation
>> The Enterprise Postgres Company
>>
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
> A table can inherit from one or more parent table. So in that case,
> qualifying schema/table name
> helps in finding out where the column is coming from.
>

Do you have any example of this case? From the code it does not look like
that. Even if a table is inherited from more than one parent, the aliasname
is set to the name of the parent on which the scan is executed, it will
have only one alias. The case you are talking about can happen in case more
than one of these parents is included in the query. But, in such case there
will be multiple RTEs for same child table each coming from corresponding
parent, and thus will have corresponding aliasname.

>
> Regards,
> Chetan
>
> --
> EnterpriseDB Corporation
> The Enterprise PostgreSQL Company
>
> Website: www.enterprisedb.com
> EnterpriseDB Blog : http://blogs.enterprisedb.com
> Follow us on Twitter : http://www.twitter.com/enterprisedb
>
>
>
>
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-27 17:56:19
Message-ID:	CA+Tgmobqtszrhpsj0-ES7R4B+R0UtAzJCi2D+N-5fFjErn6icw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Jan 11, 2012 at 6:43 AM, Ashutosh Bapat
<ashutosh(dot)bapat(at)enterprisedb(dot)com> wrote:
> Hi,
> After running regression, I ran EXPLAIN on one of the queries in regression
> (test create_misc) and got following output
> regression=# explain verbose select * into table ramp from road where name ~
> '.*Ramp';
>                                      QUERY
> PLAN
> ------------------------------------------------------------------------------------
> Result (cost=0.00..154.00 rows=841 width=67)
>    Output: public.road.name, public.road.thepath
>    -> Append (cost=0.00..154.00 rows=841 width=67)
>          -> Seq Scan on public.road (cost=0.00..135.05 rows=418 width=67)
>                Output: public.road.name, public.road.thepath
>                Filter: (public.road.name ~ '.*Ramp'::text)
>          -> Seq Scan on public.ihighway road (cost=0.00..14.99 rows=367
> width=67)
>                                                         ^^^^^
>                Output: public.road.name, public.road.thepath
>                            ^^^^^^^^^^,           ^^^^^^
>                Filter: (public.road.name ~ '.*Ramp'::text)
>                          ^^^^^^^^^^^
>          -> Seq Scan on public.shighway road (cost=0.00..3.96 rows=56
> width=67)
>                Output: public.road.name, public.road.thepath
>                Filter: (public.road.name ~ '.*Ramp'::text)
> (12 rows)
>
> regression=# \d+ road
>                         Table "public.road"
> Column | Type | Modifiers | Storage | Stats target | Description
> ---------+------+-----------+----------+--------------+-------------
> name    | text |           | extended |              |
> thepath | path |           | extended |              |
> Indexes:
>     "rix" btree (name)
> Child tables: ihighway,
>               shighway
> Has OIDs: no
>
> Table "road" has children "ihighway" and "shighway" as seen in the \d+
> output above. The EXPLAIN output of Seq Scan node on children has
> "public.road" as prefix for variables. "public.road" could imply the parent
> table "road" and thus can cause confusion, as to what's been referreed, the
> columns of parent table or child table. In the EXPLAIN output children
> tables have "road" as alias (as against "public.road"). The alias comes from
> RangeTblEntry->eref->aliasname. It might be better to have "road" as prefix
> in the variable names over "public.road".

It's a feature, not a bug, that we schema-qualify names when VERBOSE
is specified. That was done on purpose for the benefit of external
tools that might need this information to disambiguate which object is
being referenced.

Table *aliases*, of course, should not be schema-qualified, but I
don't think that's what we're doing. You could make it more clear by
including an alias in the query, like this:

explain verbose select * into table ramp from road hwy where name ~ '.*Ramp';

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-28 02:38:17
Message-ID:	29791.1327718297@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> It's a feature, not a bug, that we schema-qualify names when VERBOSE
> is specified. That was done on purpose for the benefit of external
> tools that might need this information to disambiguate which object is
> being referenced.

> Table *aliases*, of course, should not be schema-qualified, but I
> don't think that's what we're doing. You could make it more clear by
> including an alias in the query, like this:

> explain verbose select * into table ramp from road hwy where name ~ '.*Ramp';

I think you are both focusing on the wrong thing. There is a lot of
squishiness in what EXPLAIN prints out, since SQL notation is not always
well suited to what an execution plan actually does. But this code has
a hard and fast requirement that it dump view definitions correctly,
else pg_dump doesn't work. And after looking at this I think Ashutosh
has in fact found a bug. Consider this example:

regression=# create schema s1;
CREATE SCHEMA
regression=# create schema s2;
CREATE SCHEMA
regression=# create table s1.t1 (f1 int);
CREATE TABLE
regression=# create table s2.t1 (f1 int);
CREATE TABLE
regression=# create view v1 as
regression-# select * from s1.t1 where exists (
regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1
regression(# );
CREATE VIEW
regression=# \d+ v1
View "public.v1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
f1 | integer | | plain |
View definition:
SELECT t1.f1
FROM s1.t1
WHERE (EXISTS ( SELECT 1
FROM s2.t1
WHERE t1.f1 = s1.t1.f1));

Both of the above displays of the view are formally correct, in that the
variables will be taken to refer to the correct upper or lower RTE.
But let's change that back and rename the other table:

regression=# alter table s2.tx rename to t1;
ALTER TABLE
regression=# alter table s1.t1 rename to tx;
ALTER TABLE
regression=# \d+ v1
View "public.v1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
f1 | integer | | plain |
View definition:
SELECT t1.f1
FROM s1.tx t1
WHERE (EXISTS ( SELECT 1
FROM s2.t1
WHERE t1.f1 = s1.t1.f1));

This is just plain wrong, as you'll see if you try to execute that
query:

regression=# SELECT t1.f1
regression-# FROM s1.tx t1
regression-# WHERE (EXISTS ( SELECT 1
regression(# FROM s2.t1
regression(# WHERE t1.f1 = s1.t1.f1));
ERROR: invalid reference to FROM-clause entry for table "t1"
LINE 5: WHERE t1.f1 = s1.t1.f1));
^
HINT: There is an entry for table "t1", but it cannot be referenced
from this part of the query.

(The HINT is a bit confused here, but the query is certainly invalid.)

So what we have here is a potential failure to dump and reload view
definitions, which is a lot more critical in my book than whether
EXPLAIN's output is confusing.

If we stick with the existing rule for attaching a fake alias to renamed
RTEs, I think that Ashutosh's patch or something like it is probably
appropriate, because the variable-printing code ought to be in step with
the RTE-printing code. Unfortunately, I think the hack to attach a fake
alias to renamed RTEs creates some issues of its own. Consider

select * from s1.t1
where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1);

If s1.t1 is now renamed to s1.tx, it is still possible to express
the same semantics:

select * from s1.tx
where exists (select 1 from s2.t2 t1 where t1.f1 = s1.tx.f1);

But when we attach a fake alias, it's broken:

select * from s1.tx t1
where exists (select 1 from s2.t2 t1 where t1.f1 = ?.f1);

There is no way to reference the outer RTE anymore from the subquery,
because the conflicting lower alias masks it.

We may be between a rock and a hard place though, because it's not that
hard to demonstrate cases where not adding a fake alias breaks it too:

select * from s1.t1 tx
where exists (select 1 from s2.t1 where s2.t1.f1 = tx.f1);

If s2.t1 is renamed to s2.tx, there's no longer any way to reference the
upper alias tx, unless you alias the lower RTE to some different name.
I think that when we put in the fake-alias behavior, we made a value
judgment that this type of situation was more common than the other,
but I'm not really sure why.

Maybe what we need to do instead is create totally-made-up, unique
aliases when something like this happens.

regards, tom lane

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-30 11:07:42
Message-ID:	CAFjFpRcBVtVo260n-LGyFYsZTHTG+njDnorGZzCsLt3A5HfELQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Thanks Tom for giving a stronger case. I found the problem whille looking
at inherited tables, and didn't think beyond inherited tables. Since
inherited tables are expanded when subquery planner is invoked, I thought
the problem will occur only in Explain output as we won't generate queries,
that can be used elsewhere after/during planning.

So, as I understand we have two problems here
1. Prefixing schemaname to the fake alises if there is another RTE with
same name. There may not be a relation with that name (fake alias name
given) in the schema chosen as prefix.
2. Fake aliases themselves can be conflicting.

If I understand correctly, if we solve the second problem, first problem
will not occur. Is that correct?

On Sat, Jan 28, 2012 at 8:08 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> > It's a feature, not a bug, that we schema-qualify names when VERBOSE
> > is specified. That was done on purpose for the benefit of external
> > tools that might need this information to disambiguate which object is
> > being referenced.
>
> > Table *aliases*, of course, should not be schema-qualified, but I
> > don't think that's what we're doing. You could make it more clear by
> > including an alias in the query, like this:
>
> > explain verbose select * into table ramp from road hwy where name ~
> '.*Ramp';
>
> I think you are both focusing on the wrong thing. There is a lot of
> squishiness in what EXPLAIN prints out, since SQL notation is not always
> well suited to what an execution plan actually does. But this code has
> a hard and fast requirement that it dump view definitions correctly,
> else pg_dump doesn't work. And after looking at this I think Ashutosh
> has in fact found a bug. Consider this example:
>
> regression=# create schema s1;
> CREATE SCHEMA
> regression=# create schema s2;
> CREATE SCHEMA
> regression=# create table s1.t1 (f1 int);
> CREATE TABLE
> regression=# create table s2.t1 (f1 int);
> CREATE TABLE
> regression=# create view v1 as
> regression-# select * from s1.t1 where exists (
> regression(# select 1 from s2.t1 where s2.t1.f1 = s1.t1.f1
> regression(# );
> CREATE VIEW
> regression=# \d+ v1
> View "public.v1"
> Column | Type | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
> f1 | integer | | plain |
> View definition:
> SELECT t1.f1
> FROM s1.t1
> WHERE (EXISTS ( SELECT 1
> FROM s2.t1
> WHERE t1.f1 = s1.t1.f1));
>
> regression=# alter table s2.t1 rename to tx;
> ALTER TABLE
> regression=# \d+ v1
> View "public.v1"
> Column | Type | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
> f1 | integer | | plain |
> View definition:
> SELECT t1.f1
> FROM s1.t1
> WHERE (EXISTS ( SELECT 1
> FROM s2.tx t1
> WHERE t1.f1 = s1.t1.f1));
>
> Both of the above displays of the view are formally correct, in that the
> variables will be taken to refer to the correct upper or lower RTE.
> But let's change that back and rename the other table:
>
> regression=# alter table s2.tx rename to t1;
> ALTER TABLE
> regression=# alter table s1.t1 rename to tx;
> ALTER TABLE
> regression=# \d+ v1
> View "public.v1"
> Column | Type | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
> f1 | integer | | plain |
> View definition:
> SELECT t1.f1
> FROM s1.tx t1
> WHERE (EXISTS ( SELECT 1
> FROM s2.t1
> WHERE t1.f1 = s1.t1.f1));
>
> This is just plain wrong, as you'll see if you try to execute that
> query:
>
> regression=# SELECT t1.f1
> regression-# FROM s1.tx t1
> regression-# WHERE (EXISTS ( SELECT 1
> regression(# FROM s2.t1
> regression(# WHERE t1.f1 = s1.t1.f1));
> ERROR: invalid reference to FROM-clause entry for table "t1"
> LINE 5: WHERE t1.f1 = s1.t1.f1));
> ^
> HINT: There is an entry for table "t1", but it cannot be referenced
> from this part of the query.
>
> (The HINT is a bit confused here, but the query is certainly invalid.)
>
> So what we have here is a potential failure to dump and reload view
> definitions, which is a lot more critical in my book than whether
> EXPLAIN's output is confusing.
>
> If we stick with the existing rule for attaching a fake alias to renamed
> RTEs, I think that Ashutosh's patch or something like it is probably
> appropriate, because the variable-printing code ought to be in step with
> the RTE-printing code. Unfortunately, I think the hack to attach a fake
> alias to renamed RTEs creates some issues of its own. Consider
>
> select * from s1.t1
> where exists (select 1 from s2.t2 t1 where t1.f1 = s1.t1.f1);
>
> If s1.t1 is now renamed to s1.tx, it is still possible to express
> the same semantics:
>
> select * from s1.tx
> where exists (select 1 from s2.t2 t1 where t1.f1 = s1.tx.f1);
>
> But when we attach a fake alias, it's broken:
>
> select * from s1.tx t1
> where exists (select 1 from s2.t2 t1 where t1.f1 = ?.f1);
>
> There is no way to reference the outer RTE anymore from the subquery,
> because the conflicting lower alias masks it.
>
> We may be between a rock and a hard place though, because it's not that
> hard to demonstrate cases where not adding a fake alias breaks it too:
>
> select * from s1.t1 tx
> where exists (select 1 from s2.t1 where s2.t1.f1 = tx.f1);
>
> If s2.t1 is renamed to s2.tx, there's no longer any way to reference the
> upper alias tx, unless you alias the lower RTE to some different name.
> I think that when we put in the fake-alias behavior, we made a value
> judgment that this type of situation was more common than the other,
> but I'm not really sure why.
>
> Maybe what we need to do instead is create totally-made-up, unique
> aliases when something like this happens.
>
> regards, tom lane
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-30 16:56:27
Message-ID:	16176.1327942587@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> So, as I understand we have two problems here
> 1. Prefixing schemaname to the fake alises if there is another RTE with
> same name. There may not be a relation with that name (fake alias name
> given) in the schema chosen as prefix.
> 2. Fake aliases themselves can be conflicting.

Well, the issue is more that a fake alias might unintentionally collide
with a regular alias elsewhere in the query. There's no guard against
that in the current behavior, and ISTM there needs to be.

The one possibly-simplifying thing about this whole issue is that we
needn't cater for references that couldn't have been made in the
original query. For instance, if the inner and outer queries both have
explicit aliases "tx", it's impossible for the inner query to have
referred to any columns of the outer "tx" --- so we don't have to try to
make it possible in the dumped form.

> If I understand correctly, if we solve the second problem, first problem
> will not occur. Is that correct?

I don't believe that the problem has anything to do with the names of
other tables that might happen to exist in the database. It's a matter
of what RTE names/aliases are exposed for variable references in
different parts of the query.

regards, tom lane

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-01-31 03:56:47
Message-ID:	CAFjFpRf-Tzo=eO_EFn07ZmsRrmqi2X06Pv6G8hg0JaHAzeB5iQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I don't believe that the problem has anything to do with the names of

> other tables that might happen to exist in the database. It's a matter
> of what RTE names/aliases are exposed for variable references in
> different parts of the query.
>
>
Names of other tables come into picture when we schema qualify the fake
aliases in the generated query. See examples in first post.

> regards, tom lane
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-02-01 12:38:06
Message-ID:	CAFjFpRfzR0t0KcVQ8cL_bnvxMhP1sLdsyxGJMXvDbp=DSi3CBw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Looking at the code, it seems that the fake aliases (eref) for relations
(may be views as well) are not generated per say, but they do not get
changed when the relation name changes OR in case of inherited tables, they
do not get changed when the inheritance is expanded
(expand_inherited_rtentry). So, there is not question of generating them so
as to not collide with other aliases in the query. However I did not find
answers to these questions
1. What is the use of eref in case of relation when the relation name
itself can be provided by the reloid?
2. Can we use schema qualified relation name in get_from_clause_item() and
get_variable() instead of use eref->aliasname. I have noticed that the
logic in get_rte_attribute_name() gives preference to the column names in
catalog tables over eref->colnames.

Anyone?

On Mon, Jan 30, 2012 at 10:26 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> > So, as I understand we have two problems here
> > 1. Prefixing schemaname to the fake alises if there is another RTE with
> > same name. There may not be a relation with that name (fake alias name
> > given) in the schema chosen as prefix.
> > 2. Fake aliases themselves can be conflicting.
>
> Well, the issue is more that a fake alias might unintentionally collide
> with a regular alias elsewhere in the query. There's no guard against
> that in the current behavior, and ISTM there needs to be.
>
> The one possibly-simplifying thing about this whole issue is that we
> needn't cater for references that couldn't have been made in the
> original query. For instance, if the inner and outer queries both have
> explicit aliases "tx", it's impossible for the inner query to have
> referred to any columns of the outer "tx" --- so we don't have to try to
> make it possible in the dumped form.
>
> > If I understand correctly, if we solve the second problem, first problem
> > will not occur. Is that correct?
>
> I don't believe that the problem has anything to do with the names of
> other tables that might happen to exist in the database. It's a matter
> of what RTE names/aliases are exposed for variable references in
> different parts of the query.
>
> regards, tom lane
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-02-01 17:23:45
Message-ID:	17524.1328117025@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> Looking at the code, it seems that the fake aliases (eref) for relations
> (may be views as well) are not generated per say, but they do not get
> changed when the relation name changes OR in case of inherited tables, they
> do not get changed when the inheritance is expanded
> (expand_inherited_rtentry). So, there is not question of generating them so
> as to not collide with other aliases in the query.

Well, what I was considering was exactly generating new aliases that
don't collide with anything else in the query. The fact that the code
doesn't do that now doesn't mean we can't make it do that.

> However I did not find answers to these questions
> 1. What is the use of eref in case of relation when the relation name
> itself can be provided by the reloid?

eref is stored mainly so that parsing code doesn't have to repeatedly
look up what the effective RTE name is. The alias field is meant to
represent whether there was an AS clause or not, and if so exactly what
it said. So eref is a derived result whereas alias is essentially raw
grammar output. Because of the possibility that the relation gets
renamed, it's probably best if we don't rely on eref anymore after
initial parsing of a query, ie ruleutils.c probably shouldn't use it.
(Too lazy to go check right now if that's already true, but it seems
like a good goal to pursue if we're going to change this code.)

> 2. Can we use schema qualified relation name in get_from_clause_item() and
> get_variable() instead of use eref->aliasname.

No. If there is an alias, it is flat wrong to use the relation name
instead, with or without schema name. You might want to go study the
SQL spec a bit in this area.

> I have noticed that the
> logic in get_rte_attribute_name() gives preference to the column names in
> catalog tables over eref->colnames.

Hm. What it should probably do is look at alias first, and if the alias
field doesn't specify a column name, then go to the catalogs to get the
current name.

Thinking about this some more, it seems like there are ways for a user
to shoot himself in the foot pretty much irretrievably. Consider

CREATE TABLE t (x int);
CREATE VIEW v AS SELECT y FROM t t(y);
ALTER TABLE t ADD COLUMN y int;

On dump and reload, we'll have

CREATE TABLE t (x int, y int);
CREATE VIEW v AS SELECT y FROM t t(y);

and now the CREATE VIEW will fail, complaining (correctly) that the
column reference "y" is ambiguous. Should ruleutils be expected to take
it upon itself to prevent that? We could conceive of "fixing" it by
inventing column aliases out of whole cloth:

CREATE VIEW v AS SELECT y FROM t t(y, the_other_y);

but that seems a little much, not to mention that such a view definition
would be horribly confusing to work with. On the other hand it isn't
all that far beyond what I had in mind of inventing relation aliases
to cure relation-name conflicts. Should we take the existence of such
cases as evidence that we shouldn't try hard in this area? It seems
reasonable to me to try to handle relation renames but draw the line
at disambiguating column names. But others might find that distinction
artificial.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-02-01 17:31:04
Message-ID:	CA+TgmoZsEp7Os5yAcTTKHD2gJf6Yt9-Vqe2eAXeOd07G7xRT6g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> On the other hand it isn't
> all that far beyond what I had in mind of inventing relation aliases
> to cure relation-name conflicts. Should we take the existence of such
> cases as evidence that we shouldn't try hard in this area? It seems
> reasonable to me to try to handle relation renames but draw the line
> at disambiguating column names. But others might find that distinction
> artificial.

I sure do.

I mean, in Oracle, if you rename a table or column involved in a view,
then the view breaks. Blammo! The reference is by object name, not
by some internal identifier a la OID. If you put back an object with
the correct name (either the original one or a different one), you can
re-enable the view.

We've decide that we don't want that behavior: instead, our references
are to the object itself rather than to the name of the object.
Renaming the object doesn't change what the reference points to. But
given that position, it seems to me that we ought to be willing to
work pretty hard to make sure that when we dump-and-reload the
database, things stay sane. Otherwise, we're sort of in this
unsatisfying in-between place where references are *mostly* by
internal identifier but everyone once in a while it falls apart and
name collisions can break everything. Yech!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-02-02 01:44:22
Message-ID:	CAFjFpRdxUXzq_C6TGV9g5yLAd7tmrVeWk9AZfV_yZo_YYTff_w@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 10:53 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> > Looking at the code, it seems that the fake aliases (eref) for relations
> > (may be views as well) are not generated per say, but they do not get
> > changed when the relation name changes OR in case of inherited tables,
> they
> > do not get changed when the inheritance is expanded
> > (expand_inherited_rtentry). So, there is not question of generating them
> so
> > as to not collide with other aliases in the query.
>
> Well, what I was considering was exactly generating new aliases that
> don't collide with anything else in the query. The fact that the code
> doesn't do that now doesn't mean we can't make it do that.
>
> > However I did not find answers to these questions
> > 1. What is the use of eref in case of relation when the relation name
> > itself can be provided by the reloid?
>
> eref is stored mainly so that parsing code doesn't have to repeatedly
> look up what the effective RTE name is. The alias field is meant to
> represent whether there was an AS clause or not, and if so exactly what
> it said. So eref is a derived result whereas alias is essentially raw
> grammar output. Because of the possibility that the relation gets
> renamed, it's probably best if we don't rely on eref anymore after
> initial parsing of a query, ie ruleutils.c probably shouldn't use it.
> (Too lazy to go check right now if that's already true, but it seems
> like a good goal to pursue if we're going to change this code.)
>
> > 2. Can we use schema qualified relation name in get_from_clause_item()
> and
> > get_variable() instead of use eref->aliasname.
>
> No. If there is an alias, it is flat wrong to use the relation name
> instead, with or without schema name. You might want to go study the
> SQL spec a bit in this area.
>

To clarify matters a bit, item 2 is in conjunction with item 1. Aliases, if
provided, are output irrespective of whether we get the relation name from
eref or catalogs. ruleutils should just ignore eref (for RTE_RELATION
only) and get the relation name from given OID.

>
> > I have noticed that the
> > logic in get_rte_attribute_name() gives preference to the column names in
> > catalog tables over eref->colnames.
>
> Hm. What it should probably do is look at alias first, and if the alias
> field doesn't specify a column name, then go to the catalogs to get the
> current name.
>

It does give preference to aliases today. I compared preferences of
colnames in eref and that obtained from catalogs.

>
>
> Thinking about this some more, it seems like there are ways for a user
> to shoot himself in the foot pretty much irretrievably. Consider
>
> CREATE TABLE t (x int);
> CREATE VIEW v AS SELECT y FROM t t(y);
> ALTER TABLE t ADD COLUMN y int;
>
> On dump and reload, we'll have
>
> CREATE TABLE t (x int, y int);
> CREATE VIEW v AS SELECT y FROM t t(y);
>
> and now the CREATE VIEW will fail, complaining (correctly) that the
> column reference "y" is ambiguous. Should ruleutils be expected to take
> it upon itself to prevent that? We could conceive of "fixing" it by
> inventing column aliases out of whole cloth:
>
> CREATE VIEW v AS SELECT y FROM t t(y, the_other_y);
>
> but that seems a little much, not to mention that such a view definition
> would be horribly confusing to work with. On the other hand it isn't
> all that far beyond what I had in mind of inventing relation aliases
> to cure relation-name conflicts. Should we take the existence of such
> cases as evidence that we shouldn't try hard in this area? It seems
> reasonable to me to try to handle relation renames but draw the line
> at disambiguating column names. But others might find that distinction
> artificial.
>

I agree. The example of the colnames was only to show that the preference
alias > relation information from catalogs > eref exists somewhere in the
code.

>
> regards, tom lane
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-02-02 01:56:18
Message-ID:	CAFjFpRfY95-ybCpoWKEeNx+jG7xrRuM8wWK2fY2W_mpSKELrWw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, Feb 1, 2012 at 11:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > On the other hand it isn't
> > all that far beyond what I had in mind of inventing relation aliases
> > to cure relation-name conflicts. Should we take the existence of such
> > cases as evidence that we shouldn't try hard in this area? It seems
> > reasonable to me to try to handle relation renames but draw the line
> > at disambiguating column names. But others might find that distinction
> > artificial.
>
> I sure do.
>
> I mean, in Oracle, if you rename a table or column involved in a view,
> then the view breaks. Blammo! The reference is by object name, not
> by some internal identifier a la OID. If you put back an object with
> the correct name (either the original one or a different one), you can
> re-enable the view.
>
> We've decide that we don't want that behavior: instead, our references
> are to the object itself rather than to the name of the object.
> Renaming the object doesn't change what the reference points to. But
> given that position, it seems to me that we ought to be willing to
> work pretty hard to make sure that when we dump-and-reload the
> database, things stay sane. Otherwise, we're sort of in this
> unsatisfying in-between place where references are *mostly* by
> internal identifier but everyone once in a while it falls apart and
> name collisions can break everything. Yech!
>
>
For me the relation names problem and column aliases problems are two
independent problems. While the first one looks easy to fix, the other
problem may be hard to solve. We can solve the first problem and things
will be "better" than what we have today. If you agree, I will provide a
patch to fix the relation names problems by ignoring the eref (for
RTE_RELATION only) in ruleutils.

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

--
Best Wishes,
Ashutosh Bapat
EntepriseDB Corporation
The Enterprise Postgres Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-09-18 17:50:47
Message-ID:	23128.1347990647@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I got interested in this problem again now that we have a user complaint
about it (bug #7553).

Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com> writes:
> On Wed, Feb 1, 2012 at 11:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> On Wed, Feb 1, 2012 at 12:23 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> ... It seems
>>> reasonable to me to try to handle relation renames but draw the line
>>> at disambiguating column names. But others might find that distinction
>>> artificial.

>> I sure do.

> For me the relation names problem and column aliases problems are two
> independent problems.

I think they are independent problems, and I also think that people are
far less likely to trip over column-name problems in practice. Columns
of a table are not independent objects and so people aren't so likely
to think they can just rename them freely. Moreover, if you rename
columns that are used in views, you can get breakage of things like
USING or NATURAL joins, and that is something we *cannot* provide a
workaround for --- it's a failure inherent in the language definition.

As far as the relation-rename problem goes, I propose that what we
should do is have ruleutils.c invent nonconflicting fake aliases for
each RTE in the query tree. This would allow getting rid of some of the
dubious heuristics in get_variable: it should just print the chosen
alias and be done. (It still has to do something different for unnamed
joins, but we can leave that part alone I think.)

We can do this as follows:

1. If there's a user-assigned alias, use that. (It's possible this is
not unique within the query, but that's okay because any actual
variable reference must be to the most closely nested such RTE.)

2. Otherwise, if the relation's current name doesn't conflict with
any previously-assigned alias, use that.

3. Otherwise, append something (underscore and some digits probably)
to the relation's current name to construct a string not matching any
previously-assigned alias.

This might result in printouts that are a bit uglier than the old way
in such cases, but anybody who's offended can select their own aliases.

regards, tom lane

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-09-20 18:55:08
Message-ID:	3138.1348167308@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> I got interested in this problem again now that we have a user complaint
> about it (bug #7553).
> ...
> As far as the relation-rename problem goes, I propose that what we
> should do is have ruleutils.c invent nonconflicting fake aliases for
> each RTE in the query tree. This would allow getting rid of some of the
> dubious heuristics in get_variable: it should just print the chosen
> alias and be done. (It still has to do something different for unnamed
> joins, but we can leave that part alone I think.)

Attached is a draft patch for this. It fixes the view-dumping problems
that I exhibited in
http://archives.postgresql.org/message-id/29791.1327718297@sss.pgh.pa.us
as well as nicely cleaning up Ashutosh's original complaint at
http://archives.postgresql.org/pgsql-hackers/2012-01/msg00505.php
There are quite a few more changes in the regression-test plan printouts
than was originally discussed, but they seem to be generally for the
better IMO: for instance there is no longer any problem with different
RTEs being printed with identical names in EXPLAIN.

One thing I found while working on this is that some of the
inconsistency is not really EXPLAIN's fault but the planner's: the
planner does not take any trouble to avoid duplicate RTE aliases when it
manufactures additional RTEs, which it does in at least two places
(inheritance expansion and min/max aggregate optimization). In my first
version of the patch I was getting EXPLAIN printouts like this for
inheritance append-plans:

Nested Loop
-> Limit
-> Seq Scan on int4_tbl
-> Append
! -> Index Scan using patest0i on patest0 patest0_1
Index Cond: (id = int4_tbl.f1)
! -> Index Scan using patest1i on patest1
Index Cond: (id = int4_tbl.f1)
! -> Index Scan using patest2i on patest2
Index Cond: (id = int4_tbl.f1)

That happened because the original inheritance-root RTE got the
"patest0" alias, and then the inheritance-child RTE for the parent
relation got stuck with "patest0_1". This isn't terribly desirable
since the inheritance-root RTE isn't actually visible anywhere in
the EXPLAIN printout, so giving it the preferred name isn't ideal.

In the attached I've hacked around this by causing the planner to
assign new aliases to RTEs that it replaces in this way (see planagg.c
and prepunion.c diffs). This seems like a bit of a kluge, but it
doesn't take much code. An alternative that I'm considering is to
have EXPLAIN make a pre-pass over the plan tree to identify which
RTEs will actually be referenced, and then consider only those RTEs
while assigning aliases. This would be a great deal more code though,
and code which would require maintenance every time we add plan node
types etc. So I'm not sure it's really a better answer. Thoughts?

regards, tom lane

Attachment	Content-Type	Size
fix-ruleutils-alias-selection-1.patch	text/x-patch	56.0 KB

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Confusing EXPLAIN output in case of inherited tables
Date:	2012-09-21 17:12:09
Message-ID:	25855.1348247529@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I wrote:
> ... In the attached I've hacked around this by causing the planner to
> assign new aliases to RTEs that it replaces in this way (see planagg.c
> and prepunion.c diffs). This seems like a bit of a kluge, but it
> doesn't take much code. An alternative that I'm considering is to
> have EXPLAIN make a pre-pass over the plan tree to identify which
> RTEs will actually be referenced, and then consider only those RTEs
> while assigning aliases. This would be a great deal more code though,
> and code which would require maintenance every time we add plan node
> types etc. So I'm not sure it's really a better answer. Thoughts?

Attached is a second draft that does it like that. This adds about 130
lines to explain.c compared to the other way, but on reflection it's
probably a better solution compared to trying to kluge things in the
planner. The change in the select_views results shows that there's
at least one other case of duplicated RTE names that I'd not covered
with the two planner kluges.

I think the next question is whether we want to back-patch this.
Although the problem with incorrect view dumping is arguably a data
integrity issue (cf bug #7553), few enough people have hit it that
I'm not sure it's worth taking risks for. I'd feel better about
this code once it'd got through a beta test cycle. Comments?

regards, tom lane

Attachment	Content-Type	Size
fix-ruleutils-alias-selection-2.patch	text/x-patch	57.4 KB