Re: ruleutils vs. empty targetlists

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: ruleutils vs. empty targetlists
Date: 2013-12-03 23:37:23
Message-ID: 32067.1386113843@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thinking some more about bug #8648, it occurred to me that ruleutils.c
isn't exactly prepared for the case either:

regression=# create table nocols();
CREATE TABLE
regression=# create view vv1 as select exists (select * from nocols);
CREATE VIEW
regression=# \d+ vv1
View "public.vv1"
Column | Type | Modifiers | Storage | Description
--------+---------+-----------+---------+-------------
exists | boolean | | plain |
View definition:
SELECT (EXISTS ( SELECT
FROM nocols)) AS "exists";

which of course is illegal syntax. I thought at first that this could be
fixed trivially by emitting "*" if get_target_list found no columns.
However, that doesn't quite work; consider

create view vv2 as select exists (select nocols.* from nocols, somecols);

If we regurgitate this as "exists (select * from nocols, somecols)", it
wouldn't mean the same thing.

But on the third hand, at least in the context of an EXISTS() subquery,
it doesn't really matter because the tlist is irrelevant, at least as long
as it only contains Vars. So you could argue that emitting "*" is plenty
good enough even in the above example. I'm not certain that that applies
everywhere that a tlist could appear, though.

I experimented with code that would attempt to regurgitate "nocols.*"
in the above example; see attached draft patch. I don't like this patch
much: it's ugly, it'd be a pain to backport (because it's digging into
data structures that have changed repeatedly), and I'm not sure how much
I trust it anyway. So I'm leaning towards just doing

+ if (colno == 0)
+ appendStringInfoString(buf, " *");

at least till such time as somebody shows a case where this isn't good
enough.

Note that I wouldn't be thinking of this as something to back-patch
at all, except that if someone did have a view that looked like this,
they'd find that pg_dump output wouldn't restore. You could construct
scenarios where that could seem like a denial of service, particularly
if the DBA wasn't smart enough to figure out what was wrong with his
dump. (And who wants to deal with such a thing at 4AM anyway ...)

Comments?

regards, tom lane

Attachment Content-Type Size
empty-exists-patch-too-complicated.patch text/x-diff 1.6 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-04 08:21:26
Message-ID: CAEZATCVU=7ERtVMTL5ThBVYiWUyYHRNMfpgYrz=YJ8jSfBCfcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 3 December 2013 23:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Thinking some more about bug #8648, it occurred to me that ruleutils.c
> isn't exactly prepared for the case either:
>
> regression=# create table nocols();
> CREATE TABLE
> regression=# create view vv1 as select exists (select * from nocols);
> CREATE VIEW
> regression=# \d+ vv1
> View "public.vv1"
> Column | Type | Modifiers | Storage | Description
> --------+---------+-----------+---------+-------------
> exists | boolean | | plain |
> View definition:
> SELECT (EXISTS ( SELECT
> FROM nocols)) AS "exists";
>
> which of course is illegal syntax. I thought at first that this could be
> fixed trivially by emitting "*" if get_target_list found no columns.
> However, that doesn't quite work; consider
>
> create view vv2 as select exists (select nocols.* from nocols, somecols);
>
> If we regurgitate this as "exists (select * from nocols, somecols)", it
> wouldn't mean the same thing.
>
> But on the third hand, at least in the context of an EXISTS() subquery,
> it doesn't really matter because the tlist is irrelevant, at least as long
> as it only contains Vars. So you could argue that emitting "*" is plenty
> good enough even in the above example. I'm not certain that that applies
> everywhere that a tlist could appear, though.
>
> I experimented with code that would attempt to regurgitate "nocols.*"
> in the above example; see attached draft patch. I don't like this patch
> much: it's ugly, it'd be a pain to backport (because it's digging into
> data structures that have changed repeatedly), and I'm not sure how much
> I trust it anyway. So I'm leaning towards just doing
>
> + if (colno == 0)
> + appendStringInfoString(buf, " *");
>
> at least till such time as somebody shows a case where this isn't good
> enough.
>

Well here's a contrived example with grouping:

create table nocols();
create table somecols(a int, b int);
create view v as select exists(select nocols.* from nocols, somecols
group by somecols.a);

Simply turning that tlist into "*" makes the query invalid because
somecols.b doesn't appear in the group by clause. So in this case, it
needs to try to output a tlist that contains no columns.

Regards,
Dean

> Note that I wouldn't be thinking of this as something to back-patch
> at all, except that if someone did have a view that looked like this,
> they'd find that pg_dump output wouldn't restore. You could construct
> scenarios where that could seem like a denial of service, particularly
> if the DBA wasn't smart enough to figure out what was wrong with his
> dump. (And who wants to deal with such a thing at 4AM anyway ...)
>
> Comments?
>
> regards, tom lane
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-04 13:46:24
Message-ID: 15973.1386164784@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 3 December 2013 23:37, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Thinking some more about bug #8648, it occurred to me that ruleutils.c
>> isn't exactly prepared for the case either:
>> ... So I'm leaning towards just doing
>>
>> + if (colno == 0)
>> + appendStringInfoString(buf, " *");
>>
>> at least till such time as somebody shows a case where this isn't good
>> enough.

> Well here's a contrived example with grouping:

> create table nocols();
> create table somecols(a int, b int);
> create view v as select exists(select nocols.* from nocols, somecols
> group by somecols.a);

Hm, that's cute :-(. And I realized last night that my patch's approach
of seizing on some random zero-column RTE isn't too bulletproof against
ALTER TABLE ADD COLUMN operations, either, ie, the referenced table might
have some columns by now. Which is more or less the entire reason we
expand "*" at parse time in the first place.

What I'm thinking about this today is that really the *right* solution
is to allow syntactically-empty SELECT lists; once we've bought into the
notion of zero-column tables, the notion that you can't have an empty
select list is just fundamentally at odds with that. And since you can
already have semantically-empty SELECT lists, this should in theory not
create much risk of new bugs. If we did that, the existing ruleutils
code is just fine, as are any existing dump files containing this sort
of query.

That change might still be thought too aggressive for a back-patch,
though. Comments?

regards, tom lane


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-04 14:01:32
Message-ID: 20131204140132.GM5158@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane escribió:

> What I'm thinking about this today is that really the *right* solution
> is to allow syntactically-empty SELECT lists; once we've bought into the
> notion of zero-column tables, the notion that you can't have an empty
> select list is just fundamentally at odds with that. And since you can
> already have semantically-empty SELECT lists, this should in theory not
> create much risk of new bugs. If we did that, the existing ruleutils
> code is just fine, as are any existing dump files containing this sort
> of query.

Wow, as strange-sounding as that is, you're probably correct.

This might probably be seen as a deviation from the standard, but then
so are zero-column tables. Of course, syntactically-empty select lists
would also work with (standard-conforming) tables containing columns,
but it's hard to see that that would be a problem in practice.

> That change might still be thought too aggressive for a back-patch,
> though. Comments?

Well, no correct query will start failing due to this change; the only
visible change would be queries that previously throw errors would start
working. It's hard to see that as a backward-incompatibility.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-13 01:14:53
Message-ID: 21007.1386897293@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Tom Lane escribi:
>> What I'm thinking about this today is that really the *right* solution
>> is to allow syntactically-empty SELECT lists; once we've bought into the
>> notion of zero-column tables, the notion that you can't have an empty
>> select list is just fundamentally at odds with that. And since you can
>> already have semantically-empty SELECT lists, this should in theory not
>> create much risk of new bugs. If we did that, the existing ruleutils
>> code is just fine, as are any existing dump files containing this sort
>> of query.

> Wow, as strange-sounding as that is, you're probably correct.

I experimented with this a bit, and found that things seem to pretty much
work, although unsurprisingly there are a couple of regression tests that
expected to get a syntax error and no longer do.

The only thing I've come across that arguably doesn't work is SELECT
DISTINCT:

regression=# select distinct from pg_database;
--
(8 rows)

The reason this says "8 rows" is that the produced plan is just a seqscan
of pg_database returning no columns. The parser produced a Query with an
empty distinctClause, so the planner thinks no unique-ification is
required, so it doesn't produce any sort/uniq or hashagg node. This seems
a bit bogus to me; mathematically, it seems like what this ought to mean
is that all rows are equivalent and so you get just one empty row out.
However, I don't see any practical use-case for that behavior, so I'm
disinclined to spend time on inventing a solution --- and any solution
would probably require a change in Query representation, making it not
back-patchable anyway. Instead I suggest we add an error check in parse
analysis that throws an error for DISTINCT with no columns. If anyone is
ever motivated to make this case do something sane, they can remove that
check and fix up whatever needs fixing up. The attached first-draft patch
doesn't yet have such a prohibition, though.

Another point is that there is a second use of target_list in the grammar,
which is "RETURNING target_list". I did not change that one into
"RETURNING opt_target_list", because if I had, it would have an issue
similar to DISTINCT's: RETURNING with no arguments would be represented
the same as no RETURNING at all, leading to probably not the behavior
the user expected. We've already got that problem if you say "RETURNING
zero_column_table.*", making me wonder if a parse-analysis-time error
for that case wouldn't be a good thing.

Comments?

regards, tom lane

Attachment Content-Type Size
allow-empty-target-list-1.patch text/x-diff 5.6 KB

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-13 09:36:00
Message-ID: CAEZATCWENAvwOrJ7+Yj4_gNzy2fj5QGL0tS+w+oL-fKAJSFPyw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 December 2013 01:14, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The only thing I've come across that arguably doesn't work is SELECT
> DISTINCT:
>
> regression=# select distinct from pg_database;
> --
> (8 rows)
>
> The reason this says "8 rows" is that the produced plan is just a seqscan
> of pg_database returning no columns. The parser produced a Query with an
> empty distinctClause, so the planner thinks no unique-ification is
> required, so it doesn't produce any sort/uniq or hashagg node. This seems
> a bit bogus to me; mathematically, it seems like what this ought to mean
> is that all rows are equivalent and so you get just one empty row out.
> However, I don't see any practical use-case for that behavior, so I'm
> disinclined to spend time on inventing a solution --- and any solution
> would probably require a change in Query representation, making it not
> back-patchable anyway. Instead I suggest we add an error check in parse
> analysis that throws an error for DISTINCT with no columns. If anyone is
> ever motivated to make this case do something sane, they can remove that
> check and fix up whatever needs fixing up. The attached first-draft patch
> doesn't yet have such a prohibition, though.
>
> Another point is that there is a second use of target_list in the grammar,
> which is "RETURNING target_list". I did not change that one into
> "RETURNING opt_target_list", because if I had, it would have an issue
> similar to DISTINCT's: RETURNING with no arguments would be represented
> the same as no RETURNING at all, leading to probably not the behavior
> the user expected. We've already got that problem if you say "RETURNING
> zero_column_table.*", making me wonder if a parse-analysis-time error
> for that case wouldn't be a good thing.
>
> Comments?
>

I can't think of any practical uses for this kind of query, so I don't
think it's worth worrying too much about its results until/unless
someone comes up with a real use-case.

However, given that we currently support queries like "select distinct
* from nocols" (albeit with rather odd results), I don't think we
should start throwing new errors for them. Perhaps the actual risk of
a backwards-compatibility break is small, but so too is any benefit
from adding such new errors.

So +1 for the patch as-is, with no new errors.

Regards,
Dean


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-13 15:07:25
Message-ID: 8576.1386947245@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> I can't think of any practical uses for this kind of query, so I don't
> think it's worth worrying too much about its results until/unless
> someone comes up with a real use-case.

> However, given that we currently support queries like "select distinct
> * from nocols" (albeit with rather odd results), I don't think we
> should start throwing new errors for them. Perhaps the actual risk of
> a backwards-compatibility break is small, but so too is any benefit
> from adding such new errors.

> So +1 for the patch as-is, with no new errors.

How about as-is in the back branches, and throw the new errors only
in HEAD?

regards, tom lane


From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-13 15:15:00
Message-ID: CAEZATCVyCvpqxVEbcDytEKr2Z0ZHARt8Ak-L9hzKSKgUhVBCfA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13 December 2013 15:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
>> I can't think of any practical uses for this kind of query, so I don't
>> think it's worth worrying too much about its results until/unless
>> someone comes up with a real use-case.
>
>> However, given that we currently support queries like "select distinct
>> * from nocols" (albeit with rather odd results), I don't think we
>> should start throwing new errors for them. Perhaps the actual risk of
>> a backwards-compatibility break is small, but so too is any benefit
>> from adding such new errors.
>
>> So +1 for the patch as-is, with no new errors.
>
> How about as-is in the back branches, and throw the new errors only
> in HEAD?
>

Seems reasonable.

Regards,
Dean


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ruleutils vs. empty targetlists
Date: 2013-12-15 00:20:27
Message-ID: 15051.1387066827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com> writes:
> On 13 December 2013 15:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> How about as-is in the back branches, and throw the new errors only
>> in HEAD?

> Seems reasonable.

After further experimentation I've concluded that maybe we'd better
not back-patch this change. The reason for my change of heart is
that in 8.4, the patch made plpgsql stop complaining that "return ;"
was missing an expression. While we could possibly fix that, or
just decide not to patch 8.4, it occurs to me that there might be
applications out there that are expecting "SELECT ;" to fail, too.
So the risk of undesirable behavior changes seems a little larger
than I'd previously believed, and I'm feeling that fixing this
corner case in the back branches is not worth that risk.

regards, tom lane