Re: Variadic aggregates vs. project policy

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Variadic aggregates vs. project policy
Date: 2013-08-29 19:55:13
Message-ID: 28755.1377806113@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

So I was hacking away at supporting variadic aggregates (per an internal
request at Salesforce), and had it pretty much working, when I came across
this old comment in opr_sanity.sql:

-- Check that there are not aggregates with the same name and different
-- numbers of arguments. While not technically wrong, we have a project policy
-- to avoid this because it opens the door for confusion in connection with
-- ORDER BY: novices frequently put the ORDER BY in the wrong place.
-- See the fate of the single-argument form of string_agg() for history.
-- The only aggregates that should show up here are count(x) and count(*).

While a variadic-using aggregate doesn't actually trip the associated test
query, it surely violates the spirit of this policy: if you put ORDER BY
in the wrong place the parser will be unable to detect that that wasn't
what you meant.

For context see the thread starting here:
http://www.postgresql.org/message-id/AANLkTikV5ok2tS8t6V+gsAPtE3N6TJq1JpPhMZhG2XL0@mail.gmail.com
In that thread we agreed that this "policy" might be rather squishy,
but we should at least think hard about whether it would be wise to create
built-in aggregates with the same name and different numbers of arguments.

So the question I'm now wondering about is whether this consideration
makes variadic aggregates a bad idea all around, even if we don't have
any built-in ones. Is the risk of user confusion (in the use of their
own aggregate) sufficient reason to reject such a feature?

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 20:26:18
Message-ID: CAFj8pRDVg3NJywy5+4ZMp3PHR+vEKzc95fqW92hTq3Hyx5VfdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/8/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> So I was hacking away at supporting variadic aggregates (per an internal
> request at Salesforce), and had it pretty much working, when I came across
> this old comment in opr_sanity.sql:
>
> -- Check that there are not aggregates with the same name and different
> -- numbers of arguments. While not technically wrong, we have a project
> policy
> -- to avoid this because it opens the door for confusion in connection with
> -- ORDER BY: novices frequently put the ORDER BY in the wrong place.
> -- See the fate of the single-argument form of string_agg() for history.
> -- The only aggregates that should show up here are count(x) and count(*).
>
> While a variadic-using aggregate doesn't actually trip the associated test
> query, it surely violates the spirit of this policy: if you put ORDER BY
> in the wrong place the parser will be unable to detect that that wasn't
> what you meant.
>
> For context see the thread starting here:
>
> http://www.postgresql.org/message-id/AANLkTikV5ok2tS8t6V+gsAPtE3N6TJq1JpPhMZhG2XL0@mail.gmail.com
> In that thread we agreed that this "policy" might be rather squishy,
> but we should at least think hard about whether it would be wise to create
> built-in aggregates with the same name and different numbers of arguments.
>
> So the question I'm now wondering about is whether this consideration
> makes variadic aggregates a bad idea all around, even if we don't have
> any built-in ones. Is the risk of user confusion (in the use of their
> own aggregate) sufficient reason to reject such a feature?
>

can be this issue solved by syntax?

In September commitfest is patch for "WITHIN GROUP" where ORDER BY clause
is safety separated from parameters.

Regards

Pavel

> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 20:47:30
Message-ID: 4162.1377809250@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> 2013/8/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
>> So the question I'm now wondering about is whether this consideration
>> makes variadic aggregates a bad idea all around, even if we don't have
>> any built-in ones. Is the risk of user confusion (in the use of their
>> own aggregate) sufficient reason to reject such a feature?

> can be this issue solved by syntax?
> In September commitfest is patch for "WITHIN GROUP" where ORDER BY clause
> is safety separated from parameters.

That might not be the ugliest syntax the SQL committee ever invented, but
it's right up there. I don't want to go that way, especially not when the
existing precedent for the same feature with regular functions doesn't use
any weird special syntax.

On further reflection, what the "policy" was actually about was not that
we should forbid users from creating potentially-confusing aggregates
themselves, but only that we'd avoid having any *built in* aggregates with
this hazard. So maybe I'm overthinking this, and the correct reading is
just that we should have a policy against built-in variadic aggregates.

regards, tom lane


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 20:55:32
Message-ID: CAFj8pRCXJo79usnVGgu3WEjNqSVVX6fSg31qXyqAkgPNg_JZcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/8/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> > 2013/8/29 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
> >> So the question I'm now wondering about is whether this consideration
> >> makes variadic aggregates a bad idea all around, even if we don't have
> >> any built-in ones. Is the risk of user confusion (in the use of their
> >> own aggregate) sufficient reason to reject such a feature?
>
> > can be this issue solved by syntax?
> > In September commitfest is patch for "WITHIN GROUP" where ORDER BY
> clause
> > is safety separated from parameters.
>
> That might not be the ugliest syntax the SQL committee ever invented, but
> it's right up there. I don't want to go that way, especially not when the
> existing precedent for the same feature with regular functions doesn't use
> any weird special syntax.
>

It is maybe not nice, but it is long years supported by almost all SQL
servers.

When I talked with Atri, he mentioned, so variadic aggregates are supported
there too.

Regards

Pavel

> On further reflection, what the "policy" was actually about was not that
> we should forbid users from creating potentially-confusing aggregates
> themselves, but only that we'd avoid having any *built in* aggregates with
> this hazard. So maybe I'm overthinking this, and the correct reading is
> just that we should have a policy against built-in variadic aggregates.
>
>
can be this potentially strange situation identified? - and some warning
can be raised.

> regards, tom lane
>


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 21:37:31
Message-ID: 521FBF1B.3030305@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

> On further reflection, what the "policy" was actually about was not that
> we should forbid users from creating potentially-confusing aggregates
> themselves, but only that we'd avoid having any *built in* aggregates with
> this hazard. So maybe I'm overthinking this, and the correct reading is
> just that we should have a policy against built-in variadic aggregates.

Yes. I think we can assume that anyone smart enough to create a
variadic aggregate is smart enough to put ORDER BY in the right place.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 22:04:37
Message-ID: 20130829220436.GA7540@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Tom,
>
> > On further reflection, what the "policy" was actually about was not that
> > we should forbid users from creating potentially-confusing aggregates
> > themselves, but only that we'd avoid having any *built in* aggregates with
> > this hazard. So maybe I'm overthinking this, and the correct reading is
> > just that we should have a policy against built-in variadic aggregates.
>
> Yes. I think we can assume that anyone smart enough to create a
> variadic aggregate is smart enough to put ORDER BY in the right place.

I don't think it's a question of "smart enough", but rather how easy it
is to make a mistake. In the referenced thread email, I'm sure many of
us looked at the failing query without realizing what the problem was.

That said, if the question is whether or not to offer variadic
aggregates if they give you the chance to misuse them in this way, the
decision seems quite clear to me. Presumably, in the problematic case
the arguments to the aggregate are going to be constructed
programatically in client code; how would an ORDER BY be accidentally
inserted by such code? If you end up having

someagg('foo','bar','baz' ORDER BY 'qux', 'blarg')
instead of
someagg('foo','bar','baz', 'blarg' ORDER BY 'qux')

I think you're going to realize the problem quickly enough.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 22:16:32
Message-ID: 20130829221632.GA5908@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-08-29 15:55:13 -0400, Tom Lane wrote:
> So I was hacking away at supporting variadic aggregates (per an internal
> request at Salesforce), and had it pretty much working, when I came across
> this old comment in opr_sanity.sql:
>
> -- Check that there are not aggregates with the same name and different
> -- numbers of arguments. While not technically wrong, we have a project policy
> -- to avoid this because it opens the door for confusion in connection with
> -- ORDER BY: novices frequently put the ORDER BY in the wrong place.
> -- See the fate of the single-argument form of string_agg() for history.
> -- The only aggregates that should show up here are count(x) and count(*).
>
> While a variadic-using aggregate doesn't actually trip the associated test
> query, it surely violates the spirit of this policy: if you put ORDER BY
> in the wrong place the parser will be unable to detect that that wasn't
> what you meant.
>
> For context see the thread starting here:
> http://www.postgresql.org/message-id/AANLkTikV5ok2tS8t6V+gsAPtE3N6TJq1JpPhMZhG2XL0@mail.gmail.com
> In that thread we agreed that this "policy" might be rather squishy,
> but we should at least think hard about whether it would be wise to create
> built-in aggregates with the same name and different numbers of arguments.
>
> So the question I'm now wondering about is whether this consideration
> makes variadic aggregates a bad idea all around, even if we don't have
> any built-in ones. Is the risk of user confusion (in the use of their
> own aggregate) sufficient reason to reject such a feature?

I vote for abolishing that policy or maybe weakinging it. As you comment
somewhere downthread the policy just prohibits core functions, but even
for those it looks too strong for me. There are some useful variadic
aggregates I'd like to see and I don't think that the kind of errors
prevented by the policy are frequent enough to warrant a blanket
prohibition.
I'd say we let the check in there but have a list of exceptions in it so
that one has to explicitly think about the issue before adding the
function. Some functions are worth the risk, others are not. E.g. the 1
argument form of string_agg() doesn't have enough benefits, but other
functions might.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 22:29:34
Message-ID: 10483.1377815374@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-08-29 15:55:13 -0400, Tom Lane wrote:
>> For context see the thread starting here:
>> http://www.postgresql.org/message-id/AANLkTikV5ok2tS8t6V+gsAPtE3N6TJq1JpPhMZhG2XL0@mail.gmail.com
>> In that thread we agreed that this "policy" might be rather squishy,
>> but we should at least think hard about whether it would be wise to create
>> built-in aggregates with the same name and different numbers of arguments.

> I vote for abolishing that policy or maybe weakinging it. As you comment
> somewhere downthread the policy just prohibits core functions, but even
> for those it looks too strong for me. There are some useful variadic
> aggregates I'd like to see and I don't think that the kind of errors
> prevented by the policy are frequent enough to warrant a blanket
> prohibition.

Well, I dunno. We had two different "bug reports" caused by this type of
confusion before string_agg even got out of beta, both from intelligent
people. So I'm not about to discount the potential for confusion.

As we said originally, this is a policy that might be broken for
sufficiently strong cause --- but I don't want to just forget about
the risks.

> I'd say we let the check in there but have a list of exceptions in it so
> that one has to explicitly think about the issue before adding the
> function.

That's pretty much how the tests in opr_sanity work now.

regards, tom lane


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 22:37:25
Message-ID: 20130829223725.GE4283@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-08-29 18:29:34 -0400, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > On 2013-08-29 15:55:13 -0400, Tom Lane wrote:
> >> For context see the thread starting here:
> >> http://www.postgresql.org/message-id/AANLkTikV5ok2tS8t6V+gsAPtE3N6TJq1JpPhMZhG2XL0@mail.gmail.com
> >> In that thread we agreed that this "policy" might be rather squishy,
> >> but we should at least think hard about whether it would be wise to create
> >> built-in aggregates with the same name and different numbers of arguments.
>
> > I vote for abolishing that policy or maybe weakinging it. As you comment
> > somewhere downthread the policy just prohibits core functions, but even
> > for those it looks too strong for me. There are some useful variadic
> > aggregates I'd like to see and I don't think that the kind of errors
> > prevented by the policy are frequent enough to warrant a blanket
> > prohibition.
>
> Well, I dunno. We had two different "bug reports" caused by this type of
> confusion before string_agg even got out of beta, both from intelligent
> people. So I'm not about to discount the potential for confusion.
>
> As we said originally, this is a policy that might be broken for
> sufficiently strong cause --- but I don't want to just forget about
> the risks.

> > I'd say we let the check in there but have a list of exceptions in it so
> > that one has to explicitly think about the issue before adding the
> > function.
>
> That's pretty much how the tests in opr_sanity work now.

I basically mean that we should adapt the paragraph you quoted upthread
to roughly say something like:

-- Check that there are not aggregates with the same name and different
-- numbers of arguments. For many aggregates - look for string_agg in
-- the archives for an example - the risk of confusing novices, which
-- place ORDER BY in the wrong place, seems too big. If an aggregate is
-- deemed not to be likely to cause such a problem or provides a feature
-- which doesn't seem possibly to provide in another way that, add an
-- excception for it.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-29 22:44:57
Message-ID: 521FCEE9.1070405@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 08/29/2013 05:37 PM, Josh Berkus wrote:
> Tom,
>
>> On further reflection, what the "policy" was actually about was not that
>> we should forbid users from creating potentially-confusing aggregates
>> themselves, but only that we'd avoid having any *built in* aggregates with
>> this hazard. So maybe I'm overthinking this, and the correct reading is
>> just that we should have a policy against built-in variadic aggregates.
> Yes. I think we can assume that anyone smart enough to create a
> variadic aggregate is smart enough to put ORDER BY in the right place.
>
>

It's not the creator who is in danger, though, it's the user of the
aggregate function, AIUI. So unless you're saying that anyone smart
enough to use a variadic aggregate can be assumed to be smart enough to
put ORDER BY in the right place, I don't think this argument holds.

cheers

andrew


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 04:57:13
Message-ID: CAFj8pRBHwLm0_JuHiejuyOK=yeUu6KzSwcYJF1d2kxdKkp7=WA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/8/30 Andrew Dunstan <andrew(at)dunslane(dot)net>

>
> On 08/29/2013 05:37 PM, Josh Berkus wrote:
>
>> Tom,
>>
>> On further reflection, what the "policy" was actually about was not that
>>> we should forbid users from creating potentially-confusing aggregates
>>> themselves, but only that we'd avoid having any *built in* aggregates
>>> with
>>> this hazard. So maybe I'm overthinking this, and the correct reading is
>>> just that we should have a policy against built-in variadic aggregates.
>>>
>> Yes. I think we can assume that anyone smart enough to create a
>> variadic aggregate is smart enough to put ORDER BY in the right place.
>>
>>
>>
> It's not the creator who is in danger, though, it's the user of the
> aggregate function, AIUI. So unless you're saying that anyone smart enough
> to use a variadic aggregate can be assumed to be smart enough to put ORDER
> BY in the right place, I don't think this argument holds.
>

I was one who sent a bug report - this error is not too dangerous, but it
is hidden, and difficult to find, if you don't know what can be happen.
Same as bug with plpgsql and SQL identifier collisions. If you understand,
then you can protect self well and simply. If not, then it is a magic
error. So still I am thing so best solution is

a) a warning when detect ORDER BY in variadic aggregates

b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
it only in WITHIN GROUP syntax where is safe , btw a aggregates that needs
a ORDER BY is better to evaluate different for minimise risk of OOP killer.
This is a good solution without any risks.

Regards

Pavel

>
> cheers
>
> andrew
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/**mailpref/pgsql-hackers<http://www.postgresql.org/mailpref/pgsql-hackers>
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 13:13:51
Message-ID: 4510.1377868431@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
> I was one who sent a bug report - this error is not too dangerous, but it
> is hidden, and difficult to find, if you don't know what can be happen.
> Same as bug with plpgsql and SQL identifier collisions. If you understand,
> then you can protect self well and simply. If not, then it is a magic
> error. So still I am thing so best solution is

> a) a warning when detect ORDER BY in variadic aggregates

Such a warning would never be tolerated by users, because it would appear
even when the query is perfectly correct.

> b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
> it only in WITHIN GROUP syntax where is safe ,

And we're *not* inventing randomly different syntax for variadic
aggregates. That ship sailed when we did it this way for regular
functions.

regards, tom lane


From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 13:34:47
Message-ID: 1377869687956-5769106.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane-2 wrote
> Pavel Stehule &lt;

> pavel.stehule@

> &gt; writes:
>> I was one who sent a bug report - this error is not too dangerous, but it
>> is hidden, and difficult to find, if you don't know what can be happen.
>> Same as bug with plpgsql and SQL identifier collisions. If you
>> understand,
>> then you can protect self well and simply. If not, then it is a magic
>> error. So still I am thing so best solution is
>
>> a) a warning when detect ORDER BY in variadic aggregates
>
> Such a warning would never be tolerated by users, because it would appear
> even when the query is perfectly correct.
>
>> b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
>> it only in WITHIN GROUP syntax where is safe ,
>
> And we're *not* inventing randomly different syntax for variadic
> aggregates. That ship sailed when we did it this way for regular
> functions.

In the example case the problem is that ORDER BY constant is a valid, if
not-very-useful, construct. Can we warn on this specific usage and thus
mitigate many of the potential avenues of mis-use?

If we alter syntax for mitigation purposes I'd want to consider requiring
parentheses around the columns that belong to the ORDER BY instead of using
the full extended syntax of WITHIN GROUP.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769106.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 13:50:30
Message-ID: 20130830135030.GH5019@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2013-08-30 06:34:47 -0700, David Johnston wrote:
> Tom Lane-2 wrote
> >> I was one who sent a bug report - this error is not too dangerous, but it
> >> is hidden, and difficult to find, if you don't know what can be happen.
> >> Same as bug with plpgsql and SQL identifier collisions. If you
> >> understand,
> >> then you can protect self well and simply. If not, then it is a magic
> >> error. So still I am thing so best solution is
> >
> >> a) a warning when detect ORDER BY in variadic aggregates
> >
> > Such a warning would never be tolerated by users, because it would appear
> > even when the query is perfectly correct.
> >
> >> b) disallow ORDER BY in variadic aggregates in classic syntax, and enable
> >> it only in WITHIN GROUP syntax where is safe ,
> >
> > And we're *not* inventing randomly different syntax for variadic
> > aggregates. That ship sailed when we did it this way for regular
> > functions.
>
> In the example case the problem is that ORDER BY constant is a valid, if
> not-very-useful, construct. Can we warn on this specific usage and thus
> mitigate many of the potential avenues of mis-use?

That doesn't help against something like »SELECT string_agg(somecol
ORDER BY bar, separator)« where separator is a column.

> If we alter syntax for mitigation purposes I'd want to consider requiring
> parentheses around the columns that belong to the ORDER BY instead of using
> the full extended syntax of WITHIN GROUP.

I think that ship has sailed. The syntax is there and it's not going
away. Requiring different syntaxes for variadic/nonvariadic usages is
going to be a way much bigger pitfall for users.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: David Johnston <polobo(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 14:02:11
Message-ID: 1377871331145-5769119.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund-3 wrote
> On 2013-08-30 06:34:47 -0700, David Johnston wrote:
>> Tom Lane-2 wrote
>> >> I was one who sent a bug report - this error is not too dangerous, but
>> it
>> >> is hidden, and difficult to find, if you don't know what can be
>> happen.
>> >> Same as bug with plpgsql and SQL identifier collisions. If you
>> >> understand,
>> >> then you can protect self well and simply. If not, then it is a magic
>> >> error. So still I am thing so best solution is
>> >
>> >> a) a warning when detect ORDER BY in variadic aggregates
>> >
>> > Such a warning would never be tolerated by users, because it would
>> appear
>> > even when the query is perfectly correct.
>> >
>> >> b) disallow ORDER BY in variadic aggregates in classic syntax, and
>> enable
>> >> it only in WITHIN GROUP syntax where is safe ,
>> >
>> > And we're *not* inventing randomly different syntax for variadic
>> > aggregates. That ship sailed when we did it this way for regular
>> > functions.
>>
>> In the example case the problem is that ORDER BY constant is a valid, if
>> not-very-useful, construct. Can we warn on this specific usage and thus
>> mitigate many of the potential avenues of mis-use?
>
> That doesn't help against something like »SELECT string_agg(somecol
> ORDER BY bar, separator)« where separator is a column.
>
>> If we alter syntax for mitigation purposes I'd want to consider requiring
>> parentheses around the columns that belong to the ORDER BY instead of
>> using
>> the full extended syntax of WITHIN GROUP.
>
> I think that ship has sailed. The syntax is there and it's not going
> away. Requiring different syntaxes for variadic/nonvariadic usages is
> going to be a way much bigger pitfall for users.

Neither suggestion (nor any suggestion I would imagine) is going to "solve"
the problem. The goal is to minimize the size of the exposure.

For the second ORDER BY (col1, col2) suggestion it would be added and
recommended so those using that syntax would have less to worry about. This
would apply to ALL invocations, not just variadic.

David J.

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 14:05:44
Message-ID: CAFj8pRAXZEB6BEDT=EOfsoeQwxwZy19Xu46gY=hMA5fBPcJcag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2013/8/30 David Johnston <polobo(at)yahoo(dot)com>

> Andres Freund-3 wrote
> > On 2013-08-30 06:34:47 -0700, David Johnston wrote:
> >> Tom Lane-2 wrote
> >> >> I was one who sent a bug report - this error is not too dangerous,
> but
> >> it
> >> >> is hidden, and difficult to find, if you don't know what can be
> >> happen.
> >> >> Same as bug with plpgsql and SQL identifier collisions. If you
> >> >> understand,
> >> >> then you can protect self well and simply. If not, then it is a
> magic
> >> >> error. So still I am thing so best solution is
> >> >
> >> >> a) a warning when detect ORDER BY in variadic aggregates
> >> >
> >> > Such a warning would never be tolerated by users, because it would
> >> appear
> >> > even when the query is perfectly correct.
> >> >
> >> >> b) disallow ORDER BY in variadic aggregates in classic syntax, and
> >> enable
> >> >> it only in WITHIN GROUP syntax where is safe ,
> >> >
> >> > And we're *not* inventing randomly different syntax for variadic
> >> > aggregates. That ship sailed when we did it this way for regular
> >> > functions.
> >>
> >> In the example case the problem is that ORDER BY constant is a valid, if
> >> not-very-useful, construct. Can we warn on this specific usage and thus
> >> mitigate many of the potential avenues of mis-use?
> >
> > That doesn't help against something like »SELECT string_agg(somecol
> > ORDER BY bar, separator)« where separator is a column.
> >
> >> If we alter syntax for mitigation purposes I'd want to consider
> requiring
> >> parentheses around the columns that belong to the ORDER BY instead of
> >> using
> >> the full extended syntax of WITHIN GROUP.
> >
> > I think that ship has sailed. The syntax is there and it's not going
> > away. Requiring different syntaxes for variadic/nonvariadic usages is
> > going to be a way much bigger pitfall for users.
>
> Neither suggestion (nor any suggestion I would imagine) is going to "solve"
> the problem. The goal is to minimize the size of the exposure.
>
> For the second ORDER BY (col1, col2) suggestion it would be added and
> recommended so those using that syntax would have less to worry about.
> This
> would apply to ALL invocations, not just variadic.
>

you cannot use this syntax - probably, because (col1, col2) is same as
ROW(col1, col2) and this syntax is supported now. So there is a collision.
I had a same idea, but I don't think so it is possible

Regards

Pavel

>
> David J.
>
>
>
>
>
>
> --
> View this message in context:
> http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: David Johnston <polobo(at)yahoo(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 14:13:01
Message-ID: 6214.1377871981@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Johnston <polobo(at)yahoo(dot)com> writes:
>>> If we alter syntax for mitigation purposes I'd want to consider requiring
>>> parentheses around the columns that belong to the ORDER BY instead of
>>> using the full extended syntax of WITHIN GROUP.

Unfortunately, that ORDER BY syntax is specified by the SQL standard,
and they didn't say anything about parentheses. We don't get to
require parens there.

The particular case that's standardized is only array_agg():

<array aggregate function> ::=
ARRAY_AGG
<left paren> <value expression> [ ORDER BY <sort specification list> ]
<right paren>

but, as we customarily do, we didn't restrict the feature to be used only
with that aggregate.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-30 23:56:26
Message-ID: 23790.1377906986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> On 2013-08-29 18:29:34 -0400, Tom Lane wrote:
>> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>>> I'd say we let the check in there but have a list of exceptions in it so
>>> that one has to explicitly think about the issue before adding the
>>> function.

>> That's pretty much how the tests in opr_sanity work now.

> I basically mean that we should adapt the paragraph you quoted upthread
> to roughly say something like:

Attached is a complete draft patch for this. I modified the text about
the opr_sanity test a bit (though not exactly as you suggest) and also
added some text in xaggr.sgml pointing out the hazard, so that users can
make informed decisions about whether they want to use the feature.

My basic approach was to share the existing grammar and catalog support
for declaring variadic function arguments. That made it practically free
to add support for storing argument names for aggregates, so I did that
too. However, it was and remains true that aggregates don't support
default values for arguments, nor do we support writing calls in named or
mixed parameter style for them. The same is true of window functions.
I felt that fixing those things was out of scope for this patch, but it
might be something somebody else wants to work on. (I think that fixing
this might be relatively easy for window functions --- just need to get
the planner to apply its existing code for fixing up regular FuncExprs.
But it's less than trivial for aggregates because of our use of
TargetEntry list representation for aggregate parameter lists.)
Addition of default values for aggregates would run up against the same
number-of-arguments ambiguity we were discussing for VARIADIC, but I don't
see why we shouldn't adopt the same we-won't-do-this-but-users-can stance
as for VARIADIC.

Another point is that variadic window functions could stand some love.
I think it partially works, but ruleutils.c at least doesn't know how to
reverse-list such calls correctly. Again, that seemed out of scope for
this patch.

Comments?

regards, tom lane

Attachment Content-Type Size
variadic-aggregates-1.patch text/x-diff 68.1 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-31 02:44:55
Message-ID: 20130831024455.GB5931@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Uh, the pg_dump part checks for version 80400, shouldn't it be 90400?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Variadic aggregates vs. project policy
Date: 2013-08-31 03:02:05
Message-ID: 28790.1377918125@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
> Uh, the pg_dump part checks for version 80400, shouldn't it be 90400?

The reasoning there is that 8.4 is where we added
pg_get_function_arguments(), so this dumping code should work against that
server version or later. (Oh, memo to self: test that.) It's true that
pre-9.4 servers are not going to produce any useful extra info by using
pg_get_function_arguments() on an aggregate; but the general idea of this
patch is to make the aggregate-related code look more like the
plain-function-related code, so using the same version cutoffs in both
cases seemed to make sense.

regards, tom lane