Re: Final Patch for GROUPING SETS

Lists: pgsql-hackers
From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-13 18:37:03
Message-ID: CAOeZVidmVRe2jU6aMk_5qkxnB7dfmPROzM7Ur8JPW5j8Y5X-Lw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is phase 1 (of either 2 or 3) of implementation of the standard
GROUPING SETS feature, done by Andrew Gierth and myself.

Unlike previous attempts at this feature, we make no attempt to do
any serious work in the parser; we perform some minor syntactic
simplifications described in the spec, such as removing excess parens,
but the original query structure is preserved in views and so on.

So far, we have done most of the actual work in the executor, but
further phases will concentrate on the planner. We have not yet
tackled the hard problem of generating plans that require multiple
passes over the same input data; see below regarding design issues.

What works so far:

- all the standard syntax is accepted (but many combinations are not
plannable yet)

- while the spec only allows column references in GROUP BY, we
continue to allow arbitrary expressions

- grouping sets which can be computed in a single pass over sorted
data (i.e. anything that can be reduced to simple columns plus one
ROLLUP clause, regardless of how it was specified in the query), are
implemented as part of the existing GroupAggregate executor node

- all kinds of aggregate functions, including ordered set functions
and user-defined aggregates, are supported in conjunction with
grouping sets (no API changes, other than one caveat about fn_extra)

- the GROUPING() operation defined in the spec is implemented,
including support for multiple args, and supports arbitrary
expressions as an extension to the spec

Changes/incompatibilities:

- the big compatibility issue: CUBE and ROLLUP are now partially
reserved (col_name_keyword), which breaks contrib/cube. A separate
patch for contrib/ is attached that renames the cube type to "cube"; a
new name really needs to be chosen.

- GROUPING is now a fully reserved word, and SETS is an unreserved keyword

- GROUP BY (a,b) now means GROUP BY a,b (as required by spec).
GROUP BY ROW(a,b) still has the old meaning.

- GROUP BY () is now supported too.

- fn_extra for aggregate calls is per-call-site and NOT
per-transition-value - the same fn_extra will be used for interleaved
calls to the transition function with different transition values.
fn_extra, if used at all, should be used only for per-call-site info
such as data types, as clarified in the 9.4beta changes to the ordered
set function implementation.

Future work:

We envisage that handling of arbitrary grouping sets will be best
done by having the planner generating an Append of multiple
aggregation paths, presumably with some way of moving the original
input path to a CTE. We have not really explored yet how hard this
will be; suggestions are welcome.

In the executor, it is obviously possible to extend HashAggregate to
handle arbitrary collections of grouping sets, but even if the memory
usage issue were solved, this would leave the question of what to do
with non-hashable data types, so it seems that the planner work
probably can't be avoided.

A new name needs to be found for the "cube" data type.

At this point we are more interested in design review rather than
necessarily committing this patch in its current state. However,
committing it may make future work easier; we leave that question
open.

Regards,

Atri

Attachment Content-Type Size
groupingsets_ver1.patch text/x-diff 168.3 KB

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-13 18:43:31
Message-ID: CAOeZVif-GbQkjGT28tcqAD-C41o_exzcSTmHG7Ppeuhxp1KYMQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 14, 2014 at 12:07 AM, Atri Sharma <atri(dot)jiit(at)gmail(dot)com> wrote:

> This is phase 1 (of either 2 or 3) of implementation of the standard GROUPING SETS feature, done by Andrew Gierth and myself.
>
>
>
> Unlike previous attempts at this feature, we make no attempt to do any serious work in the parser; we perform some minor syntactic simplifications described in the spec, such as removing excess parens, but the original query structure is preserved in views and so on.
>
>
>
> So far, we have done most of the actual work in the executor, but further phases will concentrate on the planner. We have not yet tackled the hard problem of generating plans that require multiple passes over the same input data; see below regarding design issues.
>
>
>
> What works so far:
>
>
>
> - all the standard syntax is accepted (but many combinations are not plannable yet)
>
> - while the spec only allows column references in GROUP BY, we continue to allow arbitrary expressions
>
> - grouping sets which can be computed in a single pass over sorted data (i.e. anything that can be reduced to simple columns plus one ROLLUP clause, regardless of how it was specified in the query), are implemented as part of the existing GroupAggregate executor node
>
> - all kinds of aggregate functions, including ordered set functions and user-defined aggregates, are supported in conjunction with grouping sets (no API changes, other than one caveat about fn_extra)
>
> - the GROUPING() operation defined in the spec is implemented, including support for multiple args, and supports arbitrary expressions as an extension to the spec
>
>
>
> Changes/incompatibilities:
>
>
>
> - the big compatibility issue: CUBE and ROLLUP are now partially reserved (col_name_keyword), which breaks contrib/cube. A separate patch for contrib/ is attached that renames the cube type to "cube"; a new name really needs to be chosen.
>
> - GROUPING is now a fully reserved word, and SETS is an unreserved keyword
>
> - GROUP BY (a,b) now means GROUP BY a,b (as required by spec). GROUP BY ROW(a,b) still has the old meaning.
>
> - GROUP BY () is now supported too.
>
> - fn_extra for aggregate calls is per-call-site and NOT per-transition-value - the same fn_extra will be used for interleaved calls to the transition function with different transition values. fn_extra, if used at all, should be used only for per-call-site info such as data types, as clarified in the 9.4beta changes to the ordered set function implementation.
>
>
>
> Future work:
>
>
>
> We envisage that handling of arbitrary grouping sets will be best done by having the planner generating an Append of multiple aggregation paths, presumably with some way of moving the original input path to a CTE. We have not really explored yet how hard this will be; suggestions are welcome.
>
>
>
> In the executor, it is obviously possible to extend HashAggregate to handle arbitrary collections of grouping sets, but even if the memory usage issue were solved, this would leave the question of what to do with non-hashable data types, so it seems that the planner work probably can't be avoided.
>
>
>
> A new name needs to be found for the "cube" data type.
>
>
>
> At this point we are more interested in design review rather than necessarily committing this patch in its current state. However, committing it may make future work easier; we leave that question open.
>
>
>
>
Sorry, forgot to attach the patch for fixing cube in contrib, which breaks
since we now reserve "cube" keyword. Please find attached the same.

Regards,

Atri

Attachment Content-Type Size
cube_contribfixes.patch text/x-diff 228.3 KB

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 06:32:36
Message-ID: 53F59284.1050801@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/13/2014 09:43 PM, Atri Sharma wrote:
> Sorry, forgot to attach the patch for fixing cube in contrib, which breaks
> since we now reserve "cube" keyword. Please find attached the same.

Ugh, that will make everyone using the cube extension unhappy. After
this patch, they will have to quote contrib's cube type and functions
every time.

I think we should bite the bullet and rename the extension, and its
"cube" type and functions. For an application, having to suddenly quote
it has the same effect as renaming it; you'll have to find all the
callers and change them. And in the long-run, it's clearly better to
have an unambiguous name.

- Heikki


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 07:37:42
Message-ID: 87r40aqph5.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:

> On 08/13/2014 09:43 PM, Atri Sharma wrote:
>> Sorry, forgot to attach the patch for fixing cube in contrib,
>> which breaks since we now reserve "cube" keyword. Please find
>> attached the same.

Heikki> Ugh, that will make everyone using the cube extension
Heikki> unhappy. After this patch, they will have to quote contrib's
Heikki> cube type and functions every time.

Heikki> I think we should bite the bullet and rename the extension,

I agree, the contrib/cube patch as posted is purely so we could test
everything without having to argue over the new name first. (And it
is posted separately from the main patch because of its length and
utter boringness.)

However, even if/when a new name is chosen, there's the question of
how to make the upgrade path easiest. Once CUBE is reserved,
up-to-date pg_dump will quote all uses of the "cube" type and function
when dumping an older database (except inside function bodies of
course), so there may be merit in keeping a "cube" domain over the new
type, and maybe also merit in keeping the extension name.

So what's the new type name going to be? cuboid? hypercube?
geometric_cube? n_dimensional_box?

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 10:28:36
Message-ID: 87mwayqjv6.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


A progress update:

Atri> We envisage that handling of arbitrary grouping sets will be
Atri> best done by having the planner generating an Append of
Atri> multiple aggregation paths, presumably with some way of moving
Atri> the original input path to a CTE. We have not really explored
Atri> yet how hard this will be; suggestions are welcome.

This idea was abandoned.

Instead, we have implemented full support for arbitrary grouping sets
by means of a chaining system:

explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred);

QUERY PLAN
-----------------------------------------------------------------------------------------------------
GroupAggregate
Output: four, ten, hundred, count(*)
Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), ()
-> Sort
Output: four, ten, hundred
Sort Key: onek.hundred, onek.four, onek.ten
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
-> Sort
Output: four, ten, hundred
Sort Key: onek.ten, onek.hundred
-> ChainAggregate
Output: four, ten, hundred
Grouping Sets: (onek.four, onek.ten), (onek.four)
-> Sort
Output: four, ten, hundred
Sort Key: onek.four, onek.ten
-> Seq Scan on public.onek
Output: four, ten, hundred
(20 rows)

The ChainAggregate nodes use a tuplestore to communicate with the
GroupAggregate node at the top of the chain; they pass through input
tuples unchanged, and write aggregated result rows to the tuplestore,
which the top node then returns once it has finished its own result.

The organization of the planner code seems to be actively hostile to
any attempt to break out new CTEs on the fly, or to plan parts of the
query more than once; the method above seems to be the easiest way to
avoid those issues.

Atri> At this point we are more interested in design review rather
Atri> than necessarily committing this patch in its current state.

This no longer applies; we expect to post within a day or two an
updated patch with full functionality.

--
Andrew (irc:RhodiumToad)


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 10:50:09
Message-ID: 53F5CEE1.6030205@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08/21/2014 01:28 PM, Andrew Gierth wrote:
>
> A progress update:
>
> Atri> We envisage that handling of arbitrary grouping sets will be
> Atri> best done by having the planner generating an Append of
> Atri> multiple aggregation paths, presumably with some way of moving
> Atri> the original input path to a CTE. We have not really explored
> Atri> yet how hard this will be; suggestions are welcome.
>
> This idea was abandoned.
>
> Instead, we have implemented full support for arbitrary grouping sets
> by means of a chaining system:
>
> explain (verbose, costs off) select four, ten, hundred, count(*) from onek group by cube(four,ten,hundred);
>
> QUERY PLAN
> -----------------------------------------------------------------------------------------------------
> GroupAggregate
> Output: four, ten, hundred, count(*)
> Grouping Sets: (onek.hundred, onek.four, onek.ten), (onek.hundred, onek.four), (onek.hundred), ()
> -> Sort
> Output: four, ten, hundred
> Sort Key: onek.hundred, onek.four, onek.ten
> -> ChainAggregate
> Output: four, ten, hundred
> Grouping Sets: (onek.ten, onek.hundred), (onek.ten)
> -> Sort
> Output: four, ten, hundred
> Sort Key: onek.ten, onek.hundred
> -> ChainAggregate
> Output: four, ten, hundred
> Grouping Sets: (onek.four, onek.ten), (onek.four)
> -> Sort
> Output: four, ten, hundred
> Sort Key: onek.four, onek.ten
> -> Seq Scan on public.onek
> Output: four, ten, hundred
> (20 rows)

Uh, that's ugly. The EXPLAIN out I mean; as an implementation detail
chaining the nodes might be reasonable. But the above gets unreadable if
you have more than a few grouping sets.

> The ChainAggregate nodes use a tuplestore to communicate with the
> GroupAggregate node at the top of the chain; they pass through input
> tuples unchanged, and write aggregated result rows to the tuplestore,
> which the top node then returns once it has finished its own result.

Hmm, so there's a "magic link" between the GroupAggregate at the top and
all the ChainAggregates, via the tuplestore. That may be fine, we have
special rules in passing information between bitmap scan nodes too.

But rather than chain multiple ChainAggregate nodes, how about just
doing all the work in the top GroupAggregate node?

> Atri> At this point we are more interested in design review rather
> Atri> than necessarily committing this patch in its current state.
>
> This no longer applies; we expect to post within a day or two an
> updated patch with full functionality.

Ok, cool

- Heikki


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 15:00:25
Message-ID: 14962.1408633225@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> Heikki> I think we should bite the bullet and rename the extension,

> I agree, the contrib/cube patch as posted is purely so we could test
> everything without having to argue over the new name first.

I wonder if you've tried hard enough to avoid reserving the keyword.

I think that the cube extension is not going to be the only casualty
if "cube" becomes a reserved word --- that seems like a name that
could be in use in lots of applications. ("What do you mean, 9.5
breaks our database for tracking office space?") It would be worth
quite a bit of effort to avoid that.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 15:01:16
Message-ID: 87d2btrkdz.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:

Heikki> Uh, that's ugly. The EXPLAIN out I mean; as an implementation
Heikki> detail chaining the nodes might be reasonable. But the above
Heikki> gets unreadable if you have more than a few grouping sets.

It's good for highlighting performance issues in EXPLAIN, too.

4096 grouping sets takes about a third of a second to plan and execute,
but something like a minute to generate the EXPLAIN output. However,
for more realistic sizes, plan time is not significant and explain
takes only about 40ms for 256 grouping sets.

(To avoid resource exhaustion issues, we have set a limit of,
currently, 4096 grouping sets per query level. Without such a limit,
it is easy to write queries that would take TBs of memory to parse or
plan. MSSQL and DB2 have similar limits, I'm told.)

>> The ChainAggregate nodes use a tuplestore to communicate with the
>> GroupAggregate node at the top of the chain; they pass through input
>> tuples unchanged, and write aggregated result rows to the tuplestore,
>> which the top node then returns once it has finished its own result.

Heikki> Hmm, so there's a "magic link" between the GroupAggregate at
Heikki> the top and all the ChainAggregates, via the tuplestore. That
Heikki> may be fine, we have special rules in passing information
Heikki> between bitmap scan nodes too.

Eh. It's far from a perfect solution, but the planner doesn't lend itself
to perfect solutions.

Heikki> But rather than chain multiple ChainAggregate nodes, how
Heikki> about just doing all the work in the top GroupAggregate node?

It was easier this way. (How would you expect to do it all in the top
node when each subset of the grouping sets list needs to see the data
in a different order?)

--
Andrew (irc:RhodiumToad)


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 15:05:54
Message-ID: CAFj8pRAAH5tnQEj8OyikoUGNo-UWFvULquNFs+J6kRFkCZGL0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-08-21 17:00 GMT+02:00 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:

> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> > "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > Heikki> I think we should bite the bullet and rename the extension,
>
> > I agree, the contrib/cube patch as posted is purely so we could test
> > everything without having to argue over the new name first.
>
> I wonder if you've tried hard enough to avoid reserving the keyword.
>
> I think that the cube extension is not going to be the only casualty
> if "cube" becomes a reserved word --- that seems like a name that
> could be in use in lots of applications. ("What do you mean, 9.5
> breaks our database for tracking office space?") It would be worth
> quite a bit of effort to avoid that.
>

My prototypes worked without reserved keywords if I remember well

but analyzer is relatively complex

Pavel

>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 15:58:14
Message-ID: 874mx5rim4.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> I agree, the contrib/cube patch as posted is purely so we could test
>> everything without having to argue over the new name first.

Tom> I wonder if you've tried hard enough to avoid reserving the keyword.

GROUP BY cube(a,b) is currently legal syntax and means something completely
incompatible to what the spec requires.

GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression
to group on, or a list of grouping sets to expand?

GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
as a function call rather than a grouping set? What about GROUP BY
GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)

Accepting those as valid implies a degree of possible confusion that I
personally regard as quite questionable. Previous discussion seemed to
have accepted that contrib/cube was going to have to be renamed.

Tom> I think that the cube extension is not going to be the only
Tom> casualty if "cube" becomes a reserved word --- that seems like a
Tom> name that could be in use in lots of applications. ("What do
Tom> you mean, 9.5 breaks our database for tracking office space?")
Tom> It would be worth quite a bit of effort to avoid that.

It has been a reserved word in the spec since, what, 1999? and it is a
reserved word in mssql, oracle, db2, etc.?

It only needs to be a col_name_keyword, so it still works as a table
or column name (as usual we are less strict than the spec in that
respect). I'm looking into whether it can be made unreserved, but I
have serious doubts about this being a good idea.

--
Andrew (irc:RhodiumToad)


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 16:04:15
Message-ID: CAFj8pRBEzT7QdgtckgK6oL_-LnpkdPbFYJKrLvwgigtgvANKaQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-08-21 17:58 GMT+02:00 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:

> >>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> >> I agree, the contrib/cube patch as posted is purely so we could test
> >> everything without having to argue over the new name first.
>
> Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
>
> GROUP BY cube(a,b) is currently legal syntax and means something
> completely
> incompatible to what the spec requires.
>
> GROUP BY GROUPING SETS (cube(a,b), c) -- is that cube(a,b) an expression
> to group on, or a list of grouping sets to expand?
>
> GROUP BY (cube(a,b)) -- should that be an error, or silently treat it
> as a function call rather than a grouping set? What about GROUP BY
> GROUPING SETS ((cube(a,b)) ? (both are errors in our patch)
>
> Accepting those as valid implies a degree of possible confusion that I
> personally regard as quite questionable. Previous discussion seemed to
> have accepted that contrib/cube was going to have to be renamed.
>
> Tom> I think that the cube extension is not going to be the only
> Tom> casualty if "cube" becomes a reserved word --- that seems like a
> Tom> name that could be in use in lots of applications. ("What do
> Tom> you mean, 9.5 breaks our database for tracking office space?")
> Tom> It would be worth quite a bit of effort to avoid that.
>
> It has been a reserved word in the spec since, what, 1999? and it is a
> reserved word in mssql, oracle, db2, etc.?
>
> It only needs to be a col_name_keyword, so it still works as a table
> or column name (as usual we are less strict than the spec in that
> respect). I'm looking into whether it can be made unreserved, but I
> have serious doubts about this being a good idea.
>

+1

contrib module should be renamed - more - current name is confusing against
usual functionality related to words CUBE and ROLLUP

Pavel

>
> --
> Andrew (irc:RhodiumToad)
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 18:13:30
Message-ID: 848.1408644810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> I wonder if you've tried hard enough to avoid reserving the keyword.

> GROUP BY cube(a,b) is currently legal syntax and means something completely
> incompatible to what the spec requires.

Well, if there are any extant applications that use that exact phrasing,
they're going to be broken in any case. That does not mean that we have
to break every other appearance of "cube". I think that special-casing
appearances of cube(...) in GROUP BY lists might be a feasible approach.

Basically, I'm afraid that unilaterally renaming cube is going to break
enough applications that there will be more people who flat out don't
want this patch than there will be who get benefit from it, and we end
up voting to revert the feature altogether. If you'd like to take that
risk then feel free to charge full steam ahead, but don't say you were
not warned. And don't bother arguing that CUBE is reserved according to
the standard, because that will not make one damn bit of difference
to the people who will be unhappy.

regards, tom lane


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 18:48:52
Message-ID: CAHyXU0yvJQR473bb5u90UVF5oRX4Z2ACt3m8VmU_qdiOwdz1zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 21, 2014 at 1:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
>
>> GROUP BY cube(a,b) is currently legal syntax and means something completely
>> incompatible to what the spec requires.
>
> Well, if there are any extant applications that use that exact phrasing,
> they're going to be broken in any case. That does not mean that we have
> to break every other appearance of "cube". I think that special-casing
> appearances of cube(...) in GROUP BY lists might be a feasible approach.
>
> Basically, I'm afraid that unilaterally renaming cube is going to break
> enough applications that there will be more people who flat out don't
> want this patch than there will be who get benefit from it, and we end
> up voting to revert the feature altogether. If you'd like to take that
> risk then feel free to charge full steam ahead, but don't say you were
> not warned. And don't bother arguing that CUBE is reserved according to
> the standard, because that will not make one damn bit of difference
> to the people who will be unhappy.

I have to respectfully disagree. Certainly, if there is some
reasonable way to not have to change 'cube' then great. But the
tonnage rule applies here: even considering compatibility issues, when
considering the importance of standard SQL (and, I might add,
exceptionally useful) syntax and a niche extension, 'cube' is going to
have to get out of the way. There are view valid reasons to break
compatibility but blocking standard syntax is definitely one of them.

merlin


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 20:43:04
Message-ID: 53F659D8.4030107@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 08/21/2014 02:48 PM, Merlin Moncure wrote:

>> Basically, I'm afraid that unilaterally renaming cube is going to break
>> enough applications that there will be more people who flat out don't
>> want this patch than there will be who get benefit from it, and we end
>> up voting to revert the feature altogether. If you'd like to take that
>> risk then feel free to charge full steam ahead, but don't say you were
>> not warned. And don't bother arguing that CUBE is reserved according to
>> the standard, because that will not make one damn bit of difference
>> to the people who will be unhappy.
> I have to respectfully disagree. Certainly, if there is some
> reasonable way to not have to change 'cube' then great. But the
> tonnage rule applies here: even considering compatibility issues, when
> considering the importance of standard SQL (and, I might add,
> exceptionally useful) syntax and a niche extension, 'cube' is going to
> have to get out of the way. There are view valid reasons to break
> compatibility but blocking standard syntax is definitely one of them.
>

I'm inclined to think that the audience for this is far larger than the
audience for the cube extension, which I have not once encountered in
the field.

But I guess we all have different experiences.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 21:13:33
Message-ID: 9819.1408655613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> I'm inclined to think that the audience for this is far larger than the
> audience for the cube extension, which I have not once encountered in
> the field.

Perhaps so. I would really prefer not to have to get into estimating
how many people will be inconvenienced how badly. It's clear to me
that not a lot of sweat has been put into seeing if we can avoid
reserving the keyword, and I think we need to put in that effort.
We've jumped through some pretty high hoops to avoid reserving keywords in
the past, so I don't think this patch should get a free pass on the issue.

Especially considering that renaming the cube extension isn't exactly
going to be zero work: there is no infrastructure for such a thing.
A patch consisting merely of s/cube/foobar/g isn't going to cut it.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 22:15:33
Message-ID: 20140821221533.GR16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > I'm inclined to think that the audience for this is far larger than the
> > audience for the cube extension, which I have not once encountered in
> > the field.

+1

> Perhaps so. I would really prefer not to have to get into estimating
> how many people will be inconvenienced how badly. It's clear to me
> that not a lot of sweat has been put into seeing if we can avoid
> reserving the keyword, and I think we need to put in that effort.

I'm with Merlin on this one, it's going to end up happening and I don't
know that 9.5 is any worse than post-9.5 to make this change.

> We've jumped through some pretty high hoops to avoid reserving keywords in
> the past, so I don't think this patch should get a free pass on the issue.

This doesn't feel like an attempt to get a free pass on anything- it's
not being proposed as fully reserved and there is spec-defined syntax
which needs to be supported. If we can get away with keeping it
unreserved while not making it utterly confusing for users and
convoluting the code, great, but that doesn't seem likely to pan out.

> Especially considering that renaming the cube extension isn't exactly
> going to be zero work: there is no infrastructure for such a thing.
> A patch consisting merely of s/cube/foobar/g isn't going to cut it.

This is a much more interesting challenge to deal with, but perhaps we
could include a perl script or pg_upgrade snippet for users to run to
see if they have the extension installed and to do some magic before the
actual upgrade to handle the rename..?

Thanks,

Stephen


From: David Fetter <david(at)fetter(dot)org>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-21 23:11:02
Message-ID: 20140821231102.GA4967@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 21, 2014 at 06:15:33PM -0400, Stephen Frost wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> > Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> > > I'm inclined to think that the audience for this is far larger than the
> > > audience for the cube extension, which I have not once encountered in
> > > the field.
>
> +1

I haven't seen it in the field either.

I'd also like to mention that the mere presence of a module in our
contrib/ directory can reflect bad decisions that need reversing just
as easily as it can the presence of vitally important utilities that
need to be preserved. I'm pretty sure the cube extension arrived
after the CUBE keyword in SQL, which makes that an error on our part
if true.

> > Especially considering that renaming the cube extension isn't
> > exactly going to be zero work: there is no infrastructure for such
> > a thing. A patch consisting merely of s/cube/foobar/g isn't going
> > to cut it.
>
> This is a much more interesting challenge to deal with, but perhaps
> we could include a perl script or pg_upgrade snippet for users to
> run to see if they have the extension installed and to do some magic
> before the actual upgrade to handle the rename..?

+1 for doing this. Do we want to make some kind of generator for such
things? It doesn't seem hard in principle, but I haven't tried coding
it up yet.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 01:54:56
Message-ID: 871ts9pd4h.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Stephen" == Stephen Frost <sfrost(at)snowman(dot)net> writes:

>>> I'm inclined to think that the audience for this is far larger
>>> than the audience for the cube extension, which I have not once
>>> encountered in the field.

Stephen> +1

Most of my encounters with cube have been me suggesting it to people
on IRC as a possible approach for solving certain kinds of performance
problems by converting them to N-dimensional spatial containment
queries. Sometimes this works well, but it doesn't seem to be an
approach that many people discover on their own.

>> We've jumped through some pretty high hoops to avoid reserving
>> keywords in the past, so I don't think this patch should get a
>> free pass on the issue.

Stephen> This doesn't feel like an attempt to get a free pass on
Stephen> anything- it's not being proposed as fully reserved and
Stephen> there is spec-defined syntax which needs to be supported.
Stephen> If we can get away with keeping it unreserved while not
Stephen> making it utterly confusing for users and convoluting the
Stephen> code, great, but that doesn't seem likely to pan out.

Having now spent some more time looking, I believe there is a solution
which makes it unreserved which does not require any significant pain
in the code. I'm not entirely convinced that this is the right
approach in the long term, but it might allow for a more planned
transition.

The absolute minimum seems to be:

GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
list as a <grouping operation> looks like a function call for any
argument types)

CUBE, ROLLUP, SETS as unreserved_keyword

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 07:18:13
Message-ID: 87tx55nhh7.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> Perhaps so. I would really prefer not to have to get into
Tom> estimating how many people will be inconvenienced how badly.
Tom> It's clear to me that not a lot of sweat has been put into
Tom> seeing if we can avoid reserving the keyword, and I think we
Tom> need to put in that effort.

So, first nontrivial issue that crops up is this: if CUBE is
unreserved, then ruleutils will output the string "cube(a,b)" for a
function call to a function named "cube", on the assumption that it
will parse back as a single unit (which inside a GROUP BY is not
true).

Options:

1) when outputting GROUP BY clauses (and nothing else), put parens
around anything that isn't provably atomic; or put parens around
anything that might look like a function call; or put parens around
anything that looks like a function call with a keyword name

2) when outputting any function call, add parens if the name is an
unreserved keyword

3) when outputting any function call, quote the name if it is an
unreserved keyword

4) something else?

(This of course means that if someone has a cube() function call in
a group by clause of a view, then upgrading will change the meaning
of the view and possibly fail to create it; there seems to be no fix
for this, not even using the latest pg_dump, since pg_dump relies on
the old server's ruleutils)

--
Andrew (irc:RhodiumToad)


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 12:54:27
Message-ID: 20140822125427.GX16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andrew Gierth (andrew(at)tao11(dot)riddles(dot)org(dot)uk) wrote:
> Having now spent some more time looking, I believe there is a solution
> which makes it unreserved which does not require any significant pain
> in the code. I'm not entirely convinced that this is the right
> approach in the long term, but it might allow for a more planned
> transition.
>
> The absolute minimum seems to be:
>
> GROUPING as a col_name_keyword (since GROUPING(x,y,...) in the select
> list as a <grouping operation> looks like a function call for any
> argument types)
>
> CUBE, ROLLUP, SETS as unreserved_keyword

This means

GROUP BY cube(x,y)
GROUP BY (cube(x,y))
GROUP BY cube(x)

all end up with different meanings though, right?

I'm not sure that's really a better situation.

Thanks,

Stephen


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 14:29:42
Message-ID: 20140822142942.GK6343@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth wrote:

> (This of course means that if someone has a cube() function call in
> a group by clause of a view, then upgrading will change the meaning
> of the view and possibly fail to create it; there seems to be no fix
> for this, not even using the latest pg_dump, since pg_dump relies on
> the old server's ruleutils)

This sucks. Can we tweak pg_dump to check for presence of the cube
extension, and if found refuse to dump unless a minor version older than
some hardcoded version (known to have fixed ruleutils) is used?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 15:34:11
Message-ID: 87fvgoo80r.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Alvaro" == Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:

>> (This of course means that if someone has a cube() function call
>> in a group by clause of a view, then upgrading will change the
>> meaning of the view and possibly fail to create it; there seems to
>> be no fix for this, not even using the latest pg_dump, since
>> pg_dump relies on the old server's ruleutils)

Alvaro> This sucks. Can we tweak pg_dump to check for presence of
Alvaro> the cube extension, and if found refuse to dump unless a
Alvaro> minor version older than some hardcoded version (known to
Alvaro> have fixed ruleutils) is used?

I honestly don't think it's worth it. cube() is not a function that
really makes any sense in a GROUP BY, though of course someone could
have written their own function called cube() that does something
else; while this case is a problem, it is also likely to be
vanishingly rare.

--
Andrew (irc:RhodiumToad)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 17:31:48
Message-ID: CA+TgmoZ-+yZCKaHo3ejj2VqTwJicBb_TcTZCOpehyJPW_W=W-Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Tom> I wonder if you've tried hard enough to avoid reserving the keyword.
>
>> GROUP BY cube(a,b) is currently legal syntax and means something completely
>> incompatible to what the spec requires.
>
> Well, if there are any extant applications that use that exact phrasing,
> they're going to be broken in any case. That does not mean that we have
> to break every other appearance of "cube". I think that special-casing
> appearances of cube(...) in GROUP BY lists might be a feasible approach.

Not really. As pointed out downthread, you can't distinguish "cube"
from CUBE. We could fix that with a big enough hammer, of course, but
it would be a mighty big hammer.

More generally, I think it makes a lot of sense to "work harder" to
reserve keywords less when there's no fundamental semantic conflict,
but when there is, trying to do things like special case stuff
depending on context results in a situation where some keywords are a
little more reserved than others. As you pointed out in discussions
of CREATE INDEX CONCURRENTLY, that's confusing:

http://www.postgresql.org/message-id/10769.1261775601@sss.pgh.pa.us
(refer second paragraph)

I think we should:

(1) Rename the cube extension. With a bat. I have yet to encounter a
single user who is using it, but there probably are some. They'll
have to get over it; GROUPING SETS is roughly a hundred times more
important than the cube extension. The most I'd do to cater to
existing users of the extension is provide an SQL script someplace
that renames the extension and all of its containing objects so that
you can do that before running pg_dump/pg_upgrade.

(2) Reserve CUBE to the extent necessary to implement this feature.
Some people won't like this, but that's always true when we reserve a
keyword, and I don't think refusing to implement an SQL-standard
feature for which there is considerable demand is the right way to fix
that. Here are the last five keywords we partially or fully reserved:
LATERAL (fully reserved), COLLATION (type_func_name), XMLEXISTS
(col_name), BETWEEN (was type_func_name, became col_name),
CONCURRENTLY (type_func_name). That takes us back to December 2009,
so the rate at which we do this is just under one a year, and there
haven't been many screams about it. I grant you that "cube" is a
slightly more plausible identifier than any of those, but I don't
think we should let the fact that we happen to have an extension with
that name prejudice us too much about how common it really is.

Mind you, I'm not trying to say that we don't need to be judicious in
reserving keywords, or even adding them at all: I've argued against
those things on numerous occasions, and have done work to let us get
rid of keywords we've previously had. I just think that this is a big
enough, important enough feature that we'll please more people than we
disappoint. And I think trying to walk some middle way where we
distinguish on context is going to be a mess.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 18:02:42
Message-ID: 4652.1408730562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Well, if there are any extant applications that use that exact phrasing,
>> they're going to be broken in any case. That does not mean that we have
>> to break every other appearance of "cube". I think that special-casing
>> appearances of cube(...) in GROUP BY lists might be a feasible approach.

> Not really. As pointed out downthread, you can't distinguish "cube"
> from CUBE. We could fix that with a big enough hammer, of course, but
> it would be a mighty big hammer.

I'm not convinced of that; I think some creative hackery in the grammar
might be able to deal with this. It would be a bit ugly, for sure, but
if it works it would be a localized fix. Meanwhile, I don't believe
that it's going to be possible to rename the cube extension in any way
that's even remotely acceptable for its users ("remotely acceptable"
here means "pg_upgrade works", never mind what's going to be needed
to fix their applications). So the proposal you are pushing is going
to result in seriously teeing off some fraction of our userbase;
and the argument why that would be acceptable seems to boil down to
"I think there are few enough of them that we don't have to care"
(an opinion based on little evidence IMO). I think it's worth investing
some work, and perhaps accepting some ugly code, to try to avoid that.

regards, tom lane


From: Greg Stark <stark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 18:42:50
Message-ID: CAM-w4HNi7Qq19erLYitiZE0fdy=0h6s68Kc+Y_MRUFhQf_P8bw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> So the proposal you are pushing is going
> to result in seriously teeing off some fraction of our userbase;
> and the argument why that would be acceptable seems to boil down to
> "I think there are few enough of them that we don't have to care"
> (an opinion based on little evidence IMO

FWIW here's some evidence... Craig Kersteins did a talk on the
statistics across the Heroku fleet: Here are the slides from 2013
though I think there's an updated slide deck with more recent numbers
out there:
https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use

Cube shows up as the number 9 most popular extension with about 1% of
databases having it installed (tied with pg_crypto and earthdistance).
That's a lot more than I would have expected actually.

Personally I would love to change the name because I always found the
name the most confusing thing about it. It took me forever to figure
out what on earth a "cube" was. It's actually a vector data type which
is actually a pretty useful idea.

--
greg


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 18:52:50
Message-ID: CA+TgmoZp5tDw_K_ichmrdxCXjDqmsWovWMoZ2KAzFZFYc0Q75w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Aug 21, 2014 at 2:13 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Well, if there are any extant applications that use that exact phrasing,
>>> they're going to be broken in any case. That does not mean that we have
>>> to break every other appearance of "cube". I think that special-casing
>>> appearances of cube(...) in GROUP BY lists might be a feasible approach.
>
>> Not really. As pointed out downthread, you can't distinguish "cube"
>> from CUBE. We could fix that with a big enough hammer, of course, but
>> it would be a mighty big hammer.
>
> I'm not convinced of that; I think some creative hackery in the grammar
> might be able to deal with this. It would be a bit ugly, for sure, but
> if it works it would be a localized fix.

Well, I have no idea how to do that. I think the only way you'd be
able to is if you make productions like ColId and ColLabel return
something different for a keyword than they do for an IDENT. And
that's not going to be a localized change. If you've got another
proposal, I'm all ears...

> Meanwhile, I don't believe
> that it's going to be possible to rename the cube extension in any way
> that's even remotely acceptable for its users ("remotely acceptable"
> here means "pg_upgrade works", never mind what's going to be needed
> to fix their applications).
>
> So the proposal you are pushing is going
> to result in seriously teeing off some fraction of our userbase;
> and the argument why that would be acceptable seems to boil down to
> "I think there are few enough of them that we don't have to care"
> (an opinion based on little evidence IMO).

The only hard statistics I am aware of are from Heroku. Peter
Geoghegan was kind enough to find me the link:

https://www.youtube.com/watch?v=MT2gzzbyWpw

At around 8 minutes, he shows utilization statistics for cube at
around 1% across their install base. That's higher than I would have
guessed, so, eh, shows what I know. In any case, I'm not so much
advocating not caring at all as confining the level of caring to what
can be done without moving the earth.

> I think it's worth investing
> some work, and perhaps accepting some ugly code, to try to avoid that.

I can accept ugly code, but I feel strongly that we shouldn't accept
ugly semantics. Forcing cube to get out of the way may not be pretty,
but I think it will be much worse if we violate the rule that quoting
a keyword strips it of its special meaning; or the rule that there are
four kinds of keywords and, if a keyword of a particular class is
accepted as an identifier in a given context, all other keywords in
that class will also be accepted as identifiers in that context.
Violating those rules could have not-fun-at-all consequences like
needing to pass additional context information to ruleutils.c's
quote_identifier() function, or not being able to dump and restore a
query from an older version even with --quote-all-identifiers.
Renaming the cube type will suck for people who are using it; but it
will only have to be done once; weird stuff like the above will be
with us forever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 21:33:26
Message-ID: 53F7B726.2040206@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 08/22/2014 02:42 PM, Greg Stark wrote:
> On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> So the proposal you are pushing is going
>> to result in seriously teeing off some fraction of our userbase;
>> and the argument why that would be acceptable seems to boil down to
>> "I think there are few enough of them that we don't have to care"
>> (an opinion based on little evidence IMO
> FWIW here's some evidence... Craig Kersteins did a talk on the
> statistics across the Heroku fleet: Here are the slides from 2013
> though I think there's an updated slide deck with more recent numbers
> out there:
> https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use
>
> Cube shows up as the number 9 most popular extension with about 1% of
> databases having it installed (tied with pg_crypto and earthdistance).
> That's a lot more than I would have expected actually.

That's an interesting statistic. What I'd be more interested in is
finding out how many of those are actually using it as opposed to having
loaded it into a database.

cheers

andrew


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 21:36:39
Message-ID: CAHyXU0x-961dSA9eYtf4DVs0aUVbH53PYoxCd_wFvY+3LX54XA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 1:52 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Aug 22, 2014 at 2:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> https://www.youtube.com/watch?v=MT2gzzbyWpw
>
> At around 8 minutes, he shows utilization statistics for cube at
> around 1% across their install base. That's higher than I would have
> guessed, so, eh, shows what I know. In any case, I'm not so much
> advocating not caring at all as confining the level of caring to what
> can be done without moving the earth.

cube is a dependency for earthdistance and it's gotten some light
advocacy throughout the years as the 'way to do it'. I tried it
myself way back in the day and concluded a while back that the 'box'
type + gist was better than earthdistance for bounding box operations
-- it just seemed easier to understand and use. If you search the
archives you'll probably find a couple of examples of me suggesting as
such.

merlin


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Greg Stark <stark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 21:37:51
Message-ID: 20140822213751.GE16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Andrew Dunstan (andrew(at)dunslane(dot)net) wrote:
>
> On 08/22/2014 02:42 PM, Greg Stark wrote:
> >On Fri, Aug 22, 2014 at 7:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >>So the proposal you are pushing is going
> >>to result in seriously teeing off some fraction of our userbase;
> >>and the argument why that would be acceptable seems to boil down to
> >>"I think there are few enough of them that we don't have to care"
> >>(an opinion based on little evidence IMO
> >FWIW here's some evidence... Craig Kersteins did a talk on the
> >statistics across the Heroku fleet: Here are the slides from 2013
> >though I think there's an updated slide deck with more recent numbers
> >out there:
> >https://speakerdeck.com/craigkerstiens/postgres-what-they-really-use
> >
> >Cube shows up as the number 9 most popular extension with about 1% of
> >databases having it installed (tied with pg_crypto and earthdistance).
> >That's a lot more than I would have expected actually.
>
>
> That's an interesting statistic. What I'd be more interested in is
> finding out how many of those are actually using it as opposed to
> having loaded it into a database.

Agreed- and how many of those have *every extension available* loaded...

Thanks,

Stephen


From: Greg Stark <stark(at)mit(dot)edu>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 21:48:58
Message-ID: CAM-w4HPX5FVNGQTVciu5CwiYGiUazMSQPjrVNWcAx46+fzdCsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> Agreed- and how many of those have *every extension available* loaded...

Actually that was also in the talk.a few slides later. 0.7%

--
greg


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 22:01:20
Message-ID: 20140822220120.GF16422@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Greg Stark (stark(at)mit(dot)edu) wrote:
> On Fri, Aug 22, 2014 at 10:37 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > Agreed- and how many of those have *every extension available* loaded...
>
> Actually that was also in the talk.a few slides later. 0.7%

So, 0.3% install cube w/o installing *every* extension..? That seems
like the more relevant number then, to me anyway.

Admittedly, it's non-zero, but it's also a rather small percentage..

Thanks!

Stephen


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-22 23:38:59
Message-ID: 87wqa0m7ev.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

Tom> I'm not convinced of that; I think some creative hackery in the
Tom> grammar might be able to deal with this.

Making GROUP BY CUBE(a,b) parse as grouping sets rather than as a
function turned out to be the easy part: give CUBE a lower precedence
than '(' (equal to the one for IDENT and various other unreserved
keywords), and a rule that has an explicit CUBE '(' gets preferred
over one that reduces the CUBE to an unreserved_keyword.

The (relatively minor) ugliness required is mostly in the ruleutils
logic to decide how to output a cube(...) function call in such a way
that it doesn't get misparsed as a grouping set. See my other mail on
that.

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-08-25 05:21:13
Message-ID: 87iolhkvsq.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Here is the new version of our grouping sets patch. This version
supersedes the previous post.

We believe the functionality of this version to be substantially
complete, providing all the standard grouping set features except T434
(GROUP BY DISTINCT). (Additional tweaks, such as extra variants on
GROUPING(), could be added for compatibility with other databases.)

Since the debate regarding reserved keywords has not produced any
useful answer, the main patch here makes CUBE and ROLLUP into
col_name_reserved keywords, but a separate small patch is attached to
make them unreserved_keywords instead.

So there are now 5 files:

gsp1.patch - phase 1 code patch (full syntax, limited functionality)
gsp2.patch - phase 2 code patch (adds full functionality using the
new chained aggregate mechanism)
gsp-doc.patch - docs
gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
intended primarily for testing pending a decision on
renaming contrib/cube or unreserving keywords
gsp-u.patch - proposed method to unreserve CUBE and ROLLUP

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp1.patch text/x-patch 179.9 KB
gsp2.patch text/x-patch 45.1 KB
gsp-doc.patch text/x-patch 13.6 KB
gsp-contrib.patch text/x-patch 228.3 KB
gsp-u.patch text/x-patch 5.4 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-25 05:35:02
Message-ID: 87d2bpkuhs.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Robert" == Robert Haas <robertmhaas(at)gmail(dot)com> writes:

Robert> I can accept ugly code, but I feel strongly that we shouldn't
Robert> accept ugly semantics. Forcing cube to get out of the way
Robert> may not be pretty, but I think it will be much worse if we
Robert> violate the rule that quoting a keyword strips it of its
Robert> special meaning; or the rule that there are four kinds of
Robert> keywords and, if a keyword of a particular class is accepted
Robert> as an identifier in a given context, all other keywords in
Robert> that class will also be accepted as identifiers in that
Robert> context. Violating those rules could have not-fun-at-all
Robert> consequences like needing to pass additional context
Robert> information to ruleutils.c's quote_identifier() function, or
Robert> not being able to dump and restore a query from an older
Robert> version even with --quote-all-identifiers. Renaming the cube
Robert> type will suck for people who are using it; but it will only
Robert> have to be done once; weird stuff like the above will be with
Robert> us forever.

If you look at the latest patch post, there's a small patch in it that
does nothing but unreserve the keywords and fix ruleutils to make
deparse/parse work. The required fix to ruleutils is an example of
violating your "four kinds of keywords" principle, but quoting
keywords still works.

--
Andrew (irc:RhodiumToad)


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-08-25 21:05:40
Message-ID: CAFj8pRBUO78OcfRBLMkC_RUUzTa=pN49r8TMZinzV4QQM-gmAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi

I checked this patch, and it working very well

I found only two issue - I am not sure if it is issue

with data from https://wiki.postgresql.org/wiki/Grouping_Sets

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by rollup(name, place);
name | place | sum | grouping | grouping
-------+------------+-------+----------+----------
bmw | czech rep. | 100 | 0 | 0
bmw | germany | 1000 | 0 | 0
bmw | | 1100 | 0 | 1
opel | czech rep. | 7000 | 0 | 0
opel | germany | 7000 | 0 | 0
opel | | 14000 | 0 | 1
skoda | czech rep. | 10000 | 0 | 0
skoda | germany | 5000 | 0 | 0
skoda | | 15000 | 0 | 1
| | 30100 | 1 | 1
(10 rows)

* redundant sets should be ignored

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by rollup(name, place), name;
name | place | sum | grouping | grouping
-------+------------+-------+----------+----------
bmw | czech rep. | 100 | 0 | 0
bmw | germany | 1000 | 0 | 0
bmw | | 1100 | 0 | 1
bmw | | 1100 | 0 | 1
opel | czech rep. | 7000 | 0 | 0
opel | germany | 7000 | 0 | 0
opel | | 14000 | 0 | 1
opel | | 14000 | 0 | 1
skoda | czech rep. | 10000 | 0 | 0
skoda | germany | 5000 | 0 | 0
skoda | | 15000 | 0 | 1
skoda | | 15000 | 0 | 1
(12 rows)

It duplicate rows

postgres=# explain select name, place, sum(count), grouping(name),
grouping(place) from cars group by rollup(name, place), name;
QUERY PLAN
------------------------------------------------------------------------
GroupAggregate (cost=10000000001.14..10000000001.38 rows=18 width=68)
Grouping Sets: (name, place), (name), (name)
-> Sort (cost=10000000001.14..10000000001.15 rows=6 width=68)
Sort Key: name, place
-> Seq Scan on cars (cost=0.00..1.06 rows=6 width=68)
Planning time: 0.235 ms
(6 rows)

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by grouping sets((name, place), (name), (name),(place), ());
name | place | sum | grouping | grouping
-------+------------+-------+----------+----------
bmw | czech rep. | 100 | 0 | 0
bmw | germany | 1000 | 0 | 0
bmw | | 1100 | 0 | 1
bmw | | 1100 | 0 | 1
opel | czech rep. | 7000 | 0 | 0
opel | germany | 7000 | 0 | 0
opel | | 14000 | 0 | 1
opel | | 14000 | 0 | 1
skoda | czech rep. | 10000 | 0 | 0
skoda | germany | 5000 | 0 | 0
skoda | | 15000 | 0 | 1
skoda | | 15000 | 0 | 1
| | 30100 | 1 | 1
| czech rep. | 17100 | 1 | 0
| germany | 13000 | 1 | 0
(15 rows)

Fantastic work

Regards

Pavel

2014-08-25 7:21 GMT+02:00 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:

> Here is the new version of our grouping sets patch. This version
> supersedes the previous post.
>
> We believe the functionality of this version to be substantially
> complete, providing all the standard grouping set features except T434
> (GROUP BY DISTINCT). (Additional tweaks, such as extra variants on
> GROUPING(), could be added for compatibility with other databases.)
>
> Since the debate regarding reserved keywords has not produced any
> useful answer, the main patch here makes CUBE and ROLLUP into
> col_name_reserved keywords, but a separate small patch is attached to
> make them unreserved_keywords instead.
>
> So there are now 5 files:
>
> gsp1.patch - phase 1 code patch (full syntax, limited
> functionality)
> gsp2.patch - phase 2 code patch (adds full functionality using the
> new chained aggregate mechanism)
> gsp-doc.patch - docs
> gsp-contrib.patch - quote "cube" in contrib/cube and
> contrib/earthdistance,
> intended primarily for testing pending a decision on
> renaming contrib/cube or unreserving keywords
> gsp-u.patch - proposed method to unreserve CUBE and ROLLUP
>
> --
> Andrew (irc:RhodiumToad)
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-08-26 00:45:40
Message-ID: 87tx50je1p.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:

Pavel> Hi
Pavel> I checked this patch, and it working very well

Pavel> I found only two issue - I am not sure if it is issue

Pavel> It duplicate rows

Pavel> postgres=# explain select name, place, sum(count), grouping(name),
Pavel> grouping(place) from cars group by rollup(name, place), name;
Pavel> QUERY PLAN
Pavel> ------------------------------------------------------------------------
Pavel> GroupAggregate (cost=10000000001.14..10000000001.38 rows=18 width=68)
Pavel> Grouping Sets: (name, place), (name), (name)

I think I can safely claim from the spec that our version is correct.
Following the syntactic transformations given in 7.9 <group by clause>
of sql2008, we have:

GROUP BY rollup(name,place), name;

parses as GROUP BY <rollup list>, <ordinary grouping set>

Syntax rule 13 replaces the <rollup list> giving:

GROUP BY GROUPING SETS ((name,place), (name), ()), name;

Syntax rule 16b gives:

GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name);

Syntax rule 16c takes the cartesian product of the two sets:

GROUP BY GROUPING SETS ((name,place,name), (name,name), (name));

Syntax rule 17 gives:

SELECT ... GROUP BY name,place,name
UNION ALL
SELECT ... GROUP BY name,name
UNION ALL
SELECT ... GROUP BY name

Obviously at this point the extra "name" columns become redundant so
we eliminate them (this doesn't correspond to a spec rule, but doesn't
change the semantics). So we're left with:

SELECT ... GROUP BY name,place
UNION ALL
SELECT ... GROUP BY name
UNION ALL
SELECT ... GROUP BY name

Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's
behavior agrees with my interpretation.

Nothing in the spec that I can find licenses the elimination of
duplicate grouping sets except indirectly via feature T434 (GROUP BY
DISTINCT ...), which we did not attempt to implement.

--
Andrew (irc:RhodiumToad)


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-08-26 04:21:59
Message-ID: CAFj8pRBGAjt5_BZu-fyKsrgARB-aLeYZK7Vp6WH1=cd2NcLGxQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-08-26 2:45 GMT+02:00 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:

> >>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>
> Pavel> Hi
> Pavel> I checked this patch, and it working very well
>
> Pavel> I found only two issue - I am not sure if it is issue
>
> Pavel> It duplicate rows
>
> Pavel> postgres=# explain select name, place, sum(count), grouping(name),
> Pavel> grouping(place) from cars group by rollup(name, place), name;
> Pavel> QUERY PLAN
> Pavel>
> ------------------------------------------------------------------------
> Pavel> GroupAggregate (cost=10000000001.14..10000000001.38 rows=18
> width=68)
> Pavel> Grouping Sets: (name, place), (name), (name)
>
> I think I can safely claim from the spec that our version is correct.
> Following the syntactic transformations given in 7.9 <group by clause>
> of sql2008, we have:
>
> GROUP BY rollup(name,place), name;
>
> parses as GROUP BY <rollup list>, <ordinary grouping set>
>
> Syntax rule 13 replaces the <rollup list> giving:
>
> GROUP BY GROUPING SETS ((name,place), (name), ()), name;
>
> Syntax rule 16b gives:
>
> GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name);
>
> Syntax rule 16c takes the cartesian product of the two sets:
>
> GROUP BY GROUPING SETS ((name,place,name), (name,name), (name));
>
> Syntax rule 17 gives:
>
> SELECT ... GROUP BY name,place,name
> UNION ALL
> SELECT ... GROUP BY name,name
> UNION ALL
> SELECT ... GROUP BY name
>
> Obviously at this point the extra "name" columns become redundant so
> we eliminate them (this doesn't correspond to a spec rule, but doesn't
> change the semantics). So we're left with:
>
> SELECT ... GROUP BY name,place
> UNION ALL
> SELECT ... GROUP BY name
> UNION ALL
> SELECT ... GROUP BY name
>
> Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's
> behavior agrees with my interpretation.
>
> Nothing in the spec that I can find licenses the elimination of
> duplicate grouping sets except indirectly via feature T434 (GROUP BY
> DISTINCT ...), which we did not attempt to implement.
>
>
ok, I'll try to search in my memory to find some indices, so redundant
columns should be reduced,

Regards

Pavel

> --
> Andrew (irc:RhodiumToad)
>


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 08:25:18
Message-ID: d19e1a7a64a631edd9ce0cc0da9bbd20.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, August 25, 2014 07:21, Andrew Gierth wrote:
> Here is the new version of our grouping sets patch. This version
> supersedes the previous post.

The patches did not apply anymore so I applied at 73eba19aebe0. There they applied OK, and make && make check was OK.

drop table if exists items_sold;
create table items_sold as
select * from (
values
('Foo', 'L', 10)
, ('Foo', 'M', 20)
, ('Bar', 'M', 15)
, ('Bar', 'L', 5)
) as f(brand, size, sales) ;

select brand, size, grouping(brand, size), sum(sales) from items_sold group by rollup(brand, size);
--> WARNING: unrecognized node type: 347

I suppose that's not correct.

thanks,

Erik Rijkers


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: "Erik Rijkers" <er(at)xs4all(dot)nl>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 09:02:38
Message-ID: 87mwark4jb.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:

Erik> The patches did not apply anymore so I applied at 73eba19aebe0.
Erik> There they applied OK, and make && make check was OK.

I'll look and rebase if need be.

--> WARNING: unrecognized node type: 347

Can't reproduce this - are you sure it's not a mis-build?

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: "Erik Rijkers" <er(at)xs4all(dot)nl>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 09:13:38
Message-ID: 87iolfk4ae.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:

Erik> The patches did not apply anymore so I applied at 73eba19aebe0.
Erik> There they applied OK, and make && make check was OK.

Andrew> I'll look and rebase if need be.

They apply cleanly for me at 2bde297 whether with git apply or patch,
except for the contrib one (which you don't need unless you want to
run the contrib regression tests without applying the gsp-u patch).

--
Andrew (irc:RhodiumToad)


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 10:42:40
Message-ID: 17d43a75a8d9c1b112a5c6f5f54b52c5.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 26, 2014 11:13, Andrew Gierth wrote:
>>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>
>>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:
>
> Erik> The patches did not apply anymore so I applied at 73eba19aebe0.
> Erik> There they applied OK, and make && make check was OK.
>
> Andrew> I'll look and rebase if need be.
>
> They apply cleanly for me at 2bde297 whether with git apply or patch,
> except for the contrib one (which you don't need unless you want to
> run the contrib regression tests without applying the gsp-u patch).
>

Ah, I had not realised that. Excluding that contrib-patch and only applying these three:

gsp1.patch
gsp2.patch
gsp-doc.patch

does indeed work (applies, compiles).

Thank you,

Erik Rijkers


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-08-26 11:21:38
Message-ID: 87bnr7jy9y.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Andrew> gsp-contrib.patch - quote "cube" in contrib/cube and
Andrew> contrib/earthdistance, intended primarily for testing pending
Andrew> a decision on renaming contrib/cube or unreserving keywords

Here's a rebase of this one patch. Note that you only need this if
you're NOT applying the gsp-u patch to unreserve keywords, and you
also don't need it if you're not planning to test the cube extension
compatibility with grouping sets.

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-contrib.patch text/x-patch 228.3 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: "Erik Rijkers" <er(at)xs4all(dot)nl>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 12:24:14
Message-ID: 8761hfjv8j.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:

>> They apply cleanly for me at 2bde297 whether with git apply or
>> patch, except for the contrib one (which you don't need unless you
>> want to run the contrib regression tests without applying the
>> gsp-u patch).

Erik> Ah, I had not realised that. Excluding that contrib-patch and
Erik> only applying these three:

Erik> gsp1.patch
Erik> gsp2.patch
Erik> gsp-doc.patch

Erik> does indeed work (applies, compiles).

I put up a rebased contrib patch anyway (linked off the CF).

Did the "unrecognized node type" error go away, or do we still need to
look into that?

--
Andrew (irc:RhodiumToad)


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-26 12:34:20
Message-ID: 14490819f585c0575747068db5a797d9.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
>>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:
>
> >> They apply cleanly for me at 2bde297 whether with git apply or
> >> patch, except for the contrib one (which you don't need unless you
> >> want to run the contrib regression tests without applying the
> >> gsp-u patch).
>
> Erik> Ah, I had not realised that. Excluding that contrib-patch and
> Erik> only applying these three:
>
> Erik> gsp1.patch
> Erik> gsp2.patch
> Erik> gsp-doc.patch
>
> Erik> does indeed work (applies, compiles).
>
> I put up a rebased contrib patch anyway (linked off the CF).
>
> Did the "unrecognized node type" error go away, or do we still need to
> look into that?
>

Yes, it did go away; looks fine now:

select brand , size , grouping(brand, size) , sum(sales) from items_sold group by rollup(brand, size) ;
brand | size | grouping | sum
-------+------+----------+-----
Bar | L | 0 | 5
Bar | M | 0 | 15
Bar | | 1 | 20
Foo | L | 0 | 10
Foo | M | 0 | 20
Foo | | 1 | 30
| | 3 | 50
(7 rows)

I'm a bit unclear why the bottom-row 'grouping' value is 3. Shouldn't that be 2?

But I'm still reading the documentation so it's perhaps too early to ask...

Thanks,

Erik Rijkers


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-08-27 13:19:15
Message-ID: CA+TgmoaMcSog=cVB+j84KZreoZs1MkBR2Tx_W_cJLYivdTQ-EQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 25, 2014 at 1:35 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> If you look at the latest patch post, there's a small patch in it that
> does nothing but unreserve the keywords and fix ruleutils to make
> deparse/parse work. The required fix to ruleutils is an example of
> violating your "four kinds of keywords" principle, but quoting
> keywords still works.

I think it would be intolerable to lose the ability to quote keywords.
That could easily create situations where there's no reasonable way to
dump an older database in such a fashion that it can be reloaded into
a newer database. So it's good that you avoided that.

The "four kinds of keywords" principle is obviously much less
absolute. We've talked before about introducing additional categories
of keywords, and that might be a good thing to do for one reason or
another. But I think it's not good to do it in a highly idiosyncratic
way: I previously proposed reserving concurrently only when it follows
CREATE INDEX, and not in any other context, but Tom argued that it had
to become a type_func_name_keyword since users would be confused to
find that concurrently (but not any other keyword) needed quoting
there. In retrospect, I tend to think he probably had it right.
There is a good amount of third-party software out there that tries to
be smart about quoting PostgreSQL keywords - for example, pgAdmin has
code for that, or did last I looked - so by making things more
complicated, we run the risk not only of bugs in our own software but
also bugs in other people's software, as well as user confusion. So I
still think the right solution is probably to reserve CUBE across the
board, and not just in the narrowest context that we can get away
with.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Andrew Gierth" <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>, "Atri Sharma" <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-31 15:37:50
Message-ID: 956da6e246ccfe5cd6305b5be9b6497c.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
>>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:
>
> >> They apply cleanly for me at 2bde297 whether with git apply or
> >> patch, except for the contrib one (which you don't need unless you
> >> want to run the contrib regression tests without applying the
> >> gsp-u patch).
>
> Erik> Ah, I had not realised that. Excluding that contrib-patch and
> Erik> only applying these three:
>
> Erik> gsp1.patch
> Erik> gsp2.patch
> Erik> gsp-doc.patch
>
> Erik> does indeed work (applies, compiles).
>
> I put up a rebased contrib patch anyway (linked off the CF).
>
> Did the "unrecognized node type" error go away, or do we still need to
> look into that?
>

I have found that the "unrecognized node type" error is caused by:

shared_preload_libraries = pg_stat_statements

in postgresql.conf (as my default compile script was doing).

If I disable that line the error goes away.

I don't know exactly what that means for the groping sets patches but I thought I'd mention it here.

Otherwise I've not run into any problems with GROUPING SETS.

Erik Rijkers


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-31 15:39:59
Message-ID: CAOeZVifp_EorxHSKpFp8Zgmi93+5Tz=KbfFk6s=V1bx=QR3mog@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> On Tue, August 26, 2014 14:24, Andrew Gierth wrote:
> >>>>>> "Erik" == Erik Rijkers <er(at)xs4all(dot)nl> writes:
> >
> > >> They apply cleanly for me at 2bde297 whether with git apply or
> > >> patch, except for the contrib one (which you don't need unless you
> > >> want to run the contrib regression tests without applying the
> > >> gsp-u patch).
> >
> > Erik> Ah, I had not realised that. Excluding that contrib-patch and
> > Erik> only applying these three:
> >
> > Erik> gsp1.patch
> > Erik> gsp2.patch
> > Erik> gsp-doc.patch
> >
> > Erik> does indeed work (applies, compiles).
> >
> > I put up a rebased contrib patch anyway (linked off the CF).
> >
> > Did the "unrecognized node type" error go away, or do we still need to
> > look into that?
> >
>
> I have found that the "unrecognized node type" error is caused by:
>
> shared_preload_libraries = pg_stat_statements
>
> in postgresql.conf (as my default compile script was doing).
>
> If I disable that line the error goes away.
>
>
I think thats more of a library linking problem rather than a problem with
the patch. I couldnt reproduce it,though.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-31 16:04:25
Message-ID: 20140831160425.GA29536@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
> On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
> > I have found that the "unrecognized node type" error is caused by:

It's a warning, not an error, right?

> > shared_preload_libraries = pg_stat_statements
> >
> > in postgresql.conf (as my default compile script was doing).
> >
> > If I disable that line the error goes away.
> >
> >
> I think thats more of a library linking problem rather than a problem with
> the patch. I couldnt reproduce it,though.

I think it's vastly more likely that the patch simply didn't add the new
expression types to pg_stat_statements.c:JumbleExpr().

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-31 16:11:38
Message-ID: CAOeZViczFdGyb4F9Eqig9cbhCDQfBa8cCJg5WuXiFRF2YmUAvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday, August 31, 2014, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:

> On 2014-08-31 21:09:59 +0530, Atri Sharma wrote:
> > On Sun, Aug 31, 2014 at 9:07 PM, Erik Rijkers <er(at)xs4all(dot)nl
> <javascript:;>> wrote:
> > > I have found that the "unrecognized node type" error is caused by:
>
> It's a warning, not an error, right?
>
> > > shared_preload_libraries = pg_stat_statements
> > >
> > > in postgresql.conf (as my default compile script was doing).
> > >
> > > If I disable that line the error goes away.
> > >
> > >
> > I think thats more of a library linking problem rather than a problem
> with
> > the patch. I couldnt reproduce it,though.
>
> I think it's vastly more likely that the patch simply didn't add the new
> expression types to pg_stat_statements.c:JumbleExpr().
>
>
>
Must have run the above diagnosis in a wrong manner then, I will
check.Thanks for the heads up!

Regards,

Atri

--
Regards,

Atri
*l'apprenant*


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>,Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-08-31 20:52:14
Message-ID: 874mwsfkta.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Recut patches:

gsp1.patch - phase 1 code patch (full syntax, limited functionality)
gsp2.patch - phase 2 code patch (adds full functionality using the
new chained aggregate mechanism)
gsp-doc.patch - docs
gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
intended primarily for testing pending a decision on
renaming contrib/cube or unreserving keywords
gsp-u.patch - proposed method to unreserve CUBE and ROLLUP

(the contrib patch is not necessary if the -u patch is used; the
contrib/pg_stat_statements fixes are in the phase1 patch)

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp1.patch text/x-patch 181.2 KB
gsp2.patch text/x-patch 45.1 KB
gsp-doc.patch text/x-patch 13.6 KB
gsp-contrib.patch text/x-patch 228.3 KB
gsp-u.patch text/x-patch 5.4 KB

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-06 20:04:46
Message-ID: 540B68DE.404@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.8.2014 22:52, Andrew Gierth wrote:
> Recut patches:
>
> gsp1.patch - phase 1 code patch (full syntax, limited functionality)
> gsp2.patch - phase 2 code patch (adds full functionality using the
> new chained aggregate mechanism)
> gsp-doc.patch - docs
> gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
> intended primarily for testing pending a decision on
> renaming contrib/cube or unreserving keywords
> gsp-u.patch - proposed method to unreserve CUBE and ROLLUP
>
> (the contrib patch is not necessary if the -u patch is used; the
> contrib/pg_stat_statements fixes are in the phase1 patch)

Hi,

I looked at the patch today.

The good news is it seems to apply cleanly on HEAD (with some small
offsets, but no conflicts). The code generally seems OK to me, although
the patch is quite massive. I've also did a considerable amount of
testing and I've been unable to cause failures.

I have significant doubts about the whole design, though. Especially the
decision not to use HashAggregate, and the whole chaining idea. I
haven't noticed any discussion about this (at least in this thread), and
the chaining idea was not mentioned until 21/8, so I'd appreciate some
reasoning behind this choice.

I assume the "no HashAggregate" decision was done because of fear of
underestimates, and the related OOM issues. I don't see how this is
different from the general HashAggregate, though. Or is there another
reason for this?

Now, the chaining only makes this worse, because it effectively forces a
separate sort of the whole table for each grouping set.

We're doing a lot of analytics on large tables, where large means tens
of GBs and hundreds of millions of rows. What we do now at the moment is
basically the usual ROLAP approach - create a cube with aggregated data,
which is usually much smaller than the source table, and then compute
the rollups for the interesting slices in a second step.

I was hoping that maybe we could eventually replace this with the GROUP
BY CUBE functionality provided by this patch, but these design decisions
make it pretty much impossible. I believe most other users processing
non-trivial amounts of data (pretty much everyone with just a few
million rows) will be in similar position :-(

What I envisioned when considering hacking on this a few months back,
was extending the aggregate API with "merge state" function, doing the
aggregation just like today and merging the groups (for each cell) at
the end. Yeah, we don't have this infrastructure, but maybe it'd be a
better way than the current chaining approach. And it was repeatedly
mentioned as necessary for parallel aggregation (and even mentioned in
the memory-bounded hashagg batching discussion). I'm ready to spend some
time on this, if it makes the grouping sets useful for us.

regards
Tomas


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-06 21:34:15
Message-ID: 874mwka4hd.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:

Tomas> I have significant doubts about the whole design,
Tomas> though. Especially the decision not to use HashAggregate,

There is no "decision not to use HashAggregate". There is simply no
support for HashAggregate yet.

Having it be able to work with GroupAggregate is essential, because
there are always cases where HashAggregate is simply not permitted
(e.g. when using distinct or sorted aggs; or unhashable types; or with
the current code, when the estimated memory usage exceeds work_mem).
HashAggregate may be a performance improvement, but it's something
that can be added afterwards rather than an essential part of the
feature.

Tomas> Now, the chaining only makes this worse, because it
Tomas> effectively forces a separate sort of the whole table for each
Tomas> grouping set.

It's not one sort per grouping set, it's the minimal number of sorts
needed to express the result as a union of ROLLUP clauses. The planner
code will (I believe) always find the smallest number of sorts needed.

Each aggregate node can process any number of grouping sets as long as
they represent a single rollup list (and therefore share a single sort
order).

Yes, this is slower than using one hashagg. But it solves the general
problem in a way that does not interfere with future optimization.

(HashAggregate can be added to the current implementation by first
adding executor support for hashagg with multiple grouping sets, then
in the planner, extracting as many hashable grouping sets as possible
from the list before looking for rollup lists. The chained aggregate
code can work just fine with a HashAggregate as the chain head.

We have not actually tackled this, since I'm not going to waste any
time adding optimizations before the basic idea is accepted.)

Tomas> What I envisioned when considering hacking on this a few
Tomas> months back, was extending the aggregate API with "merge
Tomas> state" function,

That's not really on the cards for arbitrary non-trivial aggregate
functions.

Yes, it can be done for simple ones, and if you want to use that as a
basis for adding optimizations that's fine. But a solution that ONLY
works in simple cases isn't sufficient, IMO.

--
Andrew (irc:RhodiumToad)


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 12:33:05
Message-ID: 540C5081.90205@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6.9.2014 23:34, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>
> Tomas> I have significant doubts about the whole design,
> Tomas> though. Especially the decision not to use HashAggregate,
>
> There is no "decision not to use HashAggregate". There is simply no
> support for HashAggregate yet.
>
> Having it be able to work with GroupAggregate is essential, because
> there are always cases where HashAggregate is simply not permitted
> (e.g. when using distinct or sorted aggs; or unhashable types; or with
> the current code, when the estimated memory usage exceeds work_mem).
> HashAggregate may be a performance improvement, but it's something
> that can be added afterwards rather than an essential part of the
> feature.

Ah, OK. I got confused by the "final patch" subject, and so the
possibility of additional optimization somehow didn't occur to me.

> Tomas> Now, the chaining only makes this worse, because it
> Tomas> effectively forces a separate sort of the whole table for each
> Tomas> grouping set.
>
> It's not one sort per grouping set, it's the minimal number of sorts
> needed to express the result as a union of ROLLUP clauses. The planner
> code will (I believe) always find the smallest number of sorts needed.

You're probably right. Although when doing GROUP BY CUBE (a,b,c,a) I get
one more ChainAggregate than with CUBE(a,b,c). and we seem to compute
all the aggregates twice. Not sure if we need to address this though,
because it's mostly user's fault.

> Each aggregate node can process any number of grouping sets as long as
> they represent a single rollup list (and therefore share a single sort
> order).
>
> Yes, this is slower than using one hashagg. But it solves the general
> problem in a way that does not interfere with future optimization.
>
> (HashAggregate can be added to the current implementation by first
> adding executor support for hashagg with multiple grouping sets, then
> in the planner, extracting as many hashable grouping sets as possible
> from the list before looking for rollup lists. The chained aggregate
> code can work just fine with a HashAggregate as the chain head.
>
> We have not actually tackled this, since I'm not going to waste any
> time adding optimizations before the basic idea is accepted.)

OK, understood.

>
> Tomas> What I envisioned when considering hacking on this a few
> Tomas> months back, was extending the aggregate API with "merge
> Tomas> state" function,
>
> That's not really on the cards for arbitrary non-trivial aggregate
> functions.
>
> Yes, it can be done for simple ones, and if you want to use that as a
> basis for adding optimizations that's fine. But a solution that ONLY
> works in simple cases isn't sufficient, IMO.

I believe it can be done for most aggregates, assuming you have access
to the internal state somehow (not just the). Adding it for in-core
aggregates would not be difficult, in most cases. But you're right we
don't have this now at all.

regards
Tomas


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 13:11:49
Message-ID: 87lhpv8u9q.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:

>> It's not one sort per grouping set, it's the minimal number of
>> sorts needed to express the result as a union of ROLLUP
>> clauses. The planner code will (I believe) always find the
>> smallest number of sorts needed.

Tomas> You're probably right. Although when doing GROUP BY CUBE
Tomas> (a,b,c,a) I get one more ChainAggregate than with
Tomas> CUBE(a,b,c). and we seem to compute all the aggregates
Tomas> twice. Not sure if we need to address this though, because
Tomas> it's mostly user's fault.

Hm. Yeah, you're right that the number of sorts is not optimal there.
We can look into that.

As for computing it all twice, there's currently no attempt to
optimize multiple identical grouping sets into multiple projections of
a single grouping set result. CUBE(a,b,c,a) has twice as many grouping
sets as CUBE(a,b,c) does, even though all the extra ones are duplicates.

--
Andrew (irc:RhodiumToad)


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 14:14:03
Message-ID: 540C682B.2070707@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7.9.2014 15:11, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>
> >> It's not one sort per grouping set, it's the minimal number of
> >> sorts needed to express the result as a union of ROLLUP
> >> clauses. The planner code will (I believe) always find the
> >> smallest number of sorts needed.
>
> Tomas> You're probably right. Although when doing GROUP BY CUBE
> Tomas> (a,b,c,a) I get one more ChainAggregate than with
> Tomas> CUBE(a,b,c). and we seem to compute all the aggregates
> Tomas> twice. Not sure if we need to address this though, because
> Tomas> it's mostly user's fault.
>
> Hm. Yeah, you're right that the number of sorts is not optimal
> there. We can look into that.

I don't think it's very critical, though. I was worried about it because
of the sorts, but if that gets tackled in patches following this
commitfest it seems OK.

> As for computing it all twice, there's currently no attempt to
> optimize multiple identical grouping sets into multiple projections
> of a single grouping set result. CUBE(a,b,c,a) has twice as many
> grouping sets as CUBE(a,b,c) does, even though all the extra ones are
> duplicates.

Shouldn't this be solved by eliminating the excessive ChainAggregate?
Although it probably changes GROUPING(...), so it's not just about
removing the duplicate column(s) from the CUBE.

Maybe preventing this completely (i.e. raising an ERROR with "duplicate
columns in CUBE/ROLLUP/... clauses") would be appropriate. Does the
standard says anything about this?

But arguably this is a minor issue, happening only when the user uses
the same column/expression twice. Hopefully the users don't do that too
often.

regards
Tomas


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 16:52:21
Message-ID: 87egvn8k0e.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:

>> As for computing it all twice, there's currently no attempt to
>> optimize multiple identical grouping sets into multiple
>> projections of a single grouping set result. CUBE(a,b,c,a) has
>> twice as many grouping sets as CUBE(a,b,c) does, even though all
>> the extra ones are duplicates.

Tomas> Shouldn't this be solved by eliminating the excessive
Tomas> ChainAggregate? Although it probably changes GROUPING(...),
Tomas> so it's not just about removing the duplicate column(s) from
Tomas> the CUBE.

Eliminating the excess ChainAggregate would not change the number of
grouping sets, only where they are computed.

Tomas> Maybe preventing this completely (i.e. raising an ERROR with
Tomas> "duplicate columns in CUBE/ROLLUP/... clauses") would be
Tomas> appropriate. Does the standard says anything about this?

The spec does not say anything explicitly about duplicates, so they
are allowed (and duplicate grouping _sets_ can't be removed, only
duplicate columns within a single GROUP BY clause after the grouping
sets have been eliminated by transformation). I have checked my
reading of the spec against oracle 11 and MSSQL using sqlfiddle.

The way the spec handles grouping sets is to define a sequence of
syntactic transforms that result in a query which is a UNION ALL of
ordinary GROUP BY queries. (We haven't tried to implement the
additional optional feature of GROUP BY DISTINCT.) Since it's UNION
ALL, any duplicates must be preserved, so a query with GROUPING SETS
((a),(a)) reduces to:

SELECT ... GROUP BY a UNION ALL SELECT ... GROUP BY a;

and therefore has duplicates of all its result rows.

I'm quite prepared to concede that I may have read the spec wrong
(wouldn't be the first time), but in this case I require any such
claim to be backed up by an example from some other db showing an
actual difference in behavior.

--
Andrew (irc:RhodiumToad)


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-07 18:27:07
Message-ID: 540CA37B.2080308@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 7.9.2014 18:52, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>
> Tomas> Maybe preventing this completely (i.e. raising an ERROR with
> Tomas> "duplicate columns in CUBE/ROLLUP/... clauses") would be
> Tomas> appropriate. Does the standard says anything about this?
>
> The spec does not say anything explicitly about duplicates, so they
> are allowed (and duplicate grouping _sets_ can't be removed, only
> duplicate columns within a single GROUP BY clause after the grouping
> sets have been eliminated by transformation). I have checked my
> reading of the spec against oracle 11 and MSSQL using sqlfiddle.
>
> The way the spec handles grouping sets is to define a sequence of
> syntactic transforms that result in a query which is a UNION ALL of
> ordinary GROUP BY queries. (We haven't tried to implement the
> additional optional feature of GROUP BY DISTINCT.) Since it's UNION
> ALL, any duplicates must be preserved, so a query with GROUPING SETS
> ((a),(a)) reduces to:
>
> SELECT ... GROUP BY a UNION ALL SELECT ... GROUP BY a;
>
> and therefore has duplicates of all its result rows.
>
> I'm quite prepared to concede that I may have read the spec wrong
> (wouldn't be the first time), but in this case I require any such
> claim to be backed up by an example from some other db showing an
> actual difference in behavior.

I think you read the spec right. Apparently duplicate grouping sets are
allowed, and it's supposed to output that grouping set twice.

The section on ROLLUP/CUBE do not mention duplicates at all, it only
explains how to generate all the possible grouping sets, so if you have
duplicate columns there, you'll get duplicate sets (which is allowed).

If we can get rid of the excessive ChainAggregate, that's certainly
enough for now.

Optimizing it could be simple, though - you don't need to keep the
duplicate groups, you only need to keep a counter "how many times to
output this group". But the more I think about it, the more I think we
can ignore that. There are far more important pieces to implement, and
if you write bad SQL there's no help anyway.

regards
Tomas


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 14:01:16
Message-ID: CA+TgmoYMBsOkwF-Y3gbXckdboFe_-r27xDBeLzezNpvjum3G6A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> Heikki> Uh, that's ugly. The EXPLAIN out I mean; as an implementation
> Heikki> detail chaining the nodes might be reasonable. But the above
> Heikki> gets unreadable if you have more than a few grouping sets.
>
> It's good for highlighting performance issues in EXPLAIN, too.

Perhaps so, but that doesn't take away from Heikki's point: it's still
ugly. I don't understand why the sorts can't all be nested under the
GroupAggregate nodes. We have a number of nodes already (e.g. Append)
that support an arbitrary number of children, and I don't see why we
can't do the same thing here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 15:19:16
Message-ID: CAFj8pRA+uLOQtdko5odT57g1nL-Og_KtY_h2m45sjKKfmd9M0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2014-09-09 16:01 GMT+02:00 Robert Haas <robertmhaas(at)gmail(dot)com>:

> On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
> <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> >>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> > Heikki> Uh, that's ugly. The EXPLAIN out I mean; as an implementation
> > Heikki> detail chaining the nodes might be reasonable. But the above
> > Heikki> gets unreadable if you have more than a few grouping sets.
> >
> > It's good for highlighting performance issues in EXPLAIN, too.
>
> Perhaps so, but that doesn't take away from Heikki's point: it's still
> ugly. I don't understand why the sorts can't all be nested under the
> GroupAggregate nodes. We have a number of nodes already (e.g. Append)
> that support an arbitrary number of children, and I don't see why we
> can't do the same thing here.
>

I don't think so showing sort and aggregation is bad idea. Both can have a
different performance impacts

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 15:25:07
Message-ID: CA+TgmoYnKMik3uBLibfFzNUBEyT-Ws-i5gzO5yBNzWpAARP85Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 9, 2014 at 11:19 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> 2014-09-09 16:01 GMT+02:00 Robert Haas <robertmhaas(at)gmail(dot)com>:
>> On Thu, Aug 21, 2014 at 11:01 AM, Andrew Gierth
>> <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>> >>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>> > Heikki> Uh, that's ugly. The EXPLAIN out I mean; as an implementation
>> > Heikki> detail chaining the nodes might be reasonable. But the above
>> > Heikki> gets unreadable if you have more than a few grouping sets.
>> >
>> > It's good for highlighting performance issues in EXPLAIN, too.
>>
>> Perhaps so, but that doesn't take away from Heikki's point: it's still
>> ugly. I don't understand why the sorts can't all be nested under the
>> GroupAggregate nodes. We have a number of nodes already (e.g. Append)
>> that support an arbitrary number of children, and I don't see why we
>> can't do the same thing here.
>
> I don't think so showing sort and aggregation is bad idea. Both can have a
> different performance impacts

Sure, showing the sort and aggregation steps is fine. But I don't see
what advantage we get out of showing them like this:

Aggregate
-> Sort
-> ChainAggregate
-> Sort
-> ChainAggregate
-> Sort

When we could show them like this:

Aggregate
-> Sort
-> Sort
-> Sort

From both a display perspective and an implementation-complexity
perspective, it seems appealing to have the Aggregate node feed the
data to one sort after another, rather having it send the data down a
very deep pipe.

I might be missing something, of course. I don't want to presume that
I'm smarter than Andrew, because Andrew is pretty smart. :-) But it
seems odd to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 16:01:26
Message-ID: 87r3zk6bc8.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Robert" == Robert Haas <robertmhaas(at)gmail(dot)com> writes:

Robert> Sure, showing the sort and aggregation steps is fine. But I
Robert> don't see what advantage we get out of showing them like
Robert> this:

Robert> Aggregate
Robert> -> Sort
Robert> -> ChainAggregate
Robert> -> Sort
Robert> -> ChainAggregate
Robert> -> Sort

The advantage is that this is how the plan tree is actually
structured.

Robert> When we could show them like this:

Robert> Aggregate
Robert> -> Sort
Robert> -> Sort
Robert> -> Sort

And we can't structure the plan tree like this, because then it
wouldn't be a _tree_ any more.

The Sort node expects to have a child node to fetch rows from, and it
expects all the usual plan tree mechanics (initialization, rescan,
etc.) to work on that child node. There's no way for the parent to
feed data to the child.

Robert> From both a display perspective and an
Robert> implementation-complexity perspective,

... says the person who has never tried implementing it.

Honestly, ChainAggregate is _trivial_ compared to trying to make the
GroupAggregate code deal with multiple inputs, or trying to make some
new sort of plumbing node to feed input to those sorts. (You'd think
that it should be possible to use the existing CTE mechanics to do it,
but noooo... the existing code is actively and ferociously hostile to
the idea of adding new CTEs from within the planner.)

--
Andrew (irc:RhodiumToad)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 16:11:30
Message-ID: CA+TgmobZVd1qannT1NyXU25u5Q-jEeY_Wnh9rOjjUgNs_WTTBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Robert" == Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
> Robert> Sure, showing the sort and aggregation steps is fine. But I
> Robert> don't see what advantage we get out of showing them like
> Robert> this:
>
> Robert> Aggregate
> Robert> -> Sort
> Robert> -> ChainAggregate
> Robert> -> Sort
> Robert> -> ChainAggregate
> Robert> -> Sort
>
> The advantage is that this is how the plan tree is actually
> structured.

I do understand that. I am questioning (as I believe Heikki was also)
whether it's structured correctly. Nobody is arguing for displaying
the plan tree in a way that doesn't mirror it's actual structure, or
at least I am not.

> The Sort node expects to have a child node to fetch rows from, and it
> expects all the usual plan tree mechanics (initialization, rescan,
> etc.) to work on that child node. There's no way for the parent to
> feed data to the child.

OK, good point. So we do need something that can feed data from one
part of the plan tree to another, like a CTE does. I still think it
would be worth trying to see if there's a reasonable way to structure
the plan tree so that it's flatter.

> Robert> From both a display perspective and an
> Robert> implementation-complexity perspective,
>
> ... says the person who has never tried implementing it.

This comment to me reads rather sharply, and I don't feel that the
tone of my email is such as to merit a rebuke.

> Honestly, ChainAggregate is _trivial_ compared to trying to make the
> GroupAggregate code deal with multiple inputs, or trying to make some
> new sort of plumbing node to feed input to those sorts. (You'd think
> that it should be possible to use the existing CTE mechanics to do it,
> but noooo... the existing code is actively and ferociously hostile to
> the idea of adding new CTEs from within the planner.)

That's unfortunate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 16:21:22
Message-ID: 10590.1410279682@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Sep 9, 2014 at 12:01 PM, Andrew Gierth
> <andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>> Honestly, ChainAggregate is _trivial_ compared to trying to make the
>> GroupAggregate code deal with multiple inputs, or trying to make some
>> new sort of plumbing node to feed input to those sorts. (You'd think
>> that it should be possible to use the existing CTE mechanics to do it,
>> but noooo... the existing code is actively and ferociously hostile to
>> the idea of adding new CTEs from within the planner.)

> That's unfortunate.

I'm less than convinced that it's true ... I've been meaning to find
time to review this patch, but it sounds like it's getting to the point
where I need to.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2014-09-09 16:43:05
Message-ID: 87lhps69jg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>>> Honestly, ChainAggregate is _trivial_ compared to trying to make the
>>> GroupAggregate code deal with multiple inputs, or trying to make some
>>> new sort of plumbing node to feed input to those sorts. (You'd think
>>> that it should be possible to use the existing CTE mechanics to do it,
>>> but noooo... the existing code is actively and ferociously hostile to
>>> the idea of adding new CTEs from within the planner.)

>> That's unfortunate.

Tom> I'm less than convinced that it's true ...

Maybe you can figure out how, but I certainly didn't see a reasonable way.

I would also question one aspect of the desirability - using the CTE
mechanism has the downside of needing an extra tuplestore with the
full input data set in it, whereas the chain mechanism only has
aggregated data in its tuplestore which should be much smaller.

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Final Patch for GROUPING SETS - unrecognized node type: 347
Date: 2014-09-10 16:16:21
Message-ID: 87iokv4g2y.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tv(at)fuzzy(dot)cz> writes:

Tomas> If we can get rid of the excessive ChainAggregate, that's
Tomas> certainly enough for now.

I found an algorithm that should provably give the minimal number of sorts
(I was afraid that problem would turn out to be NP-hard, but not so - it's
solvable in P by reducing it to a problem of maximal matching in bipartite
graphs).

Updated patch should be forthcoming in a day or two.

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-12 18:41:24
Message-ID: 87ha0c3cvl.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Changes since previous post:

gsp2.patch: code to generate sort chains updated to guarantee minimal
number of sort steps

Recut patches:

gsp1.patch - phase 1 code patch (full syntax, limited functionality)
gsp2.patch - phase 2 code patch (adds full functionality using the
new chained aggregate mechanism)
gsp-doc.patch - docs
gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
intended primarily for testing pending a decision on
renaming contrib/cube or unreserving keywords
gsp-u.patch - proposed method to unreserve CUBE and ROLLUP

(the contrib patch is not necessary if the -u patch is used; the
contrib/pg_stat_statements fixes are in the phase1 patch)

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp1.patch text/x-patch 181.2 KB
gsp2.patch text/x-patch 59.3 KB
gsp-doc.patch text/x-patch 13.6 KB
gsp-contrib.patch text/x-patch 228.3 KB
gsp-u.patch text/x-patch 5.4 KB

From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-17 22:02:39
Message-ID: CABRT9RDeRjCMFnZo0c51pF-H9wT1F=EE9zwYJaKkFFLPaw6rVA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 12, 2014 at 9:41 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> gsp1.patch - phase 1 code patch (full syntax, limited functionality)
> gsp2.patch - phase 2 code patch (adds full functionality using the
> new chained aggregate mechanism)

I gave these a try by converting my current CTE-based queries into
CUBEs and it works as expected; query time is cut in half and lines of
code is 1/4 of original. Thanks!

I only have a few trivial observations; if I'm getting too nitpicky
let me know. :)

----
Since you were asking for feedback on the EXPLAIN output on IRC, I'd
weigh in and say that having the groups on separate lines would be
significantly more readable. It took me a while to understand what's
going on in my queries due to longer table and column names and
wrapping; The comma separators between groups are hard to distinguish.
If that can be made to work with the EXPLAIN printer without too much
trouble.

So instead of:
GroupAggregate
Output: four, ten, hundred, count(*)
Grouping Sets: (onek.four, onek.ten, onek.hundred), (onek.four,
onek.ten), (onek.four), ()

Perhaps print:
Grouping Sets: (onek.four, onek.ten, onek.hundred)
(onek.four, onek.ten)
(onek.four)
()

Or maybe:
Grouping Set: (onek.four, onek.ten, onek.hundred)
Grouping Set: (onek.four, onek.ten)
Grouping Set: (onek.four)
Grouping Set: ()

Both seem to work with the explain.depesz.com parser, although the 1st
won't be aligned as nicely.

----
Do you think it would be reasonable to normalize single-set grouping
sets into a normal GROUP BY? Such queries would be capable of using
HashAggregate, but the current code doesn't allow that. For example:

set enable_sort=off;
explain select two, count(*) from onek group by grouping sets (two);
Could be equivalent to:
explain select two, count(*) from onek group by two;

----
I'd expect GROUP BY () to be fully equivalent to having no GROUP BY
clause, but there's a difference in explain output. The former
displays "Grouping Sets: ()" which is odd, since none of the grouping
set keywords were used.

# explain select count(*) from onek group by ();
Aggregate (cost=77.78..77.79 rows=1 width=0)
Grouping Sets: ()
-> Index Only Scan using onek_stringu1 on onek (cost=0.28..75.28
rows=1000 width=0)

Regards,
Marti


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-18 00:21:35
Message-ID: 541A258F.7070701@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/17/2014 03:02 PM, Marti Raudsepp wrote:
> So instead of:
> GroupAggregate
> Output: four, ten, hundred, count(*)
> Grouping Sets: (onek.four, onek.ten, onek.hundred), (onek.four,
> onek.ten), (onek.four), ()
>
> Perhaps print:
> Grouping Sets: (onek.four, onek.ten, onek.hundred)
> (onek.four, onek.ten)
> (onek.four)
> ()

So:

Grouping Sets: [
[ onek.four, onek.ten, onek.hundred ],
[ onek.four, onek.ten ],
[ onek.four ],
[]
]

.. in JSON?

Seems to me that we need a better way to display the grand total
grouping set.

>
> Or maybe:
> Grouping Set: (onek.four, onek.ten, onek.hundred)
> Grouping Set: (onek.four, onek.ten)
> Grouping Set: (onek.four)
> Grouping Set: ()

The latter won't work with JSON and YAML output.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, josh(at)agliodbs(dot)com, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 01:45:29
Message-ID: 878ulgpfnz.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Marti" == Marti Raudsepp <marti(at)juffo(dot)org> writes:

Marti> Since you were asking for feedback on the EXPLAIN output on
Marti> IRC, I'd weigh in and say that having the groups on separate
Marti> lines would be significantly more readable.

I revisited the explain output a bit and have come up with these
(surrounding material trimmed for clarity):

(text format)

GroupAggregate (cost=1122.39..1197.48 rows=9 width=8)
Group Key: two, four
Group Key: two
Group Key: ()
-> ...

(xml format)

<Plan>
<Node-Type>Aggregate</Node-Type>
<Strategy>Sorted</Strategy>
<Startup-Cost>1122.39</Startup-Cost>
<Total-Cost>1197.48</Total-Cost>
<Plan-Rows>9</Plan-Rows>
<Plan-Width>8</Plan-Width>
<Grouping-Sets>
<Group-Key>
<Item>two</Item>
<Item>four</Item>
</Group-Key>
<Group-Key>
<Item>two</Item>
</Group-Key>
<Group-Key>
</Group-Key>
</Grouping-Sets>
<Plans>...

(json format)

"Plan": {
"Node Type": "Aggregate",
"Strategy": "Sorted",
"Startup Cost": 1122.39,
"Total Cost": 1197.48,
"Plan Rows": 9,
"Plan Width": 8,
"Grouping Sets": [
["two", "four"],
["two"],
[]
],
"Plans": [...]

(yaml format)

- Plan:
Node Type: "Aggregate"
Strategy: "Sorted"
Startup Cost: 1122.39
Total Cost: 1197.48
Plan Rows: 9
Plan Width: 8
Grouping Sets:
- - "two"
- "four"
- - "two"
-
Plans: ...

Opinions? Any improvements?

I'm not entirely happy with what I had to do with the json and
(especially) the YAML output code in order to make this work. There
seemed no obvious way to generate nested unlabelled structures in
either using the existing Explain* functions, and for the YAML case
the best output structure to produce was entirely non-obvious (and
trying to read the YAML spec made my head explode).

Marti> Do you think it would be reasonable to normalize single-set
Marti> grouping sets into a normal GROUP BY?

It's certainly possible, though it would seem somewhat odd to write
queries that way. Either the parser or the planner could do that;
would you want the original syntax preserved in views, or wouldn't
that matter?

Marti> I'd expect GROUP BY () to be fully equivalent to having no
Marti> GROUP BY clause, but there's a difference in explain
Marti> output. The former displays "Grouping Sets: ()" which is odd,
Marti> since none of the grouping set keywords were used.

That's an implementation artifact, in the sense that we preserve the
fact that GROUP BY () was used by using an empty grouping set. Is it
a problem, really, that it shows up that way in explain?

--
Andrew (irc:RhodiumToad)


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Josh Berkus <josh(at)agliodbs(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 14:46:36
Message-ID: CABRT9RC=O5qXjD4U0kWtmZRovjK6CJChczk6PmjZJ+S=QYUrgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Sep 19, 2014 at 4:45 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> GroupAggregate (cost=1122.39..1197.48 rows=9 width=8)
> Group Key: two, four
> Group Key: two
> Group Key: ()

> "Grouping Sets": [
> ["two", "four"],
> ["two"],
> []

+1 looks good to me.

> (yaml format)
> Grouping Sets:
> - - "two"
> - "four"
> - - "two"
> -

Now this is weird. But is anyone actually using YAML output format, or
was it implemented simply "because we can"?

> Marti> Do you think it would be reasonable to normalize single-set
> Marti> grouping sets into a normal GROUP BY?
> It's certainly possible, though it would seem somewhat odd to write
> queries that way.

The reason I bring this up is that queries are frequently dynamically
generated by programs. Coders are unlikely to special-case SQL
generation when there's just a single grouping set. And that's the
power of relational databases: the optimization work is done in the
database pretty much transparently to the coder (when it works, that
is).

> would you want the original syntax preserved in views

Doesn't matter IMO.

> Marti> I'd expect GROUP BY () to be fully equivalent to having no
> Marti> GROUP BY clause, but there's a difference in explain
> Marti> output. The former displays "Grouping Sets: ()" which is odd,
> Marti> since none of the grouping set keywords were used.
> That's an implementation artifact, in the sense that we preserve the
> fact that GROUP BY () was used by using an empty grouping set. Is it
> a problem, really, that it shows up that way in explain?

No, not really a problem. :)

Regards,
Marti


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Josh Berkus <josh(at)agliodbs(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 15:35:52
Message-ID: 871tr7pshe.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Marti" == Marti Raudsepp <marti(at)juffo(dot)org> writes:

>> (yaml format)
>> Grouping Sets:
>> - - "two"
>> - "four"
>> - - "two"
>> -

Marti> Now this is weird.

You're telling me. Also, feeding it to an online yaml-to-json
converter gives the result as [["two","four"],["two"],null] which is
not quite the same as the json version. An alternative would be:

Grouping Sets:
- - "two"
- "four"
- - "two"
- []

or

Grouping Sets:
-
- "two"
- "four"
-
- "two"
- []

though I haven't managed to get that second one to work yet.

Marti> But is anyone actually using YAML output format, or was it
Marti> implemented simply "because we can"?

Until someone decides to dike it out, I think we are obligated to make
it produce something resembling correct output.

Marti> The reason I bring this up is that queries are frequently
Marti> dynamically generated by programs.

Good point.

>> would you want the original syntax preserved in views

Marti> Doesn't matter IMO.

I think it's fairly consistent for the parser to do this, since we do
a number of other normalization steps there (removing excess nesting
and so on). This turns out to be quite trivial.

--
Andrew (irc:RhodiumToad)


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Josh Berkus <josh(at)agliodbs(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 15:52:03
Message-ID: 20140919155203.GA25887@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-19 16:35:52 +0100, Andrew Gierth wrote:
> Marti> But is anyone actually using YAML output format, or was it
> Marti> implemented simply "because we can"?
>
> Until someone decides to dike it out, I think we are obligated to make
> it produce something resembling correct output.

I vote for ripping it out. There really isn't any justification for it
and it broke more than once.

Greg: Did you actually ever end up using the yaml output?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Josh Berkus <josh(at)agliodbs(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 15:52:23
Message-ID: 87vbojobj5.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Andrew> You're telling me. Also, feeding it to an online yaml-to-json
Andrew> converter gives the result as [["two","four"],["two"],null]
Andrew> which is not quite the same as the json version. An
Andrew> alternative would be:

Oh, another YAML alternative would be:

Grouping Sets:
- ["two","four"]
- ["two"]
- []

Would that be better? (It's not consistent with other YAML outputs like
sort/group keys, but it's equally legal as far as I can tell and seems
more readable.)

--
Andrew (irc:RhodiumToad)


From: Petr Jelinek <petr(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Josh Berkus <josh(at)agliodbs(dot)com>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 15:58:40
Message-ID: 541C52B0.5050802@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 19/09/14 17:52, Andres Freund wrote:
> On 2014-09-19 16:35:52 +0100, Andrew Gierth wrote:
>> Marti> But is anyone actually using YAML output format, or was it
>> Marti> implemented simply "because we can"?
>>
>> Until someone decides to dike it out, I think we are obligated to make
>> it produce something resembling correct output.
>
> I vote for ripping it out. There really isn't any justification for it
> and it broke more than once.
>

Even though I really like YAML I say +1, mainly because any YAML 1.2
parser should be able to parse JSON output without problem...

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 17:37:40
Message-ID: 541C69E4.7070206@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/19/2014 08:52 AM, Andres Freund wrote:
>> Until someone decides to dike it out, I think we are obligated to make
>> > it produce something resembling correct output.
> I vote for ripping it out. There really isn't any justification for it
> and it broke more than once.

(a) I personally use it all the time to produce human-readable output,
sometimes also working via markdown. It's easier to read than the
"standard format" or JSON, especially when combined with grep or other
selective filtering. Note that this use would not at all preclude
having the YAML output look "wierd" as long as it was readable.

(b) If we're going to discuss ripping out YAML format, please let's do
that as a *separate* patch and discussion, and not as a side effect of
Grouping Sets. Otherwise this will be one of those things where people
pitch a fit during beta because the people who care about YAML aren't
necessarily reading this thread.

On 09/19/2014 08:52 AM, Andrew Gierth wrote:> Oh, another YAML
alternative would be:
>
> Grouping Sets:
> - ["two","four"]
> - ["two"]
> - []
>
> Would that be better? (It's not consistent with other YAML outputs like
> sort/group keys, but it's equally legal as far as I can tell and seems
> more readable.)

That works for me.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-19 18:04:11
Message-ID: 87oaubo5nu.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Josh" == Josh Berkus <josh(at)agliodbs(dot)com> writes:

Josh> (b) If we're going to discuss ripping out YAML format, please
Josh> let's do that as a *separate* patch and discussion,

+infinity

>> Grouping Sets:
>> - ["two","four"]
>> - ["two"]
>> - []
>>
>> Would that be better? (It's not consistent with other YAML outputs
>> like sort/group keys, but it's equally legal as far as I can tell
>> and seems more readable.)

Josh> That works for me.

I prefer that one to any of the others I've come up with, so unless anyone
has a major objection, I'll go with it.

--
Andrew (irc:RhodiumToad)


From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-24 14:08:59
Message-ID: 5422D07B.8060800@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

There's been a lot of discussion and I haven't followed it in detail.
Andrew, there were some open questions, but have you gotten enough
feedback so that you know what to do next? I'm trying to get this
commitfest to an end, and this is still in "Needs Review" state...

- Heikki


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-24 16:44:52
Message-ID: 87vbodhtvb.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Heikki" == Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:

Heikki> There's been a lot of discussion and I haven't followed it in
Heikki> detail. Andrew, there were some open questions, but have you
Heikki> gotten enough feedback so that you know what to do next?

I was holding off on posting a recut patch with the latest EXPLAIN
formatting changes (which are basically cosmetic) until it became
clear whether RLS was likely to be reverted or kept (we have a tiny
but irritating conflict with it, in the regression test schedule file
where we both add to the same list of tests).

Other than that there is nothing for Atri and me to do next but wait
on a proper review. The feedback and discussion has been almost all
about cosmetic details; the only actual issues found have been a
trivial omission from pg_stat_statements, and a slightly suboptimal
planning of sort steps, both long since fixed.

What we have not had:

- anything more than a superficial review

- any feedback over the acceptability of our chained-sorts approach
for doing aggregations with differing sort orders

- any decision about the question of reserved words and/or possibly
renaming contrib/cube (and what new name to use if so)

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-09-27 05:37:38
Message-ID: 87fvfdhbnn.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:

Andrew> I was holding off on posting a recut patch with the latest
Andrew> EXPLAIN formatting changes (which are basically cosmetic)
Andrew> until it became clear whether RLS was likely to be reverted
Andrew> or kept (we have a tiny but irritating conflict with it, in
Andrew> the regression test schedule file where we both add to the
Andrew> same list of tests).

And here is that recut patch set.

Changes since last posting (other than conflict removal):

- gsp1.patch: clearer EXPLAIN output as per discussion

Recut patches:

gsp1.patch - phase 1 code patch (full syntax, limited functionality)
gsp2.patch - phase 2 code patch (adds full functionality using the
new chained aggregate mechanism)
gsp-doc.patch - docs
gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
intended primarily for testing pending a decision on
renaming contrib/cube or unreserving keywords
gsp-u.patch - proposed method to unreserve CUBE and ROLLUP

(the contrib patch is not necessary if the -u patch is used; the
contrib/pg_stat_statements fixes are in the phase1 patch)

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp1.patch text/x-patch 183.4 KB
gsp2.patch text/x-patch 59.3 KB
gsp-doc.patch text/x-patch 13.6 KB
gsp-contrib.patch text/x-patch 228.3 KB
gsp-u.patch text/x-patch 5.4 KB

From: David Fetter <david(at)fetter(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Tom Lane <tgl(at)postgresql(dot)org>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-11-12 06:43:09
Message-ID: 20141112064309.GN6982@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Sep 27, 2014 at 06:37:38AM +0100, Andrew Gierth wrote:
> >>>>> "Andrew" == Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
>
> Andrew> I was holding off on posting a recut patch with the latest
> Andrew> EXPLAIN formatting changes (which are basically cosmetic)
> Andrew> until it became clear whether RLS was likely to be reverted
> Andrew> or kept (we have a tiny but irritating conflict with it, in
> Andrew> the regression test schedule file where we both add to the
> Andrew> same list of tests).
>
> And here is that recut patch set.
>
> Changes since last posting (other than conflict removal):
>
> - gsp1.patch: clearer EXPLAIN output as per discussion
>
> Recut patches:
>
> gsp1.patch - phase 1 code patch (full syntax, limited functionality)
> gsp2.patch - phase 2 code patch (adds full functionality using the
> new chained aggregate mechanism)
> gsp-doc.patch - docs
> gsp-contrib.patch - quote "cube" in contrib/cube and contrib/earthdistance,
> intended primarily for testing pending a decision on
> renaming contrib/cube or unreserving keywords
> gsp-u.patch - proposed method to unreserve CUBE and ROLLUP
>
> (the contrib patch is not necessary if the -u patch is used; the
> contrib/pg_stat_statements fixes are in the phase1 patch)
>
> --
> Andrew (irc:RhodiumToad)
>

Tom, any word on this?

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-10 18:36:48
Message-ID: 24515.1418236608@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> And here is that recut patch set.

I started looking over this patch, but eventually decided that it needs
more work to be committable than I'm prepared to put in right now.

My single biggest complaint is about the introduction of struct
GroupedVar. If we stick with that, we're going to have to teach an
extremely large number of places that know about Vars to also know
about GroupedVars. This will result in code bloat and errors of
omission. If you think the latter concern is hypothetical, note that
you can't get 40 lines into gsp1.patch without finding such an omission,
namely the patch fails to teach pg_stat_statements.c about GroupedVars.
(That also points up that some of the errors of omission will be in
third-party code that we can't fix easily.)

I think you should get rid of that concept and instead implement the
behavior by having nodeAgg.c set the relevant fields of the representative
tuple slot to NULL, so that a regular Var does the right thing.

I'm also not happy about the quality of the internal documentation.
The big problem here is the seriously lacking documentation of the new
parse node types, eg

+/*
+ * Node representing substructure in GROUPING SETS
+ *
+ * This is not actually executable, but it's used in the raw parsetree
+ * representation of GROUP BY, and in the groupingSets field of Query, to
+ * preserve the original structure of rollup/cube clauses for readability
+ * rather than reducing everything to grouping sets.
+ */
+
+typedef enum
+{
+ GROUPING_SET_EMPTY,
+ GROUPING_SET_SIMPLE,
+ GROUPING_SET_ROLLUP,
+ GROUPING_SET_CUBE,
+ GROUPING_SET_SETS
+} GroupingSetKind;
+
+typedef struct GroupingSet
+{
+ Expr xpr;
+ GroupingSetKind kind;
+ List *content;
+ int location;
+} GroupingSet;

The only actual documentation there is a long-winded excuse for having
put the struct declaration in the wrong place. (Since it's not an
executable expression, it should be in parsenodes.h not primnodes.h.)
Good luck figuring out what "content" is a list of, or indeed anything
at all except that this has got something to do with grouping sets.
If one digs around in the patch long enough, some useful information can
be found in the header comments for various functions --- but there should
be a spec for what this struct means, what its fields are, what the
relevant invariants are *in the .h file*. Poking around in parsenodes.h,
eg the description of SortGroupClause, should give you an idea of the
standard here.

I'm not too happy about struct Grouping either. If one had to guess, one
would probably guess that this was part of the representation of a GROUP
BY clause; a guess led on by the practice of the patch of dealing with
this and struct GroupingSet together, as in eg pg_stat_statements.c and
nodes.h. Reading enough of the patch will eventually clue you that this
is the representation of a call of the GROUPING() pseudo-function, but
that's not exactly clear from either the name of the struct or its random
placement between Var and Const in primnodes.h. And the comment is oh so
helpful:

+/*
+ * Grouping
+ */

I'd be inclined to call it GroupingFunc and put it after
Aggref/WindowFunc. Also please note that there is an attempt throughout
the system to order code stanzas that deal with assorted node types in an
order matching the order in which they're declared in the *nodes.h files.
You should never be flipping a coin to decide where to add such code, and
"put it at the end of the existing list" is usually not the best answer
either.

Some other random examples of inadequate attention to commenting:

@@ -243,7 +243,7 @@ typedef struct AggStatePerAggData
* rest.
*/

- Tuplesortstate *sortstate; /* sort object, if DISTINCT or ORDER BY */
+ Tuplesortstate **sortstate; /* sort object, if DISTINCT or ORDER BY */

This change didn't even bother to pluralize the comment, let alone explain
the length of the array or what it's indexed according to, let alone
explain why we now need multiple tuplesort objects in what is still
apparently a "per aggregate" state struct. (BTW, as a matter of good
engineering I think it's useful to change a field's name when you change
its meaning and representation so fundamentally. In this case, renaming
to "sortstates" would have been clearer and would have helped ensure that
you didn't miss fixing any referencing code.)

@@ -338,81 +339,101 @@ static Datum GetAggInitVal(Datum textInitVal, Oid transtype);
static void
initialize_aggregates(AggState *aggstate,
AggStatePerAgg peragg,
- AggStatePerGroup pergroup)
+ AggStatePerGroup pergroup,
+ int numReinitialize)
{
int aggno;

I wonder what numReinitialize is, or why it's needed, or (having read
more code than I should have had to in order to guess at what it is)
why it is that only the first N sortstates need to be reset. The comments
at the call sites are no more enlightening.

I don't really have any comments on the algorithms yet, having spent too
much time trying to figure out underdocumented data structures to get to
the algorithms. However, noting the addition of list_intersection_int()
made me wonder whether you'd not be better off reducing the integer lists
to bitmapsets a lot sooner, perhaps even at parse analysis.
list_intersection_int() is going to be O(N^2) by nature. Maybe N can't
get large enough to matter in this context, but I do see places that
seem to be concerned about performance.

I've not spent any real effort looking at gsp2.patch yet, but it seems
even worse off comment-wise: if there's any explanation in there at all
of what a "chained aggregate" is, I didn't find it. I'd also counsel you
to find some other way to do it than putting bool chain_head fields in
Aggref nodes; that looks like a mess, eg, it will break equal() tests
for expression nodes that probably should still be seen as equal.

I took a quick look at gsp-u.patch. It seems like that approach should
work, with of course the caveat that using CUBE/ROLLUP as function names
in a GROUP BY list would be problematic. I'm not convinced by the
commentary in ruleutils.c suggesting that extra parentheses would help
disambiguate: aren't extra parentheses still going to contain grouping
specs according to the standard? Forcibly schema-qualifying such function
names seems like a less fragile answer on that end.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-11 02:19:00
Message-ID: 87tx13t9y6.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

More comment on this later, but I want to highlight these specific
points since we need clear answers here to avoid wasting unnecessary
time and effort:

Tom> I've not spent any real effort looking at gsp2.patch yet, but it
Tom> seems even worse off comment-wise: if there's any explanation in
Tom> there at all of what a "chained aggregate" is, I didn't find it.

(Maybe "stacked" would have been a better term.)

What that code does is produce plans that look like this:

GroupAggregate
-> Sort
-> ChainAggregate
-> Sort
-> ChainAggregate

in much the same way that WindowAgg nodes are generated.

Where would you consider the best place to comment this? The WindowAgg
equivalent seems to be discussed primarily in the header comment of
nodeWindowAgg.c.

Tom> I'd also counsel you to find some other way to do it than
Tom> putting bool chain_head fields in Aggref nodes;

There are no chain_head fields in Aggref nodes.

Agg.chain_head is true for the Agg node at the top of the chain (the
GroupAggregate node in the above example), while AggState.chain_head
is set on the ChainAggregate nodes to point to the AggState of the
GroupAggregate node.

What we need to know before doing any further work on this is whether
this idea of stacking up aggregate and sort nodes is a viable one.

(The feedback I've had so far suggests that the performance is
acceptable, even if there are still optimization opportunities that
can be tackled later, like adding HashAggregate support.)

Tom> I took a quick look at gsp-u.patch. It seems like that approach
Tom> should work, with of course the caveat that using CUBE/ROLLUP as
Tom> function names in a GROUP BY list would be problematic. I'm not
Tom> convinced by the commentary in ruleutils.c suggesting that extra
Tom> parentheses would help disambiguate: aren't extra parentheses
Tom> still going to contain grouping specs according to the standard?

The spec is of minimal help here since it does not allow expressions in
GROUP BY at all, last I looked; only column references.

The extra parens do actually disambiguate because CUBE(x) and
(CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
GROUPING SETS (...), it cannot appear inside a (...) list nested inside
a GROUPING SETS list (or anywhere else).

As the comments in gram.y explain, the productions used are intended
to follow the spec with the exception of using a_expr where the spec
requires <ordinary grouping set>. So CUBE and ROLLUP are recognized as
special only as part of a group_by_item (<grouping element> in the
spec), and as soon as we see a paren that isn't part of the "GROUPING
SETS (" opener, we're forced into parsing an a_expr, in which CUBE()
would become a function call.

(The case of upgrading from an old pg version seems to require the use
of --quote-all-identifiers in pg_dump)

Tom> Forcibly schema-qualifying such function names seems like a less
Tom> fragile answer on that end.

That I guess would require keeping more state, unless you applied it
everywhere to any function with a keyword for a name? I dunno.

The question that needs deciding here is less whether the approach
_could_ work but whether we _want_ it. The objection has been made
that we are in effect introducing a new category of "unreserved almost
everywhere" keyword, which I think has a point; on the other hand,
reserving CUBE is a seriously painful prospect.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-11 19:37:08
Message-ID: 5544.1418326628@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> I've not spent any real effort looking at gsp2.patch yet, but it
> Tom> seems even worse off comment-wise: if there's any explanation in
> Tom> there at all of what a "chained aggregate" is, I didn't find it.

> (Maybe "stacked" would have been a better term.)

> What that code does is produce plans that look like this:

> GroupAggregate
> -> Sort
> -> ChainAggregate
> -> Sort
> -> ChainAggregate

> in much the same way that WindowAgg nodes are generated.

That seems pretty messy, especially given your further comments that these
plan nodes are interconnected and know about each other (though you failed
to say exactly how). The claimed analogy to WindowAgg therefore seems
bogus since stacked WindowAggs are independent, AFAIR anyway. I'm also
wondering about performance: doesn't this imply more rows passing through
some of the plan steps than really necessary?

Also, how would this extend to preferring hashed aggregation in some of
the grouping steps?

ISTM that maybe what we should do is take a cue from the SQL spec, which
defines these things in terms of UNION ALL of plain-GROUP-BY operations
reading from a common CTE. Abstractly, that is, we'd have

Append
-> GroupAggregate
-> Sort
-> source data
-> GroupAggregate
-> Sort
-> source data
-> GroupAggregate
-> Sort
-> source data
...

(or some of the arms could be HashAgg without a sort). Then the question
is what exactly the aggregates are reading from. We could do worse than
make it a straight CTE, I suppose.

> Tom> I'd also counsel you to find some other way to do it than
> Tom> putting bool chain_head fields in Aggref nodes;

> There are no chain_head fields in Aggref nodes.

Oh, I mistook "struct Agg" for "struct Aggref". (That's another pretty
poorly chosen struct name, though I suppose it's far too late to change
that choice.) Still, interconnecting plan nodes that aren't adjacent in
the plan tree doesn't sound like a great idea to me.

> Tom> I took a quick look at gsp-u.patch. It seems like that approach
> Tom> should work, with of course the caveat that using CUBE/ROLLUP as
> Tom> function names in a GROUP BY list would be problematic. I'm not
> Tom> convinced by the commentary in ruleutils.c suggesting that extra
> Tom> parentheses would help disambiguate: aren't extra parentheses
> Tom> still going to contain grouping specs according to the standard?

> The extra parens do actually disambiguate because CUBE(x) and
> (CUBE(x)) are not equivalent anywhere; while CUBE(x) can appear inside
> GROUPING SETS (...), it cannot appear inside a (...) list nested inside
> a GROUPING SETS list (or anywhere else).

Maybe, but this seems very fragile and non-future-proof. I think
double-quoting or schema-qualifying such function names would be safer
when you think about the use-case of dumping views that may get loaded
into future Postgres versions.

> The question that needs deciding here is less whether the approach
> _could_ work but whether we _want_ it. The objection has been made
> that we are in effect introducing a new category of "unreserved almost
> everywhere" keyword, which I think has a point;

True, but I think that ship has already sailed. We already have similar
behavior for PARTITION, RANGE, and ROWS (see the opt_existing_window_name
production), and I think PRECEDING, FOLLOWING, and UNBOUNDED are
effectively reserved-in-certain-very-specific-contexts as well. And there
are similar behaviors in plpgsql's parser.

> on the other hand,
> reserving CUBE is a seriously painful prospect.

Precisely. I think renaming or getting rid of contrib/cube would have
to be something done in a staged fashion over multiple release cycles.
Waiting several years to get GROUPING SETS doesn't seem appealing at all
compared to this alternative.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-11 23:30:15
Message-ID: 8761dhu9ts.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> What that code does is produce plans that look like this:

>> GroupAggregate
>> -> Sort
>> -> ChainAggregate
>> -> Sort
>> -> ChainAggregate

>> in much the same way that WindowAgg nodes are generated.

Tom> That seems pretty messy, especially given your further comments
Tom> that these plan nodes are interconnected and know about each
Tom> other (though you failed to say exactly how).

I'd already explained in more detail way back when we posted the
patch. But to reiterate: the ChainAggregate nodes pass through their
input data unchanged, but on group boundaries they write aggregated
result rows to a tuplestore shared by the whole chain. The top node
returns the data from the tuplestore after its own output is
completed.

The chain_head pointer in the ChainAggregate nodes is used for:

- obtaining the head node's targetlist and qual, to use to project
rows into the tuplestore (the ChainAggregate nodes don't do
ordinary projection so they have dummy targetlists like the Sort
nodes do)

- obtaining the pointer to the tuplestore itself

- on rescan without parameter change, to inform the parent node
whether or not the child nodes are also being rescanned (since
the Sort nodes may or may not block this)

Tom> The claimed analogy to WindowAgg therefore seems bogus since
Tom> stacked WindowAggs are independent, AFAIR anyway.

The analogy is only in that they need to see the same input rows but
in different sort orders.

Tom> I'm also wondering about performance: doesn't this imply more
Tom> rows passing through some of the plan steps than really
Tom> necessary?

There's no way to cut down the number of rows seen by intermediate
nodes unless you implement (and require) associative aggregates, which
we do not do in this patch (that's left for possible future
optimization efforts). Our approach makes no new demands on the
implementation of aggregate functions.

Tom> Also, how would this extend to preferring hashed aggregation in
Tom> some of the grouping steps?

My suggestion for extending it to hashed aggs is: by having a (single)
HashAggregate node keep multiple hash tables, per grouping set, then
any arbitrary collection of grouping sets can be handled in one node
provided that memory permits and no non-hashable features are used.
So the normal plan for CUBE(a,b) under this scheme would be just:

HashAggregate
Grouping Sets: (), (a), (b), (a,b)
-> (input path in unsorted order)

If a mixture of hashable and non-hashable data types are used, for
example CUBE(hashable,unhashable), then a plan of this form could be
constructed:

HashAggregate
Grouping Sets: (), (hashable)
-> ChainAggregate
Grouping Sets: (unhashable), (unhashable,hashable)
-> (input path sorted by (unhashable,hashable))

Likewise, plans of this form could be considered for cases like
CUBE(low_card, high_card) where hashed grouping on high_card would
require excessive memory:

HashAggregate
Grouping Sets: (), (low_card)
-> ChainAggregate
Grouping Sets: (high_card), (high_card, low_card)
-> (input path sorted by (high_card, low_card))

Tom> ISTM that maybe what we should do is take a cue from the SQL
Tom> spec, which defines these things in terms of UNION ALL of
Tom> plain-GROUP-BY operations reading from a common CTE.

I looked at that, in fact that was our original plan, but it became
clear quite quickly that it was not going to be easy.

I tried two different approaches. First was to actually re-plan the
input (i.e. running query_planner more than once) for different sort
orders; that crashed and burned quickly thanks to the extent to which
the planner assumes that it'll be run once only on any given input.

Second was to generate a CTE for the input data. This didn't get very
far because everything that already exists to handle CTE nodes assumes
that they are explicit in the planner's input (that they have their
own Query node, etc.) and I was not able to determine a good solution.
If you have any suggestions for how to approach the problem, I'm happy
to have another go at it.

--
Andrew (irc:RhodiumToad)


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-12 21:33:38
Message-ID: 8479.1418420018@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
> Tom> That seems pretty messy, especially given your further comments
> Tom> that these plan nodes are interconnected and know about each
> Tom> other (though you failed to say exactly how).

> I'd already explained in more detail way back when we posted the
> patch. But to reiterate: the ChainAggregate nodes pass through their
> input data unchanged, but on group boundaries they write aggregated
> result rows to a tuplestore shared by the whole chain. The top node
> returns the data from the tuplestore after its own output is
> completed.

That seems pretty grotty from a performance+memory consumption standpoint.
At peak memory usage, each one of the Sort nodes will contain every input
row, and the shared tuplestore will contain every output row. That will
lead to either a lot of memory eaten, or a lot of temp-file I/O, depending
on how big work_mem is.

In principle, with the CTE+UNION approach I was suggesting, the peak
memory consumption would be one copy of the input rows in the CTE's
tuplestore plus one copy in the active branch's Sort node. I think a
bit of effort would be needed to get there (ie, shut down one branch's
Sort node before starting the next, something I'm pretty sure doesn't
happen today). But it's doable whereas I don't see how we dodge the
multiple-active-sorts problem with the chained implementation.

> Tom> ISTM that maybe what we should do is take a cue from the SQL
> Tom> spec, which defines these things in terms of UNION ALL of
> Tom> plain-GROUP-BY operations reading from a common CTE.

> I looked at that, in fact that was our original plan, but it became
> clear quite quickly that it was not going to be easy.

> I tried two different approaches. First was to actually re-plan the
> input (i.e. running query_planner more than once) for different sort
> orders; that crashed and burned quickly thanks to the extent to which
> the planner assumes that it'll be run once only on any given input.

Well, we'd not want to rescan the input multiple times, so I don't think
that generating independent plan trees for each sort order would be the
thing to do anyway. I suppose ideally it would be nice to check the costs
of getting the different sort orders, so that the one Sort we elide is the
one that gets the best cost savings. But the WindowAgg code isn't that
smart either and no one's really complained, so I think this can wait.
(Eventually I'd like to make such cost comparisons possible as part of the
upper-planner Pathification that I keep nattering about. But it doesn't
seem like a prerequisite for getting GROUPING SETS in.)

> Second was to generate a CTE for the input data. This didn't get very
> far because everything that already exists to handle CTE nodes assumes
> that they are explicit in the planner's input (that they have their
> own Query node, etc.) and I was not able to determine a good solution.

Seems like restructuring that wouldn't be *that* hard. We probably don't
want it to be completely like a CTE for planning purposes anyway --- that
would foreclose passing down any knowledge of desired sort order, which
we don't want. But it seems like we could stick a variant of CtePath
atop the chosen result path of the scan/join planning phase. If you like
I can poke into this a bit.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-13 04:37:48
Message-ID: 87fvcks1og.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

>> I'd already explained in more detail way back when we posted the
>> patch. But to reiterate: the ChainAggregate nodes pass through
>> their input data unchanged, but on group boundaries they write
>> aggregated result rows to a tuplestore shared by the whole
>> chain. The top node returns the data from the tuplestore after its
>> own output is completed.

Tom> That seems pretty grotty from a performance+memory consumption
Tom> standpoint. At peak memory usage, each one of the Sort nodes
Tom> will contain every input row,

Has this objection ever been raised for WindowAgg, which has the same
issue?

Tom> and the shared tuplestore will contain every output row.

Every output row except those produced by the top node, and since this
is after grouping, that's expected to be smaller than the input.

Tom> That will lead to either a lot of memory eaten, or a lot of
Tom> temp-file I/O, depending on how big work_mem is.

Yes. Though note that this code only kicks in when dealing with
grouping sets more complex than a simple rollup. A CUBE of two
dimensions uses only one Sort node above whatever is needed to produce
sorted input, and a CUBE of three dimensions uses only two. (It does
increase quite a lot for large cubes though.)

Tom> In principle, with the CTE+UNION approach I was suggesting, the
Tom> peak memory consumption would be one copy of the input rows in
Tom> the CTE's tuplestore plus one copy in the active branch's Sort
Tom> node. I think a bit of effort would be needed to get there (ie,
Tom> shut down one branch's Sort node before starting the next,
Tom> something I'm pretty sure doesn't happen today).

Correct, it doesn't.

However, I notice that having ChainAggregate shut down its input would
also have the effect of bounding the memory usage (to two copies,
which is as good as the append+sorts+CTE case).

Is shutting down and reinitializing parts of the plan really feasible
here? Or would it be a case of forcing a rescan?

>> Second was to generate a CTE for the input data. This didn't get
>> very far because everything that already exists to handle CTE
>> nodes assumes that they are explicit in the planner's input (that
>> they have their own Query node, etc.) and I was not able to
>> determine a good solution.

Tom> Seems like restructuring that wouldn't be *that* hard. We
Tom> probably don't want it to be completely like a CTE for planning
Tom> purposes anyway --- that would foreclose passing down any
Tom> knowledge of desired sort order, which we don't want. But it
Tom> seems like we could stick a variant of CtePath atop the chosen
Tom> result path of the scan/join planning phase. If you like I can
Tom> poke into this a bit.

Please do.

That seems to cover the high-priority issues from our point of view.

We will continue working on the other issues, on the assumption that
when we have some idea how to do it your way, we will rip out the
ChainAggregate stuff in favour of an Append-based solution.

--
Andrew (irc:RhodiumToad)


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-13 12:40:53
Message-ID: 874mszsy9g.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

With the high-priority questions out of the way, time to tackle the
rest:

Tom> My single biggest complaint is about the introduction of struct
Tom> GroupedVar. If we stick with that, we're going to have to teach
Tom> an extremely large number of places that know about Vars to also
Tom> know about GroupedVars. This will result in code bloat and
Tom> errors of omission. If you think the latter concern is
Tom> hypothetical, note that you can't get 40 lines into gsp1.patch
Tom> without finding such an omission, namely the patch fails to
Tom> teach pg_stat_statements.c about GroupedVars. (That also points
Tom> up that some of the errors of omission will be in third-party
Tom> code that we can't fix easily.)

Except that GroupedVar is created only late in planning, and so only a
small proportion of places need to know about it (and certainly
pg_stat_statements does not). It also can't end up attached to any
foreign scan or otherwise potentially third-party plan node.

Tom> I think you should get rid of that concept and instead implement
Tom> the behavior by having nodeAgg.c set the relevant fields of the
Tom> representative tuple slot to NULL, so that a regular Var does
Tom> the right thing.

We did consider that. Messing with the null flags of the slot didn't
seem like an especially clean approach. But if that's how you want
it...

Tom> I don't really have any comments on the algorithms yet, having
Tom> spent too much time trying to figure out underdocumented data
Tom> structures to get to the algorithms. However, noting the
Tom> addition of list_intersection_int() made me wonder whether you'd
Tom> not be better off reducing the integer lists to bitmapsets a lot
Tom> sooner, perhaps even at parse analysis.

list_intersection_int should not be time-critical; common queries do
not call it at all (simple cube or rollup clauses always have an empty
grouping set, causing the intersection test to bail immediately), and
in pathological worst-case constructions like putting a dozen
individually grouped columns in front of a 12-d cube (thus calling it
4096 times on lists at least 12 nodes long) it doesn't account for
more than a small percentage even with optimization off and debugging
and asserts on.

The code uses the list representation almost everywhere in parsing and
planning because in some places the order of elements matters, and I
didn't want to keep swapping between a bitmap and a list
representation.

(We _do_ use bitmapsets where we're potentially going to be doing an
O(N^2) number of subset comparisons to build the graph adjacency
list for computing the minimal set of sort operations, and at
execution time.)

I didn't even consider using bitmaps for the output of parse analysis
because at that stage we want to preserve most of the original query
substructure (otherwise view deparse won't look anything like the
original query did).

Tom> list_intersection_int() is going to be O(N^2) by nature. Maybe
Tom> N can't get large enough to matter in this context, but I do see
Tom> places that seem to be concerned about performance.

My main feeling on performance is that simple cube and rollup clauses
or short lists of grouping sets should parse and plan very quickly;
more complex cases should parse and plan fast enough that execution
time on any nontrivial input will swamp the parse/plan time; and the
most complex cases that aren't outright rejected should plan in no
more than a few seconds extra. (We're limiting to 4096 grouping sets
in any query level, which is comparable to other databases and seems
quite excessively high compared to what people are actually likely to
need.)

(don't be fooled by the excessive EXPLAIN time on some queries. There
are performance issues in EXPLAIN output generation that have nothing
to do with this patch, and which I've not pinned down.)

--
Andrew (irc:RhodiumToad)


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-15 01:32:35
Message-ID: CAB7nPqTdvn6Z8+ttmyiHAZ5YeL1M3R34nHNhWz__aHTVOj3KKQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 11, 2014 at 3:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I don't really have any comments on the algorithms yet, having spent too
> much time trying to figure out underdocumented data structures to get to
> the algorithms. However, noting the addition of list_intersection_int()
> made me wonder whether you'd not be better off reducing the integer lists
> to bitmapsets a lot sooner, perhaps even at parse analysis.
> list_intersection_int() is going to be O(N^2) by nature. Maybe N can't
> get large enough to matter in this context, but I do see places that
> seem to be concerned about performance.
>
> I've not spent any real effort looking at gsp2.patch yet, but it seems
> even worse off comment-wise: if there's any explanation in there at all
> of what a "chained aggregate" is, I didn't find it. I'd also counsel you
> to find some other way to do it than putting bool chain_head fields in
> Aggref nodes; that looks like a mess, eg, it will break equal() tests
> for expression nodes that probably should still be seen as equal.
>
> I took a quick look at gsp-u.patch. It seems like that approach should
> work, with of course the caveat that using CUBE/ROLLUP as function names
> in a GROUP BY list would be problematic. I'm not convinced by the
> commentary in ruleutils.c suggesting that extra parentheses would help
> disambiguate: aren't extra parentheses still going to contain grouping
> specs according to the standard? Forcibly schema-qualifying such function
> names seems like a less fragile answer on that end.
Based on those comments, I am marking this patch as "Returned with
Feedback" on the CF app for 2014-10. Andrew, feel free to move this
entry to CF 2014-12 if you are planning to continue working on it so
as it would get additional review. (Note that this patch status was
"Waiting on Author" when writing this text).
Regards,
--
Michael


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-15 03:28:36
Message-ID: 87iohdppa3.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Michael" == Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:

Michael> Based on those comments, I am marking this patch as
Michael> "Returned with Feedback" on the CF app for 2014-10. Andrew,
Michael> feel free to move this entry to CF 2014-12 if you are
Michael> planning to continue working on it so as it would get
Michael> additional review. (Note that this patch status was "Waiting
Michael> on Author" when writing this text).

Moved it to 2014-12 and set it back to "waiting on author". We expect to
submit a revised version, though I have no timescale yet.

--
Andrew (irc:RhodiumToad)


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-15 03:29:28
Message-ID: CAB7nPqTHgW5Wgz+xq_U=2w56idQpmxU_N4-FyvYgmu5eBeBAQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 15, 2014 at 12:28 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Michael" == Michael Paquier <michael(dot)paquier(at)gmail(dot)com> writes:
>
> Michael> Based on those comments, I am marking this patch as
> Michael> "Returned with Feedback" on the CF app for 2014-10. Andrew,
> Michael> feel free to move this entry to CF 2014-12 if you are
> Michael> planning to continue working on it so as it would get
> Michael> additional review. (Note that this patch status was "Waiting
> Michael> on Author" when writing this text).
>
> Moved it to 2014-12 and set it back to "waiting on author". We expect to
> submit a revised version, though I have no timescale yet.
OK thanks for the update.
--
Michael


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-21 21:00:05
Message-ID: 20141221210005.GA1864976@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 13, 2014 at 04:37:48AM +0000, Andrew Gierth wrote:
> >>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> >> I'd already explained in more detail way back when we posted the
> >> patch. But to reiterate: the ChainAggregate nodes pass through
> >> their input data unchanged, but on group boundaries they write
> >> aggregated result rows to a tuplestore shared by the whole
> >> chain. The top node returns the data from the tuplestore after its
> >> own output is completed.
>
> Tom> That seems pretty grotty from a performance+memory consumption
> Tom> standpoint. At peak memory usage, each one of the Sort nodes
> Tom> will contain every input row,
>
> Has this objection ever been raised for WindowAgg, which has the same
> issue?

I caution against using window function performance as the template for
GROUPING SETS performance goals. The benefit of GROUPING SETS compared to its
UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance
opportunities. Contrast that having window functions is great even with naive
performance, because they enable tasks that are otherwise too hard in SQL.

> Tom> In principle, with the CTE+UNION approach I was suggesting, the
> Tom> peak memory consumption would be one copy of the input rows in
> Tom> the CTE's tuplestore plus one copy in the active branch's Sort
> Tom> node. I think a bit of effort would be needed to get there (ie,
> Tom> shut down one branch's Sort node before starting the next,
> Tom> something I'm pretty sure doesn't happen today).
>
> Correct, it doesn't.
>
> However, I notice that having ChainAggregate shut down its input would
> also have the effect of bounding the memory usage (to two copies,
> which is as good as the append+sorts+CTE case).

Agreed, and I find that more promising than the CTE approach. Both strategies
require temporary space covering two copies of the input data. (That, or you
accept rescanning the original input.) The chained approach performs less
I/O. Consider "SELECT count(*) FROM t GROUP BY GROUPING SETS (a, b)", where
pg_relation_size(t) >> RAM. I/O consumed with the chained approach:

read table
write tuplesort 1
read tuplesort 1
write tuplesort 2
read tuplesort 2

I/O consumed with the CTE approach:

read table
write CTE
read CTE
write tuplesort 1
read tuplesort 1
read CTE
write tuplesort 2
read tuplesort 2

Tom rightly brought up the space requirements for result rows. The CTE
approach naturally avoids reserving space for that. However, I find it a safe
bet to optimize GROUPING SETS for input >> result. Reserving temporary space
for result rows to save input data I/O is a good trade. We don't actually
need to compromise; one can imagine a GroupAggregateChain plan node with a
sortChain list that exhibits the efficiencies of both. I'm fine moving
forward with the cross-node tuplestore, though.

The elephant in the performance room is the absence of hash aggregation. I
agree with your decision to make that a follow-on patch, but the project would
be in an awkward PR situation if 9.5 has GroupAggregate-only GROUPING SETS. I
may argue to #ifdef-out the feature rather than release that way. We don't
need to debate that prematurely, but keep it in mind while planning.

Thanks,
nm


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 15:46:16
Message-ID: 19548.1419263176@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Noah Misch <noah(at)leadboat(dot)com> writes:
> On Sat, Dec 13, 2014 at 04:37:48AM +0000, Andrew Gierth wrote:
> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Tom> That seems pretty grotty from a performance+memory consumption
>> Tom> standpoint. At peak memory usage, each one of the Sort nodes
>> Tom> will contain every input row,

>> Has this objection ever been raised for WindowAgg, which has the same
>> issue?

> I caution against using window function performance as the template for
> GROUPING SETS performance goals. The benefit of GROUPING SETS compared to its
> UNION ALL functional equivalent is 15% syntactic pleasantness, 85% performance
> opportunities. Contrast that having window functions is great even with naive
> performance, because they enable tasks that are otherwise too hard in SQL.

The other reason that's a bad comparison is that I've not seen many
queries that use more than a couple of window frames, whereas we have
to expect that the number of grouping sets in typical queries will be
significantly more than "a couple". So we do have to think about what
the performance will be like with a lot of sort steps. I'm also worried
that this use-case may finally force us to do something about the "one
work_mem per sort node" behavior, unless we can hack things so that only
one or two sorts reach max memory consumption concurrently.

I still find the ChainAggregate approach too ugly at a system structural
level to accept, regardless of Noah's argument about number of I/O cycles
consumed. We'll be paying for that in complexity and bugs into the
indefinite future, and I wonder if it isn't going to foreclose some other
"performance opportunities" as well.

regards, tom lane


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 16:19:57
Message-ID: 8761d3is6c.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Tom" == Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

[Noah]
>> I caution against using window function performance as the
>> template for GROUPING SETS performance goals. The benefit of
>> GROUPING SETS compared to its UNION ALL functional equivalent is
>> 15% syntactic pleasantness, 85% performance opportunities.
>> Contrast that having window functions is great even with naive
>> performance, because they enable tasks that are otherwise too hard
>> in SQL.

Yes, this is a reasonable point.

Tom> The other reason that's a bad comparison is that I've not seen
Tom> many queries that use more than a couple of window frames,
Tom> whereas we have to expect that the number of grouping sets in
Tom> typical queries will be significantly more than "a couple".

I would be interested in seeing more good examples of the size and
type of grouping sets used in typical queries.

Tom> So we do have to think about what the performance will be like
Tom> with a lot of sort steps. I'm also worried that this use-case
Tom> may finally force us to do something about the "one work_mem per
Tom> sort node" behavior, unless we can hack things so that only one
Tom> or two sorts reach max memory consumption concurrently.

Modifying ChainAggregate so that only two sorts reach max memory
consumption concurrently seems to have been quite simple to implement,
though I'm still testing some aspects of it.

--
Andrew (irc:RhodiumToad)


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 19:38:28
Message-ID: CA+TgmoZxocVRLVnWJRTSwHnc9ZCgp97zFsABA16jNSKrQPs_bQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> Tom> The other reason that's a bad comparison is that I've not seen
> Tom> many queries that use more than a couple of window frames,
> Tom> whereas we have to expect that the number of grouping sets in
> Tom> typical queries will be significantly more than "a couple".
>
> I would be interested in seeing more good examples of the size and
> type of grouping sets used in typical queries.

From what I have seen, there is interest in being able to do things
like GROUP BY CUBE(a, b, c, d) and have that be efficient. That will
require 16 different groupings, and we really want to minimize the
number of times we have to re-sort to get all of those done. For
example, if we start by sorting on (a, b, c, d), we want to then make
a single pass over the data computing the aggregates with (a, b, c,
d), (a, b, c), (a, b), (a), and () as the grouping columns.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 19:40:26
Message-ID: CAOeZVid-wruEF3NJgtag24uJnYCQ4TG_rMQCpGZnXfEqV86ODQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday, December 23, 2014, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Mon, Dec 22, 2014 at 11:19 AM, Andrew Gierth
> <andrew(at)tao11(dot)riddles(dot)org(dot)uk <javascript:;>> wrote:
> > Tom> The other reason that's a bad comparison is that I've not seen
> > Tom> many queries that use more than a couple of window frames,
> > Tom> whereas we have to expect that the number of grouping sets in
> > Tom> typical queries will be significantly more than "a couple".
> >
> > I would be interested in seeing more good examples of the size and
> > type of grouping sets used in typical queries.
>
> From what I have seen, there is interest in being able to do things
> like GROUP BY CUBE(a, b, c, d) and have that be efficient. That will
> require 16 different groupings, and we really want to minimize the
> number of times we have to re-sort to get all of those done. For
> example, if we start by sorting on (a, b, c, d), we want to then make
> a single pass over the data computing the aggregates with (a, b, c,
> d), (a, b, c), (a, b), (a), and () as the grouping columns.
>
>
>
That is what ChainAggregate node does exactly. A set of orders that fit in
a single ROLLUP list (like your example) are processed in a single go.

--
Regards,

Atri
*l'apprenant*


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-22 23:57:43
Message-ID: 87wq5jgv1c.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Robert" == Robert Haas <robertmhaas(at)gmail(dot)com> writes:

>> I would be interested in seeing more good examples of the size and
>> type of grouping sets used in typical queries.

Robert> From what I have seen, there is interest in being able to do
Robert> things like GROUP BY CUBE(a, b, c, d) and have that be
Robert> efficient.

Yes, but that's not telling me anything I didn't already know.

What I'm curious about is things like:

- what's the largest cube(...) people actually make use of in practice

- do people make much use of the ability to mix cube and rollup, or
take the cross product of multiple grouping sets

- what's the most complex GROUPING SETS clause anyone has seen in
common use

Robert> That will require 16 different groupings, and we really want
Robert> to minimize the number of times we have to re-sort to get all
Robert> of those done. For example, if we start by sorting on (a, b,
Robert> c, d), we want to then make a single pass over the data
Robert> computing the aggregates with (a, b, c, d), (a, b, c), (a,
Robert> b), (a), and () as the grouping columns.

In the case of cube(a,b,c,d), our code currently gives:

b,d,a,c: (b,d,a,c),(b,d)
a,b,d: (a,b,d),(a,b)
d,a,c: (d,a,c),(d,a),(d)
c,d: (c,d),(c)
b,c,d: (b,c,d),(b,c),(b)
a,c,b: (a,c,b),(a,c),(a),()

There is no solution in less than 6 sorts. (There are many possible
solutions in 6 sorts, but we don't attempt to prefer one over
another. The minimum number of sorts for a cube of N dimensions is
obviously N! / (r! * (N-r)!) where r = floor(N/2).)

If you want the theory: the set of grouping sets is a poset ordered by
set inclusion; what we want is a minimal partition of this poset into
chains (since any chain can be processed in one pass), which happens
to be equivalent to the problem of maximum cardinality matching in a
bipartite graph, which we solve in polynomial time with the
Hopcroft-Karp algorithm. This guarantees us a minimal solution for
any combination of grouping sets however specified, not just for
cubes.

--
Andrew (irc:RhodiumToad)


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-23 07:29:58
Message-ID: 20141223072958.GB1900132@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
> I still find the ChainAggregate approach too ugly at a system structural
> level to accept, regardless of Noah's argument about number of I/O cycles
> consumed. We'll be paying for that in complexity and bugs into the
> indefinite future, and I wonder if it isn't going to foreclose some other
> "performance opportunities" as well.

Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
intersection between those having maintainable design and those having optimal
I/O usage, optimal memory usage, and optimal number of sorts. Let's put more
effort into finding it. I'm hearing that the shared tuplestore is
ChainAggregate's principal threat to system structure; is that right?


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-23 13:48:40
Message-ID: CA+TgmoYLgenTkWH0_qrU4VoOwy_kMrsm449dyJEcnrHT8oR+cw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Dec 22, 2014 at 6:57 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
> In the case of cube(a,b,c,d), our code currently gives:
>
> b,d,a,c: (b,d,a,c),(b,d)
> a,b,d: (a,b,d),(a,b)
> d,a,c: (d,a,c),(d,a),(d)
> c,d: (c,d),(c)
> b,c,d: (b,c,d),(b,c),(b)
> a,c,b: (a,c,b),(a,c),(a),()

That's pretty cool.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Noah Misch <noah(at)leadboat(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 08:58:45
Message-ID: 20141231085845.GA2148306@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 23, 2014 at 02:29:58AM -0500, Noah Misch wrote:
> On Mon, Dec 22, 2014 at 10:46:16AM -0500, Tom Lane wrote:
> > I still find the ChainAggregate approach too ugly at a system structural
> > level to accept, regardless of Noah's argument about number of I/O cycles
> > consumed. We'll be paying for that in complexity and bugs into the
> > indefinite future, and I wonder if it isn't going to foreclose some other
> > "performance opportunities" as well.
>
> Among GROUPING SETS GroupAggregate implementations, I bet there's a nonempty
> intersection between those having maintainable design and those having optimal
> I/O usage, optimal memory usage, and optimal number of sorts. Let's put more
> effort into finding it. I'm hearing that the shared tuplestore is
> ChainAggregate's principal threat to system structure; is that right?

The underlying algorithm, if naively expressed in terms of our executor
concepts, would call ExecProcNode() on a SortState, then feed the resulting
slot to both a GroupAggregate and to another Sort. That implies a non-tree
graph of executor nodes, which isn't going to fly anytime soon. The CTE
approach bypasses that problem by eliminating cooperation between sorts,
instead reading 2N+1 copies of the source data for N sorts. ChainAggregate is
a bit like a node having two parents, a Sort and a GroupAggregate. However,
the graph edge between ChainAggregate and its GroupAggregate is a tuplestore
instead of the usual, synchronous ExecProcNode().

Suppose one node orchestrated all sorting and aggregation. Call it a
MultiGroupAggregate for now. It wouldn't harness Sort nodes, because it
performs aggregation between tuplesort_puttupleslot() calls. Instead, it
would directly manage two Tuplesortstate, CUR and NEXT. The node would have
an initial phase similar to ExecSort(), in which it drains the outer node to
populate the first CUR. After that, it looks more like agg_retrieve_direct(),
except that CUR is the input source, and each tuple drawn is also put into
NEXT. When done with one CUR, swap CUR with NEXT and reinitialize NEXT. This
design does not add I/O consumption or require a nonstandard communication
channel between executor nodes. Tom, Andrew, does that look satisfactory?

Thanks,
nm


From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 09:15:29
Message-ID: CAOeZVidbB5v0FPG2=_QcROGvYJe+41LMiJ1Xy_ptqFZjNnp0Hw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ChainAggregate is

> a bit like a node having two parents, a Sort and a GroupAggregate.
> However,
> the graph edge between ChainAggregate and its GroupAggregate is a
> tuplestore
> instead of the usual, synchronous ExecProcNode().
>

Well, I dont buy the two parents theory. The Sort nodes are intermediately
stacked amongst ChainAggregate nodes, so there is still the single edge.
However, as you rightly said, there is a shared tuplestore, but note that
only the head of chain ChainAggregate has the top GroupAggregate as its
parent.

>
> Suppose one node orchestrated all sorting and aggregation. Call it a
> MultiGroupAggregate for now. It wouldn't harness Sort nodes, because it
> performs aggregation between tuplesort_puttupleslot() calls. Instead, it
> would directly manage two Tuplesortstate, CUR and NEXT. The node would
> have
> an initial phase similar to ExecSort(), in which it drains the outer node
> to
> populate the first CUR. After that, it looks more like
> agg_retrieve_direct(),
> except that CUR is the input source, and each tuple drawn is also put into
> NEXT. When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> This
> design does not add I/O consumption or require a nonstandard communication
> channel between executor nodes. Tom, Andrew, does that look satisfactory?
>
>
So you are essentially proposing merging ChainAggregate and its
corresponding Sort node?

So the structure would be something like:

GroupAggregate
--> MultiGroupAgg (a,b)
----> MultiGroupAgg (c,d) ...

I am not sure if I understand you correctly. Only the top level
GroupAggregate node projects the result of the entire operation. The key to
ChainAggregate nodes is that each ChainAggregate node handles grouping sets
that fit a single ROLLUP list i.e. can be done by a single sort order.
There can be multiple lists of this type in a single GS operation, however,
our current design has only a single top GroupAggregate node but a
ChainAggregate node + Sort node per sort order. If you are proposing
replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
with a single MultiGroupAggregate node, I am not able to understand how it
will handle all the multiple sort orders. If you are proposing replacing
only ChainAggregate + Sort node with a single MultiGroupAgg node, that
still shares the tuplestore with top level GroupAggregate node.

I am pretty sure I have messed up my understanding of your proposal. Please
correct me if I am wrong.

Regards,

Atri

--
Regards,

Atri
*l'apprenant*


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 17:33:43
Message-ID: 87d26zd9k8.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Noah" == Noah Misch <noah(at)leadboat(dot)com> writes:

Noah> Suppose one node orchestrated all sorting and aggregation.

Well, that has the downside of making it into an opaque blob, without
actually gaining much.

Noah> Call it a MultiGroupAggregate for now. It wouldn't harness
Noah> Sort nodes, because it performs aggregation between
Noah> tuplesort_puttupleslot() calls. Instead, it would directly
Noah> manage two Tuplesortstate, CUR and NEXT. The node would have
Noah> an initial phase similar to ExecSort(), in which it drains the
Noah> outer node to populate the first CUR. After that, it looks
Noah> more like agg_retrieve_direct(),

agg_retrieve_direct is already complex enough, and this would be
substantially more so, as compared to agg_retrieve_chained which is
substantially simpler.

A more serious objection is that this forecloses (or at least makes
much more complex) the future possibility of doing some grouping sets
by sorting and others by hashing. The chained approach specifically
allows for the future possibility of using a HashAggregate as the
chain head, so that for example cube(a,b) can be implemented as a
sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
it to be done with one sort even if the result size for (a,b) is too
big to hash.

Noah> Tom, Andrew, does that look satisfactory?

Not to me.

--
Andrew (irc:RhodiumToad)


From: Noah Misch <noah(at)leadboat(dot)com>
To: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 21:01:59
Message-ID: 20141231210159.GA2159277@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 31, 2014 at 02:45:29PM +0530, Atri Sharma wrote:
> > Suppose one node orchestrated all sorting and aggregation. Call it a
> > MultiGroupAggregate for now. It wouldn't harness Sort nodes, because it
> > performs aggregation between tuplesort_puttupleslot() calls. Instead, it
> > would directly manage two Tuplesortstate, CUR and NEXT. The node would
> > have
> > an initial phase similar to ExecSort(), in which it drains the outer node
> > to
> > populate the first CUR. After that, it looks more like
> > agg_retrieve_direct(),
> > except that CUR is the input source, and each tuple drawn is also put into
> > NEXT. When done with one CUR, swap CUR with NEXT and reinitialize NEXT.
> > This
> > design does not add I/O consumption or require a nonstandard communication
> > channel between executor nodes. Tom, Andrew, does that look satisfactory?
> >
> >
> So you are essentially proposing merging ChainAggregate and its
> corresponding Sort node?
>
> So the structure would be something like:
>
> GroupAggregate
> --> MultiGroupAgg (a,b)
> ----> MultiGroupAgg (c,d) ...

No.

> If you are proposing
> replacing GroupAggregate node + entire ChainAggregate + Sort nodes stack
> with a single MultiGroupAggregate node, I am not able to understand how it
> will handle all the multiple sort orders.

Yes, I was proposing that. My paragraph that you quoted above was the attempt
to explain how the node would manage multiple sort orders. If you have
specific questions about it, feel free to ask.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2014-12-31 21:05:53
Message-ID: 20141231210553.GB2159277@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Dec 31, 2014 at 05:33:43PM +0000, Andrew Gierth wrote:
> >>>>> "Noah" == Noah Misch <noah(at)leadboat(dot)com> writes:
>
> Noah> Suppose one node orchestrated all sorting and aggregation.
>
> Well, that has the downside of making it into an opaque blob, without
> actually gaining much.

The opaque-blob criticism is valid. As for not gaining much, well, the gain I
sought was to break this stalemate. You and Tom have expressed willingness to
accept the read I/O multiplier of the CTE approach. You and I are willing to
swallow an architecture disruption, namely a tuplestore acting as a side
channel between executor nodes. Given your NACK, I agree that it fails to
move us toward consensus and therefore does not gain much. Alas.

> A more serious objection is that this forecloses (or at least makes
> much more complex) the future possibility of doing some grouping sets
> by sorting and others by hashing. The chained approach specifically
> allows for the future possibility of using a HashAggregate as the
> chain head, so that for example cube(a,b) can be implemented as a
> sorted agg for (a,b) and (a) and a hashed agg for (b) and (), allowing
> it to be done with one sort even if the result size for (a,b) is too
> big to hash.

That's a fair criticism, too. Ingesting nodeSort.c into nodeAgg.c wouldn't be
too bad, because nodeSort.c is a thin wrapper around tuplesort.c. Ingesting
nodeHash.c is not so tidy; that could entail extracting a module similar in
level to tuplesort.c, to be consumed by both executor nodes. This does raise
the good point that the GROUPING SETS _design_ ought to consider group and
hash aggregation together. Designing one in isolation carries too high of a
risk of painting the other into a corner.

Thanks,
nm


From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-01-02 21:55:23
Message-ID: 54A713CB.6020604@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/31/14, 3:05 PM, Noah Misch wrote:
> On Wed, Dec 31, 2014 at 05:33:43PM +0000, Andrew Gierth wrote:
>>>>>>> > >>>>>"Noah" == Noah Misch<noah(at)leadboat(dot)com> writes:
>> >
>> > Noah> Suppose one node orchestrated all sorting and aggregation.
>> >
>> >Well, that has the downside of making it into an opaque blob, without
>> >actually gaining much.
> The opaque-blob criticism is valid. As for not gaining much, well, the gain I
> sought was to break this stalemate. You and Tom have expressed willingness to
> accept the read I/O multiplier of the CTE approach. You and I are willing to
> swallow an architecture disruption, namely a tuplestore acting as a side
> channel between executor nodes. Given your NACK, I agree that it fails to
> move us toward consensus and therefore does not gain much. Alas.

I haven't read the full discussion in depth, but is what we'd want here is the ability to feed tuples to more than one node simultaneously? That would allow things like

GroupAggregate
--> Sort(a) \
------------+--> Sort(a,b) -\
--> Hash(b) ----------------+
\--> SeqScan

That would allow the planner to trade off things like total memory consumption vs IO.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


From: Noah Misch <noah(at)leadboat(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-01-03 17:53:20
Message-ID: 20150103175320.GB2241798@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 02, 2015 at 03:55:23PM -0600, Jim Nasby wrote:
> On 12/31/14, 3:05 PM, Noah Misch wrote:
> >On Wed, Dec 31, 2014 at 05:33:43PM +0000, Andrew Gierth wrote:
> >>>>>>>> >>>>>"Noah" == Noah Misch<noah(at)leadboat(dot)com> writes:
> >>>
> >>> Noah> Suppose one node orchestrated all sorting and aggregation.
> >>>
> >>>Well, that has the downside of making it into an opaque blob, without
> >>>actually gaining much.
> >The opaque-blob criticism is valid. As for not gaining much, well, the gain I
> >sought was to break this stalemate. You and Tom have expressed willingness to
> >accept the read I/O multiplier of the CTE approach. You and I are willing to
> >swallow an architecture disruption, namely a tuplestore acting as a side
> >channel between executor nodes. Given your NACK, I agree that it fails to
> >move us toward consensus and therefore does not gain much. Alas.
>
> I haven't read the full discussion in depth, but is what we'd want here is the ability to feed tuples to more than one node simultaneously?

A similar comment appeared shortly upthread. Given a planner and executor
capable of that, we would do so here. Changing the planner and executor
architecture to support it is its own large, open-ended project.


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-01-06 20:21:56
Message-ID: 87h9w38zhg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Herewith the latest version of the patch.

Stuff previously discussed but NOT changed in this version:

1. Still uses the ChainAggregate mechanism. As mentioned before, I'm
happy to abandon this given a better option, but I've not been given
anything to work with yet.

2. Still has GroupedVar. I have no objection in principle to taking
this out, but the details of doing so depend on the chain-agg vs.
possible Append/CTE approach (or other approach), so I don't want to
make unnecessary work by doing this prematurely and having to re-do
it.

Stuff changed:

1. Lotsa comments.

2. Node functions and declarations are re-ordered for consistency.
Renamed "Grouping" expression node to "GroupingFunc".

3. Memory usage is now constrained so that no more than 2, or in some
cases 3, sort nodes in a chain are active at one time. (I've tested
this by monitoring the memory usage for large cubes). (The case of 3
active sorts is if REWIND was requested and we added a sort node to
the input plan; while we have to re-sort the intermediate sorts on a
rewind if there are more than two of them, we keep the originally
sorted input to avoid rewinding the input plan.)

4. A problem of incorrect size estimation was corrected (thinko).

5. Tested, but provisionally rejected, the approach of preferring to
use Bitmapsets rather than integer lists. While there is a slight code
simplification (offset by greater confusion over whether we're dealing
with lists or bitmaps at any given point), and very minor performance
gain on contrived cases, the big drawback is that the order of clauses
in the query is destroyed even when it is surprising to the user to do
so.

6. CUBE and ROLLUP are unreserved, and ruleutils is modified to
schema-qualify uses of them as plain functions in group by clauses, in
addition to adding extra parens as the previous patch did. Accordingly
there are no changes to contrib/cube or contrib/earthdistance now.
(Upgrading from older versions may require --quote-all-identifiers if
for some bizarre reason cube() appears in a group by clause of a view.)

7. Fixed a bug in handling direct args of ordered-set aggs.

8. Some variable name cleanup and general tidying.

This is now one big patch (per Tom's gripe about the previous one
being split up, even though there were reasons for that).

One possible new issue is that the memory usage constraint now means
that the explain analyze output shows no memory stats for most of the
sort nodes. This is arguably more accurate, since if each node
displayed its actual memory usage it would look like the plan uses
more memory than it actually does; but it's still a bit odd. (It
happens because the preemptive rescan call discards the actual
statistics)

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-all.patch text/x-patch 269.6 KB

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-01-20 21:02:50
Message-ID: 8761c1mbgw.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated patch (mostly just conflict resolution):

- fix explain code to track changes to deparse context handling

- tiny expansion of some comments (clarify in nodeAgg header
comment that aggcontexts are now EContexts rather than just
memory contexts)

- declare support for features in sql_features.txt, which had been
previously overlooked

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-all.patch text/x-patch 271.0 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-02-13 07:12:35
Message-ID: CAB7nPqQFG_0DAuaQ1fjL4OzjOi5++k73HaM+KqJdX5vboX_doQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 21, 2015 at 6:02 AM, Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
wrote:

> Updated patch (mostly just conflict resolution):
>
> - fix explain code to track changes to deparse context handling
>
> - tiny expansion of some comments (clarify in nodeAgg header
> comment that aggcontexts are now EContexts rather than just
> memory contexts)
>
> - declare support for features in sql_features.txt, which had been
> previously overlooked
>
>
Patch moved to CF 2015-02.
--
Michael


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Cc: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-03-09 18:02:18
Message-ID: 87d24iukc5.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated patch:

- updated to latest head

- Removed MemoryContextDeleteChildren calls made redundant by the
recent change to MemoryContextReset

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-all-latest.patch text/x-patch 271.1 KB

From: Svenne Krap <svenne(at)krap(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-03-13 12:43:26
Message-ID: 20150313124326.2548.55507.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Patch from message (87d24iukc5(dot)fsf(at)news-spur(dot)riddles(dot)org(dot)uk) fails (tried to apply on top of ebc0f5e01d2f ), as b55722692 has broken up the line (in src/backend/optimizer/util/pathnode.c):

pathnode->path.rows = estimate_num_groups(root, uniq_exprs, rel->rows);

After patching the added parameter (NULL) in by hand, the build fails as src/backend/optimizer/path/indxpath.c:1953 misses the new argument as well - this change is not in the patch.

/Svenne

The new status of this patch is: Waiting on Author


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-03-13 15:46:15
Message-ID: 87d24cq4w4.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Updated patch:

- updated to latest head fixing conflicts with b5572269

--
Andrew (irc:RhodiumToad)

Attachment Content-Type Size
gsp-all-latest.patch text/x-patch 271.6 KB

From: Svenne Krap <svenne(at)krap(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-03-18 16:18:02
Message-ID: 20150318161802.4007.56450.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

This is a midway review, a later will complete it.

The patch applies against 8d1f239003d0245dda636dfa6cf0add13bee69d6 and builds correctly. Make installcheck-world fails, but it seems to be somewhere totally unrelated (TCL pl)...

The documentation is very well-written and the patch implements the documented syntax.

I still need to check against the standard and I will run it against a non-trivival production load... hopefully I will finish up my review shortly after the weekend...


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Svenne Krap <svenne(at)krap(dot)dk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-03-26 22:17:27
Message-ID: 87y4mj2yxq.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Svenne" == Svenne Krap <svenne(at)krap(dot)dk> writes:

Svenne> I still need to check against the standard and I will run it
Svenne> against a non-trivival production load... hopefully I will
Svenne> finish up my review shortly after the weekend...

Thanks for the review so far; any progress? I'm quite interested in
collecting samples of realistic grouping sets queries and their
performance, for use in possible further optimization work. (I don't
need full data or anything like that, just "this query ran in x seconds
on N million rows, which is fast enough/not fast enough/too slow to be
any use")

Let me know if there's anything you need...

--
Andrew (irc:RhodiumToad)


From: Svenne Krap <svenne(dot)lists(at)krap(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-03-27 08:03:59
Message-ID: 55150EEF.9010005@krap.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 18-03-2015 17:18, Svenne Krap wrote:
>
> I still need to check against the standard and I will run it against a
non-trivival production load... hopefully I will finish up my review
shortly after the weekend...

I am still on it, but a little delayed. I hope to get it done this weekend.

Svenne


From: Svenne Krap <svenne(at)krap(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-04-20 08:36:58
Message-ID: 20150420083658.2543.70336.pgcf@coridan.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The following review has been posted through the commitfest application:
make installcheck-world: tested, failed
Implements feature: tested, passed
Spec compliant: not tested
Documentation: tested, passed

Hi,

I have (finally) found time to review this.

The syntax is as per spec as I can see, and the queries I have tested have all produced the correct output.

The documentation looks good and is clear.

I think it is spec compliant, but I am not used enough to the spec to be sure. Also I have not understood the function of <set quantifier> (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I haven't marked the spec compliant part.

The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a sorting problem when looking at the diff) which should be unrelated to GSP. I don't know enough of the check to know if it has already run the GSP tests..

I have also been running a few tests on some real data. This is run on my laptop with 32 GB of memory and a fast SSD.

The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a tiny multi-column lookup table. I am returning a count(*).
Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and ROLLUP both works fine and if work_buffers are large enough it slightly beats the handwritten "union all" equivalent (runtimes as 7,6 seconds to 7,7 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) seconds).

The other query is on the same datatable as before, but with three "columns" (two calculated and one natural) for a cube. I am returning a count(*).
First column is "extract year from date column"
Second column is "divide a value by something and truncate" (i.e. make buckets)
Third column is a litteral integer column.
Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). Nothing obvious about why in the explain (analyze,buffers,costs,timing) .

I have the explains, but as the dataset is semi-private and I don't have any easy way to edit out names in it, I will send it on request (non-disclosure from the recipient is of course a must) and not post it on the list.

I think the feature is ready to be commited, but am unsure whether I am qualified to gauge that :)

/Svenne

The new status of this patch is: Ready for Committer


From: Svenne Krap <svenne(dot)lists(at)krap(dot)dk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-04-20 08:40:28
Message-ID: 5534BB7C.7020507@krap.dk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Oh, and I build it on top of f92fc4c95ddcc25978354a8248d3df22269201bc

On 20-04-2015 10:36, Svenne Krap wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world: tested, failed
> Implements feature: tested, passed
> Spec compliant: not tested
> Documentation: tested, passed
>
> Hi,
>
> I have (finally) found time to review this.
>
> The syntax is as per spec as I can see, and the queries I have tested have all produced the correct output.
>
> The documentation looks good and is clear.
>
> I think it is spec compliant, but I am not used enough to the spec to be sure. Also I have not understood the function of <set quantifier> (DISTINCT,ALL) part in the group by clause (and hence not tested it). Hence I haven't marked the spec compliant part.
>
> The installcheck-world fails, but in src/pl/tcl/results/pltcl_queries.out (a sorting problem when looking at the diff) which should be unrelated to GSP. I don't know enough of the check to know if it has already run the GSP tests..
>
> I have also been running a few tests on some real data. This is run on my laptop with 32 GB of memory and a fast SSD.
>
> The first dataset is a join between a data table of 472 MB (4,3 Mrows) and a tiny multi-column lookup table. I am returning a count(*).
> Here the data is hierarchical so CUBE does not make sense. GROUPING SETS and ROLLUP both works fine and if work_buffers are large enough it slightly beats the handwritten "union all" equivalent (runtimes as 7,6 seconds to 7,7 seconds). If work_buffers are the default 4MB the union-all-equivalent (UAE) beats the GS-query almost 2:1 due to disk spill (14,3 (GS) vs. 8,2 (UAE) seconds).
>
> The other query is on the same datatable as before, but with three "columns" (two calculated and one natural) for a cube. I am returning a count(*).
> First column is "extract year from date column"
> Second column is "divide a value by something and truncate" (i.e. make buckets)
> Third column is a litteral integer column.
> Here the GS-version is slightly slower than the UAE-version (17,5 vs. 14,2). Nothing obvious about why in the explain (analyze,buffers,costs,timing) .
>
> I have the explains, but as the dataset is semi-private and I don't have any easy way to edit out names in it, I will send it on request (non-disclosure from the recipient is of course a must) and not post it on the list.
>
> I think the feature is ready to be commited, but am unsure whether I am qualified to gauge that :)
>
> /Svenne
>
> The new status of this patch is: Ready for Committer
>
>


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Svenne Krap <svenne(at)krap(dot)dk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP Patch for GROUPING SETS phase 1
Date: 2015-04-21 15:06:58
Message-ID: 87k2x55xpk.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Svenne" == Svenne Krap <svenne(at)krap(dot)dk> writes:

Svenne> I have the explains,

Can you post the explain analyze outputs?

If need be, you can anonymize the table and column names and any
identifiers by using the anonymization option of explain.depesz.com, but
please only do that if you actually need to.

--
Andrew (irc:RhodiumToad)


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-04-30 00:07:30
Message-ID: 20150430000730.GB16366@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

This is not a real review. I'm just scanning through the patch, without
reading the thread, to understand if I see something "worthy" of
controversy. While scanning I might have a couple observations or
questions.

On 2015-03-13 15:46:15 +0000, Andrew Gierth wrote:

> + * A list of grouping sets which is structurally equivalent to a ROLLUP
> + * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a single pass over
> + * ordered data. We do this by keeping a separate set of transition values
> + * for each grouping set being concurrently processed; for each input tuple
> + * we update them all, and on group boundaries we reset some initial subset
> + * of the states (the list of grouping sets is ordered from most specific to
> + * least specific). One AGG_SORTED node thus handles any number of grouping
> + * sets as long as they share a sort order.

Found "initial subset" not very clear, even if I probably guessed the
right meaning.

> + * To handle multiple grouping sets that _don't_ share a sort order, we use
> + * a different strategy. An AGG_CHAINED node receives rows in sorted order
> + * and returns them unchanged, but computes transition values for its own
> + * list of grouping sets. At group boundaries, rather than returning the
> + * aggregated row (which is incompatible with the input rows), it writes it
> + * to a side-channel in the form of a tuplestore. Thus, a number of
> + * AGG_CHAINED nodes are associated with a single AGG_SORTED node (the
> + * "chain head"), which creates the side channel and, when it has returned
> + * all of its own data, returns the tuples from the tuplestore to its own
> + * caller.

This paragraph deserves to be expanded imo.

> + * In order to avoid excess memory consumption from a chain of alternating
> + * Sort and AGG_CHAINED nodes, we reset each child Sort node preemptively,
> + * allowing us to cap the memory usage for all the sorts in the chain at
> + * twice the usage for a single node.

What does reseting 'preemtively' mean?

> + * From the perspective of aggregate transition and final functions, the
> + * only issue regarding grouping sets is this: a single call site (flinfo)
> + * of an aggregate function may be used for updating several different
> + * transition values in turn. So the function must not cache in the flinfo
> + * anything which logically belongs as part of the transition value (most
> + * importantly, the memory context in which the transition value exists).
> + * The support API functions (AggCheckCallContext, AggRegisterCallback) are
> + * sensitive to the grouping set for which the aggregate function is
> + * currently being called.

Hm. I've seen a bunch of aggreates do this.

> + * TODO: AGG_HASHED doesn't support multiple grouping sets yet.

Are you intending to resolve this before an eventual commit? Possibly
after the 'structural' issues are resolved? Or do you think this can
safely be put of for another release?

> @@ -534,11 +603,13 @@ static void
> advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
> {
> int aggno;
> + int setno = 0;
> + int numGroupingSets = Max(aggstate->numsets, 1);
> + int numAggs = aggstate->numaggs;
>
> - for (aggno = 0; aggno < aggstate->numaggs; aggno++)
> + for (aggno = 0; aggno < numAggs; aggno++)
> {
> AggStatePerAgg peraggstate = &aggstate->peragg[aggno];
> - AggStatePerGroup pergroupstate = &pergroup[aggno];
> ExprState *filter = peraggstate->aggrefstate->aggfilter;
> int numTransInputs = peraggstate->numTransInputs;
> int i;
> @@ -582,13 +653,16 @@ advance_aggregates(AggState *aggstate, AggStatePerGroup pergroup)
> continue;
> }
>
> - /* OK, put the tuple into the tuplesort object */
> - if (peraggstate->numInputs == 1)
> - tuplesort_putdatum(peraggstate->sortstate,
> - slot->tts_values[0],
> - slot->tts_isnull[0]);
> - else
> - tuplesort_puttupleslot(peraggstate->sortstate, slot);
> + for (setno = 0; setno < numGroupingSets; setno++)
> + {
> + /* OK, put the tuple into the tuplesort object */
> + if (peraggstate->numInputs == 1)
> + tuplesort_putdatum(peraggstate->sortstates[setno],
> + slot->tts_values[0],
> + slot->tts_isnull[0]);
> + else
> + tuplesort_puttupleslot(peraggstate->sortstates[setno], slot);
> + }
> }

Hm. So a normal GROUP BY is just a subcase of grouping sets. Seems to
make sense, but worthwhile to mention somewhere in the intro.

> + if (!node->agg_done)
> + {
> + /* Dispatch based on strategy */
> + switch (((Agg *) node->ss.ps.plan)->aggstrategy)
> + {
> + case AGG_HASHED:
> + if (!node->table_filled)
> + agg_fill_hash_table(node);
> + result = agg_retrieve_hash_table(node);
> + break;
> + case AGG_CHAINED:
> + result = agg_retrieve_chained(node);
> + break;
> + default:
> + result = agg_retrieve_direct(node);
> + break;
> + }
> +
> + if (!TupIsNull(result))
> + return result;
> + }

Maybe it's just me, but I get twitchy if I see a default being used like
this. I'd much, much rather see the two remaining AGG_* types and get a
warning from the compiler if a new one is added.
> + /*-
> + * If a subgroup for the current grouping set is present, project it.
> + *
> + * We have a new group if:
> + * - we're out of input but haven't projected all grouping sets
> + * (checked above)
> + * OR
> + * - we already projected a row that wasn't from the last grouping
> + * set
> + * AND
> + * - the next grouping set has at least one grouping column (since
> + * empty grouping sets project only once input is exhausted)
> + * AND
> + * - the previous and pending rows differ on the grouping columns
> + * of the next grouping set
> + */
> + if (aggstate->input_done
> + || (node->aggstrategy == AGG_SORTED
> + && aggstate->projected_set != -1
> + && aggstate->projected_set < (numGroupingSets - 1)
> + && nextSetSize > 0
> + && !execTuplesMatch(econtext->ecxt_outertuple,
> + tmpcontext->ecxt_outertuple,
> + nextSetSize,
> + node->grpColIdx,
> + aggstate->eqfunctions,
> + tmpcontext->ecxt_per_tuple_memory)))

I'll bet this will look absolutely horrid after a pgindent run :/

> +/*
> + * We want to produce the absolute minimum possible number of lists here to
> + * avoid excess sorts. Fortunately, there is an algorithm for this; the problem
> + * of finding the minimal partition of a poset into chains (which is what we
> + * need, taking the list of grouping sets as a poset ordered by set inclusion)
> + * can be mapped to the problem of finding the maximum cardinality matching on
> + * a bipartite graph, which is solvable in polynomial time with a worst case of
> + * no worse than O(n^2.5) and usually much better. Since our N is at most 4096,
> + * we don't need to consider fallbacks to heuristic or approximate methods.
> + * (Planning time for a 12-d cube is under half a second on my modest system
> + * even with optimization off and assertions on.)

I think using the long form of poset once would be a good thing.

> + * We use the Hopcroft-Karp algorithm for the graph matching; it seems to work
> + * well enough for our purposes. This implementation is based on pseudocode
> + * found at:
> + *
> + * http://en.wikipedia.org/w/index.php?title=Hopcroft%E2%80%93Karp_algorithm&oldid=593898016
> + *
> + * This implementation uses the same indices for elements of U and V (the two
> + * halves of the graph) because in our case they are always the same size, and
> + * we always know whether an index represents a u or a v. Index 0 is reserved
> + * for the NIL node.
> + */
> +
> +struct hk_state
> +{
> + int graph_size; /* size of half the graph plus NIL node */
> + int matching;
> + short **adjacency; /* adjacency[u] = [n, v1,v2,v3,...,vn] */
> + short *pair_uv; /* pair_uv[u] -> v */
> + short *pair_vu; /* pair_vu[v] -> u */
> + float *distance; /* distance[u], float so we can have +inf */
> + short *queue; /* queue storage for breadth search */
> +};

I wonder if it'd not be better to put this in a separate file. Most
readers just won't care about this bit and the file is long enough.

> - if (!parse->hasAggs && !parse->groupClause && !root->hasHavingQual &&
> + if (!parse->hasAggs && !parse->groupClause && !parse->groupingSets && !root->hasHavingQual &&
> !parse->hasWindowFuncs)
> {

Looks like well above 80 lines.
> %nonassoc UNBOUNDED /* ideally should have same precedence as IDENT */
> -%nonassoc IDENT NULL_P PARTITION RANGE ROWS PRECEDING FOLLOWING
> +%nonassoc IDENT NULL_P PARTITION RANGE ROWS PRECEDING FOLLOWING CUBE ROLLUP
> %left Op OPERATOR /* multi-character ops and user-defined operators */

> +/*
> + * These hacks rely on setting precedence of CUBE and ROLLUP below that of '(',
> + * so that they shift in these rules rather than reducing the conflicting
> + * unreserved_keyword rule.
> + */
> +
> +rollup_clause:
> + ROLLUP '(' expr_list ')'
> + {
> + $$ = (Node *) makeGroupingSet(GROUPING_SET_ROLLUP, $3, @1);
> + }
> + ;
> +
> +cube_clause:
> + CUBE '(' expr_list ')'
> + {
> + $$ = (Node *) makeGroupingSet(GROUPING_SET_CUBE, $3, @1);
> + }
> + ;
> +
> +grouping_sets_clause:
> + GROUPING SETS '(' group_by_list ')'
> + {
> + $$ = (Node *) makeGroupingSet(GROUPING_SET_SETS, $4, @1);
> + }
> + ;
> +

This is somewhat icky. I've not really thought abuot this very much, but
it's imo something to pay attention to.

So, having quickly scanned through the patch, do I understand correctly
that the contentious problems are:

* Arguably this converts the execution *tree* into a DAG. Tom seems to
be rather uncomfortable with that. I am wondering whether this really
is a big deal - essentially this only happens in a relatively
'isolated' part of the tree right? I.e. if those chained together
nodes were considered one node, there would not be any loops?
Additionally, the way parametrized scans works already essentially
"violates" the tree paradigma somewhat.
There still might be better representations, about which I want to
think, don't get me wrong. I'm "just" not seing this as a fundamental
problem.
* The whole grammar/keyword issue. To me this seems to be a problem of
swallowing one out of several similarly coloured poisonous
pills. Which we can't really avoid, i.e. this isn't really this
patches fault that we have to make them.

Are those the two bigger controversial areas? Or are there others in
your respective views?

Greetings,

Andres Freund


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-04-30 04:35:26
Message-ID: 87oam6tjux.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

Andres> This is not a real review. I'm just scanning through the
Andres> patch, without reading the thread, to understand if I see
Andres> something "worthy" of controversy. While scanning I might have
Andres> a couple observations or questions.

>> + * A list of grouping sets which is structurally equivalent to a ROLLUP
>> + * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a single pass over
>> + * ordered data. We do this by keeping a separate set of transition values
>> + * for each grouping set being concurrently processed; for each input tuple
>> + * we update them all, and on group boundaries we reset some initial subset
>> + * of the states (the list of grouping sets is ordered from most specific to
>> + * least specific). One AGG_SORTED node thus handles any number of grouping
>> + * sets as long as they share a sort order.

Andres> Found "initial subset" not very clear, even if I probably
Andres> guessed the right meaning.

How about:

* [...], and on group boundaries we reset those states
* (starting at the front of the list) whose grouping values have
* changed (the list of grouping sets is ordered from most specific to
* least specific). One AGG_SORTED node thus handles any number [...]

>> + * To handle multiple grouping sets that _don't_ share a sort order, we use
>> + * a different strategy. An AGG_CHAINED node receives rows in sorted order
>> + * and returns them unchanged, but computes transition values for its own
>> + * list of grouping sets. At group boundaries, rather than returning the
>> + * aggregated row (which is incompatible with the input rows), it writes it
>> + * to a side-channel in the form of a tuplestore. Thus, a number of
>> + * AGG_CHAINED nodes are associated with a single AGG_SORTED node (the
>> + * "chain head"), which creates the side channel and, when it has returned
>> + * all of its own data, returns the tuples from the tuplestore to its own
>> + * caller.

Andres> This paragraph deserves to be expanded imo.

OK, but what in particular needs clarifying?

>> + * In order to avoid excess memory consumption from a chain of alternating
>> + * Sort and AGG_CHAINED nodes, we reset each child Sort node preemptively,
>> + * allowing us to cap the memory usage for all the sorts in the chain at
>> + * twice the usage for a single node.

Andres> What does reseting 'preemtively' mean?

Plan nodes are normally not reset (in the sense of calling ExecReScan)
just because they finished, but rather it's done before a subsequent new
scan is done. Doing the rescan call after all the sorted output has
been read means we discard the data from each sort node as soon as it is
transferred to the next one.

There is a more specific comment in agg_retrieve_chained where this
actually happens.

>> + * From the perspective of aggregate transition and final functions, the
>> + * only issue regarding grouping sets is this: a single call site (flinfo)
>> + * of an aggregate function may be used for updating several different
>> + * transition values in turn. So the function must not cache in the flinfo
>> + * anything which logically belongs as part of the transition value (most
>> + * importantly, the memory context in which the transition value exists).
>> + * The support API functions (AggCheckCallContext, AggRegisterCallback) are
>> + * sensitive to the grouping set for which the aggregate function is
>> + * currently being called.

Andres> Hm. I've seen a bunch of aggreates do this.

Such as? This was discussed about a year ago in the context of WITHIN
GROUP:

http://www.postgresql.org/message-id/87r424i24w.fsf@news-spur.riddles.org.uk

>> + * TODO: AGG_HASHED doesn't support multiple grouping sets yet.

Andres> Are you intending to resolve this before an eventual commit?

Original plan was to tackle AGG_HASHED after a working implementation
was committed; we figured that we'd have two commitfests to get the
basics right, and then have a chance to get AGG_HASHED done for the
third one. Also, there was talk of other people working on hashagg
memory usage issues, and we didn't want to conflict with that.

Naturally the extended delays rather put paid to that plan. Going ahead
and writing code for AGG_HASHED anyway wasn't really an option, since
with the overall structural questions unresolved there was too much
chance of it being wasted effort.

Andres> Possibly after the 'structural' issues are resolved? Or do you
Andres> think this can safely be put of for another release?

I think the feature is useful even without AGG_HASHED, even though that
means it can sometimes be beaten on performance by using UNION ALL of
many separate GROUP BYs; but I'd defer to the opinions of others on that
point.

Andres> Maybe it's just me, but I get twitchy if I see a default being
Andres> used like this. I'd much, much rather see the two remaining
Andres> AGG_* types and get a warning from the compiler if a new one is
Andres> added.

Meh. It also needs a bogus initialization of "result" to avoid compiler
warnings if done that way.

Andres> I'll bet this will look absolutely horrid after a pgindent run :/

pgindent doesn't touch it, I just checked.

[making CUBE and ROLLUP work without being reserved]

Andres> This is somewhat icky. I've not really thought abuot this very
Andres> much, but it's imo something to pay attention to.

This one was discussed in December or so - all the arguments were
thrashed out then.

The bottom line is that reserving "cube" is really painful due to
contrib/cube, and of the possible workarounds, using precedence rules to
resolve it in the grammar is already being used for some other
constructs.

Andres> So, having quickly scanned through the patch, do I understand
Andres> correctly that the contentious problems are:

Andres> * Arguably this converts the execution *tree* into a DAG. Tom
Andres> seems to be rather uncomfortable with that. I am wondering
Andres> whether this really is a big deal - essentially this only
Andres> happens in a relatively 'isolated' part of the tree right?
Andres> I.e. if those chained together nodes were considered one node,
Andres> there would not be any loops? Additionally, the way
Andres> parametrized scans works already essentially "violates" the
Andres> tree paradigma somewhat.

The major downsides as I see them with the current approach are:

1. It makes plans (and hence explain output) nest very deeply if you
have complex grouping sets (especially cubes with high dimensionality).

This can make explain very slow in the most extreme cases (explain seems
to perform about O(N^3) in the nesting depth of the plan, I don't know
why). But it's important not to exaggerate this effect: if anyone
actually has a real-world example of a 12-d cube I'll eat the headgear
of their choice, and for an 8-d cube the explain overhead is only on the
order of ~40ms. (A 12-d cube generates more than 35 megs of explain
output, nested about 1850 levels deep, and takes about 45 seconds to
explain, though only about 200ms to plan.)

In more realistic cases, the nesting isn't too bad (4-d cube gives a
12-deep plan: 6 sorts and 6 agg nodes), but it's still somewhat less
readable than a union-based plan would be. But honestly I think that
explain output aesthetics should not be a major determining factor for
implementations.

2. A union-based approach would have a chance of including AGG_HASHED
support without any significant code changes, just by using one HashAgg
node per qualifying grouping set. However, this would be potentially
significantly slower than teaching HashAgg to do multiple grouping sets,
and memory usage would be an issue.

(The current approach is specifically intended to allow the use of an
AGG_HASHED node as the head of the chain, once it has been extended to
support multiple grouping sets.)

Andres> There still might be better representations, about which I want
Andres> to think, don't get me wrong. I'm "just" not seing this as a
Andres> fundamental problem.

The obvious alternative is this:

-> CTE x
-> entire input subplan here
-> Append
-> GroupAggregate
-> Sort
-> CTE Scan x
-> GroupAggregate
-> Sort
-> CTE Scan x
-> HashAggregate
-> CTE Scan x
[...]

which was basically what we expected to do originally. But all of the
existing code to deal with CTE / CTEScan is based on the assumption that
each CTE has a rangetable entry before planning starts, and it is
completely non-obvious how to arrange to generate such CTEs on the fly
while planning. Tom offered in December to look into that aspect for
us, and of course we've heard nothing about it since.

Andres> * The whole grammar/keyword issue. To me this seems to be a
Andres> problem of swallowing one out of several similarly coloured
Andres> poisonous pills.

Right. Which is why this issue was thrashed out months ago and the
current approach decided on. I consider this question closed.

Andres> Are those the two bigger controversial areas? Or are there
Andres> others in your respective views?

Another controversial item was the introduction of GroupedVar. The need
for this can be avoided by explicitly setting to NULL the relevant
columns of the representative group tuple when evaluating result rows,
but (a) I don't think that's an especially clean approach (though I'm
not pushing back very hard on it) and (b) the logic needed in its
absence is different between the current chaining implementation and a
possible union implementation, so I decided against making any changes
on wasted-effort grounds.

--
Andrew (irc:RhodiumToad)


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-01 01:17:52
Message-ID: 20150501011752.GA3416161@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 05:35:26AM +0100, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

> >> + * TODO: AGG_HASHED doesn't support multiple grouping sets yet.
>
> Andres> Are you intending to resolve this before an eventual commit?
>
> Original plan was to tackle AGG_HASHED after a working implementation
> was committed;

+1 for that plan.

> Andres> Possibly after the 'structural' issues are resolved? Or do you
> Andres> think this can safely be put of for another release?
>
> I think the feature is useful even without AGG_HASHED, even though that
> means it can sometimes be beaten on performance by using UNION ALL of
> many separate GROUP BYs; but I'd defer to the opinions of others on that
> point.

It will be a tough call, and PostgreSQL has gone each way on some recent
feature. I recommend considering both GroupAggregate and HashAggregate in all
design discussion but continuing to work toward a first commit implementing
GroupAggregate alone. With that in the tree, we'll be in a better position to
decide whether to release a feature paused at that stage in its development.
Critical facts are uncertain, so a discussion today would be unproductive.

> Andres> So, having quickly scanned through the patch, do I understand
> Andres> correctly that the contentious problems are:
>
> Andres> * Arguably this converts the execution *tree* into a DAG. Tom
> Andres> seems to be rather uncomfortable with that. I am wondering
> Andres> whether this really is a big deal - essentially this only
> Andres> happens in a relatively 'isolated' part of the tree right?
> Andres> I.e. if those chained together nodes were considered one node,
> Andres> there would not be any loops? Additionally, the way
> Andres> parametrized scans works already essentially "violates" the
> Andres> tree paradigma somewhat.

I agree with your assessment. That has been contentious.

> The major downsides as I see them with the current approach are:
>
> 1. It makes plans (and hence explain output) nest very deeply if you
> have complex grouping sets (especially cubes with high dimensionality).
>
> This can make explain very slow in the most extreme cases

I'm not worried about that. If anything, the response is to modify explain to
more-quickly/compactly present affected plan trees.

> 2. A union-based approach would have a chance of including AGG_HASHED
> support without any significant code changes,

> -> CTE x
> -> entire input subplan here
> -> Append
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> HashAggregate
> -> CTE Scan x
> [...]

This uses 50-67% more I/O than the current strategy, which makes it a dead end
from my standpoint. Details:
http://www.postgresql.org/message-id/20141221210005.GA1864976@tornado.leadboat.com

> Andres> Are those the two bigger controversial areas? Or are there
> Andres> others in your respective views?

> Another controversial item was the introduction of GroupedVar.

I know of no additional controversies to add to this list.

Thanks,
nm


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-12 03:36:19
Message-ID: 20150512033619.GD32151@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-04-30 05:35:26 +0100, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:
>
> Andres> This is not a real review. I'm just scanning through the
> Andres> patch, without reading the thread, to understand if I see
> Andres> something "worthy" of controversy. While scanning I might have
> Andres> a couple observations or questions.
>
> >> + * A list of grouping sets which is structurally equivalent to a ROLLUP
> >> + * clause (e.g. (a,b,c), (a,b), (a)) can be processed in a single pass over
> >> + * ordered data. We do this by keeping a separate set of transition values
> >> + * for each grouping set being concurrently processed; for each input tuple
> >> + * we update them all, and on group boundaries we reset some initial subset
> >> + * of the states (the list of grouping sets is ordered from most specific to
> >> + * least specific). One AGG_SORTED node thus handles any number of grouping
> >> + * sets as long as they share a sort order.
>
> Andres> Found "initial subset" not very clear, even if I probably
> Andres> guessed the right meaning.
>
> How about:
>
> * [...], and on group boundaries we reset those states
> * (starting at the front of the list) whose grouping values have
> * changed (the list of grouping sets is ordered from most specific to
> * least specific). One AGG_SORTED node thus handles any number [...]

sounds good.

> >> + * To handle multiple grouping sets that _don't_ share a sort order, we use
> >> + * a different strategy. An AGG_CHAINED node receives rows in sorted order
> >> + * and returns them unchanged, but computes transition values for its own
> >> + * list of grouping sets. At group boundaries, rather than returning the
> >> + * aggregated row (which is incompatible with the input rows), it writes it
> >> + * to a side-channel in the form of a tuplestore. Thus, a number of
> >> + * AGG_CHAINED nodes are associated with a single AGG_SORTED node (the
> >> + * "chain head"), which creates the side channel and, when it has returned
> >> + * all of its own data, returns the tuples from the tuplestore to its own
> >> + * caller.
>
> Andres> This paragraph deserves to be expanded imo.
>
> OK, but what in particular needs clarifying?

I'm not sure ;). I obviously was a bit tired...

> Andres> Are you intending to resolve this before an eventual commit?
...
> Andres> Possibly after the 'structural' issues are resolved? Or do you
> Andres> think this can safely be put of for another release?
>
> I think the feature is useful even without AGG_HASHED, even though that
> means it can sometimes be beaten on performance by using UNION ALL of
> many separate GROUP BYs; but I'd defer to the opinions of others on that
> point.

I agree.

> Andres> * Arguably this converts the execution *tree* into a DAG. Tom
> Andres> seems to be rather uncomfortable with that. I am wondering
> Andres> whether this really is a big deal - essentially this only
> Andres> happens in a relatively 'isolated' part of the tree right?
> Andres> I.e. if those chained together nodes were considered one node,
> Andres> there would not be any loops? Additionally, the way
> Andres> parametrized scans works already essentially "violates" the
> Andres> tree paradigma somewhat.
>
> The major downsides as I see them with the current approach are:
>
> 1. It makes plans (and hence explain output) nest very deeply if you
> have complex grouping sets (especially cubes with high dimensionality).

That doesn't concern me overly much. If we feel the need to fudge the
explain output we certainly can.

> 2. A union-based approach would have a chance of including AGG_HASHED
> support without any significant code changes, just by using one HashAgg
> node per qualifying grouping set. However, this would be potentially
> significantly slower than teaching HashAgg to do multiple grouping sets,
> and memory usage would be an issue.

Your "however" imo pretty much disqualifies that as an argument.

> The obvious alternative is this:
>
> -> CTE x
> -> entire input subplan here
> -> Append
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> GroupAggregate
> -> Sort
> -> CTE Scan x
> -> HashAggregate
> -> CTE Scan x
> [...]
>
> which was basically what we expected to do originally. But all of the
> existing code to deal with CTE / CTEScan is based on the assumption that
> each CTE has a rangetable entry before planning starts, and it is
> completely non-obvious how to arrange to generate such CTEs on the fly
> while planning. Tom offered in December to look into that aspect for
> us, and of course we've heard nothing about it since.

I find Noah's argument about this kind of structure pretty
convincing. We'd either increase the number of reads, or baloon the
amount of memory if we'd manage to find a structure that'd allow several
of the aggregates to be computed at the same time.

Looking at this some more, I do think the current structure makes sense.
I do think we could flatten this into the toplevel aggregation node, but
the increase in complexity doesn't seem to have corresponding benefits
to me.

> Andres> Are those the two bigger controversial areas? Or are there
> Andres> others in your respective views?
>
> Another controversial item was the introduction of GroupedVar. The need
> for this can be avoided by explicitly setting to NULL the relevant
> columns of the representative group tuple when evaluating result rows,
> but (a) I don't think that's an especially clean approach (though I'm
> not pushing back very hard on it) and (b) the logic needed in its
> absence is different between the current chaining implementation and a
> possible union implementation, so I decided against making any changes
> on wasted-effort grounds.

Seems like fairly minor point to me. I very tentatively lean towards
setting the columns in the group tuple to NULL.

I've rebased the patch to
http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summary
branch int/grouping_sets . There were some minor conflicts.

What I dislike so far:
* Minor formatting things. Just going to fix and push the ones I
dislike.
* The Hopcroft-Karp stuff not being separate
* The increased complexity of grouping_planner. It'd imo be good if some
of that could be refactored into a separate function. Specifically the
else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
block.
* I think it'd not hurt to add rule deparse check for the function in
GROUPING SETS case. I didn't see one at least.

Do you have some nicer demo data set you worked with during development?

FWIW, expensive explains seem to be in stack traces like:

#25 0x00000000008a0ebe in get_variable (var=0x2326d90, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#26 0x00000000008a2ff6 in get_rule_expr (node=0x2326d90, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#27 0x00000000008a0ebe in get_variable (var=0x2326338, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#28 0x00000000008a2ff6 in get_rule_expr (node=0x2326338, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#29 0x00000000008a0ebe in get_variable (var=0x23258b0, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#30 0x00000000008a2ff6 in get_rule_expr (node=0x23258b0, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#31 0x00000000008a0ebe in get_variable (var=0x2324e58, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#32 0x00000000008a2ff6 in get_rule_expr (node=0x2324e58, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#33 0x00000000008a0ebe in get_variable (var=0x2324400, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#34 0x00000000008a2ff6 in get_rule_expr (node=0x2324400, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#35 0x00000000008a0ebe in get_variable (var=0x2323978, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:5813
#36 0x00000000008a2ff6 in get_rule_expr (node=0x2323978, context=0x7ffcd4b386a0, showimplicit=1 '\001')
at /home/andres/src/postgresql/src/backend/utils/adt/ruleutils.c:6933
#37 0x00000000008a0ebe in get_variable (var=0x2322f20, levelsup=0, istoplevel=0 '\000', context=0x7ffcd4b386a0)

- several thousand frames deep. Something seems off here. It's all
below show_grouping_set_keys(), which in turn is below a couple
ExplainNode()s.

I think the problem is "just" that for each variable, in each grouping
set - a very large number in a large cube - we're recursing through the
whole ChainAggregate tree, as each Var just points to a var one level
lower.

It might be worthwhile to add a little hack that deparses the variables
agains the "lowest" relevant node (i.e. the one below the last chain
agg). Since they'll all have the same targetlist that ought to be safe.

I'll look some more, but dinner is calling now.

Greetings,

Andres Freund


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-12 03:43:36
Message-ID: 61362087-28F3-49E5-873C-0DBCE58FD0B5@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


>I think the problem is "just" that for each variable, in each grouping
>set - a very large number in a large cube - we're recursing through the
>whole ChainAggregate tree, as each Var just points to a var one level
>lower.

For small values of very large, that is. Had a little thinko there. Its still fault of recursing down all these levels, doing nontrivial work each time.

--
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-12 18:40:49
Message-ID: 20150512184049.GZ12950@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-12 05:36:19 +0200, Andres Freund wrote:
> What I dislike so far:
> * Minor formatting things. Just going to fix and push the ones I
> dislike.
> * The Hopcroft-Karp stuff not being separate
> * The increased complexity of grouping_planner. It'd imo be good if some
> of that could be refactored into a separate function. Specifically the
> else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
> block.
> * I think it'd not hurt to add rule deparse check for the function in
> GROUPING SETS case. I didn't see one at least.

* The code in nodeAgg.c isn't pretty in places. Stuff like if
(node->chain_depth > 0) estate->agg_chain_head = save_chain_head;...
Feels like a good bit of cleanup would be possible there.

> I think the problem is "just" that for each variable, in each grouping
> set - a very large number in a large cube - we're recursing through the
> whole ChainAggregate tree, as each Var just points to a var one level
> lower.
>
> It might be worthwhile to add a little hack that deparses the variables
> agains the "lowest" relevant node (i.e. the one below the last chain
> agg). Since they'll all have the same targetlist that ought to be safe.

I've prototype hacked this, and indeed, adding a shortcut from the
intermediate chain nodes to the 'leaf chain node' cuts the explain time
from 11 to 2 seconds on some arbitrary statement. The remaining time is
the equivalent problem in the sort nodes...

I'm not terribly bothered by this. We can relatively easily fix this up
if required.

Greetings,

Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-12 23:48:41
Message-ID: 20150512234841.GE12950@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-12 20:40:49 +0200, Andres Freund wrote:
> On 2015-05-12 05:36:19 +0200, Andres Freund wrote:
> > What I dislike so far:
> > * Minor formatting things. Just going to fix and push the ones I
> > dislike.
> > * The Hopcroft-Karp stuff not being separate
> > * The increased complexity of grouping_planner. It'd imo be good if some
> > of that could be refactored into a separate function. Specifically the
> > else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
> > block.
> > * I think it'd not hurt to add rule deparse check for the function in
> > GROUPING SETS case. I didn't see one at least.
>
> * The code in nodeAgg.c isn't pretty in places. Stuff like if
> (node->chain_depth > 0) estate->agg_chain_head = save_chain_head;...
> Feels like a good bit of cleanup would be possible there.

In the executor I'd further like:
* to split agg_retrieve_direct into a version for grouping sets and one
without. I think that'll be a pretty clear win for clarity.
* to spin out common code between agg_retrieve_direct (in both the
functions its split into), agg_retrieve_hashed and
agg_retrieve_chained. It should e.g. be fairly simple to spin out the
tail end processing of a input group (finalize_aggregate loop,
ExecQual) into a separate function.

Andrew, are you going to be working on any of these?

Greetings,

Andres Freund


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-13 07:04:41
Message-ID: 87y4ktc7za.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

Andres> Andrew, are you going to be working on any of these?

As discussed on IRC, current status is:

>>> * The increased complexity of grouping_planner. It'd imo be good if some
>>> of that could be refactored into a separate function. Specifically the
>>> else if (parse->hasAggs || (parse->groupingSets && parse->groupClause))
>>> block.

done and pushed at you

>>> * The Hopcroft-Karp stuff not being separate

done and pushed

Andres> * to split agg_retrieve_direct into a version for grouping sets
Andres> and one without. I think that'll be a pretty clear win for
Andres> clarity.

I don't see how this helps given that the grouping sets version will be
exactly as complex as the current code.

Andres> * to spin out common code between agg_retrieve_direct (in both
Andres> the functions its split into), agg_retrieve_hashed and
Andres> agg_retrieve_chained. It should e.g. be fairly simple to spin
Andres> out the tail end processing of a input group
Andres> (finalize_aggregate loop, ExecQual) into a separate function.

This isn't _quite_ as simple as it sounds but I'll have a go.

>> * The code in nodeAgg.c isn't pretty in places. Stuff like if
>> (node->chain_depth > 0) estate->agg_chain_head = save_chain_head;...
>> Feels like a good bit of cleanup would be possible there.

I'll look.

--
Andrew (irc:RhodiumToad)


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-13 20:51:15
Message-ID: 20150513205115.GE29246@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-12 05:36:19 +0200, Andres Freund wrote:
> > Another controversial item was the introduction of GroupedVar. The need
> > for this can be avoided by explicitly setting to NULL the relevant
> > columns of the representative group tuple when evaluating result rows,
> > but (a) I don't think that's an especially clean approach (though I'm
> > not pushing back very hard on it) and (b) the logic needed in its
> > absence is different between the current chaining implementation and a
> > possible union implementation, so I decided against making any changes
> > on wasted-effort grounds.
>
> Seems like fairly minor point to me. I very tentatively lean towards
> setting the columns in the group tuple to NULL.

I'm pretty sure by now that I dislike the introduction of GroupedVar,
and not just tentatively. While I can see why you found its
introduction to be nicer than fiddling with the result tuple, for me the
disadvantages seem to outweigh the advantage. For one it's rather wierd
to have Var nodes be changed into GroupedVar in setrefs.c. The number
of places that need to be touched even when it's a 'planned stmt only'
type of node is still pretty large.

Andrew: I'll work on changing this in a couple hours unless you're
speaking up about doing it yourself.

Greetings,

Andres Freund


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 05:50:31
Message-ID: 20150514055031.GE9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-13 22:51:15 +0200, Andres Freund wrote:
> I'm pretty sure by now that I dislike the introduction of GroupedVar,
> and not just tentatively. While I can see why you found its
> introduction to be nicer than fiddling with the result tuple, for me the
> disadvantages seem to outweigh the advantage. For one it's rather wierd
> to have Var nodes be changed into GroupedVar in setrefs.c. The number
> of places that need to be touched even when it's a 'planned stmt only'
> type of node is still pretty large.
>
> Andrew: I'll work on changing this in a couple hours unless you're
> speaking up about doing it yourself.

I did a stab at removing it, and it imo definitely ends up looking
better.

The code for the GroupedVar replacement isn't perfect yet, but I think
it'd be possible to clean that up until Friday. Unfortunately, after
prolonged staring out of the window, I came to the conclusion that I
don't think the current tree structure isn't right.

I still believe that the general approach of chaining vs. a union or CTE
is correct due to the efficiency arguments upthread. My problem is
that, unless I very much misunderstand something, the current
implementation can end up requiring roughly #sets * #input of additional
space for the "sidechannel tuplestore" in some bad cases. That happens
if you group by a couple clauses that each lead to a high number of
groups.

That happens because the aggregated rows produced in the chain nodes
can't be returned up-tree, because the the next chain (or final group
aggregate) node will expect unaggregated tuples. The current solution
for that is to move the aggregated rows produced in chain nodes into a
tuplestore that's then drained when the top level aggregate node has
done it's own job.

While that's probably not too bad in many cases because most of the use
cases aggregation will be relatively effective, it does seem to be
further evidence that the sidechannel tuplestore isn't the perfect idea.

What I think we should/need to do instead is to the chaining locally
inside one aggregation node. That way the aggregated tuples can be
returned directly, without the sidechannel. While that will require
inlining part of the code from nodeSort.c it doesn't seem too bad.

Besides the advantage of getting rid of that tuplestore, it'll also fix
the explain performance problems (as there's no deep tree to traverse
via ruleutils.c), get rid of the the preemtive ExecReScan() to control
memory usage. I think it might also make combined hashing/sorting
easier.

A rough sketch of what I'm thinking of is:
ExecAgg()
{
...
while (!aggstate->consumed_input)
{
outerslot = ExecProcNode(outerPlanState(aggstate));

if (TupIsNull(outerslot))
{
consumed_input = true;
break;
}

if (aggstate->doing_hashing)
{
entry = lookup_hash_entry(aggstate, outerslot);

/* Advance the aggregates */
advance_aggregates(aggstate, entry->pergroup);
}

if (aggstate->presorted_input || AGG_PLAIN)
{
/* handle aggregation, return if done with group */
}

if (aggstate->doing_chaining)
{
tuplesort_puttupleslot(tuplesortstate, slot);
}
}

if (aggstate->doing_hashing && !done)
agg_retrieve_hashed();

/*
* Feed data from one sort to the next, to compute grouping sets that
* need differing sort orders.
*/
last_sort = tuplesortstate[0];
current_sort = numGroupingSets > 0 ? tuplesortstate[1] : NULL;

while (aggstate->doing_chaining && !done_sorting)
{
tuplesort_gettupleslot(last_sort, tmpslot);

/* exhausted all tuple from a particular sort order, move on */
if (TupIsNull(tmpslot))
{
finalize_aggregates(...);
tuplesort_end(last_sort); /* maybe save stats somewhere? */
last_sort = current_sort;
current_sort = tuplesortstate[...];
if (all_sorts_done)
done_sorting = true;

return aggregated;
}

if (current_sort != NULL)
tuplesort_puttupleslot(current_sort, slot);

/* check if we crossed a boundary */
if (!execTuplesMatch(...))
{
finalize_aggregates(...);
aggstate->grp_firstTuple = ...
return aggregated;
}

advance_aggregates();
tuplesort_puttupleslot(current_sort, slot);
}
}

I think this is quite doable and seems likely to actually end up with
easier to understand code. But unfortunately it seems to be big enough
of a change to make it unlikely to be done in sufficient quality until
the freeze. I'll nonetheless work a couple hours on it tomorrow.

Andrew, is that a structure you could live with, or not?

Others, what do you think?

Greetings,

Andres Freund


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 06:32:04
Message-ID: 20150514063204.GA3707771@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 14, 2015 at 07:50:31AM +0200, Andres Freund wrote:
> I still believe that the general approach of chaining vs. a union or CTE
> is correct due to the efficiency arguments upthread. My problem is
> that, unless I very much misunderstand something, the current
> implementation can end up requiring roughly #sets * #input of additional
> space for the "sidechannel tuplestore" in some bad cases. That happens
> if you group by a couple clauses that each lead to a high number of
> groups.

Correct.

> Andrew, is that a structure you could live with, or not?
>
> Others, what do you think?

Andrew and I discussed that very structure upthread:

http://www.postgresql.org/message-id/20141231085845.GA2148306@tornado.leadboat.com
http://www.postgresql.org/message-id/87d26zd9k8.fsf@news-spur.riddles.org.uk
http://www.postgresql.org/message-id/20141231210553.GB2159277@tornado.leadboat.com

I still believe the words I wrote in my two messages cited.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 06:38:07
Message-ID: 20150514063807.GF9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-14 02:32:04 -0400, Noah Misch wrote:
> On Thu, May 14, 2015 at 07:50:31AM +0200, Andres Freund wrote:
> > Andrew, is that a structure you could live with, or not?
> >
> > Others, what do you think?
>
> Andrew and I discussed that very structure upthread:

> http://www.postgresql.org/message-id/87d26zd9k8.fsf@news-spur.riddles.org.uk

I don't really believe that that'd necesarily be true. I think if done
like I sketched it'll likely end up being simpler than the currently
proposed code. I also don't see why this would make combining hashing
and sorting any more complex than now. If anything the contrary.

> http://www.postgresql.org/message-id/20141231085845.GA2148306@tornado.leadboat.com
> http://www.postgresql.org/message-id/20141231210553.GB2159277@tornado.leadboat.com
>
> I still believe the words I wrote in my two messages cited.

I.e. that you think it's a sane approach, despite the criticism?

Greetings,

Andres Freund


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 06:51:42
Message-ID: 20150514065142.GB3707771@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 14, 2015 at 08:38:07AM +0200, Andres Freund wrote:
> On 2015-05-14 02:32:04 -0400, Noah Misch wrote:
> > On Thu, May 14, 2015 at 07:50:31AM +0200, Andres Freund wrote:
> > > Andrew, is that a structure you could live with, or not?
> > >
> > > Others, what do you think?
> >
> > Andrew and I discussed that very structure upthread:
>
> > http://www.postgresql.org/message-id/87d26zd9k8.fsf@news-spur.riddles.org.uk
>
> I don't really believe that that'd necesarily be true. I think if done
> like I sketched it'll likely end up being simpler than the currently
> proposed code. I also don't see why this would make combining hashing
> and sorting any more complex than now. If anything the contrary.
>
> > http://www.postgresql.org/message-id/20141231085845.GA2148306@tornado.leadboat.com
> > http://www.postgresql.org/message-id/20141231210553.GB2159277@tornado.leadboat.com
> >
> > I still believe the words I wrote in my two messages cited.
>
> I.e. that you think it's a sane approach, despite the criticism?

Yes. I won't warrant that it proves better, but it looks promising. Covering
hash aggregation might entail a large preparatory refactoring of nodeHash.c,
but beyond development cost I can't malign that.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 06:59:45
Message-ID: 20150514065945.GG9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-14 02:51:42 -0400, Noah Misch wrote:
> Covering hash aggregation might entail a large preparatory refactoring
> of nodeHash.c, but beyond development cost I can't malign that.

You mean execGrouping.c? Afaics nodeHash.c isn't involved, and it
doesn't look very interesting to make it so?

Isn't that just calling BuildTupleHashTable() for each
to-be-hash-aggregated set, and then make agg_fill_hash_table() target
multiple hashtables? This mostly seems to be adding a couple loops and
parameters.

Greetings,

Andres Freund


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 08:16:10
Message-ID: 87wq0bbpht.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>>>> "Andres" == Andres Freund <andres(at)anarazel(dot)de> writes:

Andres> My problem is that, unless I very much misunderstand something,
Andres> the current implementation can end up requiring roughly #sets *
Andres> #input of additional space for the "sidechannel tuplestore" in
Andres> some bad cases. That happens if you group by a couple clauses
Andres> that each lead to a high number of groups.

The actual upper bound for the tuplestore size is the size of the
_result_ of the grouping, less one or two rows. You get that in cases
like grouping sets (unique_col, rollup(constant_col)), which seems
sufficiently pathological not to be worth worrying about greatly.

In normal cases, the size of the tuplestore is the size of the result
minus the rows processed directly by the top node. So the only way the
size can be an issue is if the result set size itself is also an issue,
and in that case I don't really think that this is going to be a matter
of significant concern.

Andres> A rough sketch of what I'm thinking of is:

I'm not sure I'd do it quite like that. Rather, have a wrapper function
get_outer_tuple that calls ExecProcNode and, if appropriate, writes the
tuple to a tuplesort before returning it; use that in place of
ExecProcNode in agg_retrieve_direct and when building the hash table.

The problem with trying to turn agg_retrieve_direct inside-out (to make
it look more like agg_retrieve_chained) is that it potentially projects
multiple output groups (not just multiple-result projections) from a
single input tuple, so it has to have some control over whether a tuple
is read or not. (agg_retrieve_chained avoids this problem because it can
loop over the projections, since it's writing to the tuplestore rather
than returning to the caller.)

Andres> I think this is quite doable and seems likely to actually end
Andres> up with easier to understand code. But unfortunately it seems
Andres> to be big enough of a change to make it unlikely to be done in
Andres> sufficient quality until the freeze. I'll nonetheless work a
Andres> couple hours on it tomorrow.

Andres> Andrew, is that a structure you could live with, or not?

Well, I still think the opaque-blobless isn't nice, but I retract some
of my previous concerns; I can see a way to do it that doesn't
significantly impinge on the difficulty of adding hash support.

It sounds like I have more time immediately available than you do. As
discussed on IRC, I'll take the first shot, and we'll see how far I can
get.

--
Andrew (irc:RhodiumToad)


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-14 17:18:11
Message-ID: 20150514171811.GI9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-14 09:16:10 +0100, Andrew Gierth wrote:
> Andres> A rough sketch of what I'm thinking of is:
>
> I'm not sure I'd do it quite like that.

It was meant as a sketch, so there's lots of things it's probably
missing ;)

> Rather, have a wrapper function get_outer_tuple that calls
> ExecProcNode and, if appropriate, writes the tuple to a tuplesort
> before returning it; use that in place of ExecProcNode in
> agg_retrieve_direct and when building the hash table.

Hm. I'd considered that, but thought it might end up being more complex
for hashing support. I'm not exactly sure why I thought that tho.


From: Noah Misch <noah(at)leadboat(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-15 00:25:53
Message-ID: 20150515002553.GA3730890@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, May 14, 2015 at 08:59:45AM +0200, Andres Freund wrote:
> On 2015-05-14 02:51:42 -0400, Noah Misch wrote:
> > Covering hash aggregation might entail a large preparatory refactoring
> > of nodeHash.c, but beyond development cost I can't malign that.
>
> You mean execGrouping.c? Afaics nodeHash.c isn't involved, and it
> doesn't look very interesting to make it so?

That particular comment of mine was comprehensively wrong.


From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-15 22:06:12
Message-ID: 20150515220612.GP9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-14 09:16:10 +0100, Andrew Gierth wrote:
> It sounds like I have more time immediately available than you do. As
> discussed on IRC, I'll take the first shot, and we'll see how far I can
> get.

Andrew (and I) have been working on this since. Here's the updated and
rebased patch.

It misses a decent commit message and another beautification
readthrough. I've spent the last hour going through the thing again and
all I hit was a disturbing number of newline "errors" and two minor
comment additions.

Greetings,

Andres Freund

Attachment Content-Type Size
0001-Support-GROUPING-SETS.patch text/x-patch 271.8 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>, Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, Marti Raudsepp <marti(at)juffo(dot)org>, Noah Misch <noah(at)leadboat(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Svenne Krap <svenne(at)krap(dot)dk>
Subject: Re: Final Patch for GROUPING SETS
Date: 2015-05-16 01:53:39
Message-ID: 20150516015339.GR9584@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-16 00:06:12 +0200, Andres Freund wrote:
> Andrew (and I) have been working on this since. Here's the updated and
> rebased patch.
>
> It misses a decent commit message and another beautification
> readthrough. I've spent the last hour going through the thing again and
> all I hit was a disturbing number of newline "errors" and two minor
> comment additions.

And committed. Thanks Andrew, everyone.

Despite some unhappiness all around I do think the patch has improved
due to the discussions in this thread.