Re: Aggregate ORDER BY patch

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Aggregate ORDER BY patch
Date: 2009-11-15 11:42:53
Message-ID: e08cc0400911150342v5ca55ddm843a2c8232d8abf5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here's my first review.

The patch was in context diff format and applied cleanly with a little
3 hunks in parse_expr.c. make succeeded without any warnings and make
check passed all 121 tests.

It implements as advertised, following SQL spec with extension of both
DISTINCT clause and ORDER BY clause are available in any aggregate
functions including user defined ones. It supports VIEWs by adding
code in ruleutils.c.

Questions here:
- agglevelsup?
We have aggregate capability that all arguments from upper level query
in downer level aggregate makes aggregate call itself to upper level
call, as a constant value in downer level. What if ORDER BY clause has
downer level Vars? For example:

regression=# select (select count(t1.four order by unique1) from tenk1
limit 1) from tenk1 t1 where unique1 < 10;
?column?
----------
10000
10000
10000
10000
10000
10000
10000
10000
10000
10000
(10 rows)

regression=# select (select count(t1.four order by t1.unique1) from
tenk1 limit 1) from tenk1 t1 where unique1 < 10;
?column?
----------
10
(1 row)

Is it sane? The result is consistent but surprised me a little. No
need to raise an error?

- order by 1?
Normal ORDER BY clause accepts constant integer as TargetEntry's
resno. The patch seems not to support it.

regression=# select array_agg(four order by 1) from tenk1 where unique1 < 10;
array_agg
-----------------------
{0,2,1,2,1,0,1,3,3,0}
(1 row)

Shouldn't it be the same as normal ORDER BY?

Performance doesn't seem slowing down, though I don't have
quantitative test result.

Coding, almost all sane. Since its syntax and semantics are similar to
existing DISTINCT and ORDER BY features, parsing and transformation
code are derived from those area. The executor has few issues:

- #include in nodeAgg.c
executor/tuptable.h is added in the patch but required really?
I removed that line but still build without any warnings.

- process_ordered_aggregate_(single|multi)
It seems that the patch left process_sorted_aggregate() function as
process_ordered_aggregate_single() and added
process_ordered_aggregate_multi() for more than one input arguments
(actually target entries) case. Why have those two? Could we combine
them? Or I couldn't find convincing reason in comments.

And ParseFuncOrColumn() in parse_func.c now gets more complicated.
Since feature / semantics are similar, I bet we may share some code to
transform DISTINCT and ORDER BY with traditional code in
parse_clause.c, though I'm not sure nor don't have clear idea.
Especially, code around here

save_next_resno = pstate->p_next_resno;
pstate->p_next_resno = attno + 1;

cheats pstate to transform clauses and I felt a bit fear.

- SortGroupClause.implicit
"implicit" member was added in SortGroupClause. I didn't find clear
reason to add this. Could you show a case to clarify this?

That's it for now.

Regards,

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2009-11-15 12:32:31 Re: Hot standby, race condition between recovery snapshot and commit
Previous Message Simon Riggs 2009-11-15 10:25:04 Re: Hot standby, overflowed snapshots, testing