Re: Aggregate ORDER BY patch

From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Aggregate ORDER BY patch
Date: 2009-11-16 01:40:33
Message-ID: e08cc0400911151740w440a1a70w9a53e2e8c265ef3f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2009/11/16 Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>:
>>>>>> "Hitoshi" == Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com> writes:
>
>  Hitoshi> Questions here:
>  Hitoshi> - agglevelsup?
> What case exactly would you consider an error? When an order by
> expression references a lower (more deeply nested) query level than
> any of the actual arguments?

It's only that I felt not intuitive. To me, arguments are regarded as
aggregate's "member" while ORDER BY clause expressions didn't hit me.
If it's only me, no problem. Maybe additional document in
#syntax-aggregates will do.

>  Hitoshi> - order by 1?
>
>  Hitoshi> Normal ORDER BY clause accepts constant integer as
>  Hitoshi> TargetEntry's resno. The patch seems not to support it.
>  Hitoshi> Shouldn't it be the same as normal ORDER BY?
>
> Specifically documented. The SQL spec doesn't allow ordinal positions
> in ORDER BY any more (those are a holdover from SQL92) and we don't
> support them in, for example, window ORDER BY clauses.

Clear.

>  Hitoshi> Coding, almost all sane. Since its syntax and semantics are
>  Hitoshi> similar to existing DISTINCT and ORDER BY features, parsing
>  Hitoshi> and transformation code are derived from those area. The
>  Hitoshi> executor has few issues:
>
>  Hitoshi> - #include in nodeAgg.c
>  Hitoshi> executor/tuptable.h is added in the patch but required really?
>  Hitoshi> I removed that line but still build without any warnings.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.
>
> The code is making explicit use of various Slot calls declared in
> tuptable.h. The only reason why it builds without error when you
> remove that is that utils/tuplesort.h happens to include tuptable.h
> indirectly.

My C skill is not so good to determine if that #include is needed.
Simply we'd not needed it, it seems to me that it isn't needed still.

>  Hitoshi> - process_ordered_aggregate_(single|multi)
>  Hitoshi> It seems that the patch left process_sorted_aggregate()
>  Hitoshi> function as process_ordered_aggregate_single() and added
>  Hitoshi> process_ordered_aggregate_multi() for more than one input
>  Hitoshi> arguments (actually target entries) case. Why have those
>  Hitoshi> two? Could we combine them? Or I couldn't find convincing
>  Hitoshi> reason in comments.
>
> Performance.
>
> tuplesort_getdatum etc. seems to be substantially faster than
> tuplesort_gettupleslot especially for the case where you're sorting a
> pass-by-value datum such as an integer (since the datum is then stored
> only in the sort tuple header and doesn't require a separate space
> allocation for itself). Using a slot in all cases would have slowed
> down some common cases like count(distinct id) by a measurable amount.
>
> Cases like array_agg(x order by x) benefit from the faster code path
> too.
>
> The memory management between the two cases is sufficiently different
> that combining them into one function while still maintaining the
> slot vs. datum distinction would be ugly and probably error-prone.
> The relatively minor duplication of logic seemed much clearer to me.

I see the reason. But if we allow this, shouldn't we apply the same
logic to other sort depend parts? Or maybe refactor tuplesort in the
future?

>
>  Hitoshi>  - SortGroupClause.implicit
>  Hitoshi> "implicit" member was added in SortGroupClause. I didn't
>  Hitoshi> find clear reason to add this. Could you show a case to
>  Hitoshi> clarify this?
>
> Without that flag or something like it, when you do
>
> create view foo as select count(distinct x) from table;
>
> and then display the view definition, you would get back the query as
> "select count(distinct x order by x) from table" which would be
> confusing and unnecessarily backwards- and forwards-incompatible.
>
> So the code sets "implicit" for any SortGroupClause that is added for
> some internal reason rather than being present in the original query,
> and the deparse code in ruleutils skips such clauses.

I don't have much experiences in VIEW systems, but isn't it enough to
let "order by x" omitted? "select count(distinct x order by x) from
table" means the same as "select count(distinct x) from table"
currently. ruleutils can handle it if distinct expressions are equal
to order by expressions.

Regards,

--
Hitoshi Harada

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-11-16 01:40:50 Re: named parameters in SQL functions
Previous Message James Pye 2009-11-16 01:39:33 Re: Python 3.1 support