Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pavel(dot)stehule(at)gmail(dot)com (Pavel Stehule), Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support
Date: 2009-12-20 22:48:33
Message-ID: 87zl5de1vc.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Pavel" == Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:

> 2009/12/20 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>> I think that we've already expanded the capabilities of aggregates
>> a great deal for 8.5, and we should let it sit as-is for a release
>> or two and see what the real user demand is for additional
>> features.
>>
>> I'm particularly concerned by the fact that the feature set is
>> already far out in front of what the planner can optimize
>> effectively (e.g., there's no ability to combine the work when
>> multiple aggregates need the same sorted data).  The more features
>> we add on speculation, the harder it's going to be to close that
>> gap.

I absolutely agree with Tom here and for some quite specific reasons.

An optimal (or at least more optimal than is currently possible)
implementation of median() on top of the ordered-agg code as it stands
requires additions to the aggregate function interface: the median agg
implementation would have to, as a minimum, know how many rows of
sorted input are available. In addition, it would be desirable for it
to have direct (and possibly bidirectional) access to the tuplesort.

Now, if we look at how ordered aggs ought to be optimized, it's clear
that the planner should take the ordering costs into account and
consider plans that order the input instead. Once you do this, then
there's no longer any pre-computed count or tuplesort object
available, so if you'd implemented a better median() before the
optimizations, you'd end up either having to forgo the optimization or
break the median() agg again; clearly not something we want.

Plus, a feature that I intentionally omitted from the ordered-aggs
patch is the ability to do ordered-aggs as window functions. This is
in the spec, but (a) there were conflicting patches for window
functions on the table and (b) in my opinion, much of the work needed
to implement ordered-agg-as-window-func in an effective manner is
dependent on doing more work on optimization first (or at least will
potentially become easier as a result of that work).

So I think both the optimization issue and the window functions issue
would be best addressed before trying to build any additional features
on top of what we have so far.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2009-12-20 23:02:17 Re: creating index names automatically?
Previous Message Tom Lane 2009-12-20 22:43:40 Re: fdw validation function vs zero catalog id