Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

From: Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Date: 2013-07-18 05:02:13
Message-ID: CAOeZVienct7pONTeLrA9d6twcj5CDF3KAtRn5myGwre-Jr-1uA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 18, 2013 at 10:02 AM, David Fetter <david(at)fetter(dot)org> wrote:
> On Thu, Jul 18, 2013 at 03:15:14AM +0000, Andrew Gierth wrote:
>> The spec defines two types of aggregate function classed as "ordered set
>> function", as follows:
>>
>> 1. An "inverse distribution function" taking one argument (which must be
>> a grouped column or otherwise constant within groups) plus a sorted
>> group with exactly one column:
>>
>> =# SELECT (func(p) WITHIN GROUP (ORDER BY q)) from ...
>>
>> The motivating example for this (and the only ones in the spec) are
>> percentile_cont and percentile_disc, to return a percentile result
>> from a continuous or discrete distribution. (Thus
>> percentile_cont(0.5) within group (order by x) is the spec's version
>> of a median(x) function.)
>>
>> 2. A "hypothetical set function" taking N arguments of arbitrary types
>> (a la VARIADIC "any", rather than a fixed list) plus a sorted group
>> with N columns of matching types:
>>
>> =# SELECT (func(p1,p2,...) WITHIN GROUP (ORDER BY q1,q2,...)) from ...
>>
>> (where typeof(p1)==typeof(q1) and so on, at least up to trivial
>> conversions)
>>
>> The motivating example here is to be able to do rank(p1,p2,...) to
>> return the rank that the specified values would have had if they were
>> added to the group.
>>
>> As usual, we do not want to constrain ourselves to supporting only the
>> specific cases in the spec, but would prefer a general solution.
>>
>> We (meaning myself and Atri) have an implementation that basically
>> works, though it is not yet complete, but before taking it any further
>> we need to resolve the design question of how to represent these two
>> types of function in the system catalogs. The fact that there are in
>> effect two parts to the parameter list, which are either independent
>> (for inverse distribution funcs) or closely related (for hypothetical
>> set functions), doesn't seem to point to an obvious way to represent
>> this in pg_proc/pg_aggregate.
>>
>> I'm not yet satisfied with the method used in our implementation,
>
> What is that method?

We currently represent ordered set functions with a new bool flag in
pg_aggregate. The flag is set to true for ordered set
functions(obviously) and false for all others. The currently
implemented functions i.e. percentile_disc, percentile_cont and
percentile_cont for intervals have their finalfns present in
pg_aggregate.

The aggregate functions take in two arguments, one for the percentile
value and other for the input row set. So, percentile_cont's entry in
pg_proc has float8 and float8 as its parameters and another entry of
percentile_cont (with the interval version as the finalfn) has float8
and interval as its parameter types.

As you can see, there isn't a way right now to resolve the return type
of the aggregate for polymorphic cases. This is something we wish to
resolve.

Regards,

Atri

--
Regards,

Atri
l'apprenant

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2013-07-18 05:04:14 Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Previous Message Amit Kapila 2013-07-18 04:53:51 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])