Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>
Subject: Re: Proposal/design feedback needed: WITHIN GROUP (sql standard ordered set aggregate functions)
Date: 2013-07-23 01:21:52
Message-ID: 6b7ac5e465c972afa1392892fd434f4f@news-out.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Ok, since Atri posted our work-so-far and there's not been much
comment, I'll outline here my proposed plan of attack.

Rather than, as in the WIP patch, using the agg finalfn to validate
the split between normal args and ORDER BY args, I propose this:

Firstly, as in the WIP patch,

func(a) within group (order by b)

is looked up as though it were func(a,b). The result must be marked as
an ordered set function. A new pg_aggregate integer column,
aggordnargs (?), must be equal to the number of normal args (i.e. (a)
in this case). Note that this may be 0; one can see a legitimate use
case for mode() within group (order by anyelement) for example.

The finalfn must be defined to have the same signature, so its args
are processed as if it were func_final(a,b) - but only a dummy arg is
passed for b. (Similar to the case for window functions.) Resolution
of polymorphic parameters and result types therefore works as normal.

For hypothetical set functions we add a special case, aggordnargs=-1,
for which both the aggregate and the finalfn must be defined as
(variadic "any") and parse analysis detects this case and unifies the
types of the normal args with those of the ORDER BY args.

I propose this new syntax:

create aggregate func(argtypes...) within group (argtypes...) (
[ STYPE = ... , ]
[ SORTOP = ... , ]
[ INITCOND = ... , ]
FINALFUNC = func_final
);

Ordered set functions will typically not need STYPE etc., but
hypothetical set functions will be declared as, e.g.:

create aggregate rank(variadic "any") within group (variadic "any") (
STYPE = boolean,
INITCOND = 'f',
SORTOP = >,
FINALFUNC = rank_hypothetical_final
);

(I'm open to comment as to whether to simply overload the aggsortop
column in pg_aggregate or add a new one. I'm inclined to do the latter.)

The idea here is that a column of type STYPE will be appended to the
list of columns to be sorted, using SORTOP as sort operator, and all
input rows will have this column initialized to the INITCOND value.
This is to make it easy to implement rank() and friends by simply
inserting the hypothetical row into the sort, with a true value to
flag it, and finding its position in the sort result. (Better that
than comparing the hypothetical row against the whole group.)

(Security caveat: it will be necessary for the finalfn in such cases
to validate that the additional column exists with the right
type. Producing the wrong result is acceptable if the values in it are
unexpected; crashing is not.)

Any comment before we get back to coding?

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2013-07-23 01:36:00 Re: [PATCH] Revive line type
Previous Message Noah Misch 2013-07-23 00:58:12 Re: REINDEX checking of index constraints