Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)

Lists: pgsql-bugspgsql-general
From: "Richard Neill" <rn214(at)cam(dot)ac(dot)uk>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-10 23:37:46
Message-ID: 200907102337.n6ANbkDk086971@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general


The following bug has been logged online:

Bug reference: 4916
Logged by: Richard Neill
Email address: rn214(at)cam(dot)ac(dot)uk
PostgreSQL version: 8.4
Operating system: Linux
Description: wish: more statistical functions (median, percentiles
etc)
Details:

In addition to the existing aggregate functions (avg, stddev etc), it would
be nice if postgres could return further information. For example, the
quartiles, percentiles, and median.

[mode would also be useful, as an explicit function, though we can get it
easily enough using count(1) order by count desc].

According to google, this has been a wish since at least year 2000 for
various people, but doesn't seem to be implemented.

Thanks - Richard


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 00:52:34
Message-ID: 20090711005234.GC31174@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
> In addition to the existing aggregate functions (avg, stddev etc), it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.

That's because no one has yet taken the time. However, patches are welcome, if
you'd like it enough to implement it. It's on my list of things that might be
interesting to write, for example, but there are other things higher up on
that list.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


From: Richard Neill <rn214(at)hermes(dot)cam(dot)ac(dot)uk>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Richard Neill <rn214(at)cam(dot)ac(dot)uk>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 01:12:22
Message-ID: 4A57E6F6.9050200@hermes.cam.ac.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

Dear Joshua,

Thanks for your reply. Sadly, I haven't the time (or expertise) to write
this myself. However, the feature would be really useful to have. I'd
certainly be willing to make a £200 payment or donation in return.

I'm aware that this number is problematic, as it undervalues developer
time (£200 is the value, to me, of that feature; which is probably
uncorrelated with the value of his time to anyone who might implement it).

The offer is there if anyone wants to take it; please feel free to
contact me off-list.

Richard

Joshua Tolley wrote:
> On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
>> In addition to the existing aggregate functions (avg, stddev etc), it would
>> be nice if postgres could return further information. For example, the
>> quartiles, percentiles, and median.
>>
>> [mode would also be useful, as an explicit function, though we can get it
>> easily enough using count(1) order by count desc].
>>
>> According to google, this has been a wish since at least year 2000 for
>> various people, but doesn't seem to be implemented.
>
> That's because no one has yet taken the time. However, patches are welcome, if
> you'd like it enough to implement it. It's on my list of things that might be
> interesting to write, for example, but there are other things higher up on
> that list.
>
> --
> Joshua Tolley / eggyknap
> End Point Corporation
> http://www.endpoint.com


From: Joshua Tolley <eggyknap(at)gmail(dot)com>
To: Richard Neill <rn214(at)hermes(dot)cam(dot)ac(dot)uk>
Cc: Richard Neill <rn214(at)cam(dot)ac(dot)uk>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 01:21:00
Message-ID: 20090711012100.GD31174@eddie
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote:
> Thanks for your reply. Sadly, I haven't the time (or expertise) to write
> this myself. However, the feature would be really useful to have. I'd
> certainly be willing to make a £200 payment or donation in return.

That's very nice of you to make the offer. Pending someone taking you up on
it, you might consider your ability to write the functions in some procedural
language. They would probably be easier to write, and you'd only have to make
them handle data types you're planning to use them with. For instance, there's
an example of PL/Perl versions available embedded in the code here:

http://tr.im/rPDA

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 06:34:12
Message-ID: 162867790907102334r71db0227jfa0e4bd96f48b8e4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

2009/7/11 Richard Neill <rn214(at)cam(dot)ac(dot)uk>:
>
> The following bug has been logged online:
>
> Bug reference:      4916
> Logged by:          Richard Neill
> Email address:      rn214(at)cam(dot)ac(dot)uk
> PostgreSQL version: 8.4
> Operating system:   Linux
> Description:        wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc),  it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

hello

you can use following tricks:

create or replace function nth_percentil(anyarray, int)
returns anyelement as $$
select $1[$2/100.0 * array_upper($1,1) + 1];
$$ language sql immutable strict;

pagila=# select nth_percentil(array(select length from film order by 1),90);
nth_percentil
---------------
173
(1 row)

pagila=# select count(case when length < 173 then 1 end)::float /
count(*) * 100.0
from film;
?column?
----------
89.6
(1 row)

create or replace function median(anyarray)
returns float as $$
select ($1[round(array_upper($1,1)/2.0)] + $1[array_upper($1,1) -
round(array_upper($1,1)/2.0) + 1]) / 2.0::float;
$$ language sql immutable strict;

pagila=# select median(array[1,2]), median(array[1,2,3]),
median(array[1,2,3,4]);
median | median | median
--------+--------+--------
1.5 | 2 | 2.5
(1 row)

pagila=# select median(array(select length from film order by 1));
median
--------
114
(1 row)

tested on pagila database

regards
Pavel Stehule
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>


From: Jasen Betts <jasen(at)xnet(dot)co(dot)nz>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-11 14:12:42
Message-ID: h3a6kq$ekn$1@reversiblemaps.ath.cx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On 2009-07-10, Richard Neill <rn214(at)cam(dot)ac(dot)uk> wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4916
> Logged by: Richard Neill
> Email address: rn214(at)cam(dot)ac(dot)uk
> PostgreSQL version: 8.4
> Operating system: Linux
> Description: wish: more statistical functions (median, percentiles
> etc)
> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc), it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.
>
> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].
>
> According to google, this has been a wish since at least year 2000 for
> various people, but doesn't seem to be implemented.
>
> Thanks - Richard

percentile isn't really an agregate function as its value is mainly
determined by two (or fewer) values in the data set.

here's a function that works on arbitrarily named tables,
by building queries to get the needed statistics and then
using order by, offset, limit. Exploiting
an index (if present) to retrtieve the data values.

it won't work (well) on subqueries, but as 'col' and tbl are substitutd in
unescaped you can use joins and expressions if needed (expect a
performance hit if you do!)

-- percentile function:
-- usage example: select percentile('sometable','somecolumn','true', 0.45);
-- 'true' is a where constraint, 0.45 gets the 45th percentile.
--
-- for best preformance have an index that covers the where condition
-- and col.

CREATE OR REPLACE FUNCTION percentile ( tbl text, col text, whr text, frc double precision ) returns double precision as $$
DECLARE
cnt integer; -- count of records
rat double precision; -- ratio
pos double precision; -- postion
res double precision; -- result
BEGIN
EXECUTE 'select count( '|| col ||' ) FROM '|| tbl ||' WHERE '|| whr INTO cnt;
pos=frc*(cnt-1);
rat=mod(pos::numeric,1);
EXECUTE 'SELECT (SELECT '||col||' * '||(1-rat)::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || floor(pos)::text || ' LIMIT 1 )'
|| ' + (SELECT '||col||' * '||rat::text
|| ' FROM '||tbl||' WHERE '|| whr || ' ORDER BY '||col||' ASC '
|| 'OFFSET ' || ceil(pos)::text || ' LIMIT 1 )' into RES;
RETURN res;
END;
$$ LANGUAGE PLPGSQL;


From: Joe Conway <mail(at)joeconway(dot)com>
To: Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc: Richard Neill <rn214(at)hermes(dot)cam(dot)ac(dot)uk>, Richard Neill <rn214(at)cam(dot)ac(dot)uk>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-12 09:51:39
Message-ID: 4A59B22B.2070107@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

Joshua Tolley wrote:
> On Sat, Jul 11, 2009 at 02:12:22AM +0100, Richard Neill wrote:
>> Thanks for your reply. Sadly, I haven't the time (or expertise) to write
>> this myself. However, the feature would be really useful to have. I'd
>> certainly be willing to make a £200 payment or donation in return.
>
> That's very nice of you to make the offer. Pending someone taking you up on
> it, you might consider your ability to write the functions in some procedural
> language. They would probably be easier to write, and you'd only have to make
> them handle data types you're planning to use them with. For instance, there's
> an example of PL/Perl versions available embedded in the code here:

This stuff is pretty trivial to do with PL/R

Joe


From: David Fetter <david(at)fetter(dot)org>
To: Richard Neill <rn214(at)cam(dot)ac(dot)uk>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-12 15:53:50
Message-ID: 20090712155350.GE18589@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On Fri, Jul 10, 2009 at 11:37:46PM +0000, Richard Neill wrote:
>
> The following bug has been logged online:
>
> Bug reference: 4916
> Logged by: Richard Neill
> Email address: rn214(at)cam(dot)ac(dot)uk
> PostgreSQL version: 8.4
> Operating system: Linux
> Description: wish: more statistical functions (median, percentiles etc)

This really should go to the pgsql-general list, where I'm redirecting
follow-ups. :)

> Details:
>
> In addition to the existing aggregate functions (avg, stddev etc), it would
> be nice if postgres could return further information. For example, the
> quartiles, percentiles, and median.

You can get ntiles in 8.4 with the windowing functions. As to median,
it is very precisely defined...several different and incompatible
ways. Any ideas as to how to handle this?

> [mode would also be useful, as an explicit function, though we can get it
> easily enough using count(1) order by count desc].

You can get that with windowing functions, too. :)

> According to google, this has been a wish since at least year 2000
> for various people, but doesn't seem to be implemented.

Patches are welcome :)

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Grzegorz Jaśkiewicz <gryzman(at)gmail(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Richard Neill <rn214(at)cam(dot)ac(dot)uk>, PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-13 08:17:15
Message-ID: 2f4958ff0907130117k66578d01n89a16a011f49dfbf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

On Sun, Jul 12, 2009 at 5:53 PM, David Fetter<david(at)fetter(dot)org> wrote:
>> [mode would also be useful, as an explicit function, though we can get it
>> easily enough using count(1) order by count desc].
>
> You can get that with windowing functions, too. :)
>
>> According to google, this has been a wish since at least year 2000
>> for various people, but doesn't seem to be implemented.
>
> Patches are welcome :)

Trouble in writing such aggregate, would be that it has to keep full
set, in order to sort it, and choose n/2 element (in case set contains
odd number of elements), or ([n/2]+[(n/2)+1])/2 otherwise.

I usually, if in need to calculate it , I usually do it like that (but
that's pretty slow on large sets):

pg84(at)atlantic:~$ psql
psql (8.4beta2)
Type "help" for help.

pg84=# create table foo(a int not null);
CREATE TABLE
pg84=# insert into foo(a) select random()*666 from generate_series(1,666);
INSERT 0 666
pg84=# select a from foo order by a limit 1 offset (select count(*)/2 from foo);
a
-----
321
(1 row)

(yeah, I know it is lame).

So , I think in order to create such patch, the aggregate would have
to secretly create some temporary table, to store the set first...

--
GJ


From: "Chris Spotts" <rfusca(at)gmail(dot)com>
To: "'David Fetter'" <david(at)fetter(dot)org>, "'Richard Neill'" <rn214(at)cam(dot)ac(dot)uk>
Cc: "'PostgreSQL General'" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Re: [BUGS] BUG #4916: wish: more statistical functions (median, percentiles etc)
Date: 2009-07-13 13:09:49
Message-ID: 00db01ca03bb$350f18f0$9f2d4ad0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-general

> > Details:
> >
> > In addition to the existing aggregate functions (avg, stddev etc),
> it would
> > be nice if postgres could return further information. For example,
> the
> > quartiles, percentiles, and median.
[Spotts, Christopher]
If you're interested in doing real stat work in postgres, try PL/R.