Re: Add min and max execute statement time in pg_stat_statement

From: Arne Scheffer <arne(dot)scheffer(at)uni-muenster(dot)de>
To: David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <andrew(at)dunslane(dot)net>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2015-01-21 12:18:04
Message-ID: permail-20150121121804fe5316b600007a2b-scheffa@message-id.uni-muenster.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

David G Johnston schrieb am 2015-01-21:
> Andrew Dunstan wrote
> > On 01/20/2015 01:26 PM, Arne Scheffer wrote:

> >> And a very minor aspect:
> >> The term "standard deviation" in your code stands for
> >> (corrected) sample standard deviation, I think,
> >> because you devide by n-1 instead of n to keep the
> >> estimator unbiased.
> >> How about mentioning the prefix "sample"
> >> to indicate this beiing the estimator?

> > I don't understand. I'm following pretty exactly the calculations
> > stated
> > at &lt;http://www.johndcook.com/blog/standard_deviation/&gt;

> > I'm not a statistician. Perhaps others who are more literate in
> > statistics can comment on this paragraph.

> I'm largely in the same boat as Andrew but...

> I take it that Arne is referring to:

> http://en.wikipedia.org/wiki/Bessel's_correction

Yes, it is.

> but the mere presence of an (n-1) divisor does not mean that is what
> is
> happening. In this particular situation I believe the (n-1) simply
> is a
> necessary part of the recurrence formula and not any attempt to
> correct for
> sampling bias when estimating a population's variance.

That's wrong, it's applied in the end to the sum of squared differences
and therefore per definition the corrected sample standard deviation
estimator.

> In fact, as
> far as
> the database knows, the values provided to this function do represent
> an
> entire population and such a correction would be unnecessary. I

That would probably be an exotic assumption in a working database
and it is not, what is computed here!

> guess it
> boils down to whether "future" queries are considered part of the
> population
> or whether the population changes upon each query being run and thus
> we are
> calculating the ever-changing population variance.

Yes, indeed correct.
And exactly to avoid that misunderstanding, I suggested to
use the "sample" term.
To speak in Postgresql terms; applied in Andrews/Welfords algorithm
is stddev_samp(le), not stddev_pop(ulation).
Therefore stddev in Postgres is only kept for historical reasons, look at
http://www.postgresql.org/docs/9.4/static/functions-aggregate.html
Table 9-43.

VlG-Arne

> Note point 3 in
> the
> linked Wikipedia article.

> David J.

> --
> View this message in context:
> http://postgresql.nabble.com/Add-min-and-max-execute-statement-time-in-pg-stat-statement-tp5774989p5834805.html
> Sent from the PostgreSQL - hackers mailing list archive at
> Nabble.com.

> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2015-01-21 12:26:59 Re: pgaudit - an auditing extension for PostgreSQL
Previous Message Amit Langote 2015-01-21 11:01:12 Re: Parallel Seq Scan