Quick Links

Re: Add min and max execute statement time in pg_stat_statement

From:	Arne Scheffer <scheffa(at)uni-muenster(dot)de>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	Arne Scheffer <arne(dot)scheffer(at)uni-muenster(dot)de>, David G Johnston <david(dot)g(dot)johnston(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Add min and max execute statement time in pg_stat_statement
Date:	2015-01-21 16:21:01
Message-ID:	alpine.DEB.2.02.1501211650370.2867@zivarne
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, 21 Jan 2015, Andrew Dunstan wrote:

>
> On 01/21/2015 09:27 AM, Arne Scheffer wrote:
>> Sorry, corrected second try because of copy&paste mistakes:
>> VlG-Arne
>>
>>> Comments appreciated.
>>> Definition var_samp = Sum of squared differences /n-1
>>> Definition stddev_samp = sqrt(var_samp)
>>> Example N=4
>>> 1.) Sum of squared differences
>>> 1_4Sum(Xi-XM4)²
>>> =
>>> 2.) adding nothing
>>> 1_4Sum(Xi-XM4)²
>>> +0
>>> +0
>>> +0
>>> =
>>> 3.) nothing changed
>>> 1_4Sum(Xi-XM4)²
>>> +(-1_3Sum(Xi-XM3)²+1_3Sum(Xi-XM3)²)
>>> +(-1_2Sum(Xi-XM2)²+1_2Sum(Xi-XM2)²)
>>> +(-1_1Sum(Xi-XM1)²+1_1Sum(Xi-XM1)²)
>>> =
>>> 4.) parts reordered
>>> (1_4Sum(Xi-XM4)²-1_3Sum(Xi-XM3)²)
>>> +(1_3Sum(Xi-XM3)²-1_2Sum(Xi-XM2)²)
>>> +(1_2Sum(Xi-XM2)²-1_1Sum(Xi-XM1)²)
>>> +1_1Sum(X1-XM1)²
>>> =
>>> 5.)
>>> (X4-XM4)(X4-XM3)
>>> + (X3-XM3)(X3-XM2)
>>> + (X2-XM2)(X2-XM1)
>>> + (X1-XM1)²
>>> =
>>> 6.) XM1=X1 => There it is - The iteration part of Welfords Algorithm
>>> (in
>>> reverse order)
>>> (X4-XM4)(X4-XM3)
>>> + (X3-XM3)(X3-XM2)
>>> + (X2-XM2)(X2-X1)
>>> + 0
>>> The missing piece is 4.) to 5.)
>>> it's algebra, look at e.g.:
>>> http://jonisalonen.com/2013/deriving-welfords-method-for-computing-variance/
>>
>>
>
>
>
> I have no idea what you are saying here.

I'm sorry for that statistics stuff,
my attempt was only to visualize in detail
the mathematical reason for
the iterating part of Welfords algorithm
being computing the current sum of squared differences in every step

- therefore it's in my opinion better to call the variable sum_of_squared_diffs
(every statistician will be confused bei "sum_of_variances",
because: sample variance = sum_of_squared_diffs / n-1,
have a look at Mr. Cooks explanation)

- therefore deviding by n-1 is the unbiased estimator by definition.
(have a look at Mr. Cooks explanation)

- therefore I suggested (as a minor nomenclature issue) to call the column/description
stdev_samp (PostgreSQL-nomenclature) / sample_.... to indicate that information.
(have a look at the PostgreSQL aggregate functions, it's doing that the same way)

>
> Here are comments in email to me from the author of
> <http://www.johndcook.com/blog/standard_deviation> regarding the divisor
> used:
>
> My code is using the unbiased form of the sample variance, dividing
> by n-1.
>

I am relieved, now we are at least two persons saying that. :-)
Insert into the commonly known definition

>>> Definition stddev_samp = sqrt(var_samp)

from above, and it's exactly my point.

> Maybe I should add that in the code comments. Otherwise, I don't think we
> need a change.

Huh?

Why is it a bad thing to call the column "stddev_samp" analog to the
aggregate function or make a note in the documentation,
that the sample stddev is used to compute the solution?

I really think it not a good strategy having the user to make a test or dive
into the source code to determine the divisor used.

E.g. David expected stdev_pop, so there is a need for documentation for cases with a small sample.

VlG-Arne

In response to

Re: Add min and max execute statement time in pg_stat_statement at 2015-01-21 14:46:53 from Andrew Dunstan

Responses

Re: Add min and max execute statement time in pg_stat_statement at 2015-01-21 16:32:57 from Andrew Dunstan

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Adam Brightwell	2015-01-21 16:27:31	Re: Additional role attributes && superuser review
Previous Message	Sawada Masahiko	2015-01-21 16:13:51	Re: Merging postgresql.conf and postgresql.auto.conf