Re: Add min and max execute statement time in pg_stat_statement

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Peter Geoghegan <pg(at)heroku(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2013-10-22 11:03:23
Message-ID: 52665B7B.4090106@archidevsys.co.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22/10/13 22:56, Dimitri Fontaine wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> Hm. It's been a long time since college statistics, but doesn't the
>> entire concept of standard deviation depend on the assumption that the
>> underlying distribution is more-or-less normal (Gaussian)? Is there a
> I just had a quick chat with a statistician friends of mine on that
> topic, and it seems that the only way to make sense of an average is if
> you know already the distribution.
>
> In our case, what I keep experiencing with tuning queries is that we
> have like 99% of them running under acceptable threshold and 1% of them
> taking more and more time.
>
> In a normal (Gaussian) distribution, there would be no query time
> farther away from the average than any other, so my experience tells me
> that the query time distribution is anything BUT normal (Gaussian).
>
>> good reason to suppose that query runtime is Gaussian? (I'd bet not;
>> in particular, multimodal behavior seems very likely due to things like
>> plan changes.) If not, how much does that affect the usefulness of
>> a standard-deviation calculation?
> I don't know what multi-modal is.
>
[...]

Multi-modal is basically having more than one hump when you graph the frequencies of values.

If you gave a series of mathematical questions of varying degrees of difficulty and divers areas in mathematics to a group of people between the ages of 20 & 25 selected at random in New Zealand, then you would have at least 2 humps. One hump would be those who had little mathematical training and/or no interest and those that had had more advanced mathematical training and/or were interested in mathematics.

You would also get at least 2 humps if you plotted numbers of people under the age of 50, with the number of visits to medical practioners. Basically those people with chronic illnesses with those who tend not to have extended periods of illness - this implies 2 humps, but it may be more complicated.

Grabbing people at random and getting them to fire a rifle at targets would also be multi modal. A lot of people with low scores and a lessor percentage with reasonable scores. I would expect this to be quite pronounced, people with lots of rifle practice will tend to do significantly better.

Cheers,
Gavin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-22 11:14:55 Re: all_visible replay aborting due to uninitialized pages
Previous Message Pavel Stehule 2013-10-22 10:55:41 Re: proposal: lob conversion functionality