Re: Review: Revise parallel pg_restore's scheduling heuristic

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Review: Revise parallel pg_restore's scheduling heuristic
Date: 2009-08-07 19:08:21
Message-ID: 4A7C3555020000250002967D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sam Mason <sam(at)samason(dot)me(dot)uk> wrote:

> All we're saying is that we're less than 90% confident that there's
> something "significant" going on. All the fiddling with standard
> deviations and sample sizes is just easiest way (that I know of)
> that statistics currently gives us of determining this more formally
> than a hand-wavy "it looks OK to me". Science tells us that humans
> are liable to say things are OK when they're not, as well as vice
> versa; statistics gives us a way to work past these limitations in
> some common and useful situations.

Following up, I took the advice offered in the referenced article, and
used a spreadsheet with a TDIST function for more accurate results
than available through the table included in the article. That allows
what I think is a more meaningful number: the probability that taking
a sample that big would have resulted in a t-statistic larger than was
actually achieved if there was no real difference.

With the 20 samples from that last round of tests, the answer (rounded
to the nearest percent) is 60%, so "probably noise" is a good summary.
Combined with the 12 samples from earlier comparable runs with the
prior version of the patch, it goes to a 90% probability that noise
would generate a difference at least that large, so I think we've
gotten to "almost certainly noise". :-)

To me, that seems more valuable for this situation than saying "we
haven't reached 90% confidence that it's a real difference." I used
the same calculations up through the t-statistic.

The one question I have left for this technique is why you went with

((avg1 - avg2) / (stddev * sqrt(2/samples)))

instead of

((avg1 - avg2) / (stddev / sqrt(samples)))

I assume that it's because the baseline was a set of samples rather
than a fixed mark, but I couldn't pick out a specific justification
for this in the literature (although I might have just missed it), so
I'd feel more comfy if you could clarify.

Given the convenience of capturing benchmarking data in a database,
has anyone tackled implementation of something like the spreadsheet
TDIST function within PostgreSQL?

-Kevin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-08-07 19:18:54 Re: Review: Revise parallel pg_restore's scheduling heuristic
Previous Message Sam Mason 2009-08-07 18:59:37 Re: Fixing geometic calculation