Re: Abbreviated keys for Numeric

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Peter Geoghegan <pg(at)heroku(dot)com>
Subject: Re: Abbreviated keys for Numeric
Date: 2015-02-23 16:56:08
Message-ID: 54EB5BA8.8050700@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 23.2.2015 11:59, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:
>
> Tomas> Interesting, but I think Gavin was asking about how much
> Tomas> variation was there for each tested case (e.g. query executed on
> Tomas> the same code / dataset). And in those cases the padding /
> Tomas> alignment won't change, and thus the effects you describe won't
> Tomas> influence the results, no?
>
> My point is exactly the fact that since the result is not affected,
> this variation between runs of the same code is of no real relevance
> to the question of whether a given change in performance can properly
> be attributed to a patch.
>
> Put it this way: suppose I have a test that when run repeatedly with no
> code changes takes 6.10s (s=0.025s), and I apply a patch that changes
> that to 6.26s (s=0.025s). Did the patch have an impact on performance?
>
> Now suppose that instead of applying the patch I insert random amounts
> of padding in an unused function and find that my same test now takes a
> mean of 6.20s (s=0.058s) when I take the best timing for each padding
> size and calculate stats across sizes. Now it looks obvious that the
> actual code of the patch probably wasn't responsible for any change...
>
> The numbers used here aren't theoretical; they are obtained by testing a
> single query - "select * from d_flt order by v offset 10000000" where
> d_flt contains 5 million float8 values - over 990 times with 33
> different random padding sizes (uniform in 0-32767). Here's a scatter
> plot, with 3 runs of each padding size so you can see the repeatability:
> http://tinyurl.com/op9qg8a

I think we're talking about slightly different things, then.

I believe Gavin was asking about variability for executions with a
particular code (i.e. with fixed amount of padding), to decide whether
it even makes sense to compare results for different patches or whether
the differences are just random noise.

Interpreting those differences - whether they are due to changes in the
algorithm or a result of some padding somewhere else in the code, that's
of course important too.

I believe the small regressions (1-10%) for small data sets, might be
caused by this 'random padding' effect, because that's probably where
L1/L2 cache is most important. For large datasets the caches are
probably not as efficient anyway, so the random padding makes no
difference, and the speedup is just as good as for the other queries.
See for example this:

http://www.postgresql.org/message-id/54EB580C.2000904@2ndquadrant.com

But I'm speculating here ... time for a profiler, I guess.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2015-02-23 17:09:24 Re: Primary not sending to synchronous standby
Previous Message Heikki Linnakangas 2015-02-23 16:56:02 Re: Redesigning checkpoint_segments