Re: gaussian distribution pgbench

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-07-13 06:27:19
Message-ID: CADupcHX=WBDBEsqeQCWkGOLX=e=YM6JtL4M2cbW+aoYqh-dczQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

2014-07-04 19:05 GMT+09:00 Andres Freund <andres(at)2ndquadrant(dot)com>:

> On 2014-07-04 11:59:23 +0200, Fabien COELHO wrote:
> >
> > >Yea. I certainly disagree with the patch in it's current state because
> it
> > >copies the same 15 lines several times with a two word difference.
> > >Independent of whether we want those options, I don't think that's going
> > >to fly.
> >
> > I liked a simple static string for the different variants, which means
> > replication. Factorizing out the (large) common part will mean malloc &
> > sprintf. Well, why not.
>
> It sucks from a maintenance POV. And I don't see the overhead of malloc
> being relevant here...
>
> > >>OTOH, we've almost reached the consensus that supporting gaussian
> > >>and exponential options in \setrandom. So I think that you should
> > >>separate those two features into two patches, and we should apply
> > >>the \setrandom one first. Then we can discuss whether the other patch
> > >>should be applied or not.
> >
> > >Sounds like a good plan.
> >
> > Sigh. I'll do that as it seems to be a blocker...
>
I still agree with Fabien-san. I cannot understand why our logical proposal
isn't accepted...

I think we also need documentation about the actual mathematical
> behaviour of the randomness generators.
> > + <para>
> > + With the gaussian option, the larger the
> <replaceable>threshold</>,
> > + the more frequently values close to the middle of the interval
> are drawn,
> > + and the less frequently values close to the <replaceable>min</>
> and
> > + <replaceable>max</> bounds.
> > + In other worlds, the larger the <replaceable>threshold</>,
> > + the narrower the access range around the middle.
> > + the smaller the threshold, the smoother the access pattern
> > + distribution. The minimum threshold is 2.0 for performance.
> > + </para>
>
> The only way to actually understand the distribution here is to create a
> table, insert random values, and then look at the result. That's not a
> good thing.
>
That's right. Therefore, we create command line option to easy to
understand parametrized Gaussian distribution.
When you want to know the parameter of distribution, you can use command
line option like under followings.

[nttcom(at)localhost postgresql]$ contrib/pgbench/pgbench --exponential=10
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 10.00000
decile percents: 63.2% 23.3% 8.6% 3.1% 1.2% 0.4% 0.2% 0.1% 0.0% 0.0%
highest/lowest percent of the range: 9.5% 0.0%

[nttcom(at)localhost postgresql]$ contrib/pgbench/pgbench --exponential=5
starting vacuum...end.
transaction type: Exponential distribution TPC-B (sort of)
scaling factor: 1
exponential threshold: 5.00000
decile percents: 39.6% 24.0% 14.6% 8.8% 5.4% 3.3% 2.0% 1.2% 0.7% 0.4%
highest/lowest percent of the range: 4.9% 0.0%

If you have a better method than our method, please share us.

> > The caveat that I have is that without these options there is:
> >
> > (1) no return about the actual distributions in the final summary, which
> > depend on the threshold value, and
> >
> > (2) no included mean to test the feature, so the first patch is less
> > meaningful if the feature cannot be used simply and require a custom
> script.
>
> I personally agree that we likely want that as an additional
> feature. Even if just because it makes the results easier to compare.
>
If we can do positive and logical discussion, I will agree with the
proposal about separate patches.
However, I think that most opposite hacker decided by his feelings...
Actuary, he didn't answer to our proposal about understanding the
parametrized distribution...
So I also think it is blocker. Command line feature is also needed.
Besides, is there a other good method? Please share us.

Best regards,
--
Mitsumasa KONDO

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2014-07-13 10:27:22 Re: tweaking NTUP_PER_BUCKET
Previous Message Peter Geoghegan 2014-07-13 02:45:14 Re: PoC: Partial sort