Re: gaussian distribution pgbench

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench
Date: 2014-03-15 08:50:43
Message-ID: CADupcHWUDkgKbMa1K=Z5kgVShH91ipHVJzx1+gypf09RxNzRbw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

2014-03-15 15:53 GMT+09:00 Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>:

>
> Hello Heikki,
>
>
> A couple of comments:
>>
>> * There should be an explicit "\setrandom ... uniform" option too, even
>> though you get that implicitly if you don't specify the distribution
>>
>
> Indeed. I agree. I suggested it, but it got lost.

OK. If we keep to the SQL grammar, your saying is right. I will add it.

> * What exactly does the "threshold" mean? The docs informally explain
>> that "the larger the thresold, the more frequent values close to the middle
>> of the interval are drawn", but that's pretty vague.
>>
>
> There are explanations and computations as comments in the code. If it is
> about the documentation, I'm not sure that a very precise mathematical
> definition will help a lot of people, and might rather hinder
> understanding, so the doc focuses on an intuitive explanation instead.

Yeah, I think that we had better to only explain necessary infomation for
using this feature. If we add mathematical theory in docs, it will be too
difficult for user. And it's waste.

* Does min and max really make sense for gaussian and exponential
>> distributions? For gaussian, I would expect mean and standard deviation as
>> the parameters, not min/max/threshold.
>>
>
> Yes... and no:-) The aim is to draw an integer primary key from a table,
> so it must be in a specified range. This is approximated by drawing a
> double value with the expected distribution (gaussian or exponential) and
> project it carefully onto integers. If it is out of range, there is a loop
> and another value is drawn. The minimal threshold constraint (2.0) ensures
> that the probability of looping is low.

I think it is difficult to understand from our text... So I create picture
that will help you to understand it.
Please see it.

>
> * How about setting the variable as a float instead of integer? Would
>> seem more natural to me. At least as an option.
>>
>
> Which variable? The values set by setrandom are mostly used for primary
> keys. We really want integers in a range.

I think he said threshold parameter. Threshold parameter is very sensitive
parameter, so we need to set double in threshold. I think that you can
consent it when you see attached picture.

regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachment Content-Type Size
explain_gaussian_and_exponential_generating_algorithm2.pdf application/pdf 110.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mitsumasa KONDO 2014-03-15 08:56:42 Re: gaussian distribution pgbench
Previous Message Fabien COELHO 2014-03-15 06:53:47 Re: gaussian distribution pgbench