Re: gaussian distribution pgbench -- splits v4

From: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench -- splits v4
Date: 2014-08-01 07:26:53
Message-ID: alpine.DEB.2.10.1408010905040.9457@sto
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Hello,

>> Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime with
>> respect to "n", "n" being the number of keys. This is nearly possible,
>> but for the modulo operator which is currently missing, and that I'm
>> planning to submit for this very reason, but probably another time.
>
> That's pretty crude,

Yep. It is very simple, it is much better than nothing, and for a database
test is may be "good enough".

> although I don't object to a modulo operator. It would be nice to be
> able to use a truly random permutation, which is not hard to generate
> but probably requires O(n) storage, likely a problem for large scale
> factors.

That is indeed the actual issue in my mind. I was thinking of permutations
with a formula, which are not so easy to find and may end-up looking like
"(a*k+b)%n" anyway. I had the same issue for generating random data for a
schema (see http://www.coelho.net/datafiller.html).

> Maybe somebody who knows more math than I do (like you, probably!) can
> come up with something more clever.

I can certainly suggest other formula, but that does not mean beautiful
code, thus would probably be rejected. I'll see.

An alternative to this whole process may be to hash/modulo a non uniform
random value.

id = 1 + hash(some-random()) % n

But the hashing changes the distribution as it adds collisions, so I have
to think about how to be able to control the distribution in that case,
and what hash function to use.

--
Fabien.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mitsumasa KONDO 2014-08-01 07:58:01 Re: gaussian distribution pgbench -- splits v4
Previous Message Jeff Davis 2014-08-01 06:33:44 numeric and float comparison oddities