Re: gaussian distribution pgbench -- splits v4

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>
Cc: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: gaussian distribution pgbench -- splits v4
Date: 2014-07-31 16:48:02
Message-ID: CA+TgmoZjsB-UuzWiaVNSsgwC0bsZnJ1vDfCkqn3vHOeuT45m0A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jul 31, 2014 at 10:01 AM, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr> wrote:
>> One of the concerns that I have about the proposal of simply slapping a
>> gaussian or exponential modifier onto \setrandom aid 1 :naccounts is that,
>> while it will allow you to make part of the relation hot and another part of
>> the relation cold, you really can't get any more fine-grained than that. If
>> you use exponential, all the hot accounts will be near the beginning of the
>> relation, and if you use gaussian, they'll all be in the middle.
>
> That is a very good remark. Although I thought of it, I do not have a very
> good solution yet:-)
>
> From a testing perspective, if we assume that keys have no semantics, a
> reasonable assumption is that the distribution of access for actual
> realistic workloads is probably exponential (of gaussian, anyway hardly
> uniform), but without direct correlation between key values.
>
> In order to simulate that, we would have to apply a fixed (pseudo-)random
> permutation to the exponential-drawn key values. This is a non trivial
> problem. The version zero of solving it is to do nothing... it is the
> current status;-) Version one is "k' = 1 + (a * k + b) modulo n" with "a"
> prime with respect to "n", "n" being the number of keys. This is nearly
> possible, but for the modulo operator which is currently missing, and that
> I'm planning to submit for this very reason, but probably another time.

That's pretty crude, although I don't object to a modulo operator. It
would be nice to be able to use a truly random permutation, which is
not hard to generate but probably requires O(n) storage, likely a
problem for large scale factors. Maybe somebody who knows more math
than I do (like you, probably!) can come up with something more
clever.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-07-31 16:49:18 Re: [COMMITTERS] pgsql: Move log_newpage and log_newpage_buffer to xlog.c.
Previous Message Heikki Linnakangas 2014-07-31 15:07:16 Re: [COMMITTERS] pgsql: Move log_newpage and log_newpage_buffer to xlog.c.