Re: random() generates collisions too early

From: Honza Horak <hhorak(at)redhat(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: random() generates collisions too early
Date: 2013-10-22 12:41:18
Message-ID: 5266726E.5030300@redhat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 10/21/2013 05:14 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
>> On 18.10.2013 14:55, Honza Horak wrote:
>>> The results covered only 181383 distinct values, and 68 values
>>> repeated four or five times each. We should at least consider using a
>>> higher-entropy seed.
>
>> Interesting. PostgreSQL's random() function just calls the underlying
>> libc random() function. I assume you tested this on with Linux and glibc.
>
> I agree with the theory that this probably isn't the fault of the random()
> function as such, but with our code to reset the random seed when forking
> a postmaster child process. Note that the test case is only examining the
> first random value created in each process. So basically what this is
> measuring is the number of different seed values we use.
>
> regards, tom lane
>

After playing a bit more with random() value and the seed defined in
src/backend/postmaster/postmaster.c:4038 it seems really like the seed
doesn't have very good characteristic. The PID number used there ensures
the value to be different in child processes, but when only xor-ed with
usecs, it doesn't enlarge the total data domain of the seed, which stays
1M values. Then having in mind the birthday paradox and probably not
ideal PRNG, it doesn't seem unrealistic to get two same random numbers
in only hundreds/thousands of tries.

In order to enlarge possible data domain of the seed we can include sec
count as well and shift some xor-ed variables to use the whole range or
unsigned int. The attached patch uses a way that gave me much better
results (collision found approx. after a hundred thousands of values):

- srandom((unsigned int) (MyProcPid ^ usecs));
+ srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));

I'm also thinking of a reproducer -- does the test-suite allow to
"reconnect" the client during a test?

Regards,
Honza

Attachment Content-Type Size
random-seed.patch text/x-patch 603 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Josh Kupershmidt 2013-10-23 01:37:13 Re: Re: [BUGS] BUG #7873: pg_restore --clean tries to drop tables that don't exist
Previous Message Honza Horak 2013-10-22 11:55:43 Re: random() generates collisions too early