Re: Obtaining random rows from a result set

From: Alban Hertroys <alban(at)magproductions(dot)nl>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Postgres General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Obtaining random rows from a result set
Date: 2007-09-01 15:44:27
Message-ID: 22987471-824E-4D9C-B494-CE7E3E3C3EB6@magproductions.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Sep 1, 2007, at 14:44, Martijn van Oosterhout wrote:

> On Sat, Sep 01, 2007 at 02:24:25PM +0200, Alban Hertroys wrote:
>> Oh, now I see... The first time guarantees that v has a value (as
>> random() < 1/1), and after that there is a decreasing chance that a
>> new row gets re-assigned to v. That means the last row has a chance
>> of 1/n, which would be it's normal chance if the distribution were
>> linear, but doesn't the first row have a chance of 1/(n!) to be
>> returned?
>
> No. Consider at the first row it has chance 1 of being selected. At
> the
> second row it has chance 1/2 of being *kept*. At the third row it has
> chance 2/3 of being kept. At row four it's 3/4. As you see, the
> numerators and denominators cancel, leaving 1/n at the end...

Ah, now I see where I went wrong. If the first row got through to any
next iteration, of course there's no chance anymore that it didn't.

> Neat huh?

Neat from an algorithmic point of view, yes. But it also means that
it's calculating random() for every record just like the rest of the
suggested solutions :(

I'm still convinced doing that isn't the right approach to the problem.

I think I'll do some experimenting with the set returning function on
Monday to see how that performs comparative to ordering by random().

The problem with the approaches that use pre-calculated random values
is I need my result to be truly random each time. If I'd return a
number of records starting at a record that has a certain random
value, I'd end up returning the records directly after that in the
same order every time because their order is pre-determined.

I realise the chances of that happening are slim provided enough
records to choose from, and chance dictates that it could happen
anyway, but you couldn't conscientiously sell that as an equal
chance... My boss thinks otherwise though, maybe I'll have to settle
for almost fair :P

--
Alban Hertroys
alban(at)magproductions(dot)nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
7500 AK Enschede

// Integrate Your World //

!DSPAM:737,46d983fa289902833059189!

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2007-09-01 17:09:32 Re: JOIN issues (Left vs Right for sorting), and "Nested Loop" problem
Previous Message Richard Broersma Jr 2007-09-01 15:26:18 Re: Export data to MS Excel