Re: ANALYZE sampling is too good

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 14:32:14
Message-ID: CAGTBQpbnhsc7h4fCHBG63kSYt3-DmyeTZ-QjGf30dN-xacrK4Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 10, 2013 at 11:02 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
>
> On 10 Dec 2013 08:28, "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:
>>
>>
>> Doesn't all that assume a normally distributed random variable?
>
> I don't think so because of the law of large numbers. If you have a large
> population and sample it the sample behaves like a normal distribution when
> if the distribution of the population isn't.

No, the large population says that if you have an AVERAGE of many
samples of a random variable, the random variable that is the AVERAGE
behaves like a normal.

The variable itself doesn't.

And for n_distinct, you need to know the variable itself.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Claudio Freire 2013-12-10 14:32:44 Re: ANALYZE sampling is too good
Previous Message Albe Laurenz 2013-12-10 14:31:31 Re: ANALYZE sampling is too good