Re: On Distributions In 7.2.1

From: Mark kirkwood <markir(at)slingshot(dot)co(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: On Distributions In 7.2.1
Date: 2002-05-04 04:49:24
Message-ID: 1020487765.1259.23.camel@spikey.slithery.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

>On Fri, 2002-05-03 at 02:11, Tom Lane wrote:
> Mark kirkwood <markir(at)slingshot(dot)co(dot)nz> writes:
> > However Tom's observation is still valid (in spite of my math) - all the
> > frequencies are overestimated, rather than the expected "some bigger,
> > some smaller" sort of thing.
>
> No, that makes sense. The values that get into the most-common-values
> list are only going to be ones that are significantly more common (in
> the sample) than the estimated average frequency. So if the thing makes
> a good estimate of the average frequency, you'll only see upside
> outliers in the MCV list. The relevant logic is in analyze.c:
>
>(snippage)

oh I see... I am thinking that a larger sample may reduce the likelihood
of sample values *not* being included in my MCV list. A quick ALTER ....
SET STATISTICS 500 + ANALYZE results in frequencies of around the 0.0004
area for each MCV - closer to the actual of 0.00033.

So the frequency estimation algorithm is behaving well in this case, and
appears to be coverging to the correct results (or within a sensible
neighbourhood of them...). So for this type of data (uniformly
distributed keys) one can ANALYZE with confidence...

I notice that for a resonably large number of keys + big table
conbination using 100 quantiles gives much more accurate frequencies -
I wonder if its worth a mention in the docs to the effect :

"ANALYZE with the default (10 or so) is ok for most cases, but for big
tables consider using ALTER ... SET STATISTICS 100 for commonly JOINed
or WHEREed columns"

(Next ...the logarithmic guy...)

regards

Mark

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jurgen Defurne 2002-05-04 05:39:44 Re: Foxpro
Previous Message Tom Lane 2002-05-04 00:34:27 Re: Subject: bool / vacuum full bug followup part 2