Skip site navigation (1) Skip section navigation (2)

Peripheral Links

Header And Logo

PostgreSQL
| The world's most advanced open source database.

Site Navigation

Search for
  Advanced Search

Re: [PERFORM] Bad n_distinct estimation; hacks suggested?


  • From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
  • To: josh(at)agliodbs(dot)com
  • Cc: Greg Stark <gsstark(at)mit(dot)edu>, Marko Ristola <marko(dot)ristola(at)kolumbus(dot)fi>, pgsql-perform <pgsql-performance(at)postgresql(dot)org>, pgsql-hackers(at)postgresql(dot)org
  • Subject: Re: [PERFORM] Bad n_distinct estimation; hacks suggested?
  • Date: Sun, 24 Apr 2005 00:48:59 -0400
  • Message-id: <25382(dot)1114318139(at)sss(dot)pgh(dot)pa(dot)us>

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Overall, our formula is inherently conservative of n_distinct.   That is, I 
> believe that it is actually computing the *smallest* number of distinct 
> values which would reasonably produce the given sample, rather than the 
> *median* one.  This is contrary to the notes in analyze.c, which seem to 
> think that we're *overestimating* n_distinct.  

Well, the notes are there because the early tests I ran on that formula
did show it overestimating n_distinct more often than not.  Greg is
correct that this is inherently a hard problem :-(

I have nothing against adopting a different formula, if you can find
something with a comparable amount of math behind it ... but I fear
it'd only shift the failure cases around.

			regards, tom lane



Home | Main Index | Thread Index

Privacy Policy | PostgreSQL Archives hosted by Command Prompt, Inc. | Designed by tinysofa
Copyright © 1996 – 2008 PostgreSQL Global Development Group