Re: default_statistics_target WAS: max_wal_senders must die

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: default_statistics_target WAS: max_wal_senders must die
Date: 2010-10-21 02:45:07
Message-ID: 10155.1287629107@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> A while back I did a fair bit of reading on ndistinct and large tables
> from the academic literature. The consensus of many papers was that it
> took a sample of at least 3% (or 5% for block-based) of the table in
> order to have 95% confidence in ndistinct of 3X. I can't imagine that
> MCV is easier than this.

You've got that entirely backwards. ndistinct is hard because there
could be a whole lot of values with very small occurrence counts,
and sampling is not going to help you distinguish between things
that occur but once and things that occur only two or three times.
But whether the table has a lot of the first or a lot of the second
means a difference of 2X or 3X on ndistinct.

MCVs, on the other hand, are MCVs precisely because they occur a lot,
and so they are highly likely to show up in a sample. You *can* get a
decent estimate of the first few MCVs from a sample, you just have to be
cautious about not believing something is an MCV if it's only a small
proportion of your sample.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Kupershmidt 2010-10-21 02:56:07 psql autocompletion for \z and \dg
Previous Message Joshua D. Drake 2010-10-21 02:32:21 Re: default_statistics_target WAS: max_wal_senders must die