From: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: estimating # of distinct values |
Date: | 2011-01-20 08:10:22 |
Message-ID: | 4D37EDEE.9070906@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 20.01.2011 04:36, Robert Haas wrote:
> ... Even better, the
> code changes would be confined to ANALYZE rather than spread out all
> over the system, which has positive implications for robustness and
> likelihood of commit.
Keep in mind that the administrator can already override the ndistinct
estimate with ALTER TABLE. If he needs to manually run a special ANALYZE
command to make it scan the whole table, he might as well just use ALTER
TABLE to tell the system what the real (or good enough) value is. A DBA
should have a pretty good feeling of what the distribution of his data
is like.
And how good does the estimate need to be? For a single-column, it's
usually not that critical, because if the column has only a few distinct
values then we'll already estimate that pretty well, and OTOH if
ndistinct is large, it doesn't usually affect the plans much if it's 10%
of the number of rows or 90%.
It seems that the suggested multi-column selectivity estimator would be
more sensitive to ndistinct of the individual columns. Is that correct?
How is it biased? If we routinely under-estimate ndistinct of individual
columns, for example, does the bias accumulate or cancel itself in the
multi-column estimate?
I'd like to see some testing of the suggested selectivity estimator with
the ndistinct estimates we have. Who knows, maybe it works fine in practice.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Dimitri Fontaine | 2011-01-20 08:59:50 | Re: Extending opfamilies for GIN indexes |
Previous Message | Pavel Stehule | 2011-01-20 07:55:20 | Re: REVIEW: patch: remove redundant code from pl_exec.c |