Re: Estimating geometric distributions

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Estimating geometric distributions
Date: 2008-03-11 20:43:11
Message-ID: F0238EBA67824444BC1CB4700960CB4804DD9399@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> > I have a field whose distribution of frequencies of values is
> > roughly geometric, rather than flat.

> My problem is frequent
> > over-estimation of rows when restricting by this field with
> > values not known at plan time.

> Is there any facility already in PostgreSQL to help me here?
>
> Hopefully an index type that I don't know about yet?
> (Geometric distributions are similar to those found in word
> count distributions).
>
> If not... is there any merit in this idea:
>
> During the analyze process, the geometric mean of sampled
> rows was calculated, and if determined to be significantly
> different from the arithmetic mean, stored in a new stats
> column. When estimating the number of rows that will be
> returned by queries of the form shown above, if there is a
> geometric mean stored, use it instead of the arithmetic mean.

I came up with another (much easier) means of adjusting the planners estimation of how many rows will be returned:

Increase the number of distinct values in the statistics.
For example:
update pg_statistic set stadistinct=2691 where starelid=29323 and staattnum=2;

I can then pick a number of distinct values such that the effective arithmetic mean is equal to what I calculated the geometric mean to be.

Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-03-11 20:44:27 Re: Autovacuum vs statement_timeout
Previous Message Bruce Momjian 2008-03-11 20:36:21 Re: Autovacuum vs statement_timeout