Re: Maximum statistics target

From: "Stephen Denne" <Stephen(dot)Denne(at)datamail(dot)co(dot)nz>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc: "Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Maximum statistics target
Date: 2008-03-09 21:24:18
Message-ID: F0238EBA67824444BC1CB4700960CB4804D2F049@dmpeints002.isotach.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > On Fri, Mar 07, 2008 at 07:25:25PM +0100, Peter Eisentraut wrote:
> >> What's the problem with setting it to ten million if I
> have ten million values
> >> in the table and I am prepared to spend the resources to
> maintain those
> >> statistics?
>
> > That it'll probably take 10 million seconds to calculate the plans
> > using it? I think Tom pointed there are a few places that are O(n^2)
> > the number entries...
>
> I'm not wedded to the number 1000 in particular --- obviously that's
> just a round number. But it would be good to see some
> performance tests
> with larger settings before deciding that we don't need a limit.

I recently encountered a situation where I would have liked to be able to try a larger limit (amongst other ideas for improving my situation):

I have a field whose distribution of frequencies of values is roughly geometric, rather than flat.
Total rows = 36 million
relpages=504864
Distinct field values in use = 169
10 values account for 50% of the rows.
41 values account for 90% of the rows.

After setting statistics target to 1000 for that field, and analyzing the table, the statistics row for that field had 75 most frequent values and a histogram with 76 entries in it. Estimating 151 values in total.

For this situation using a larger statistics target should result in more pages being read, and a more accurate record of statistics. It shouldn't result in significantly more work for the planner.

It wouldn't solve my problem though, which is frequent over-estimation of rows when restricting by this field with values not known at plan time.

Regards,
Stephen Denne.

Disclaimer:
At the Datamail Group we value team commitment, respect, achievement, customer focus, and courage. This email with any attachments is confidential and may be subject to legal privilege. If it is not intended for you please advise by reply immediately, destroy it and do not copy, disclose or use it in any way.

__________________________________________________________________
This email has been scanned by the DMZGlobal Business Quality
Electronic Messaging Suite.
Please see http://www.dmzglobal.com/services/bqem.htm for details.
__________________________________________________________________

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dawid Kuroczko 2008-03-09 21:45:59 Re: Lazy constraints / defaults
Previous Message Andrew Chernow 2008-03-09 19:37:21 Re: timestamp datatype cleanup