Re: Maximum statistics target

From: Decibel! <decibel(at)decibel(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Gregory Stark <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Subject: Re: Maximum statistics target
Date: 2008-03-20 16:17:10
Message-ID: 588963E3-CCD4-4549-B46E-56F3365F014C@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mar 10, 2008, at 1:26 PM, Peter Eisentraut wrote:
> Am Montag, 10. März 2008 schrieb Gregory Stark:
>>> It's not possible to believe that you'd not notice O(N^2)
>>> behavior for N
>>> approaching 800000 ;-). Perhaps your join columns were unique
>>> keys, and
>>> thus didn't have any most-common-values?
>>
>> We could remove the hard limit on statistics target and impose the
>> limit
>> instead on the actual size of the arrays. Ie, allow people to specify
>> larger sample sizes and discard unreasonably large excess data
>> (possibly
>> warning them when that happens).
>
> I have run some more useful tests now with more distinct values.
> The planning
> times do increase, but this is not the primary worry. If you want
> to spend
> 20 seconds of planning to speed up your query by 40 seconds, this
> could
> surely be a win in some scenarios, and not a catastrophic loss if
> not. The
> practical problems lie with memory usage in ANALYZE, in two ways.
> First, at
> some point it will try to construct pg_statistic rows that don't
> fit into the
> 1GB limit, as mentioned upthread. You get a funny error message
> and it
> aborts. This is fixable with some cosmetics. Second, ANALYZE
> appears to
> temporarily leak memory (it probably doesn't bother to free things
> along the
> way, as most of the code does), and so some not so large statistics
> targets
> (say, 40000) can get your system swapping like crazy. A crafty
> user could
> probably kill the system that way, perhaps even with the restricted
> settings
> we have now. I haven't inspected the code in detail yet, but I
> imagine a few
> pfree() calls and/or a counter that checks the current memory usage
> against
> maintenance_work_mem could provide additional safety. If we could get
> ANALYZE under control, then I imagine this would provide a more
> natural upper
> bound for the statistics targets, and it would be controllable by the
> administrator.

At some point I think it makes a lot more sense to just have VACUUM
gather stats as it goes, rather than have ANALYZE generate a bunch of
random IO.

BTW, when it comes to the case of the OP, perhaps we can build enough
intelligence for the system to understand when the stats follow some
type of pattern (ie: a geometric distribution), and store the stats
differently.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-03-20 17:17:22 Sort Refinement
Previous Message Decibel! 2008-03-20 15:59:13 Re: [Fwd: Re: [PATCHES] 64-bit CommandIds]