From: | Gregory Stark <stark(at)enterprisedb(dot)com> |
---|---|
To: | Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> |
Cc: | Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Optimizer improvements: to do or not to do? |
Date: | 2006-09-13 18:44:15 |
Message-ID: | 877j07txmo.fsf@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> It's common here for queries to vastly overestimate the
> number of pages that would need to be read because
> postgresql's guess at the correlation being practically 0
> despite the fact that the distinct values for any given
> column are closely packed on a few pages.
I think we need a serious statistics jock to pipe up with some standard
metrics that do what we need. Otherwise we'll never have a solid footing for
the predictions we make and will never know how much we can trust them.
That said I'm now going to do exactly what I just said we should stop doing
and brain storm about an ad-hoc metric that might help:
I wonder if what we need is something like: sort the sampled values by value
and count up the average number of distinct blocks per value. That might let
us predict how many pages a fetch of a specific value would retrieve. Or
perhaps we need a second histogram where the quantities are of distinct pages
rather than total records.
We might also need a separate "average number of n-block spans per value"
metric to predict how sequential the i/o will be in addition to how many pages
will be fetched.
--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Strong, David | 2006-09-13 18:46:04 | Re: Lock partitions |
Previous Message | Tom Lane | 2006-09-13 18:34:25 | Re: - Proposal for repreparing prepared statements |