Re: benchmarking the query planner

From: "Greg Stark" <stark(at)enterprisedb(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 19:08:38
Message-ID: 4136ffa0812121108i7b75490cq93f599adfd6f564a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 12, 2008 at 6:31 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> Why not keep the random algorithm we have now, but scan the block into a
> separate hash table for ndistinct estimation. That way we keep the
> correct random rows for other purposes.

It seems to me that what you have to do is look at a set of blocks and
judge a) how many duplicates are in the typical block and b) how much
overlap there are between blocks. Then extrapolate to other blocks
based on those two values.

So for example if you look at 1% of the blocks and find there are 27
distinct values on each of the blocks then you extrapolate that there
are somewhere between 100*27 distinct values table-wide (if those
blocks have no intersections) and 27 distinct values total (if those
blocks intersect 100% -- ie, they all have the same 27 distinct
values).

I haven't had a chance to read that paper, it looks extremely dense.
Is this the same idea?

--
greg

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-12-12 19:09:49 Re: WIP: default values for function parameters
Previous Message Robert Haas 2008-12-12 18:57:46 Re: benchmarking the query planner