Re: benchmarking the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 14:54:02
Message-ID: 5616.1229093642@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Robert Haas" <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Dec 11, 2008 at 10:12 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Maybe so. If we stick to the other design (end both lists at a preset
>> frequency threshold) then the math clearly goes through the same as
>> before, just with num_mcvs that are determined differently. But can
>> we prove anything about the maximum error added from that?

> I don't think so, because in that design, it's entirely possible that
> you'll throw away the entire MCV list if all of the entries are below
> the threshold (as in the example we were just benchmarking, supposing
> a threshold of 0.001).

Right, but the question is how much that really hurts. It's not like
we are going to pick a completely clueless number for the ignored MCVs;
rather, we are going to assume that they have the same stats as the
remainder of the population. If the threshold frequency isn't very
large then the error involved should be bounded. As an example, in the
perfectly flat distribution set up by the speed tests we were just
doing, there actually wouldn't be any error at all (assuming we got
ndistinct right, which of course is a pretty big assumption). I haven't
consumed enough caffeine yet to try to do the math, but I think that if
you set the threshold as something a bit more than the assumed frequency
of a non-MCV value then it could work.

> An alternative is to pick a threshold T for the maximum number of
> equality probes that you're willing to suffer through.

I'd like to get there from the other direction, ie figure out what
T has to be to get known maximum error.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2008-12-12 14:56:56 Re: WIP: default values for function parameters
Previous Message David E. Wheeler 2008-12-12 14:51:37 Re: WIP: default values for function parameters