Re: benchmarking the query planner

From: "Robert Haas" <robertmhaas(at)gmail(dot)com>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Greg Stark" <stark(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>, "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 11:44:21
Message-ID: 603c8f070812120344m67ef2c1fs41806cfb4ff9e396@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 12, 2008 at 4:04 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> The existing sampling mechanism is tied to solid statistics. It
>> provides the correct sample size to get a consistent confidence range
>> for range queries. This is the same mathematics which governs election
>> polling and other surveys. The sample size you need to get +/- 5% 19
>> times out of 20 increases as the population increases, but not by very
>> much.
>
> Sounds great, but its not true. The sample size is not linked to data
> volume, so how can it possibly give a consistent confidence range?

I'm not 100% sure how relevant it is to this case, but I think what
Greg is referring to is:

http://en.wikipedia.org/wiki/Margin_of_error#Effect_of_population_size

It is a pretty well-known mathematical fact that for something like an
opinion poll your margin of error does not depend on the size of the
population but only on the size of your sample.

...Robert

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2008-12-12 11:58:15 psql commands for SQL/MED
Previous Message Zdenek Kotala 2008-12-12 11:43:48 [Patch] Space reservation (pgupgrade)