Re: benchmarking the query planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Gregory Stark <stark(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "jd(at)commandprompt(dot)com" <jd(at)commandprompt(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: benchmarking the query planner
Date: 2008-12-12 18:18:49
Message-ID: 8020.1229105929@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> writes:
> As I said, we would only increase sample for ndistinct, not for others.

How will you do that? Keep in mind that one of the things we have to do
to compute ndistinct is to sort the sample. ISTM that the majority of
the cost of a larger sample is going to get expended anyway ---
certainly we could form the histogram using the more accurate data at
precisely zero extra cost, and I think we have also pretty much done all
the work for MCV collection by the time we finish counting the number of
distinct values.

I seem to recall Greg suggesting that there were ways to estimate
ndistinct without sorting, but short of a fundamental algorithm change
there's not going to be a win here.

> Right now we may as well use a random number generator.

Could we skip the hyperbole please?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-12-12 18:20:59 Re: benchmarking the query planner
Previous Message Simon Riggs 2008-12-12 18:18:02 Re: benchmarking the query planner