Re: ANALYZE sampling is too good

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Jim Nasby <jim(at)nasby(dot)net>, Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-11 15:16:50
Message-ID: CA+U5nMLW0yZ3JyuLc5=gcBj4RtV-BdC4zewmwaPu-tFABXaBqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 December 2013 12:08, Greg Stark <stark(at)mit(dot)edu> wrote:

> So there is something clearly wonky in the histogram stats that's
> affected by the distribution of the sample.

...in the case where the avg width changes in a consistent manner
across the table.

Well spotted.

ISTM we can have a specific cross check for bias in the sample of that
nature. We just calculate the avg width per block and then check for
correlation of the avg width against block number. If we find bias we
can calculate how many extra blocks to sample and from where.

There may be other biases also, so we can check for them and respond
accordingly.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Sullivan 2013-12-11 15:28:21 Re: Case sensitivity
Previous Message Dev Kumkar 2013-12-11 15:16:25 Re: Case sensitivity