Re: ANALYZE sampling is too good

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 21:45:29
Message-ID: 52A78B79.6060904@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/10/2013 01:33 PM, Mark Kirkwood wrote:
> Yeah - and we seem to be back to Josh's point about needing 'some math'
> to cope with the rows within a block not being a purely random selection.

Well, sometimes they are effectively random. But sometimes they are
not. The Chaudri et al paper had a formula for estimating randomness
based on the grouping of rows in each block, assuming that the sampled
blocks were widely spaced (if they aren't there's not much you can do).
This is where you get up to needing a 5% sample; you need to take
enough blocks that you're confident that the blocks you sampled are
representative of the population.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Stark 2013-12-10 21:48:10 Re: ANALYZE sampling is too good
Previous Message Andres Freund 2013-12-10 21:41:45 Re: pg_stat_statements fingerprinting logic and ArrayExpr