Re: ANALYZE sampling is too good

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ANALYZE sampling is too good
Date: 2013-12-10 21:33:29
Message-ID: 52A788A9.4010609@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/12/13 09:19, Heikki Linnakangas wrote:
> On 12/10/2013 10:00 PM, Simon Riggs wrote:
>> On 10 December 2013 19:54, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> On 12/10/2013 11:49 AM, Peter Geoghegan wrote:
>>>> On Tue, Dec 10, 2013 at 11:23 AM, Simon Riggs
>>>> <simon(at)2ndquadrant(dot)com> wrote:
>>>> I don't think that anyone believes that not doing block sampling is
>>>> tenable, fwiw. Clearly some type of block sampling would be preferable
>>>> for most or all purposes.
>>>
>>> As discussed, we need math though. Does anyone have an ACM
>>> subscription
>>> and time to do a search? Someone must. We can buy one with community
>>> funds, but no reason to do so if we don't have to.
>>
>> We already have that, just use Vitter's algorithm at the block level
>> rather than the row level.
>
> And what do you do with the blocks? How many blocks do you choose?
> Details, please.
>
>

Yeah - and we seem to be back to Josh's point about needing 'some math'
to cope with the rows within a block not being a purely random selection.

Regards

Mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-12-10 21:33:48 Re: pg_stat_statements fingerprinting logic and ArrayExpr
Previous Message Josh Berkus 2013-12-10 21:26:23 Re: plpgsql_check_function - rebase for 9.3