Re: a fast bloat measurement tool (was Re: Measuring relation free space)

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: a fast bloat measurement tool (was Re: Measuring relation free space)
Date: 2015-02-23 02:20:17
Message-ID: 54EA8E61.9080306@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/22/15 5:41 PM, Tomas Vondra wrote:
> Otherwise, the code looks OK to me. Now, there are a few features I'd
> like to have for production use (to minimize the impact):
>
> 1) no index support:-(
>
> I'd like to see support for more relation types (at least btree
> indexes). Are there any plans for that? Do we have an idea on how to
> compute that?

It'd be cleaner if had actual an actual am function for this, but see below.

> 2) sampling just a portion of the table
>
> For example, being able to sample just 5% of blocks, making it less
> obtrusive, especially on huge tables. Interestingly, there's a
> TABLESAMPLE patch in this CF, so maybe it's possible to reuse some
> of the methods (e.g. functions behind SYSTEM sampling)?
>
> 3) throttling
>
> Another feature minimizing impact of running this on production might
> be some sort of throttling, e.g. saying 'limit the scan to 4 MB/s'
> or something along those lines.
>
> 4) prefetch
>
> fbstat_heap is using visibility map to skip fully-visible pages,
> which is nice, but if we skip too many pages it breaks readahead
> similarly to bitmap heap scan. I believe this is another place where
> effective_io_concurrency (i.e. prefetch) would be appropriate.

All of those wishes are solved in one way or another by vacuum and/or
analyze. If we had a hook in the tuple scanning loop and at the end of
vacuum you could just piggy-back on it. But really all we'd need for
vacuum to be able to report this info is one more field in LVRelStats, a
call to GetRecordedFreeSpace for all-visible pages, and some logic to
deal with pages skipped because we couldn't get the vacuum lock.

Should we just add this to vacuum instead?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-02-23 02:24:56 Re: Redesigning checkpoint_segments
Previous Message Tomas Vondra 2015-02-23 01:41:03 Re: a fast bloat measurement tool (was Re: Measuring relation free space)