Re: Measuring relation free space

From: Noah Misch <noah(at)leadboat(dot)com>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: Bernd Helmle <mailings(at)oopsware(dot)de>, jaime(at)2ndquadrant(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Measuring relation free space
Date: 2011-12-18 16:56:25
Message-ID: 20111218165625.GB6393@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 16, 2011 at 02:02:03AM -0500, Greg Smith wrote:
> On 12/15/2011 04:11 PM, Noah Misch wrote:
>> Is one of those index figures simply wrong, or do they measure two senses of
>> free space, both of which are interesting to DBAs?
>
> I think the bigger one--the one I was aiming to measure--also includes
> fill-factor space. It should be possible to isolate whether that's true
> by running the function against a fresh index, or by trying tests with a
> table where there's no useful fill. I need to add some of those to the
> test example suite.

No, both measures include fillfactor space. From a brief look at the code, the
proposed function counts space in non-leaf pages, while pgstattuple does not.
Also, the proposed function counts half-dead pages like live pages, while
pgstattuple counts them like dead pages.

One could perhaps justify those choices either way, but they seem too esoteric
for DBA exposure. I recommend choosing a policy on each and making both
pgstattuple() and any new code respect that policy.

> Shaking out the alternate implementation ideas was really my goal for
> this CF here. The major goal of the next revision is to present the
> options with a measure of their respective accuracy and runtime. If I
> have to give up just a of bit of accuracy and make it much faster,
> that's probably what most people want as an option. When Jaime and I
> come back with an update, it really needs to have benchmarks and
> accuracy numbers for each option. That may be complicated a bit
> depending on how much of the table or index is cached, so isolating that
> out will be a pain.

The previous submission seemed to boil down to a speedier version of "SELECT
free_percent FROM pgstattuple('foo')". (Some of the other statistics aren't
cheap.) Considering that, the code does belong in the pgstattuple module.

The sampling approach you have mentioned sounds promising, especially for
indexes. For heap bloat, it may be hard to improve on pg_freespacemap-based and
check_postgres-style estimates with anything less than a full heap scan.

Thanks,
nm

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-12-18 16:56:33 Re: unite recovery.conf and postgresql.conf
Previous Message Jim Nasby 2011-12-18 16:22:11 Re: Autonomous subtransactions