Could ANALYZE estimate bloat?

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Could ANALYZE estimate bloat?
Date: 2013-09-20 18:59:00
Message-ID: 523C9AF4.8090402@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

I've been tinkering with a number of table bloat checks, and it occurred
to me that the problm is that these are all approximations based on
overall gross statistics, and as such highly inaccurate.

It seems like would could have ANALYZE, while sampling from the table,
also check the amount of dead space per page and use that as an estimate
of the % of dead space overall. We'd still need something else for
indexes, but this seems like it would be a good start.

No?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Could ANALYZE estimate bloat?
Date: 2013-09-20 19:21:33
Message-ID: 523CA03D.9000200@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 09/20/2013 11:59 AM, Josh Berkus wrote:
>
> Hackers,
>
> I've been tinkering with a number of table bloat checks, and it occurred
> to me that the problm is that these are all approximations based on
> overall gross statistics, and as such highly inaccurate.
>
> It seems like would could have ANALYZE, while sampling from the table,
> also check the amount of dead space per page and use that as an estimate
> of the % of dead space overall. We'd still need something else for
> indexes, but this seems like it would be a good start.
>
> No?

I think this is a great idea.

>

--
Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms
a rose in the deeps of my heart. - W.B. Yeats


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Could ANALYZE estimate bloat?
Date: 2013-09-20 21:17:03
Message-ID: 20130920211703.GK2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh,

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> I've been tinkering with a number of table bloat checks, and it occurred
> to me that the problm is that these are all approximations based on
> overall gross statistics, and as such highly inaccurate.

Greg Smith and I discussed some improvements around this area @Open.
I'd suggest you talk with him about the ideas that he has.

> It seems like would could have ANALYZE, while sampling from the table,
> also check the amount of dead space per page and use that as an estimate
> of the % of dead space overall. We'd still need something else for
> indexes, but this seems like it would be a good start.
>
> No?

Err, I thought that was one of the things like ANALYZE *did* collect
through the 'live tuples' number?

Thanks,

Stephen


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Could ANALYZE estimate bloat?
Date: 2013-09-20 22:56:24
Message-ID: 523CD298.5060800@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 09/20/2013 02:17 PM, Stephen Frost wrote:
>> No?
>
> Err, I thought that was one of the things like ANALYZE *did* collect
> through the 'live tuples' number?

Nope. "live tuples" is updated by the stats collector, NOT by analyze.
Also, live vs. dead tuples doesn't tell me how much free *space* is
available on pages.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Could ANALYZE estimate bloat?
Date: 2013-09-21 01:29:21
Message-ID: 20130921012921.GM2706@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Josh Berkus (josh(at)agliodbs(dot)com) wrote:
> Also, live vs. dead tuples doesn't tell me how much free *space* is
> available on pages.

I'm not really sure that you'd get much better from ANALYZE than you get
from tracking the inserted/updated/deleted tuples. Collecting that
information when VACUUM'ing the table would certainly provide much more
accurate results, which could possibly be stored in a page-level bitmap
of "completely empty pages" at the beginning of each 1G segment.
Alternatively, the bitmap could be updated during processing instead of
waiting for a VACUUM.

Greg and I hypothesized that such a bitmap might be used to truncate
individual 1G segments in the middle of a relation rather than only at
the end, perhaps all the way down to a point where only a header plus
the page-level bitmap in the 1G segment are left. This was discussed in
context of a VACUUM which used the try-to-elevate-the-lock approach
already used to truncate the last 1G segment of the relations, though
I've also wondered if it could take page-level locks starting at the end
of the 1G segment and walking backwards until it's unable to acquire a
lock or a non-empty page is found.

Of course, we're aware of the issues around the storage management
system and interfaces which might make this entirely unrealistic but
it's getting to a point where, I think (not sure about Greg), we need to
deal with that in some way to improve on issues like this.

Thanks,

Stephen