Re: Measuring relation free space

From: Noah Misch <noah(at)leadboat(dot)com>
To: Jaime Casanova <jaime(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Bernd Helmle <mailings(at)oopsware(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Measuring relation free space
Date: 2012-01-26 02:47:29
Message-ID: 20120126024729.GA15670@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 24, 2012 at 11:24:08AM -0500, Jaime Casanova wrote:
> On Mon, Jan 23, 2012 at 7:18 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
> > If someone feels like
> > doing it, +1 for making pgstattuple() count non-leaf free space.
>
> actually i agreed that non-leaf pages are irrelevant... i just
> confirmed that in a production system with 300GB none of the indexes
> in an 84M rows table nor in a heavily updated one has more than 1 root
> page, all the rest are deleted, half_dead or leaf. so the posibility
> of bloat coming from non-leaf pages seems very odd

FWIW, the number to look at is internal_pages from pgstatindex():

[local] test=# create table t4 (c) as select * from generate_series(1,1000000);
SELECT 1000000
[local] test=# alter table t4 add primary key(c);
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "t4_pkey" for table "t4"
ALTER TABLE
[local] test=# select * from pgstatindex('t4_pkey');
-[ RECORD 1 ]------+---------
version | 2
tree_level | 2
index_size | 22478848
root_block_no | 290
internal_pages | 10
leaf_pages | 2733
empty_pages | 0
deleted_pages | 0
avg_leaf_density | 90.06
leaf_fragmentation | 0

So, 0.4% of this index. They appear in proportion to the logarithm of the
total index size. I agree that bloat centered on them is unlikely. Counting
them would be justified, but that is a question of formal accuracy rather than
practical importance.

> but the possibility of bloat coming from the meta page doesn't exist,
> AFAIUI at least
>
> we need the most accurate value about usable free space, because the
> idea is to add a sampler mode to the function so we don't scan the
> whole relation. that's why we still need the function.

I doubt we'd add this function solely on the basis that a future improvement
will make it useful. For the patch to go in now, it needs to be useful now.
(This is not a universal principle, but it mostly holds for low-complexity
patches like this one.)

All my comments below would also apply to such a broader patch.

> btw... pgstattuple also has the problem that it's not using a ring buffer
>
>
> attached are two patches:
> - v5: is the same original patch but only track space in leaf, deleted
> and half_dead pages
> - v5.1: adds the same for all kind of indexes (problem is that this is
> inconsistent with the fact that pageinspect only manages btree indexes
> for everything else)

Let's take a step back. Again, what you're proposing is essentially a faster
implementation of "SELECT free_percent FROM pgstattuple(rel)". If this code
belongs in core at all, it belongs in the pgstattuple module. Share as much
infrastructure as is reasonable between the user-visible functions of that
module. For example, I'm suspecting that the pgstat_index() call tree should
be shared, with pgstat_index_page() checking a flag to decide whether to
gather per-tuple stats.

Next, compare the bits of code that differ between pgstattuple() and
relation_free_space(), convincing yourself that the differences are justified.
Each difference will yield one of the following conclusions:

1. Your code contains an innovation that would apply to both functions. Where
not too difficult, merge these improvements into pgstattuple(). In order for
a demonstration of your new code's better performance to be interesting, we
must fix the same low-hanging fruit in its competitor. One example is the use
of the bulk read strategy. Another is the support for SP-GiST.

2. Your code is missing an essential behavior of pgstattuple(). Add it to
your code. One example is the presence of CHECK_FOR_INTERRUPTS() calls.

3. Your code behaves differently from pgstattuple() due to a fundamental
difference in their tasks. These are the only functional differences that
ought to remain in your finished patch; please point them out in comments.
For example, pgstat_heap() visits every tuple in the heap. You'll have no
reason to do that; pgstattuple() only needs it to calculate statistics other
than free_percent.

In particular, I call your attention to the fact that pgstattuple() takes
shared buffer content locks before examining pages. Your proposed patch does
not do so. I do not know with certainty whether that falls under #1 or #2.
The broad convention is to take such locks, because we elsewhere want an exact
answer. These functions are already inexact; they make no effort to observe a
consistent snapshot of the table. If you convince yourself that the error
arising from not locking buffers is reasonably bounded, we can lose the locks
(in both functions -- conclusion #1). Comments would then be in order.

With all that done, run some quick benchmarks: see how "SELECT free_percent
FROM pgstattuple(rel)" fares compared to "SELECT relation_free_space(rel)" for
a large heap and for a large B-tree index. If the timing difference is too
small to be interesting to you, remove relation_free_space() and submit your
pgstattuple() improvements alone. Otherwise, submit as written.

I suppose I didn't make all of this clear enough earlier; sorry for that.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2012-01-26 02:51:13 Re: Second thoughts on CheckIndexCompatible() vs. operator families
Previous Message Robert Haas 2012-01-26 02:25:33 Re: Should I implement DROP INDEX CONCURRENTLY?