PROPOSAL: tracking aggregated numbers from pg_stat_database

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-06 19:51:30
Message-ID: 51607CC2.9080702@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I'm regularly using pg_stat_database view to analyze various aspects of
behavior of the cluster. The #1 issue I'm constantly running into is
that to get cluster-level view (across all the databases), the table
needs to be aggregated like this:

SELECT
SUM(blks_hit) blks_hit,
SUM(blks_read) blks_read
FROM pg_stat_database

This more or less works in stable environments, but once you start
dropping databases (think of hosting with shared DB server) it gets
unusable because after DROP DATABASE the database suddenly disappears
from the sum.

Therefore I do propose tracking the aggregated stats, similar to the
pg_stat_bgwriter view. This does not require new messages (thanks to
reuse of the existing messages), and I expect the overhead to be
negligible (a few bytes of storage, minimal CPU).

I think it does not make sense to merge this into pg_stat_bgwriter,
creating a new view (can't think of a good name though), seems like a
much better choice to me.

And now a bit more detailed explanation of the issues ...

Analysis is usually based on comparing two snapshots (say a few minutes
apart), and this makes is rather much more difficult because the dropped
databases suddenly disappear from the second snapshot.

Say for example there are two databases, A and B, with stats snapshotted
at T1 and T2. The database B is dropped sometimes between the snapshots.

So the snaphots look like this:

time | db | blks_read | bkls_hit
-----------------------------------
T1 | A | 1000000 | 500000
T1 | B | 1000000 | 500000
T2 | A | 1500000 | 750000

Now, the aggregated data look like this:

time | blks_read | bkls_hit
------------------------------
T1 | 2000000 | 1000000
T2 | 1500000 | 750000

So the difference (T2-T1) is

blks_read | bkls_hit
----------------------
-500000 | -250000

Yes, negative values do not make much sense. It's very difficult to
detect such behavior and account for that.

It might be possible to solve (some of) the issues with elaborate
snapshotting system, but it's awkward / difficult to use. Adding a new
system view works much nicer.

regards
Tomas

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2013-04-06 20:20:51 Process title for autovac
Previous Message Jeff Janes 2013-04-06 19:24:25 Re: [BUGS] BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog