PROPOSAL: tracking aggregated numbers from pg_stat_database

Lists: pgsql-hackers
From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-06 19:51:30
Message-ID: 51607CC2.9080702@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I'm regularly using pg_stat_database view to analyze various aspects of
behavior of the cluster. The #1 issue I'm constantly running into is
that to get cluster-level view (across all the databases), the table
needs to be aggregated like this:

SELECT
SUM(blks_hit) blks_hit,
SUM(blks_read) blks_read
FROM pg_stat_database

This more or less works in stable environments, but once you start
dropping databases (think of hosting with shared DB server) it gets
unusable because after DROP DATABASE the database suddenly disappears
from the sum.

Therefore I do propose tracking the aggregated stats, similar to the
pg_stat_bgwriter view. This does not require new messages (thanks to
reuse of the existing messages), and I expect the overhead to be
negligible (a few bytes of storage, minimal CPU).

I think it does not make sense to merge this into pg_stat_bgwriter,
creating a new view (can't think of a good name though), seems like a
much better choice to me.

And now a bit more detailed explanation of the issues ...

Analysis is usually based on comparing two snapshots (say a few minutes
apart), and this makes is rather much more difficult because the dropped
databases suddenly disappear from the second snapshot.

Say for example there are two databases, A and B, with stats snapshotted
at T1 and T2. The database B is dropped sometimes between the snapshots.

So the snaphots look like this:

time | db | blks_read | bkls_hit
-----------------------------------
T1 | A | 1000000 | 500000
T1 | B | 1000000 | 500000
T2 | A | 1500000 | 750000

Now, the aggregated data look like this:

time | blks_read | bkls_hit
------------------------------
T1 | 2000000 | 1000000
T2 | 1500000 | 750000

So the difference (T2-T1) is

blks_read | bkls_hit
----------------------
-500000 | -250000

Yes, negative values do not make much sense. It's very difficult to
detect such behavior and account for that.

It might be possible to solve (some of) the issues with elaborate
snapshotting system, but it's awkward / difficult to use. Adding a new
system view works much nicer.

regards
Tomas


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-07 18:55:02
Message-ID: 5161C106.4060304@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6.4.2013 21:51, Tomas Vondra wrote:
> Hi,
>
> I'm regularly using pg_stat_database view to analyze various aspects of
> behavior of the cluster. The #1 issue I'm constantly running into is
> that to get cluster-level view (across all the databases), the table
> needs to be aggregated like this:
>
> SELECT
> SUM(blks_hit) blks_hit,
> SUM(blks_read) blks_read
> FROM pg_stat_database
>
> This more or less works in stable environments, but once you start
> dropping databases (think of hosting with shared DB server) it gets
> unusable because after DROP DATABASE the database suddenly disappears
> from the sum.
>
> Therefore I do propose tracking the aggregated stats, similar to the
> pg_stat_bgwriter view. This does not require new messages (thanks to
> reuse of the existing messages), and I expect the overhead to be
> negligible (a few bytes of storage, minimal CPU).
>
> I think it does not make sense to merge this into pg_stat_bgwriter,
> creating a new view (can't think of a good name though), seems like a
> much better choice to me.

Attached is a first version of the patch, just to illustrate the idea.
It creates a new system view pg_stat_agg_database with aggregated data.
There are no docs, no regression tests etc.

Now, I'm thinking if we should do something similar with database object
(table/index) stats, i.e. keep maintain aggregated data. This might seem
silly at first, but consider for example a partitioned table. It's
common to have a db-level metrics on idx/seq_scans, but when you drop a
partition (e.g. the oldest one), you may get into the same trouble as
with database stats (see my previous post).

So I think it would be handy to define table/index stats aggregated at
the db-level, i.e. there would be one row for each database. I don't
think it makes much sense to aggregate vacuum/analyze info (counts and
timestamps), which means 18 BIGINT counters from pg_stat[io]_tables, and
10 BIGINT counters from pg_stat[io]_indexes. That's 224B for each
database, which I believe is negligible.

OTOH it would be very handy to have this info aggretated per-schema and
per-tablespace, but I'd say to do that later in a separate patch.

Opinions?

Tomas

Attachment Content-Type Size
agg-db-stats-v0.patch text/x-diff 16.2 KB

From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-13 13:01:43
Message-ID: 1365858103.20086.4.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
> This more or less works in stable environments, but once you start
> dropping databases (think of hosting with shared DB server) it gets
> unusable because after DROP DATABASE the database suddenly disappears
> from the sum.
>
> Therefore I do propose tracking the aggregated stats, similar to the
> pg_stat_bgwriter view.

It seems like this will open a can of worms. Maybe someone wants
aggregated stats for pg_stat_user_tables? Or maybe instead of the sum,
someone wants to track the average? etc. I don't think we should turn
the statistics collector into a poor man's data warehouse or statistics
engine. Couldn't you transfer the data to some other system for
long-term analysis? Maybe you could even use event triggers to have
DROP DATABASE do that automatically.


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-13 16:44:26
Message-ID: 51698B6A.4060200@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 13.4.2013 15:01, Peter Eisentraut wrote:
> On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
>> This more or less works in stable environments, but once you start
>> dropping databases (think of hosting with shared DB server) it gets
>> unusable because after DROP DATABASE the database suddenly disappears
>> from the sum.
>>
>> Therefore I do propose tracking the aggregated stats, similar to the
>> pg_stat_bgwriter view.
>
> It seems like this will open a can of worms. Maybe someone wants
> aggregated stats for pg_stat_user_tables? Or maybe instead of the sum,
> someone wants to track the average? etc. I don't think we should turn

What I propose is a simple cumulative counter, just like the other
counters we do have right now. I don't think tracking an average (or any
other statistics) makes much sense here. And as the number of objects
changes over time (e.g. dbs may be created/dropped), I'm wondering what
would be the definition of average?

BTW I've proposed tracking aggregated table/index stats in my second
message in this thread.

> the statistics collector into a poor man's data warehouse or statistics
> engine. Couldn't you transfer the data to some other system for

I certainly don't want to overcomplicate the stats system, and I don;t
think I'm turning it into a DWH or statistics engine. And even with
these aggregated counters, it still requires snapshotting and additional
processing. It's just a bunch of counters.

I'm currently struggling with (quite uncommon) deployments where
databases are created/dropped regularly (not to mention tables and
indexes), and it's surprisingly difficult to process such stats to get
reasonable values.

The point is this allows tracking data that are otherwise effectively
lost. With the current stats you have to discard intervals where
databases were dropped (because well, the data disappear so you don't
know what is the actual snapshot diff). Depending on the number of DB
drops and snapshot interval, this may very well be most of the time.

> long-term analysis? Maybe you could even use event triggers to have
> DROP DATABASE do that automatically.

I don't think event triggers are a good solution, although I'm wondering
how that's supposed to work.

Tomas


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-14 03:44:38
Message-ID: 20130414034438.GK30671@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra wrote:
> On 13.4.2013 15:01, Peter Eisentraut wrote:

> > long-term analysis? Maybe you could even use event triggers to have
> > DROP DATABASE do that automatically.
>
> I don't think event triggers are a good solution, although I'm wondering
> how that's supposed to work.

It doesn't, because event triggers do not support shared objects
(databases, tablespaces, roles).

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-15 03:20:18
Message-ID: CA+TgmoYyuwaLh=SH5gWeZuJ_3Wqf9hOjexaxPwPMQ1ys6G_-CA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 13, 2013 at 9:01 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Sat, 2013-04-06 at 21:51 +0200, Tomas Vondra wrote:
>> This more or less works in stable environments, but once you start
>> dropping databases (think of hosting with shared DB server) it gets
>> unusable because after DROP DATABASE the database suddenly disappears
>> from the sum.
>>
>> Therefore I do propose tracking the aggregated stats, similar to the
>> pg_stat_bgwriter view.
>
> It seems like this will open a can of worms. Maybe someone wants
> aggregated stats for pg_stat_user_tables? Or maybe instead of the sum,
> someone wants to track the average? etc. I don't think we should turn
> the statistics collector into a poor man's data warehouse or statistics
> engine. Couldn't you transfer the data to some other system for
> long-term analysis? Maybe you could even use event triggers to have
> DROP DATABASE do that automatically.

I think Tomas's point about objects getting dropped is a pretty fair
one, although I agree there are potential cans of worms here.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PROPOSAL: tracking aggregated numbers from pg_stat_database
Date: 2013-04-16 22:18:58
Message-ID: 516DCE52.10706@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 4/13/13 12:44 PM, Tomas Vondra wrote:
> I'm currently struggling with (quite uncommon) deployments where
> databases are created/dropped regularly (not to mention tables and
> indexes), and it's surprisingly difficult to process such stats to get
> reasonable values.

Yes, it's a pain. If you aggregate the table level data available now,
you'll end up with some missing activity. Work done between the last
snapshot and when the drop happened is gone right now, whereas your
aggregated stats view would preserve that activity. The real fun is if
the new table has the same name as the old one, which gives you all the
negative value headaches a pg_stat_reset introduces too.

It's possible to make a case for why database level aggregate statistics
are useful. I don't know that yours is compelling enough to justify
itself though, given that as you say this is an uncommon situation. In
something similar to your setup I've just accepted that I have to save
the snapshots into a table, will occasionally lose some mid-snapshot
data, and I use a combination of correction updates to that table and
SQL window functions to provide a useful view. It's a pain to do and I
end up having to customize this approach for seemingly every install,
but it can be made accurate enough for most uses.

The gigantic hole in this area I was most interested in for 9.4
development was the lack of write statistics. Not having
pg_stat_database.blks_write, pg_statio_user_tables.heap_blks_write or
pg_statio_user_indexes.idx_blks_write is a crippling operations/planning
limitation of the database. From that perspective, now isn't quite the
right time to add more aggregation on top of that data, since the
aggregation will make adding additional counters a bit more complicated.
It's not a big difference, but thinking in that direction doesn't help
your suggestion.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com