Re: keeping a timestamp of the last stats reset (for a db, table and function)

Lists: pgsql-hackers
From: tv(at)fuzzy(dot)cz
To: pgsql-hackers(at)postgresql(dot)org
Subject: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-11 18:58:53
Message-ID: 7177d0cd40b82409024e7c495e9d6992.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi everyone,

I just wrote my first patch, and I need to know whether I missed something
or not. I haven't used C for a really long time, so sickbags on standby,
and if you notice something really stupid don't hesitate to call me an
asshole (according to Simon Phipps that proves we are a healthy open
community).

So what the patch does (or should do)? It tracks when were the stats for a
given object (database, table or function) reset for the last time. This
is useful when you do snapshots of the stats for analysis - when comparing
two snapshots, you have to know whether the stats were reset (in that case
the analysis usually yields random noise and automatic tools get confused
by this).

Tom Lane already recommended a workaround - firing the DBA who randomly
resets statistics, but that's not a good solution I think. First, you have
to be superior to the DBA to be able to fire him ;-) Second, everyone
makes a mistake from time to time. Third, when there are functions to
reset stats, it's nice to provide such info as it makes life much easier.

And there are cases when you don't reset the stats explicitly but the data
are actually gone - e.g. when after a restore or upgrade (OK, this is
solvable using pg_postmaster_start_time).

In short, I think it's a useful feature (I need it and I think there are
others). And I think it's not disruptive.

So what the patch actually does:

- extends PgStat_StatDBEntry, PgStat_StatTableEntry and
PgStat_StatFuncEntry with a new field (stat_reset_timestamp)

- adds functions to read current value from these fields
(pg_stat_get_db_last_stat_reset_time, pg_stat_get_last_stat_reset_time and
pg_stat_get_function_last_stat_reset_time)

- extends the system views with calls to these functions
(pg_stat_database, pg_stat_user_functions and pg_stat_all_tables)

The values are set like this:

- when a database is created, current timestamp is stored in
PgStat_StatDBEntry.stat_reset_timestamp
- by default all tables/functions inherit this timestamp
- when stats for a given table / function are reset, current timestamp is
stored in the stat_reset_timestamp (this happens in
pgstat_recv_resetsinglecounter)
- when stats for the whole database are reset, everything starts from
scratch (this happens in pgstat_recv_resetcounter)

What I'm not sure about:

- I really am not sure about the changes made in pg_proc.h. I'm not sure
how to assign OIDs to the new functions (I've simply chosen values that
are were not used in this file), and I'm not sure about the other columns
(I've copied and modified another function with the same parameter/return
types)

- I'm not sure if there are any other ways how the stat entries can be
created. I've found two ways - directly (when asked for the stats e.g.
from pgstat_get_tab_entry), and indirectly (when processing stats from a
backend e.g. in pgstat_recv_tabstat).

regards
Tomas

Attachment Content-Type Size
stat-reset-timestamp.diff text/plain 13.6 KB

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-11 21:26:13
Message-ID: AANLkTi=-dUhvgj5+u_oNW11c_nTv65YO9m_=DwLhvHWf@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello

you have to respect pg coding style:

a) not too long lines
b) not C++ line comments

Zdar

Pavel

2010/12/11 <tv(at)fuzzy(dot)cz>:
> Hi everyone,
>
> I just wrote my first patch, and I need to know whether I missed something
> or not. I haven't used C for a really long time, so sickbags on standby,
> and if you notice something really stupid don't hesitate to call me an
> asshole (according to Simon Phipps that proves we are a healthy open
> community).
>
> So what the patch does (or should do)? It tracks when were the stats for a
> given object (database, table or function) reset for the last time. This
> is useful when you do snapshots of the stats for analysis - when comparing
> two snapshots, you have to know whether the stats were reset (in that case
> the analysis usually yields random noise and automatic tools get confused
> by this).
>
> Tom Lane already recommended a workaround - firing the DBA who randomly
> resets statistics, but that's not a good solution I think. First, you have
> to be superior to the DBA to be able to fire him ;-) Second, everyone
> makes a mistake from time to time. Third, when there are functions to
> reset stats, it's nice to provide such info as it makes life much easier.
>
> And there are cases when you don't reset the stats explicitly but the data
> are actually gone - e.g. when after a restore or upgrade (OK, this is
> solvable using pg_postmaster_start_time).
>
> In short, I think it's a useful feature (I need it and I think there are
> others). And I think it's not disruptive.
>
> So what the patch actually does:
>
> - extends PgStat_StatDBEntry, PgStat_StatTableEntry and
> PgStat_StatFuncEntry with a new field (stat_reset_timestamp)
>
> - adds functions to read current value from these fields
> (pg_stat_get_db_last_stat_reset_time, pg_stat_get_last_stat_reset_time and
> pg_stat_get_function_last_stat_reset_time)
>
> - extends the system views with calls to these functions
> (pg_stat_database, pg_stat_user_functions and pg_stat_all_tables)
>
> The values are set like this:
>
> - when a database is created, current timestamp is stored in
> PgStat_StatDBEntry.stat_reset_timestamp
> - by default all tables/functions inherit this timestamp
> - when stats for a given table / function are reset, current timestamp is
> stored in the stat_reset_timestamp (this happens in
> pgstat_recv_resetsinglecounter)
> - when stats for the whole database are reset, everything starts from
> scratch (this happens in pgstat_recv_resetcounter)
>
> What I'm not sure about:
>
> - I really am not sure about the changes made in pg_proc.h. I'm not sure
> how to assign OIDs to the new functions (I've simply chosen values that
> are were not used in this file), and I'm not sure about the other columns
> (I've copied and modified another function with the same parameter/return
> types)
>
> - I'm not sure if there are any other ways how the stat entries can be
> created. I've found two ways - directly (when asked for the stats e.g.
> from pgstat_get_tab_entry), and indirectly (when processing stats from a
> backend e.g. in pgstat_recv_tabstat).
>
> regards
> Tomas
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


From: tv(at)fuzzy(dot)cz
To: "Pavel Stehule" <pavel(dot)stehule(at)gmail(dot)com>
Cc: tv(at)fuzzy(dot)cz, pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-11 21:40:57
Message-ID: 35c703482ec3ef9f6c425b9b523ceec6.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hello
>
> you have to respect pg coding style:
>
> a) not too long lines
> b) not C++ line comments

OK, thanks for the notice. I've fixed those two problems.

regards
Tomas

Attachment Content-Type Size
stat-reset-timestamp.diff text/plain 13.7 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-12 02:47:49
Message-ID: AANLkTikZ7FYTnDG-Mbr6DW_roiFCzt0EBWFBKvQk7UNu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Dec 11, 2010 at 4:40 PM, <tv(at)fuzzy(dot)cz> wrote:
>> Hello
>>
>> you have to respect pg coding style:
>>
>> a) not too long lines
>> b) not C++ line comments
>
> OK, thanks for the notice. I've fixed those two problems.

Please add this to the currently-open commitfest.

https://commitfest.postgresql.org/action/commitfest_view/open

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 01:23:09
Message-ID: 4D0D5E7D.6090703@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 12.12.2010 03:47, Robert Haas napsal(a):
> On Sat, Dec 11, 2010 at 4:40 PM, <tv(at)fuzzy(dot)cz> wrote:
>>> Hello
>>>
>>> you have to respect pg coding style:
>>>
>>> a) not too long lines
>>> b) not C++ line comments
>>
>> OK, thanks for the notice. I've fixed those two problems.
>
> Please add this to the currently-open commitfest.
>
> https://commitfest.postgresql.org/action/commitfest_view/open

I've done several small changes to the patch, namely

- added docs for the functions (in SGML)
- added the same thing for background writer

So I think now it's 'complete' and I'll add it to the commit fest in a
few minutes.

regards
Tomas

Attachment Content-Type Size
stat-reset-timestamp.diff text/plain 19.9 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 02:51:45
Message-ID: 7626.1292727105@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> I've done several small changes to the patch, namely

> - added docs for the functions (in SGML)
> - added the same thing for background writer

> So I think now it's 'complete' and I'll add it to the commit fest in a
> few minutes.

Please split this into separate patches for database-level and
table-level tracking, because I think it's highly likely that the latter
will get rejected. We have had multiple complaints already about the
size of the stats file for databases with many tables. I don't believe
that it's worth bloating the per-table entries even further with this
information. Tracking reset time it per-database doesn't seem like a
problem though.

regards, tom lane


From: tv(at)fuzzy(dot)cz
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 11:36:20
Message-ID: 49d436b8802f7d507662014720a83055.squirrel@sq.gransy.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> I've done several small changes to the patch, namely
>
>> - added docs for the functions (in SGML)
>> - added the same thing for background writer
>
>> So I think now it's 'complete' and I'll add it to the commit fest in a
>> few minutes.
>
> Please split this into separate patches for database-level and
> table-level tracking, because I think it's highly likely that the latter
> will get rejected. We have had multiple complaints already about the
> size of the stats file for databases with many tables. I don't believe
> that it's worth bloating the per-table entries even further with this
> information. Tracking reset time it per-database doesn't seem like a
> problem though.

Sorry, it's not possible to split this patch into two, and then choose
just one or both pieces. I could split it, but using just the
'per-database' part would not make sense.

The problems is that right now, when stat's are reset for a single table
or function, the timestamp is set just for that one item, not for the
whole database. So just a blind split of the patch (and using just the
per-database part) would mean the info about tables/functions is lost.

I can prepare an alternative patch, using just per-database timestamps. So
even a reset for a single table/function would set the timestamp for the
whole database.

Tomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: tv(at)fuzzy(dot)cz
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 16:26:11
Message-ID: 24812.1292775971@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tv(at)fuzzy(dot)cz writes:
> I can prepare an alternative patch, using just per-database timestamps. So
> even a reset for a single table/function would set the timestamp for the
> whole database.

That seems like quite a bizarre definition. What I was envisioning was
that we'd track only the time of the last whole-database stats reset.

regards, tom lane


From: Jim Nasby <jim(at)nasby(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 18:13:49
Message-ID: 1F8EECEB-9C01-449A-8BBF-BFF6D553B74F@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 18, 2010, at 8:51 PM, Tom Lane wrote:
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> I've done several small changes to the patch, namely
>
>> - added docs for the functions (in SGML)
>> - added the same thing for background writer
>
>> So I think now it's 'complete' and I'll add it to the commit fest in a
>> few minutes.
>
> Please split this into separate patches for database-level and
> table-level tracking, because I think it's highly likely that the latter
> will get rejected. We have had multiple complaints already about the
> size of the stats file for databases with many tables. I don't believe
> that it's worth bloating the per-table entries even further with this
> information. Tracking reset time it per-database doesn't seem like a
> problem though.

Is there a reason this info needs to be tracked in the stats table? I know it's the most obvious place, but since we're worried about the size of it, what about tracking it in pg_class or somewhere else?
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 19:17:40
Message-ID: 4D0E5A54.3060302@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 19.12.2010 17:26, Tom Lane napsal(a):
> That seems like quite a bizarre definition. What I was envisioning was
> that we'd track only the time of the last whole-database stats reset.

Well, but that does not quite work. I need is to know whether the
snapshot is 'consistent' i.e. whether I can compare it to the previous
snapshot and get meaningful results (so that I can perform some analysis
on the difference).

And by 'consistent' I mean that none of the counters was reset since the
previous snapshot. The most obvious and most flexible way to do this is
keeping track of the last reset for each of the counters, which is
exactly what I've done in the patch.

The other possibility I've offered in my previous post is to keep just a
per-database timestamp, and set it whenever some stats in the database
are reset (table/index/function counters or all stats). It definitely is
not as flexible as the previous solution, but it gives me at least some
info that something was reset. But I'd have to throw away the whole
snapshot - in the previous case I could do analysis at least on the
counters that were not reset.

The solution you've offered - keeping only the per-database timestamp,
and not updating it when a table/index/function stats are reset, that's
completely useless for me. It simply does not provide an answer to the
question "Is this snapshot consistent?"

Tomas


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 19:27:00
Message-ID: 4D0E5C84.1010200@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 19.12.2010 19:13, Jim Nasby napsal(a):
> Is there a reason this info needs to be tracked in the stats table?
> I know it's the most obvious place, but since we're worried about the
> size of it, what about tracking it in pg_class or somewhere else?

I guess this is the best place for this kind of info, as it is directly
related to the stats items. Well, maybe we could place it to pg_class,
but is it a wise idea?

Tomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 19:28:34
Message-ID: 4630.1292786914@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> Dne 19.12.2010 17:26, Tom Lane napsal(a):
>> That seems like quite a bizarre definition. What I was envisioning was
>> that we'd track only the time of the last whole-database stats reset.

> Well, but that does not quite work. I need is to know whether the
> snapshot is 'consistent' i.e. whether I can compare it to the previous
> snapshot and get meaningful results (so that I can perform some analysis
> on the difference).

Oh, I see. Yeah, if that's what you want it for then partial resets
have to change the timestamp too. I guess if we are careful to document
it properly, this won't be too horrid.

regards, tom lane


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 21:54:37
Message-ID: 4D0E7F1D.9000007@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 19.12.2010 20:28, Tom Lane napsal(a):
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> Dne 19.12.2010 17:26, Tom Lane napsal(a):
>>> That seems like quite a bizarre definition. What I was envisioning was
>>> that we'd track only the time of the last whole-database stats reset.
>
>> Well, but that does not quite work. I need is to know whether the
>> snapshot is 'consistent' i.e. whether I can compare it to the previous
>> snapshot and get meaningful results (so that I can perform some analysis
>> on the difference).
>
> Oh, I see. Yeah, if that's what you want it for then partial resets
> have to change the timestamp too. I guess if we are careful to document
> it properly, this won't be too horrid.
>
> regards, tom lane
>

Well, maybe. I'd prefer the timestamp for each item, as that allows me
to determine which stats were not reset and analyze at least those data.

Plus I won't have time to write the other patch for at least a week, so
let's see whether there are serious objections agains the current patch.

I've never had problems with the pgstat.dat file, but I understand
others might, although this adds "just 8 bytes" to each item.

regards
Tomas


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 22:58:11
Message-ID: 17782.1292799491@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
> Plus I won't have time to write the other patch for at least a week, so
> let's see whether there are serious objections agains the current patch.

If you think this objection is not serious, you're mistaken.

> I've never had problems with the pgstat.dat file, but I understand
> others might, although this adds "just 8 bytes" to each item.

That is not the number of interest. The number of interest is that it's
8 bytes added onto a struct that currently contains 11 of 'em; in other
words a 9% increase in the size of the stats file, and consequently
about a 9% increase in the cost of updating it.

What is bothering me about that is this: how many of our users ever
reset the stats at all? Of those, how many reset the stats for just
some tables and not all of them? And of those, how many care about
having the database remember when they did it, at a per-table level?
I think the answer to each of those questions is "not many", and so
the fraction of our users who will get a benefit from that added
overhead is epsilon cubed. But approximately 100% of our users will
be paying the overhead.

That just doesn't look like a good tradeoff from here.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-19 23:03:30
Message-ID: 17858.1292799810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> That is not the number of interest. The number of interest is that it's
> 8 bytes added onto a struct that currently contains 11 of 'em; in other
> words a 9% increase in the size of the stats file, and consequently
> about a 9% increase in the cost of updating it.

Wups, sorry, I was looking at the wrong struct. It's actually an
addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
That's still an awful lot in comparison to the prospective usefulness
of the information.

regards, tom lane


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-20 00:46:06
Message-ID: 4D0EA74E.6060701@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 19.12.2010 23:58, Tom Lane napsal(a):
> Tomas Vondra <tv(at)fuzzy(dot)cz> writes:
>> > Plus I won't have time to write the other patch for at least a week, so
>> > let's see whether there are serious objections agains the current patch.
> If you think this objection is not serious, you're mistaken.

I know there were problems with pgstats.stat and I/O (for example this
thread discussing an I/O storm
http://archives.postgresql.org/pgsql-hackers/2008-01/msg01095.php).

But I thought those issues were already resolved and I have not noticed
any such issue recently. Am I missing something?

> What is bothering me about that is this: how many of our users ever
> reset the stats at all? Of those, how many reset the stats for just
> some tables and not all of them? And of those, how many care about
> having the database remember when they did it, at a per-table level?
> I think the answer to each of those questions is "not many", and so
> the fraction of our users who will get a benefit from that added
> overhead is epsilon cubed. But approximately 100% of our users will
> be paying the overhead.

Sure, I understand that only a fraction of the users will use the patch
I've proposed. That's why I'm saying that the per-database timestamp
would be sufficient (although I'd prefer the per-record timestamp).

regards
Tomas


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-23 17:47:53
Message-ID: 4D138B49.6010108@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 20.12.2010 00:03, Tom Lane napsal(a):
> I wrote:
>> That is not the number of interest. The number of interest is that it's
>> 8 bytes added onto a struct that currently contains 11 of 'em; in other
>> words a 9% increase in the size of the stats file, and consequently
>> about a 9% increase in the cost of updating it.
>
> Wups, sorry, I was looking at the wrong struct. It's actually an
> addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
> That's still an awful lot in comparison to the prospective usefulness
> of the information.
>
> regards, tom lane
>

OK, so here goes the simplified patch - it tracks one reset timestamp
for a background writer and for each database.

regards
Tomas

PS: I've noticed Magnus posted a patch to track recovery conflicts,
adding a new view pg_stat_database_conflicts. I have not checked it yet
but it should not influence this patch.


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-23 19:09:13
Message-ID: AANLkTikXWxsfqOGHPqWann=kxWAmnREEUgnYP_P3Z-mp@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/12/23 Tomas Vondra <tv(at)fuzzy(dot)cz>:
> Dne 20.12.2010 00:03, Tom Lane napsal(a):
>> I wrote:
>>> That is not the number of interest.  The number of interest is that it's
>>> 8 bytes added onto a struct that currently contains 11 of 'em; in other
>>> words a 9% increase in the size of the stats file, and consequently
>>> about a 9% increase in the cost of updating it.
>>
>> Wups, sorry, I was looking at the wrong struct.  It's actually an
>> addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
>> That's still an awful lot in comparison to the prospective usefulness
>> of the information.
>>
>>                       regards, tom lane
>>
>
> OK, so here goes the simplified patch - it tracks one reset timestamp
> for a background writer and for each database.

I think you forgot the attachment.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2010-12-23 19:41:14
Message-ID: 4D13A5DA.2040408@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 23.12.2010 20:09, Robert Haas napsal(a):
> 2010/12/23 Tomas Vondra <tv(at)fuzzy(dot)cz>:
>> Dne 20.12.2010 00:03, Tom Lane napsal(a):
>>> I wrote:
>>>> That is not the number of interest. The number of interest is that it's
>>>> 8 bytes added onto a struct that currently contains 11 of 'em; in other
>>>> words a 9% increase in the size of the stats file, and consequently
>>>> about a 9% increase in the cost of updating it.
>>>
>>> Wups, sorry, I was looking at the wrong struct. It's actually an
>>> addition of 1 doubleword to a struct of 21 of 'em, or about 5%.
>>> That's still an awful lot in comparison to the prospective usefulness
>>> of the information.
>>>
>>> regards, tom lane
>>>
>>
>> OK, so here goes the simplified patch - it tracks one reset timestamp
>> for a background writer and for each database.
>
> I think you forgot the attachment.

Yes, I did. Thanks!

Tomas

Attachment Content-Type Size
stat-reset-timestamp-2.diff text/plain 0 bytes

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: tv(at)fuzzy(dot)cz
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-01-04 12:05:16
Message-ID: 4D230CFC.2030205@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

tv(at)fuzzy(dot)cz wrote:
> - I really am not sure about the changes made in pg_proc.h. I'm not sure
> how to assign OIDs to the new functions (I've simply chosen values that
> are were not used in this file), and I'm not sure about the other columns
> (I've copied and modified another function with the same parameter/return
> types)
>

The description of the columns is at the beginning of pg_proc.h, the
part that begins with CATALOG(pg_proc,1255)... The descriptions of some
of the first 11 fields are mostly straighforward. The first fun part is
that how may times the information expected in the second "VARIABLE
LENGTH FIELDS" section repeats varies based on the parameters listed.
The other thing that's usually confusing is that the types for the
values are all expressed as type OID numbers. For example, if you see
"25", that's the OID of the "text" type. You can see the whole list with:

select oid,typname from pg_type;

And if you go back to the file with that list in handle, a lot more of
it should make sense. If you see a multiple parameter type list like
"23 21", that's a function whose input values are of types (int4,int2),

As for getting a new OID, if you go into src/include/catalog/ and run
the unused_oids script, it will give you some help in figuring which
have been used and which not. It's not worth getting too stressed about
the number you choose in the patch submission, because commits between
when you got your free number and when your patch is considered for
commit can make your choice worthless anyway. There's a process
referred to as "catversion bump", where the database catalog version get
updated to reflect things like new pg_proc information, that committers
take care of as one of the last adjustments before final commit. Doing
a final correction to the OID choice is a part every committer knows to
look at.

I wrote a talk that covered some additional trivia in this area, as well
as other things people tend to get confused about in the source code,
that you can find at
http://www.pgcon.org/2010/schedule/attachments/142_HackingWithUDFs.pdf ;
that might be helpful for some other things you might wonder about
eventually.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-01-04 12:22:33
Message-ID: 4D231109.6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra wrote:
> OK, so here goes the simplified patch - it tracks one reset timestamp
> for a background writer and for each database.
>

Adding timestamps like this was something I wanted to do after adding
pg_stat_reset_shared to 9.0, so since you've beaten me to it I'll review
your patch and make sure it all works the way I was hoping instead. The
whole per-table idea was never going to fly given how few people touch
this area at all, but the way you're tracking things now seems reasonable.

When you post an updated version of a patch that's already being tracked
on the CommitFest app, please try to remember to add that update to the
tracker there. I just did that for this 12/23 update so that's covered
already.

Next problem is that the preferred method for submitted patches uses
context diffs. See http://wiki.postgresql.org/wiki/Working_with_Git for
some information about the somewhat annoying way you have to setup git
to generate those. Don't worry about that for this round though. I
don't care about the diff formatting given the code involved, but it's
something you should sort out if you do another update.

> PS: I've noticed Magnus posted a patch to track recovery conflicts,
> adding a new view pg_stat_database_conflicts. I have not checked it yet
> but it should not influence this patch.
>

I need to do some testing of that anyway, so I'll take a look at any
potential clash as part of my review. I want to check how this
interacts with track_functions resets too.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-01-30 22:22:50
Message-ID: AANLkTimrp8FjHoTCZygxZzriZXxdOf+tqdy3gxRpc6Lc@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Dec 23, 2010 at 2:41 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>> OK, so here goes the simplified patch - it tracks one reset timestamp
>>> for a background writer and for each database.
>>
>> I think you forgot the attachment.
>
> Yes, I did. Thanks!

This patch no longer applies. Please update.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-03 22:06:10
Message-ID: 4D4B26D2.6040807@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 30.1.2011 23:22, Robert Haas napsal(a):
> On Thu, Dec 23, 2010 at 2:41 PM, Tomas Vondra <tv(at)fuzzy(dot)cz> wrote:
>>>> OK, so here goes the simplified patch - it tracks one reset timestamp
>>>> for a background writer and for each database.
>>>
>>> I think you forgot the attachment.
>>
>> Yes, I did. Thanks!
>
> This patch no longer applies. Please update.

I've updated the patch - rebased to the current HEAD.

regards
Tomas

Attachment Content-Type Size
stat-time-reset-3.diff text/plain 10.3 KB

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-04 02:37:54
Message-ID: 4D4B6682.2080406@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thinking I should start with why I think this patch is neat...most of
the servers I deal with are up 24x7 minus small amounts of downtime,
presuming everyone does their job right that is. In that environment,
having a starting timestamp for when the last stats reset happened lets
you quickly compute some figures in per-second terms that are pretty
close to actual average activity on the server. Some examples of how I
would use this:

psql -c "
SELECT
CAST(buffers_backend * block_size AS numeric) / seconds_uptime /
(1024*1024)
AS backend_mb_per_sec
FROM
(SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime,
(SELECT cast(current_setting('block_size') AS int8)) AS block_size
FROM pg_stat_bgwriter) AS raw WHERE raw.seconds_uptime > 0
"
backend_mb_per_sec
--------------------
4.27150807681618

psql -c "
SELECT
datname,CAST(xact_commit AS numeric) / seconds_uptime
AS commits_per_sec
FROM
(SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime
FROM pg_stat_database) AS raw WHERE raw.seconds_uptime > 0
"

datname | commits_per_sec
-----------+--------------------
template1 | 0.0338722604313051
postgres | 0.0363144438470267
gsmith | 0.0820573653236174
pgbench | 0.059147072347085

Now I reset, put some load on the system and check the same stats
afterward; watch how close these match up:

$ psql -d pgbench -c "select pg_stat_reset()"
$ pgbench -j 4 -c 32 -T 30 pgbench
transaction type: TPC-B (sort of)
scaling factor: 100
query mode: simple
number of clients: 32
number of threads: 4
duration: 30 s
number of transactions actually processed: 6604
tps = 207.185627 (including connections establishing)
tps = 207.315043 (excluding connections establishing)

datname | commits_per_sec
-----------+--------------------
pgbench | 183.906308135572

Both these examples work as I expected, and some playing around with the
patch didn't find any serious problems with the logic it implements.
One issue though, an oversight I think can be improved upon; watch what
happens when I create a new database:

$ createdb blank
$ psql -c "select datname,stats_reset from pg_stat_database where
datname='blank'"
datname | stats_reset
---------+-------------
blank |

That's not really what I would hope for here. One major sort of
situation I'd like this feature to work against is the one where someone
asks for help but has never touched their database stats before, which
is exactly what I'm simulating here. In this case that person would be
out of luck, the opposite of the experience I'd like a newbie to have at
this point.

The logic Tomas put in here to initialize things in the face of never
having a stat reset is reasonable. But I think to really be complete,
this needs to hook database creation and make sure the value gets
initialized with the current timestamp, not just be blank. Do that, and
I think this will make a nice incremental feature on top of the existing
stats structure.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-04 18:06:27
Message-ID: 4D4C4023.60206@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dne 4.2.2011 03:37, Greg Smith napsal(a):
> Thinking I should start with why I think this patch is neat...most of
> the servers I deal with are up 24x7 minus small amounts of downtime,
> presuming everyone does their job right that is. In that environment,
> having a starting timestamp for when the last stats reset happened lets
> you quickly compute some figures in per-second terms that are pretty
> close to actual average activity on the server. Some examples of how I
> would use this:
>
> psql -c "
> SELECT
> CAST(buffers_backend * block_size AS numeric) / seconds_uptime /
> (1024*1024)
> AS backend_mb_per_sec
> FROM
> (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime,
> (SELECT cast(current_setting('block_size') AS int8)) AS block_size
> FROM pg_stat_bgwriter) AS raw WHERE raw.seconds_uptime > 0
> "
> backend_mb_per_sec
> --------------------
> 4.27150807681618
>
> psql -c "
> SELECT
> datname,CAST(xact_commit AS numeric) / seconds_uptime
> AS commits_per_sec
> FROM
> (SELECT *,extract(epoch from now()-stats_reset) AS seconds_uptime
> FROM pg_stat_database) AS raw WHERE raw.seconds_uptime > 0
> "
>
> datname | commits_per_sec -----------+--------------------
> template1 | 0.0338722604313051
> postgres | 0.0363144438470267
> gsmith | 0.0820573653236174
> pgbench | 0.059147072347085
>
> Now I reset, put some load on the system and check the same stats
> afterward; watch how close these match up:
>
> $ psql -d pgbench -c "select pg_stat_reset()"
> $ pgbench -j 4 -c 32 -T 30 pgbench
> transaction type: TPC-B (sort of)
> scaling factor: 100
> query mode: simple
> number of clients: 32
> number of threads: 4
> duration: 30 s
> number of transactions actually processed: 6604
> tps = 207.185627 (including connections establishing)
> tps = 207.315043 (excluding connections establishing)
>
> datname | commits_per_sec -----------+--------------------
> pgbench | 183.906308135572
>
> Both these examples work as I expected, and some playing around with the
> patch didn't find any serious problems with the logic it implements.
> One issue though, an oversight I think can be improved upon; watch what
> happens when I create a new database:
>
> $ createdb blank
> $ psql -c "select datname,stats_reset from pg_stat_database where
> datname='blank'"
> datname | stats_reset
> ---------+-------------
> blank |
>
> That's not really what I would hope for here. One major sort of
> situation I'd like this feature to work against is the one where someone
> asks for help but has never touched their database stats before, which
> is exactly what I'm simulating here. In this case that person would be
> out of luck, the opposite of the experience I'd like a newbie to have at
> this point.

Are you sure about it? Because when I create a database, the field is
NULL - that's true. But once I connect to the database, the stats are
updated and the field is set (thanks to the logic in pgstat.c).

In that case 'stat_reset=NULL' would mean 'no-one ever touched this db'
which seems quite reasonable to me.

========================================================================

$ createdb testdb1
$ createdb testdb2

$ psql -d testdb1 -c "select stats_reset from pg_stat_database where
datname = 'testdb2'"
stats_reset
-------------

(1 row)

$ psql -d testdb2 -c "\q"

$ psql -d testdb1 -c "select stats_reset from pg_stat_database where
datname = 'testdb2'"
stats_reset
-------------------------------
2011-02-04 19:03:23.938929+01
(1 row)

========================================================================

But maybe I've missed something and it does not work the way I think it
does.

regards
Tomas


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tomas Vondra <tv(at)fuzzy(dot)cz>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-06 07:17:27
Message-ID: 4D4E4B07.8060102@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tomas Vondra wrote:
> Because when I create a database, the field is
> NULL - that's true. But once I connect to the database, the stats are
> updated and the field is set (thanks to the logic in pgstat.c).
>

OK--so it does what I was hoping for, I just didn't test it the right
way. Let's call that a documentation issue and move on.

Attached is an updated patch that fixes the docs and some other random
bits. Looks ready for committer to me now. Make sure to adjust
PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the
new functions.

Below is what changed since the last posted version, mainly as feedback
for Tomas:

-Explained more clearly that pg_stat_reset and
pg_stat_reset_single_counters will both touch the database reset time,
and that it's initialized upon first connection to the database.

-Added the reset time to the list of fields in pg_stat_database and
pg_stat_bgwriter.

-Fixed some tab/whitespace issues. It looks like you had tab stops set
at 8 characters during some points when you were editing non-code
files. Also, there were a couple of spot where you used a tab while
text in the area used spaces. You can normally see both types of errors
if you read a patch, they showed up as misaligned things in the context
diff.

-Removed some extra blank lines that didn't fit the style of the
surrounding code.

Basically, all the formatting bits I'm nitpicking about I found just by
reading the patch itself; they all stuck right out. I'd recommend a
pass of that before submitting things if you want to try and avoid those
in the future.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Attachment Content-Type Size
stat-time-reset-4.diff text/x-patch 12.3 KB

From: Tomas Vondra <tv(at)fuzzy(dot)cz>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-06 16:50:06
Message-ID: 4D4ED13E.106@fuzzy.cz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 6.2.2011 08:17, Greg Smith wrote:
> Below is what changed since the last posted version, mainly as feedback
> for Tomas:
>
> -Explained more clearly that pg_stat_reset and
> pg_stat_reset_single_counters will both touch the database reset time,
> and that it's initialized upon first connection to the database.
>
> -Added the reset time to the list of fields in pg_stat_database and
> pg_stat_bgwriter.
>
> -Fixed some tab/whitespace issues. It looks like you had tab stops set
> at 8 characters during some points when you were editing non-code
> files. Also, there were a couple of spot where you used a tab while
> text in the area used spaces. You can normally see both types of errors
> if you read a patch, they showed up as misaligned things in the context
> diff.
>
> -Removed some extra blank lines that didn't fit the style of the
> surrounding code.
>
> Basically, all the formatting bits I'm nitpicking about I found just by
> reading the patch itself; they all stuck right out. I'd recommend a
> pass of that before submitting things if you want to try and avoid those
> in the future.

Thanks for fixing the issues and for the review in general. I have to
tune the IDE a bit to produce the right format and be a bit more careful
when creating the patches.

regards
Tomas


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: keeping a timestamp of the last stats reset (for a db, table and function)
Date: 2011-02-10 14:15:18
Message-ID: AANLkTina_BPL4n1A1hVryZPMA3RbXsb0ndrj3txTUR2V@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Feb 6, 2011 at 08:17, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> Tomas Vondra wrote:
>>
>> Because when I create a database, the field is
>> NULL - that's true. But once I connect to the database, the stats are
>> updated and the field is set (thanks to the logic in pgstat.c).
>>
>
> OK--so it does what I was hoping for, I just didn't test it the right way.
>  Let's call that a documentation issue and move on.
>
> Attached is an updated patch that fixes the docs and some other random bits.
>  Looks ready for committer to me now.  Make sure to adjust
> PGSTAT_FILE_FORMAT_ID, do a cat version bump, and set final OIDs for the new
> functions.

... and the regression tests expected output.

> -Fixed some tab/whitespace issues.  It looks like you had tab stops set at 8

I added a few more whitespace fixes, mainly in the "whitespace at end
of line" category (git diff shows it pretty clearly)

With that, applied. Thanks!

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/