Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally.

From: Kevin Grittner <kgrittn(at)ymail(dot)com>
To: Gurjeet Singh <gurjeet(at)singh(dot)im>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Patch to send transaction commit/rollback stats to the stats collector unconditionally.
Date: 2014-06-29 15:58:04
Message-ID: 1404057484.64539.YahooMailNeo@web122305.mail.ne1.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I have reviewed this patch, and think we should do what the patch
is trying to do, but I don't think the submitted patch would
actually work.  I have attached a suggested patch which I think
would work.  Gurjeet, could you take a look at it?

My comments on prior discussion embedded below.

Gurjeet Singh <gurjeet(at)singh(dot)im> wrote:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> writes:
>>> I'm not sure I understand the point of this whole thing.
>>> Realistically, how many transactions are there that do not
>>> access any database tables?
>>
>> I think that something like "select * from pg_stat_activity"
>> might not bump any table-access counters, once the relevant
>> syscache entries had gotten loaded.  You could imagine that a
>> monitoring app would do a long series of those and nothing else
>> (whether any actually do or not is a different question).
>
> +1. This is exactly what I was doing; querying pg_stat_database
> from a psql session, to track transaction counters.

+1

A monitoring application might very well do exactly that.  Having
the history graphs show large spikes might waste someone's time
tracking down the cause.  (In fact, that seems to be exactly what
happened to Gurjeet, prompting this patch~.)  I've been in similar
situations -- enough to appreciate how user-unfriendly such
behavior is.

>> But still, it's a bit hard to credit that this patch is solving
>> any real problem.  Where's the user complaints about the
>> existing behavior?
>
> Consider my original email a user complaint, albeit with a patch
> attached. I was trying to ascertain TPS on a customer's instance
> when I noticed this behaviour; it violated POLA. Had I not
> decided to dig through the code to find the source of this
> behaviour, as a regular user I would've reported this anomaly as
> a bug, or maybe turned a blind eye to it labeling it as a quirky
> behaviour.

... or assumed that it was real transaction load during that
interval.  There have probably been many bewildered users who
thought they missed some activity storm from their own software,
and run around trying to identify the source -- never realizing it
was a measurement anomaly caused by surprising behavior of the
statistics gathering.

>> That is, even granting that anybody has a workload that acts
>> like this, why would they care ...
>
> To avoid surprises!
>
> This behaviour, at least in my case, causes Postgres to misreport
> the very thing I am trying to calculate.

Yup.  What's the point of reporting these numbers if we're going to
allow that kind of distortion?

>> and are they prepared to take a performance hit
>> to avoid the counter jump after the monitoring app exits?
>
> It doesn't look like we know how much of a  performance hit this
> will cause. I don't see any reasons cited in the code, either.
> Could this be a case of premature optimization. The commit log
> shows that, during the overhaul (commit 77947c5), this check was
> put in place to emulate what the previous code did; code before
> that commit reported stats, including transaction stats, only if
> there were any regular or shared table stats to report.

More than that, this only causes new activity for a connection
which, within a single PGSTAT_STAT_INTERVAL, ran queries -- but
none of the queries run accessed any tables.  That should be a
pretty small number of connections, since there only
special-purpose connections (e.g., monitoring) are likely to do
that.  And when it does happen, the new activity is just the same
as what any connection which does access a table would generate.
There's nothing special or more intense about this.  And an idle
connection won't generate any new activity.  This concern seem like
much ado about nothing (or about as close to nothing as you can
get).

That said, if you *did* actually have a workload where you were
using the database engine as a calculator, running such queries at
volume on a lot of connections, wouldn't you want the statistics to
represent that accurately?

> Besides, there's already a throttle built in using the
> PGSTAT_STAT_INTERVAL limit.

This is a key point; neither the original patch nor my proposed
alternative bypasses that throttling.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
report_xact_stats_unconditionally-v2.diff text/x-diff 1.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-29 16:20:02 Re: PostgreSQL for VAX on NetBSD/OpenBSD
Previous Message Tom Lane 2014-06-29 15:31:22 Re: PostgreSQL for VAX on NetBSD/OpenBSD