Re: Add min and max execute statement time in pg_stat_statement

From: Mitsumasa KONDO <kondo(dot)mitsumasa(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Rajeev rastogi <rajeev(dot)rastogi(at)huawei(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Add min and max execute statement time in pg_stat_statement
Date: 2014-01-31 13:07:59
Message-ID: CADupcHUKHUua4Tbqi6QcAjsDrns4V6b2cw3HAepgwvB1o4e1JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2014-01-31 Peter Geoghegan <pg(at)heroku(dot)com>

> On Thu, Jan 30, 2014 at 12:32 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> In reality, actual applications
> >> could hardly be further from the perfectly uniform distribution of
> >> distinct queries presented here.
> >
> > Yeah, I made the same point in different words. I think any realistic
> > comparison of this code to what we had before needs to measure a workload
> > with a more plausible query frequency distribution.
>
> Even though that distribution just doesn't square with anybody's
> reality, you can still increase the pg_stat_statements.max setting to
> 10k and the problem goes away at little cost (a lower setting is
> better, but a setting high enough to cache everything is best). But
> you're not going to have terribly much use for pg_stat_statements
> anyway....if you really do experience churn at that rate with 5,000
> possible entries, the module is ipso facto useless, and should be
> disabled.

I run extra test your and my patch with the pg_stat_statements.max
setting=10k
in other same setting and servers. They are faster than past results.

method | try1 | try2 | try3
--------------------------------------------
peter 3 | 6.769 | 6.784 | 6.785
method 5 | 6.770 | 6.774 | 6.761

I think that most significant overhead in pg_stat_statements is deleting
and inserting cost in hash table update, and not at LWLocks. If LWLock
is the most overhead, we can see the overhead -S pgbench, because it have
one select pet tern which are most Lock conflict case. But we can't see
such result.
I'm not sure about dynahash.c, but we can see hash conflict case in this
code.
IMHO, I think It might heavy because it have to run list search and compare
one
until not conflict it.

And past result shows that your patch's most weak point is that deleting
most old statement
and inserting new old statement cost is very high, as you know. It
accelerate to affect
update(delete and insert) cost in pg_stat_statements table. So you proposed
new setting
10k in default max value. But it is not essential solution, because it is
also good perfomance
for old pg_stat_statements. And when we set max=10K in your patch and want
to get most
used only 1000 queries in pg_stat_statements, we have to use order-by-query
with limit 1000.
Sort cost is relatively high, so monitoring query will be slow and high
cost. But old one is only set
pg_stat_statements.max=1000, and performance is not relatively bad. It will
be best settings for getting
most used 1000 queries infomation.

That' all my assumption.

Sorry for a few extra test, I had no time in my office today.
If we hope, I can run 1/N distribution pgbench test next week, I modify my
perl script little bit,
for creating multiple sql files with various sleep time.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message MauMau 2014-01-31 13:17:16 Re: [bug fix] pg_ctl fails with config-only directory
Previous Message Christian Kruse 2014-01-31 13:02:43 Re: Patch: show xid and xmin in pg_stat_activity and pg_stat_replication