Quick Links

Re: Improvement of pg_stat_statement usage about buffer hit ratio

Lists:	pgsql-hackers

From:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-10-18 08:05:22
Message-ID:	5260EBC2.9000808@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

I submit improvement of pg_stat_statement usage patch in CF3.

In pg_stat_statement, I think buffer hit ratio is very important value. However,
it is difficult to calculate it, and it need complicated SQL. This patch makes it
more simple usage and documentation.

> -bench=# SELECT query, calls, total_time, rows, 100.0 * shared_blks_hit /
> - nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent
> +bench=# SELECT query, calls, total_time, rows, shared_blks_hit_percent
> FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;

It will be very simple:-)

This patch conflicts pg_stat_statement_min_max_exectime patch which I submitted,
and pg_stat_statement_min_max_exectime patch also adds new columns which are
min_time and max_time. So I'd like to change it in this opportunity.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachment	Content-Type	Size
pg_stat_statements-blkhit.patch	text/x-diff	11.5 KB

From:	Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
To:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-18 11:16:36
Message-ID:	8977CB36860C5843884E0A18D8747B0372BED351@szxeml558-mbs.china.huawei.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 18 October 2013 13:35 KONDO Mitsumasa wrote:
> Hi,
>
> I submit improvement of pg_stat_statement usage patch in CF3.
>
> In pg_stat_statement, I think buffer hit ratio is very important value.
> However, it is difficult to calculate it, and it need complicated SQL.
> This patch makes it more simple usage and documentation.
>
> > -bench=# SELECT query, calls, total_time, rows, 100.0 *
> shared_blks_hit /
> > - nullif(shared_blks_hit + shared_blks_read, 0) AS
> hit_percent
> > +bench=# SELECT query, calls, total_time, rows,
> > +shared_blks_hit_percent
> > FROM pg_stat_statements ORDER BY total_time DESC LIMIT 5;
>
> It will be very simple:-)
>
> This patch conflicts pg_stat_statement_min_max_exectime patch which I
> submitted, and pg_stat_statement_min_max_exectime patch also adds new
> columns which are min_time and max_time. So I'd like to change it in
> this opportunity.

This patch adds another column "shared_blks_hit_percent" to pg_stat_statements view
Which is very beneficial to the user to know how much percentage of blks are hit.

All changes are fine and working as described. Marked as ready for committer.

Regards,
Hari babu

From:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To:	Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-18 11:36:43
Message-ID:	5289FBCB.5000601@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(2013/11/18 20:16), Haribabu kommi wrote:
> On 18 October 2013 13:35 KONDO Mitsumasa wrote:
>> This patch conflicts pg_stat_statement_min_max_exectime patch which I
>> submitted, and pg_stat_statement_min_max_exectime patch also adds new
>> columns which are min_time and max_time. So I'd like to change it in
>> this opportunity.
>
> This patch adds another column "shared_blks_hit_percent" to pg_stat_statements view
> Which is very beneficial to the user to know how much percentage of blks are hit.
>
> All changes are fine and working as described. Marked as ready for committer.
Thank you for your reviewing!

However, I'd like to add average time in each statement, too. Attached patch is
my latest one. Adding shared_blks_hit_percent and ave_time. This is the adding
main code.
> + total_time / calls::float AS avg_time,

If this patch and min/max and stddev patch will be commited, we can see more
detail and simple information in pg_stat_statements, by light-weight coding.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

Attachment	Content-Type	Size
pg_stat_statements-blkhit_v1.patch	text/x-diff	12.5 KB

From:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-18 18:49:30
Message-ID:	CAHGQGwFNa7h62nDUOZenO+kam4ucxF=ygS1jZMDLeozdjqY4DQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2013 at 8:36 PM, KONDO Mitsumasa
<kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> (2013/11/18 20:16), Haribabu kommi wrote:
>>
>> On 18 October 2013 13:35 KONDO Mitsumasa wrote:
>>>
>>> This patch conflicts pg_stat_statement_min_max_exectime patch which I
>>> submitted, and pg_stat_statement_min_max_exectime patch also adds new
>>> columns which are min_time and max_time. So I'd like to change it in
>>> this opportunity.
>>
>>
>> This patch adds another column "shared_blks_hit_percent" to
>> pg_stat_statements view
>> Which is very beneficial to the user to know how much percentage of blks
>> are hit.

The same idea was proposed before but not committed because
Itagaki thought that pg_stat_statements view should report only raw values.
Please read the following thread. I have the same feeling with him.
Anyway we should listen to more opinions from other people, though.
http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp

Regards,

--
Fujii Masao

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-18 18:56:56
Message-ID:	CAM3SWZS9470xGoF=wbqSGsmT55A9AniWBU9o75QrXoA8589-Fg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> The same idea was proposed before but not committed because
> Itagaki thought that pg_stat_statements view should report only raw values.
> Please read the following thread. I have the same feeling with him.
> Anyway we should listen to more opinions from other people, though.
> http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp

+1 from me. I think that a higher level tool needs to be built on
pg_stat_statements to make things easy for those that want a slick,
pre-packaged solution. As I said on the min/max thread, if we're not
doing enough to help people who would like to build such a tool, we
should discuss how we can do better.

--
Peter Geoghegan

From:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 02:12:41
Message-ID:	528AC919.9040301@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(2013/11/19 3:56), Peter Geoghegan wrote:
> On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> The same idea was proposed before but not committed because
>> Itagaki thought that pg_stat_statements view should report only raw values.
>> Please read the following thread. I have the same feeling with him.
>> Anyway we should listen to more opinions from other people, though.
>> http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp
I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values
be just simple. However, were his changes just simple? I cannot understand his
aesthetics sense and also you, too:-(

PG8.4 before:
> 012 CREATE FUNCTION pg_stat_statements(
> 013 OUT userid oid,
> 014 OUT dbid oid,
> 015 OUT query text,
> 016 OUT calls int8,
> 017 OUT total_time float8,
> 018 OUT rows int8
> 019 )

PG9.0 after:
> 012 CREATE FUNCTION pg_stat_statements(
> 013 OUT userid oid,
> 014 OUT dbid oid,
> 015 OUT query text,
> 016 OUT calls int8,
> 017 OUT total_time float8,
> 018 OUT rows int8,
> 019 OUT shared_blks_hit int8,
> 020 OUT shared_blks_read int8,
> 021 OUT shared_blks_written int8,
> 022 OUT local_blks_hit int8,
> 023 OUT local_blks_read int8,
> 024 OUT local_blks_written int8,
> 025 OUT temp_blks_read int8,
> 026 OUT temp_blks_written int8
> 027 )
It's too complicated, and do you know how to tuning PG from information of
local_* and temp_*?
At least, I think that most user cannot tuning from these information, and it
might not be useful information only part of them.

> +1 from me. I think that a higher level tool needs to be built on
> pg_stat_statements to make things easy for those that want a slick,
> pre-packaged solution.
No. It's not for geek tools and people having pre-packaged solution in big
company, but also for common DBA tools.

By the way, MySQL and Oracle database which are very popular have these
statistics. I think that your argument might disturb people who wants to
migration from these database and will accelerate popularity of these database more.

> As I said on the min/max thread, if we're not
> doing enough to help people who would like to build such a tool, we
> should discuss how we can do better.
Could you tell me how to get min/max statistics with low cost?
I'm not sure about detail of your patch in CF, but it seems very high cost.
Repeatedly, I think that if we want to get drastic detail statistics, we have to
create another tools of statistics. Your patch will be these statistics tools.
However, pg_stat_statement sholud be just light weight.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

From:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
Cc:	Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 02:39:51
Message-ID:	528ACF77.7080206@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(2013/11/19 11:12), KONDO Mitsumasa wrote:
> (2013/11/19 3:56), Peter Geoghegan wrote:
>> On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>> The same idea was proposed before but not committed because
>>> Itagaki thought that pg_stat_statements view should report only raw values.
>>> Please read the following thread. I have the same feeling with him.
>>> Anyway we should listen to more opinions from other people, though.
>>> http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp
> I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw values be
> just simple. However, were his changes just simple? I cannot understand his
> aesthetics sense and also you, too:-(
>
> PG8.4 before:
>> 012 CREATE FUNCTION pg_stat_statements(
>> 013 OUT userid oid,
>> 014 OUT dbid oid,
>> 015 OUT query text,
>> 016 OUT calls int8,
>> 017 OUT total_time float8,
>> 018 OUT rows int8
>> 019 )
>
> PG9.0 after:
>> 012 CREATE FUNCTION pg_stat_statements(
>> 013 OUT userid oid,
>> 014 OUT dbid oid,
>> 015 OUT query text,
>> 016 OUT calls int8,
>> 017 OUT total_time float8,
>> 018 OUT rows int8,
>> 019 OUT shared_blks_hit int8,
>> 020 OUT shared_blks_read int8,
>> 021 OUT shared_blks_written int8,
>> 022 OUT local_blks_hit int8,
>> 023 OUT local_blks_read int8,
>> 024 OUT local_blks_written int8,
>> 025 OUT temp_blks_read int8,
>> 026 OUT temp_blks_written int8
>> 027 )
> It's too complicated, and do you know how to tuning PG from information of
> local_* and temp_*?
> At least, I think that most user cannot tuning from these information, and it
> might not be useful information only part of them.
>
>> +1 from me. I think that a higher level tool needs to be built on
>> pg_stat_statements to make things easy for those that want a slick,
>> pre-packaged solution.
> No. It's not for geek tools and people having pre-packaged solution in big
> company, but also for common DBA tools.
>
> By the way, MySQL and Oracle database which are very popular have these
> statistics. I think that your argument might disturb people who wants to
> migration from these database and will accelerate popularity of these database more.
>
>> As I said on the min/max thread, if we're not
>> doing enough to help people who would like to build such a tool, we
>> should discuss how we can do better.
> Could you tell me how to get min/max statistics with low cost?
> I'm not sure about detail of your patch in CF, but it seems very high cost.
> Repeatedly, I think that if we want to get drastic detail statistics, we have to
> create another tools of statistics. Your patch will be these statistics tools.
> However, pg_stat_statement sholud be just light weight.
>
> Regards,
Oh, I forgot to say it...
I beleive that essence of information technology is to show more informative
information in little information with low cost. If it wrong, please let me know
the reason.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 03:03:00
Message-ID:	CAM3SWZQpgFaCdz2+1kaRfOUP_jDwAMYX4JWFJjaE6B05xp9Kgw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa
<kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw
> values be just simple. However, were his changes just simple? I cannot
> understand his aesthetics sense and also you, too:-(

> It's too complicated, and do you know how to tuning PG from information of
> local_* and temp_*?
> At least, I think that most user cannot tuning from these information, and
> it might not be useful information only part of them.

All of those costs are cumulative aggregates. If we didn't aggregate
them, then the user couldn't possibly determine them on their own, to
any approximation. That's the difference. If you think the local_* and
temp_* aren't very useful, I'm inclined to agree, but it's too late to
do anything about that now.

> No. It's not for geek tools and people having pre-packaged solution in big
> company, but also for common DBA tools.

I don't think that the tool needs to be expensive. If selecting from
the pg_stat_statements view every 1-3 seconds is too expensive for
such a tool, we can have a discussion about being smarter, because
there certainly are ways to optimize it.

Regarding your min/max patch: I'm opposed to adding even more to the
spinlock-protected counters struct, so that we can get an exact answer
to a question where an approximate answer would very likely be good
enough. And as Itagaki-san said 4 years ago, who is to say that what
you've done here for buffers (or equally, what you've done in your
min/max patch) is more interesting than the same thing but for another
cost? The point of having what you've removed from the
pg_stat_statements docs about calculating averages is that it is an
example that can be generalized from. I certainly think there should
be better tooling to make displaying costs over time easier, or
characterizing the distribution, but unfortunately this isn't it.

Something like pg_stat_statements is allowed to be approximate. That's
considered an appropriate trade-off. Most obviously, today there can
be hash table collisions, and some of the entries can therefore be
plain wrong. Previously, I put the probability of 1 collision in the
hash table at about 1% when pg_stat_statements.max is set to 10,000.
So if your min/max patch was "implemented in userspace", and an
outlier is lost in the noise with just one second of activity, I'm not
terribly concerned about that. It's a trade-off, and if you don't
think it's the correct one, well then I'm afraid that's just where you
and I differ. As I've said many times, if you want to have a
discussion about making aggressive snapshotting of the
pg_stat_statements view more practical, I think that would be very
useful.

> By the way, MySQL and Oracle database which are very popular have these
> statistics. I think that your argument might disturb people who wants to
> migration from these database and will accelerate popularity of these
> database more.

I think that there should be better tooling built on top of
pg_stat_statements. I don't know what Oracle does, but I'm pretty sure
that MySQL has nothing like pg_stat_statements. Please correct me if
I'm mistaken.

>> As I said on the min/max thread, if we're not
>> doing enough to help people who would like to build such a tool, we
>> should discuss how we can do better.
>
> Could you tell me how to get min/max statistics with low cost?

See my previous comments on the other thread about making
pg_stat_statements only return changed entries, and only sending the
query text once.

> I'm not sure about detail of your patch in CF, but it seems very high cost.

I think you should actually go and read the code and read my
explanation of it, and refrain from making uninformed remarks like
this. Whatever overhead my patch may add, the important point is that
it obviously and self-evidently adds exactly zero overhead to
maintaining statistics for existing entries - we only care about the
query text when first creating an entry, or when viewing an entry when
the view is selected from. With the reduced shared memory consumption,
more entries can be created, making the cost of creating new entries
(and thus whatever might have been added to that cost) matter less.
Having said that, the additional cost is thought to be very low
anyway.

If you read my mail to the list about this, you'd know that I
specifically worried about the implications of what I'd proposed for
tools like your tool. That's part of the reason I keep urging you to
think about making pg_stat_statements cheaper for consumer tools.
There is no reason to send query texts to such tools more than once
per entry, and no reason to send unchanged entries.

> Repeatedly, I think that if we want to get drastic detail statistics, we
> have to create another tools of statistics. Your patch will be these
> statistics tools. However, pg_stat_statement sholud be just light weight.

This is incomprehensible. As with the cumulative aggregate costs, how
is a consumer of pg_stat_statements possibly going to get the full
query text from anywhere else? It *has* to come from
pg_stat_statements directly.

--
Peter Geoghegan

From:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 06:08:45
Message-ID:	528B006D.5090001@lab.ntt.co.jp
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

(2013/11/19 12:03), Peter Geoghegan wrote:
> On Mon, Nov 18, 2013 at 6:12 PM, KONDO Mitsumasa
> <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>> I confirmed that Itagaki-san and Mr Cerdic disscution. He said that raw
>> values be just simple. However, were his changes just simple? I cannot
>> understand his aesthetics sense and also you, too:-(
>
>> It's too complicated, and do you know how to tuning PG from information of
>> local_* and temp_*?
>> At least, I think that most user cannot tuning from these information, and
>> it might not be useful information only part of them.
>
> All of those costs are cumulative aggregates. If we didn't aggregate
> them, then the user couldn't possibly determine them on their own, to
> any approximation. That's the difference. If you think the local_* and
> temp_* aren't very useful, I'm inclined to agree, but it's too late to
> do anything about that now.
I regret past decision of Itagaki-san's patch, and improvement might not be
possible. However, we can change it, if we get have logical reason to change it.

>> No. It's not for geek tools and people having pre-packaged solution in big
>> company, but also for common DBA tools.
>
> I don't think that the tool needs to be expensive. If selecting from
> the pg_stat_statements view every 1-3 seconds is too expensive for
> such a tool, we can have a discussion about being smarter, because
> there certainly are ways to optimize it.
I can understand why you say my patch is heavy now! Your monitoring methods are
redically heavy. In general, we get pg_stat_statements view every 1 min - 5min.
It is because monitoring SQLs must not heavier than common main SQLs. If we
measure the real performance, we don't measure with monitoring SQL's cost. And, I
cannot still understand you'd like to collect drastic detail performance of
statements. I'd like to only know max, avg and stddev in each statement. They are
enough, because we can improve them by using these information.

> Regarding your min/max patch: I'm opposed to adding even more to the
> spinlock-protected counters struct, so that we can get an exact answer
> to a question where an approximate answer would very likely be good
> enough. And as Itagaki-san said 4 years ago, who is to say that what
> you've done here for buffers (or equally, what you've done in your
> min/max patch) is more interesting than the same thing but for another
> cost? The point of having what you've removed from the
> pg_stat_statements docs about calculating averages is that it is an
> example that can be generalized from. I certainly think there should
> be better tooling to make displaying costs over time easier, or
> characterizing the distribution, but unfortunately this isn't it.
>
> Something like pg_stat_statements is allowed to be approximate. That's
> considered an appropriate trade-off. Most obviously, today there can
> be hash table collisions, and some of the entries can therefore be
> plain wrong. Previously, I put the probability of 1 collision in the
> hash table at about 1% when pg_stat_statements.max is set to 10,000.
> So if your min/max patch was "implemented in userspace", and an
> outlier is lost in the noise with just one second of activity, I'm not
> terribly concerned about that. It's a trade-off, and if you don't
> think it's the correct one, well then I'm afraid that's just where you
> and I differ. As I've said many times, if you want to have a
> discussion about making aggressive snapshotting of the
> pg_stat_statements view more practical, I think that would be very
> useful.
In summary of your comment, your patch is lower cost than I proposed patch.
Because my patch has long lock problem, on the other hands your patch doesn't
these problem. Is it right? If it true, I can understand your theoretically, but
I'm not sure without real practice or benchmark that it is really or not. So we
will need benchmark test in my patch and yours. I try it.

>> By the way, MySQL and Oracle database which are very popular have these
>> statistics. I think that your argument might disturb people who wants to
>> migration from these database and will accelerate popularity of these
>> database more.
>
> I think that there should be better tooling built on top of
> pg_stat_statements. I don't know what Oracle does, but I'm pretty sure
> that MySQL has nothing like pg_stat_statements. Please correct me if
> I'm mistaken.
I joined the db tech show case 2013 which is held in japan last week. Oracle
speaker intoroduced performance schema and like these in MySQL 5.6. This is the
slide of his. It's almost in japanese, but please see it since 31page. It is
wirtten in SQL.
http://www.slideshare.net/yoyamasaki/20131110-tuning-onmysql56

In MySQL 5.6, it has information which are sum_time, min_time, avg_time, max_time
and sum_lock_time. I think it is useful for understanding our executing statements.

>>> As I said on the min/max thread, if we're not
>>> doing enough to help people who would like to build such a tool, we
>>> should discuss how we can do better.
>>
>> Could you tell me how to get min/max statistics with low cost?
>
> See my previous comments on the other thread about making
> pg_stat_statements only return changed entries, and only sending the
> query text once.
OK.

>> Repeatedly, I think that if we want to get drastic detail statistics, we
>> have to create another tools of statistics. Your patch will be these
>> statistics tools. However, pg_stat_statement sholud be just light weight.
>
> This is incomprehensible. As with the cumulative aggregate costs, how
> is a consumer of pg_stat_statements possibly going to get the full
> query text from anywhere else? It *has* to come from
> pg_stat_statements directly.
I think that it is necessary to make it clear with real test.
I will read your patch and execute benchmark with each our patches.
However, I will travel on business tomorrow until this weekend.
I'm very sorry, but please wait it more.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 07:16:05
Message-ID:	CAM3SWZQsZ7hH6H38+SEDr207EJ1ua04F93V4bd7Hn6khJpfLsw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2013 at 10:08 PM, KONDO Mitsumasa
<kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I regret past decision of Itagaki-san's patch, and improvement might not be
> possible. However, we can change it, if we get have logical reason to change
> it.

I suppose so.

>> I don't think that the tool needs to be expensive. If selecting from
>> the pg_stat_statements view every 1-3 seconds is too expensive for
>> such a tool, we can have a discussion about being smarter, because
>> there certainly are ways to optimize it.
>
> I can understand why you say my patch is heavy now! Your monitoring methods
> are redically heavy. In general, we get pg_stat_statements view every 1 min
> - 5min. It is because monitoring SQLs must not heavier than common main
> SQLs.

Certainly, not everyone would want to take snapshots so frequently.
Most users would not. However, if you're worried about the overhead of
parsing the pg_stat_statements call, and then executing
pg_stat_statements, you're worried about the wrong thing. What about
the overhead of fingerprinting every single statement executed against
the server? Now, as I've already acknowledged I might actually have
regressed things when pg_stat_statements is selected from very
frequently. That increases the need for the tool to support external
aggregation tools well.

> If we measure the real performance, we don't measure with monitoring
> SQL's cost. And, I cannot still understand you'd like to collect drastic
> detail performance of statements. I'd like to only know max, avg and stddev
> in each statement. They are enough, because we can improve them by using
> these information.

It's not just the performance cost of this one addition. It's also a
question of "where do we draw the line?", particularly given the
current lack of atomic portable addition. Most users of
pg_stat_statements don't care about the min/max. Furthermore, I'm not
convinced that you've adequately addressed the question of local
minima and maxima, even in your most recent revision.

> In summary of your comment, your patch is lower cost than I proposed patch.
> Because my patch has long lock problem, on the other hands your patch
> doesn't these problem. Is it right? If it true, I can understand your
> theoretically, but I'm not sure without real practice or benchmark that it
> is really or not. So we will need benchmark test in my patch and yours. I
> try it.

I don't know why you decided to compare our two patches at all -
what's the point? How are you going to benchmark my patch? If the
answer is pgbench, let me save you some time: there is no added
overhead there, because any additional cost that may exist is paid
only once per entry (at a time when an exclusive lock already needs to
be acquired on the entire table to create a new entry anyway).
Furthermore, my patch might actually be cost negative (i.e. faster),
because we can create a query text in the external file with only a
shared lock held (just using a spinlock to read-and-advance the offset
into the external file), rather than, with an *exclusive* lock held,
performing memcmp() to the 1KiB text shared memory buffer.

> I joined the db tech show case 2013 which is held in japan last week. Oracle
> speaker intoroduced performance schema and like these in MySQL 5.6. This is
> the slide of his. It's almost in japanese, but please see it since 31page.
> It is wirtten in SQL.
> http://www.slideshare.net/yoyamasaki/20131110-tuning-onmysql56
>
> In MySQL 5.6, it has information which are sum_time, min_time, avg_time,
> max_time and sum_lock_time. I think it is useful for understanding our
> executing statements.

Based on a reading of
https://dev.mysql.com/doc/refman/5.6/en/events-statements-current-table.html,
I don't think that is comparable to pg_stat_statements. It looks like
this doesn't do any normalization - if it did, why is there "one row
per thread"?

>> This is incomprehensible. As with the cumulative aggregate costs, how
>> is a consumer of pg_stat_statements possibly going to get the full
>> query text from anywhere else? It *has* to come from
>> pg_stat_statements directly.
>
> I think that it is necessary to make it clear with real test.

I'd have started with a benchmark of your changes to
pg_stat_statements for an unsympathetic case. Even though there is
surely some demand for what you're asking for, you haven't adequately
demonstrated or even argued that what you've proposed is the right way
to do it, nor have you even tried to quantify the costs, or given any
indication that you've considered them.

--
Peter Geoghegan

From:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To:	Peter Geoghegan <pg(at)heroku(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-19 20:12:46
Message-ID:	CAMkU=1xCuKvYzbT_bX-L+_Xe-Bo6C7bXXcne+=1LfCSc9maBHg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Nov 18, 2013 at 10:56 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Mon, Nov 18, 2013 at 10:49 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
> wrote:
> > The same idea was proposed before but not committed because
> > Itagaki thought that pg_stat_statements view should report only raw
> values.
> > Please read the following thread. I have the same feeling with him.
> > Anyway we should listen to more opinions from other people, though.
> >
> http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp
>
> +1 from me.

That's +1 for *not* including this? If so, I agree as well. It would be
easy enough to create a sql view that computes this if one wanted (unlike
the min and max execution time, which can't be done externally). I also
have a theological opposition to exposing the buffer hit ratio. There are
two ways to improve the buffer hit ratio. One is to have fewer misses,
which is worthwhile but you can more easily do it by looking directly at
the number of misses or misses per execution, rather than a buffer hit
ratio. Or you can dilute out the misses by gratuitously increasing the
"hits" by uselessly reading cached buffers over and over again, which is
counter-productive and evil and perverse.

Take a small to medium look-up table, drop all the indexes so it has to be
full scanned all the time, and maybe rebuild it with a lower fillfactor
(but not so low that it becomes too big to cache), and watch that buffer
hit ratio go through the roof, while true performance goes the other way.

> I think that a higher level tool needs to be built on
> pg_stat_statements to make things easy for those that want a slick,
> pre-packaged solution. As I said on the min/max thread, if we're not
> doing enough to help people who would like to build such a tool, we
> should discuss how we can do better.
>

I'd like to be able to separate queries by the application_name
and/or client_addr which issued them.

Cheers,

Jeff

From:	Peter Geoghegan <pg(at)heroku(dot)com>
To:	Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-11-20 04:30:11
Message-ID:	CAM3SWZTWbHDtF124a4u_Pk1iQYTQonhZGvEZkdLb-7+vxhMxFg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Nov 19, 2013 at 12:12 PM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
>> > The same idea was proposed before but not committed because
>> > Itagaki thought that pg_stat_statements view should report only raw
>> > values.
>> > Please read the following thread. I have the same feeling with him.
>> > Anyway we should listen to more opinions from other people, though.
>> >
>> > http://www.postgresql.org/message-id/20091222172719.8B86.52131E4D@oss.ntt.co.jp
>>
>> +1 from me.
>
>
> That's +1 for *not* including this?

Right. Unfortunately, discussion of the other pg_stat_statements
patches spilled into this thread.

>> I think that a higher level tool needs to be built on
>> pg_stat_statements to make things easy for those that want a slick,
>> pre-packaged solution. As I said on the min/max thread, if we're not
>> doing enough to help people who would like to build such a tool, we
>> should discuss how we can do better.
>
>
> I'd like to be able to separate queries by the application_name and/or
> client_addr which issued them.

I see why that would be desirable (it would also be desirable to
sometimes *not* break costs down by user), but it's hard to see a way
to make that work without hurting cases where it doesn't matter (i.e.
the majority of cases).

So thinking about it some more, you could hack together a patch fairly
easily that adds a GUC that makes the current userid a union. A
setting remembers the current preference as to what that actually
stores. You remember the setting on shutdown, when serializing the
stats, in the file header. You can set the setting to hash a whole
bunch of alternative things along the lines you refer to here, such as
client_addr or application_name.

I can see why some aspects of that would be controversial, but it
could probably be made to work. Maybe the hash is used in a second
shared dynahash, where the original values are looked-up, so a new
text column is added to the view that displays things as needed that
way (when the GUC isn't set to its default, userid is NULL, else this
new column is NULL). It would probably be okay to make this spill to
disk, simply because it wouldn't come up that often, and you could be
clever about the locking. Notably, something like a client_addr is
stable within a backend, so it would be cheap to cache the hash value.

The logical/physical I/O divide is probably the main factor that makes
the track_io_timing stuff compelling.

--
Peter Geoghegan

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	Peter Geoghegan <pg(at)heroku(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc:	Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-12-02 21:42:07
Message-ID:	529CFEAF.90901@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 11/19/13, 11:30 PM, Peter Geoghegan wrote:
>>> +1 from me.
>> >
>> >
>> > That's +1 for *not* including this?
> Right.

I agree with not including this.

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	Peter Geoghegan <pg(at)heroku(dot)com>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>, Haribabu kommi <haribabu(dot)kommi(at)huawei(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Improvement of pg_stat_statement usage about buffer hit ratio
Date:	2013-12-04 09:54:54
Message-ID:	CABUevEy9c=Wxtn2sHHUA9AADNg9FyenjuKqGYRTst6bvdUuvWg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Dec 2, 2013 at 10:42 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> On 11/19/13, 11:30 PM, Peter Geoghegan wrote:
> >>> +1 from me.
> >> >
> >> >
> >> > That's +1 for *not* including this?
> > Right.
>
> I agree with not including this.
>
>
If you're looking for more of those, here's another +1 for not including
it. (And of course also for Peters comment about that we need to figure out
if something is actually missing for building this higher level tool that
can provide this information and more)

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/