pg_stat_statements

Lists: pgsql-hackers
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: pg_stat_statements
Date: 2008-06-13 08:33:18
Message-ID: 20080613173157.7FEA.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello,

Postgres 8.4 has pg_stat_user_functions view to track number of calls of
stored functions and time spent in them. Then, I'm thinking a "sql statement"
version of similar view -- pg_stat_statements.

Prepared statements and statements using extended protocol are grouped
by their sql strings without parameters, that is the just same as
pg_stat_user_functions. We could ignore simple queries with parameters
because they have different expression for each execution.

We can write sql statements in server logs and gather them using some tools
(pgfouine and pqa) even now, but statement logging has unignorable overhead.
Lightweight view is useful for typical users who are only interedted in
aggregated results.

One issue is how and where to store sql strings. We could use hash values
of statement strings as short identifiers, but we need to store sql strings
somewhere to compare the IDs and original statements.

1. Store SQLs in shared memory
We need to allocate fixed region on starting servers. Should we have
another memory setting into postgresql.conf?

2. Store SQLs in stats collector process's memory
We can use dynamically allocated memory, but sending sql statements to
stat collector process is probably slow and stat file will be large.

I'm not sure which is better. It might have relevance to discussion of
shared prepared statements.

Another issue is that we could implement the feature as an add-on,
not a core feature. We can use general hooks for this purpose; We store
sql statement and their hash values in planner_hook, and record number
of execution and time in new executor begin/end hooks or by adding
a "stop-watch" executor node. Should this feature be in the core or not?
For example, dynamic shared memory allocation might be need before we move
the feature in the core.

Comments and suggestions welcome.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-13 15:52:19
Message-ID: 11737.1213372339@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Postgres 8.4 has pg_stat_user_functions view to track number of calls of
> stored functions and time spent in them. Then, I'm thinking a "sql statement"
> version of similar view -- pg_stat_statements.

We don't have any system-wide names for statements, so this seems
pretty ill-defined and of questionable value. Showing the text of
statements in a view also has security problems.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-13 16:58:22
Message-ID: 4852A72E.3040909@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ITAGAKI Takahiro wrote:
> Hello,
>
> Postgres 8.4 has pg_stat_user_functions view to track number of calls of
> stored functions and time spent in them. Then, I'm thinking a "sql statement"
> version of similar view -- pg_stat_statements.

I can see how this would be useful, but I can also see that it could be a
huge performance burden when activated. So it couldn't be part of the
standard statistics collection.

--Josh


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-13 17:24:52
Message-ID: 1213377892.25121.216.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Fri, 2008-06-13 at 17:33 +0900, ITAGAKI Takahiro wrote:
> We can write sql statements in server logs and gather them using some
> tools (pgfouine and pqa) even now, but statement logging has
> unignorable overhead.

I would prefer to look at ways to reduce the current overhead rather
than change to another system entirely.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: info(at)2ndquadrant(dot)com
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-13 19:47:47
Message-ID: 14698.1213386467@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Fri, 2008-06-13 at 17:33 +0900, ITAGAKI Takahiro wrote:
>> We can write sql statements in server logs and gather them using some
>> tools (pgfouine and pqa) even now, but statement logging has
>> unignorable overhead.

> I would prefer to look at ways to reduce the current overhead rather
> than change to another system entirely.

It's also the case that adding to the stats collector is hardly going to
have "ignorable" overhead. (And you're dreaming if you think you can do
it like pg_stat_activity, because this can't be a fixed-size array with
a trivial indexing scheme.)

regards, tom lane


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: pg_stat_statements
Date: 2008-06-15 02:14:10
Message-ID: 200806142214.10835.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 13 June 2008 12:58:22 Josh Berkus wrote:
> ITAGAKI Takahiro wrote:
> > Hello,
> >
> > Postgres 8.4 has pg_stat_user_functions view to track number of calls of
> > stored functions and time spent in them. Then, I'm thinking a "sql
> > statement" version of similar view -- pg_stat_statements.
>
> I can see how this would be useful, but I can also see that it could be a
> huge performance burden when activated. So it couldn't be part of the
> standard statistics collection.
>

A lower overhead way to get at this type of information is to quantize dtrace
results over a specific period of time. Much nicer than doing the whole
logging/analyze piece.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-16 02:11:40
Message-ID: 20080616105922.7514.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> > Postgres 8.4 has pg_stat_user_functions view to track number of calls
> > of stored functions and time spent in them. Then, I'm thinking a
> > "sql statement" version of similar view -- pg_stat_statements.
>
> We don't have any system-wide names for statements, so this seems
> pretty ill-defined and of questionable value. Showing the text of
> statements in a view also has security problems.

Thanks. I see I have to consider security problems in whatever way I can.

I'm thinking to use hash values as system-wide IDs. Users who don't have
permissions can only see those meaningless values. SQL strings will be
hidden just same as pg_stat_activity.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: pg_stat_statements
Date: 2008-06-16 02:31:59
Message-ID: 20080616110358.7517.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> wrote:

> On Friday 13 June 2008 12:58:22 Josh Berkus wrote:
> > I can see how this would be useful, but I can also see that it could be a
> > huge performance burden when activated. So it couldn't be part of the
> > standard statistics collection.
>
> A lower overhead way to get at this type of information is to quantize dtrace
> results over a specific period of time. Much nicer than doing the whole
> logging/analyze piece.

DTrace is disabled in most installation as default, and cannot be used in
some platforms (especially I want to use the feature in Linux). I think
DTrace is known as a tool for developers, but not for DBAs. However,
statement logging is required by DBAs who used to use STATSPACK in Oracle.

I will try to measure overheads of logging in some implementation:
1. Log statements and dump them into server logs.
2. Log statements and filter them before to be written.
3. Store statements in shared memory.

I know 1 is slow, but I don't know what part of it is really slow;
If the reason is to write statements into disks, 2 would be a solution.
3 will be needed if sending statements to loggger itself is the reason
of the overhead.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "Josh Berkus" <josh(at)agliodbs(dot)com>
Subject: Re: pg_stat_statements
Date: 2008-06-16 02:37:01
Message-ID: a778a7260806151937nfda12advcb8272fc3ca41c11@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I understand there must be "some" overhead because we're collecting
extra info. I'm curious if there're considerable amount of overhead
to the users who don't want such additional trance.

2008/6/16 ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>:
>
> Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> wrote:
>
>> On Friday 13 June 2008 12:58:22 Josh Berkus wrote:
>> > I can see how this would be useful, but I can also see that it could be a
>> > huge performance burden when activated. So it couldn't be part of the
>> > standard statistics collection.
>>
>> A lower overhead way to get at this type of information is to quantize dtrace
>> results over a specific period of time. Much nicer than doing the whole
>> logging/analyze piece.
>
> DTrace is disabled in most installation as default, and cannot be used in
> some platforms (especially I want to use the feature in Linux). I think
> DTrace is known as a tool for developers, but not for DBAs. However,
> statement logging is required by DBAs who used to use STATSPACK in Oracle.
>
>
> I will try to measure overheads of logging in some implementation:
> 1. Log statements and dump them into server logs.
> 2. Log statements and filter them before to be written.
> 3. Store statements in shared memory.
>
> I know 1 is slow, but I don't know what part of it is really slow;
> If the reason is to write statements into disks, 2 would be a solution.
> 3 will be needed if sending statements to loggger itself is the reason
> of the overhead.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

--
------
Koichi Suzuki


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Permission of prepared statements (was: pg_stat_statements)
Date: 2008-06-16 08:04:26
Message-ID: 20080616162234.752E.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> We don't have any system-wide names for statements, so this seems
> pretty ill-defined and of questionable value. Showing the text of
> statements in a view also has security problems.

I found we can execute prepared statements and view the sql source through
pg_prepared_statements even after we execute SET SESSION AUTHORIZATION.
Is this an expected behavior?

It is not a problem in normal use because the *real* user is same
before and after changing ROLEs, but we should be careful about
sharing connections between different users in connection pooling.
Almost connection poolings don't do that, though.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Josh Berkus <josh(at)agliodbs(dot)com>
Subject: Re: pg_stat_statements
Date: 2008-06-16 14:34:32
Message-ID: 200806161034.33314.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 15 June 2008 22:31:59 ITAGAKI Takahiro wrote:
> Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> wrote:
> > On Friday 13 June 2008 12:58:22 Josh Berkus wrote:
> > > I can see how this would be useful, but I can also see that it could be
> > > a huge performance burden when activated. So it couldn't be part of
> > > the standard statistics collection.
> >
> > A lower overhead way to get at this type of information is to quantize
> > dtrace results over a specific period of time. Much nicer than doing the
> > whole logging/analyze piece.
>
> DTrace is disabled in most installation as default, and cannot be used in
> some platforms (especially I want to use the feature in Linux). I think
> DTrace is known as a tool for developers, but not for DBAs. However,
> statement logging is required by DBAs who used to use STATSPACK in Oracle.
>
>

For some reason it has gotten that reputation in this community, but that
really is not the case. As Sun describes it, dtrace "provides a powerful
infrastructure to permit administrators, developers, and service personnel to
concisely answer arbitrary questions about the behavior of the operating
system and user programs". Given that PostgreSQL relies on the operating
system for a majority of it's instermentation (ie. we have nothing like v$
tables in oracle), we should really be thinking of dtrace as the ultimate
tool for DBA's to figure out what is going on in thier systems.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
To: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Subject: Re: pg_stat_statements
Date: 2008-06-16 14:54:38
Message-ID: 36e682920806160754r72ba68f9te2e7f6b34e3f4d2c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 16, 2008 at 10:34 AM, Robert Treat
<xzilla(at)users(dot)sourceforge(dot)net> wrote:
> Given that PostgreSQL relies on the operating
> system for a majority of it's instermentation (ie. we have nothing like v$
> tables in oracle), we should really be thinking of dtrace as the ultimate
> tool for DBA's to figure out what is going on in thier systems.

Over a year ago, we added an Oracle wait-event-like interface
(counters and timers) to the core of Advanced Server which not only
provides everything DTrace currently does, but is also cross-platform
and is < 300 lines of code in the core.

While DTrace has a little less overhead due to its use of a user-mode
high-resolution timer, neither benchmarks or customers have found our
implementation to cause any performance degradation when timing is
enabled.

It wouldn't be too hard to write the probes in such a way as they
could be used by DTrace or by a loadable timing/counter implementation
for platforms which don't support DTrace.

--
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation | fax: 732.331.1301
499 Thornall Street, 2nd Floor | jonah(dot)harris(at)enterprisedb(dot)com
Edison, NJ 08837 | http://www.enterprisedb.com/


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: "Robert Treat" <xzilla(at)users(dot)sourceforge(dot)net>, pgsql-hackers(at)postgresql(dot)org, "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Subject: Re: pg_stat_statements
Date: 2008-06-16 19:07:40
Message-ID: 200806161207.42533.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jonah,

> It wouldn't be too hard to write the probes in such a way as they
> could be used by DTrace or by a loadable timing/counter implementation
> for platforms which don't support DTrace.

I was under the impression that's the way our feature, the "Generic
Monitoring Interface" was written. It's a macro. If someone can deliver
another trace framework, you can plug it right in.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-23 06:22:52
Message-ID: 20080623150026.946B.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I will try to measure overheads of logging in some implementation:
> 1. Log statements and dump them into server logs.
> 2. Log statements and filter them before to be written.
> 3. Store statements in shared memory.
> I know 1 is slow, but I don't know what part of it is really slow;

I tested overheads of SQL logging with pgbench.
$ pgbench -s10 -c10 -t10000 -n -S -M prepared

logging type | tps | %
-----------------------+-------+--------
0. no logging | 10651 | 100.0%
1. log to pg_log/* | 6535 | 61.4%
2. log to /dev/null | 8347 | 78.4%
3. store in memory | 10280 | 96.5%

As expected, 1 is 40% slower than no logging settings. Also, filtering
logs before written into files seems not to be a perfect solution.
Redirecting logs to /dev/null is the *fastest* filter, but there was
30% of overhead. On the other hand, 3 has only 3.5% of overhead.

I think storing SQLs in server memory is worth trying even if there
are some troubles, for example, memory management. We can use either
hooks and dtrace for the purpose, but I'm working hook-method because
of portability.

I'll send a core patch and an extension module to -patches. I hope only
the patch is to be applied in the core. The extension module would be
better to be developed separately from the core.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-23 13:16:56
Message-ID: 1214227016.9468.664.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Mon, 2008-06-23 at 15:22 +0900, ITAGAKI Takahiro wrote:
> I wrote:
> > I will try to measure overheads of logging in some implementation:
> > 1. Log statements and dump them into server logs.
> > 2. Log statements and filter them before to be written.
> > 3. Store statements in shared memory.
> > I know 1 is slow, but I don't know what part of it is really slow;
>
> I tested overheads of SQL logging with pgbench.
> $ pgbench -s10 -c10 -t10000 -n -S -M prepared
>
> logging type | tps | %
> -----------------------+-------+--------
> 0. no logging | 10651 | 100.0%
> 1. log to pg_log/* | 6535 | 61.4%
> 2. log to /dev/null | 8347 | 78.4%
> 3. store in memory | 10280 | 96.5%
>
> As expected, 1 is 40% slower than no logging settings. Also, filtering
> logs before written into files seems not to be a perfect solution.
> Redirecting logs to /dev/null is the *fastest* filter, but there was
> 30% of overhead. On the other hand, 3 has only 3.5% of overhead.
>
> I think storing SQLs in server memory is worth trying even if there
> are some troubles, for example, memory management. We can use either
> hooks and dtrace for the purpose, but I'm working hook-method because
> of portability.
>
> I'll send a core patch and an extension module to -patches. I hope only
> the patch is to be applied in the core. The extension module would be
> better to be developed separately from the core.

I think you want to see the distribution of execution times for
particular queries without needing to log *every* execution, including
parameters. I think I understand now what you are asking for and why you
are asking for it.

How about we have another log mode, call it log_statement_summary where
we keep track of the running average of re-execution time of each
protocol 3 prepared plan. Once we have 40 executions for a plan we log
any statement, with parameters, that has an execution time more than
twice the running average. That way we have an automatic filter to
reduce the amount of logging, yet without changing any current tuning
methodologies or tools.

We could also have a function that causes each backend to dump the
current averages of all plans through to the stats collector, so you can
assemble a global view. But that should be on-demand, not a continuous
flow of stats, IMHO.

I'd been thinking about ways to specify "desired execution time" for any
query, so we could log only those queries that aren't performing as
expected.

Yes, doing this as a plugin makes a lot of sense for me.

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-24 03:14:57
Message-ID: 20080624113235.7445.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:

> I think you want to see the distribution of execution times for
> particular queries without needing to log *every* execution, including
> parameters. I think I understand now what you are asking for and why you
> are asking for it.

Yes. In many case, major influential queries don't depend on thier
parameters. Also, averages of execution time are useful if you want to
log indivisual queries that takes long time compared to the averages.
For example, you can use configuration like:
"If an execution time is 3 times longer than the average of the same plan,
write the explain-tree into log".

That reminds me, it might have association with Auto-explain patch.
Is it given up?
http://archives.postgresql.org/pgsql-patches/2008-01/msg00123.php

> We could also have a function that causes each backend to dump the
> current averages of all plans through to the stats collector, so you can
> assemble a global view. But that should be on-demand, not a continuous
> flow of stats, IMHO.

Hmm, it's interesting idea. It doesn't require any additional shared
memory and max length of stat message would be enough for typical
queries (PGSTAT_MSG_PAYLOAD is 1000 bytes or less). I'm not sure
how to *notify* all backends to send plans to collector, but we could
send plans periodically instead. Bloated pgstat.stat file might be
another problem, and I have no idea about it.

I think the stats collector way is not more than the shared memory way in
efficiency, but it is more flexible. I'll reconsider and compare them...

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_stat_statements
Date: 2008-06-24 07:07:50
Message-ID: 1214291270.9468.809.camel@ebony.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Tue, 2008-06-24 at 12:14 +0900, ITAGAKI Takahiro wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>
> > I think you want to see the distribution of execution times for
> > particular queries without needing to log *every* execution, including
> > parameters. I think I understand now what you are asking for and why you
> > are asking for it.
>
> Yes. In many case, major influential queries don't depend on thier
> parameters. Also, averages of execution time are useful if you want to
> log indivisual queries that takes long time compared to the averages.
> For example, you can use configuration like:
> "If an execution time is 3 times longer than the average of the same plan,
> write the explain-tree into log".

Yes, thats good.

> That reminds me, it might have association with Auto-explain patch.
> Is it given up?
> http://archives.postgresql.org/pgsql-patches/2008-01/msg00123.php

Not by me. If the author doesn't finish it someone else will.

> > We could also have a function that causes each backend to dump the
> > current averages of all plans through to the stats collector, so you can
> > assemble a global view. But that should be on-demand, not a continuous
> > flow of stats, IMHO.
>
> Hmm, it's interesting idea. It doesn't require any additional shared
> memory and max length of stat message would be enough for typical
> queries (PGSTAT_MSG_PAYLOAD is 1000 bytes or less).

There is a related issue here. Sending the text of every query to the
stat collector is an overhead for the same reason sending the text to
the log is an overhead. The solution should be the same also: don't send
the text repeatedly.

The reason we don't increase PGSTAT_MSG_PAYLOAD is that we don't want to
experience more overhead. But if we work out a way to reduce the
frequency of messages, we might be able to increase PGSTAT_MSG_PAYLOAD,
so we see the whole text when we want it, but not send it when nobody is
looking. Now that's a gorgias solution.

For protocol 3 we should be sending an id, not the whole text each time
we execute. But that won't solve the problem for simple queries.

So pg_stat_get_activity() should poll backends for the SQL of the
currently executing query, not be continually sent.

> I'm not sure
> how to *notify* all backends to send plans to collector, but we could
> send plans periodically instead. Bloated pgstat.stat file might be
> another problem, and I have no idea about it.

Code is already there, and as of a few days ago, unused.
PMSIGNAL_WAKEN_CHILDREN

> I think the stats collector way is not more than the shared memory way in
> efficiency, but it is more flexible. I'll reconsider and compare them...

--
Simon Riggs www.2ndQuadrant.com
PostgreSQL Training, Services and Support