Adding getrusage profiling data to EXPLAIN output

Lists: pgsql-hackers
From: Greg Stark <stark(at)mit(dot)edu>
To: "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject: Adding getrusage profiling data to EXPLAIN output
Date: 2010-10-01 00:16:52
Message-ID: AANLkTimOkx8fOcBsdhZnMHKKfT7=g-WRiuvoQwbYe-4f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is a patch to display getrusage output to EXPLAIN output.
This is the patch I mentioned previously in
http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and
it raises the same issues we were talking about then. Should the
resource usage stats displayed be per-iteration totals for the whole
query execution for that node. I can't see dividing by nloops making
things clearer but the discrepancy is becoming more noticeable. I
wonder if there's anything we can do to make things clearer.

Incidentally, this is a first cut at the patch written in a single
sitting. I need to go through it again to make sure I didn't do
anything silly. One thing I know is broken is Windows support. I
fleshed out our getrusage stub a bit but I can't test it at all, and
the INSTR_TIME_* macros actually are defined differently on windows so
I can't use them quite so simply on struct timeval. The simplest
option is to define the struct timeval versions always even if
INSTR_TIME_* doesn't use them and this code can use them directly.
Another would be to modify the struct rusage definition on Windows so
we use the native Windows time datatype -- which would be tempting
since it would avoid the loss of precision in "only" having
microsecond precision.

postgres=# explain (analyze, resource) select * from x;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Seq Scan on x (cost=0.00..11540.00 rows=800000 width=4) (actual
time=69.851..1287.025 rows=800000 loops=1)
Resources: sys=240.000ms user=940.000ms read=27.7MB
Total runtime: 2431.237 ms
(3 rows)

[...flush buffers and drop caches between tests...]

postgres=# explain (analyze, verbose, buffers, resource) select * from x;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on public.x (cost=0.00..11540.00 rows=800000 width=4)
(actual time=28.739..1290.786 rows=800000 loops=1)
Output: generate_series
Resources: sys=280.000ms user=890.000ms read=27.7MB minflt=65
nvcsw=9 nivcsw=633
Buffers: shared read=3540
Total runtime: 2487.629 ms
(5 rows)

--
greg

Attachment Content-Type Size
explain-getrusage.diff text/x-patch 13.1 KB

From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding getrusage profiling data to EXPLAIN output
Date: 2010-10-01 01:51:41
Message-ID: AANLkTin_Le9EoBVu-eJjvaMg_vcCJVAKcfnNMCuMB3Fj@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Oct 1, 2010 at 9:16 AM, Greg Stark <stark(at)mit(dot)edu> wrote:
> Attached is a patch to display getrusage output to EXPLAIN output.
> This is the patch I mentioned previously in
> http://archives.postgresql.org/pgsql-hackers/2010-02/msg00684.php and
> it raises the same issues we were talking about then.

How much overhead do you have with "resource" option?
getrusage() calls for each tuple might have considerable overheads.
How much difference between (analyze) and (analyze, resource) options?

Auto_explain and pg_stat_statements will be also adjusted to the change
when the patch is acceptable, I was asked for "queries ordered by CPU times"
in pg_stat_statements several times. The getrusage infrastructure will
make it a real possibility.

--
Itagaki Takahiro


From: Greg Stark <stark(at)mit(dot)edu>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding getrusage profiling data to EXPLAIN output
Date: 2010-10-01 18:14:10
Message-ID: AANLkTikWGw5EZzJYq_NR+UpLpjy7Yp+h-Xof4AUwZvkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Sep 30, 2010 at 6:51 PM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
> How much overhead do you have with "resource" option?
> getrusage() calls for each tuple might have considerable overheads.
> How much difference between (analyze) and (analyze, resource) options?

Here's strace -c for a select count(*) from x where x is a table with
800,000 narrow rows entirely in shared buffers.

% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
99.56 0.025299 0 6400024 getrusage
0.44 0.000112 0 9872 read
0.00 0.000000 0 1 open
0.00 0.000000 0 13 lseek
0.00 0.000000 0 10 sendto
0.00 0.000000 0 5 recvfrom
------ ----------- ----------- --------- --------- ----------------
100.00 0.025411 6409925 total

That's under 40ns per getrusage call. My test C program seems to take
about 300ns per getrusage call however so I'm not sure I trust the
strace results. Either way that's pretty negligable. It's basically
just copying a struct from kernel space to user space.

Of course it depends on what query you run. In my very simple example
of doing select count(*) from x where x has 800k rows the whole query
takes about 100ms. Since there are two nodes (the seq scan and the
aggregate) it has to do 3.2 million getrusage calls. That's about
960ms total. So it imposes nearly a 10x speed slowdown but only
because the base measurement was so low. For what it's worth that's
less than the overhead from gettimeofday which seems to be imposing
about a 1100ms of overhead.

The observed timings are
query only 100.5ms
explain (analyze) 1234.9ms
explain (analyze,resource) 2210.2ms

--
greg


From: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding getrusage profiling data to EXPLAIN output
Date: 2010-10-04 10:29:43
Message-ID: AANLkTi=YDS6fptRpVuJYQsmM5w7vgpn3YmxWErsRrNM8@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 2, 2010 at 3:14 AM, Greg Stark <stark(at)mit(dot)edu> wrote:

> The observed timings are
> query only                 100.5ms
> explain (analyze)         1234.9ms
> explain (analyze,resource) 2210.2ms

It's very large overheads, but I think it is still useful
if there are no overhead if we disable the resource option.

Some random ideas and comments:
* Only top-level rusage might be useful as a replacement for
ancient log_xxx_stats, and should have smaller overhead than
full-spec counters.
* There are some overlaps between the feature and DTrace hooks.
If we need such extension even though we have DTrace hooks,
it might mean DTrace hooks are hard to use for average users
and maybe also for postgres' hackers...
* Some kinds of statistical sampling could solve too much overhead
in stop-watch based algorithm. Sampling is not always accurate,
but it might have better balance between overhead and resolution.

--
Itagaki Takahiro


From: Greg Stark <stark(at)mit(dot)edu>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Adding getrusage profiling data to EXPLAIN output
Date: 2010-10-04 16:20:06
Message-ID: AANLkTimxF64rwF7DW2no++yDd9hU_Zk4C3v93DhV65e6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 4, 2010 at 3:29 AM, Itagaki Takahiro
<itagaki(dot)takahiro(at)gmail(dot)com> wrote:
>  * There are some overlaps between the feature and DTrace hooks.
>   If we need such extension even though we have DTrace hooks,
>   it might mean DTrace hooks are hard to use for average users
>   and maybe also for postgres' hackers...

I've had it on my radar for a long time to make use of DTrace directly
in EXPLAIN output. The dtrace folk are really not interested in
programmatic access to the interfaces to the counters becasue they're
afraid it'll nail down internal interfaces too early. But I do think
there's lots of potential there.

>  * Some kinds of statistical sampling could solve too much overhead
>   in stop-watch based algorithm. Sampling is not always accurate,
>   but it might have better balance between overhead and resolution.
>

Tom experimented with doing statistical sampling and found the results
were even worse than the overhead from stopwatch profiling. I think we
should be able to subtract the overhead from the measurements if it's
a problem. But actually in the case of getrusage stats I don't think
it's an issue. The main interesting bits in getrusage are inblock and
oublock which don't have any profiling overhead -- getrusage doesn't
do any i/o. if you're only concerned with the cpu run-time of an
in-memory query you don't really need to look at getrusage at all.

--
greg