Re: Buffer usage in EXPLAIN and pg_stat_statements (review)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Buffer usage in EXPLAIN and pg_stat_statements (review)
Date: 2009-10-13 15:41:40
Message-ID: 8108.1255448500@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Itagaki Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Here is an update version of buffer usage patch.

I started to look at this patch, and I have a few comments:

1. I was expecting this patch to get rid of ShowBufferUsage() and friends
altogether, instead of adding yet more static counters to support them.
Isn't that stuff pretty well superseded by having EXPLAIN support?

2. I do not understand the stuff with propagating counts into the top
instrumentation node. That seems like it's going to double-count those
counts. In any case it is 100% inconsistent to propagate only buffer
counts that way and not any other resource usage. I think you should
drop the TopInstrument variable and the logic that propagates counts up.

3. I don't believe that you've sufficiently considered the problem of
restoring the previous value of CurrentInstrument after an error. It is
not at all adequate to do it in postgres.c; consider subtransactions
for example. However, so far as I can see that variable is useless
anyway. Couldn't you just drop both that and the "prev" link?
(If you keep TopInstrument then the same objection applies to it.)

4. I don't believe this counting scheme works, except in the special
case where all buffer access happens in leaf plan nodes (which might be
enough if it weren't for Sort, Materialize, Hash, etc). It looks to
me like counts will be transferred into the instrumentation node for
the next plan node to stop execution, which could be a descendant of
the node that really ought to get charged.

You could deal with #4 by having the low-level I/O routines accumulate
counts directly into *CurrentInstrument and not have static I/O counters
at all, but then you'd have to contend with fixing #3 properly instead
of just eliminating that global variable. It might be better to add a
"start" field to struct Instrumentation for each counter, and do
something like this:
* StartNode copies static counter into start field
* StopNode computes delta = static counter - start field,
then adds delta to node's count and resets counter to start
The reason for the reset is so that the I/O isn't double counted by
parent nodes. If you wanted buffer I/O to be charged to the node
causing it *and* to all parent nodes, which would be more consistent
with the way we charge CPU time, then don't do the reset. Offhand
though that seems to me like it'd be more surprising than useful.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2009-10-13 15:45:47 Re: Client application name
Previous Message Andrew Dunstan 2009-10-13 15:34:29 Re: Client application name