Re: That EXPLAIN ANALYZE patch still needs work

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: That EXPLAIN ANALYZE patch still needs work
Date: 2006-06-06 21:05:20
Message-ID: 20060606210519.GC45331@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 06, 2006 at 04:50:28PM -0400, Tom Lane wrote:
> I have a theory about this, and it's not pleasant at all. What I
> think is that we have a Heisenberg problem here: the act of invoking
> gettimeofday() actually changes what is measured. That is, the
> runtime of the "second part" of ExecProcNode is actually longer when
> we sample than when we don't, not merely due to the extra time spent
> in gettimeofday(). It's not very hard to guess at reasons why, either.
> The kernel entry is probably flushing some part of the CPU's state,
> such as virtual/physical address mapping for the userland address
> space. After returning from the kernel call, the time to reload
> that state shows up as more execution time within the "second part".
>
> This theory explains two observations that otherwise are hard to
> explain. One, that the effect is platform-specific: your machine
> may avoid flushing as much state during a kernel call as mine does.
> And two, that upper plan nodes seem much more affected than lower
> ones. That makes sense because the execution cycle of an upper node
> will involve touching more userspace data than a lower node, and
> therefore more of the flushed TLB entries will need to be reloaded.

If that's the case, then maybe a more sopdisticated method of measuring
the overhead would work. My thought is that on the second call to pull a
tuple from a node (second because the first probably has some anomolies
due to startup), we measure the overhead for that node. This would
probably mean doing the following:
get start time # I'm not refering to this as gettimeofday to avoid
# confusion
gettimeofday() # this is the gettimeofday call that will happen during
# normal operation
get end time

Hopefully, there's no caching effect that would come into play from not
actually touching any of the data structures after the gettimeofday()
call. If that's not the case, it makes measuring the overhead more
complex, but I think it should still be doable...
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-06-06 21:17:19 Re: That EXPLAIN ANALYZE patch still needs work
Previous Message Tom Lane 2006-06-06 21:01:53 Re: AIX check in datetime.h