Re: Generic Monitoring Framework Proposal

From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 23:36:27
Message-ID: 3630FC19-345C-4C75-8114-FC8A48E09D71@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On Jun 19, 2006, at 6:41 PM, Robert Lor wrote:

> Theo Schlossnagle wrote:
>
>>
>> Heh. Syscall probes and FBT probes in Dtrace have zero
>> overhead. User-space probes do have overhead, but it is only a
>> few instructions (two I think). Besically, the probe points are
>> replaced by illegal instructions and the kernel infrastructure
>> for Dtrace will fasttrap the ops and then act. So, it is tiny
>> tiny overhead. Little enough that it isn't unreasonable to
>> instrument things like s_lock which are tiny.
>
> Theo, you're a genius. FBT (funciton boundary tracing) probes have
> zero overhead (section 4.1) and user-space probes has two
> instructions over head (section 4.2). I was incorrect about making
> a general zero overhead statement. But it's so close to zero :-)
>
> http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
>
>>
>> The reason that Robert proposes user-space probes (I assume) is
>> that tracing C functions can be too granular and not conveniently
>> expose the "right" information to make tracing useful.
>
> Yes, I'm proposing user-space probes (aka User Statically-Defined
> Tracing - USDT). USDT provides a high-level abstraction so the
> application can expose well defined probes without the user having
> to know the detailed implementation. For example, instead of
> having to know the function LWLockAcquire(), a well documented
> probe called lwlock_acquire with the appropriate args is much more
> usable.

I am giving a talk at OSCON this year about PostgreSQL on "big
systems". Big is all relative, but I will be talking about dtrace a
bit and the advantages of running PostgreSQL on Solaris which is what
we ended up doing after some extremely disturbing experiences on
Linux. I was able to track a very acute memory "leak" in pl/perl
(which Neil so kindly fixed) within a few moments -- and this is
without explicit user-space trace points. If there were good user-
space points, I likely wouldn't have had to dig in the source as a
pre-cursor to my dtrace efforts.

The things you might be able to do with user-specific trace points:
o better understand the block scatter (distance of block-level
reads) for a specific query).
o understand lock contention in vastly multiprocessor systems
using plockstat (my hunch is that heavy-weight locks might be better).
o our current box is 4 way opteron, but we have a 16-way T2000
as well.
o report on queries including turn-around time, block-accesses,
lock acquisitions grouped by query for specific time windows.

The nice thing about dtrace is that it requires no "prep" to look at
a problem. When something is acting odd in production, you don't
want to attempt to repeat it in a test environment first. You want
to observe it. Dtrace allows you to dig in "really deep" in
production with an acceptable performance penalty and ask questions
that couldn't be asked before. It is exceptionally clever stuff. Of
all the new "neat stuff" in Solaris 10, it has my vote for coolest
and most useful. I've nailed several production problems (outside
of Postgres) using dtrace with accuracy and efficiency. When Solaris
10u2 is released, we'll be trying Postgres on ZFS, so my rankings may
change :-)

The idea of having intelligently placed dtrace probes in Postrgres
would allow us to deal with postgres as a "first class" app on
Solaris 10 with respect to troubleshooting obtuse production
problems. That, to me, is exciting stuff.

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2006-06-19 23:39:35 Re: Generic Monitoring Framework Proposal
Previous Message Tom Lane 2006-06-19 23:02:13 CVS HEAD busted on Windows?