Generic Monitoring Framework Proposal

Lists: pgsql-hackers
From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Generic Monitoring Framework Proposal
Date: 2006-06-19 19:58:48
Message-ID: 449701F8.4020108@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Motivation:
----------

The main goal for this Generic Monitoring Framework is to provide a common interface for adding instrumentation points or probes to
Postgres so its behavior can be easily observed by developers and administrators even in production systems. This framework will allow Postgres to use the appropriate
monitoring/tracing facility provided by each OS. For example, Solaris and FreeBSD will use DTrace, and other OSes can use their respective tool.

What is DTrace?
--------------

Some of you may have heard about or used DTrace already. In a nutshell, DTrace is a comprehensive dynamic tracing facility that is built into
Solaris and FreeBSD (mostly working) that can be used by administrators and developers on live production systems to examine the behavior
of both user programs and of the operating system.

DTrace can help answer difficult questions about the OS and the application itself. For example, you may want to ask:

- Show all functions that get invoked (userland & kernel) and execution time when my function foo() is called. Seeing the path a function
takes into the kernel may provide clues for performance tuning.
- Show how many times a particular lock is acquired and how long it's held. This can help identity contentions in the system.

The best way to appreciate DTrace capabilities is by seeing a demo or through hands-on experience, and I plan to show some interesting
demos at the PG Summit.

There are a numer of docs on Dtrace, and here's a quick start doc and a complete reference guide.
http://www.sun.com/software/solaris/howtoguides/dtracehowto.jsp
http://docs.sun.com/app/docs/doc/817-6223

Here is a recent DTrace for FreeBSD status
http://marc.theaimsgroup.com/?l=freebsd-current&m=114854018213275&w=2

Open source apps that provide user level probes (bottom of page)
http://uadmin.blogspot.com/2006/05/what-is-dtrace.html

Proposed Solution:
----------------

This solution is actually quite simple and non-intrusive.

1. Define macros PG_TRACE, PG_TRACE1, etc, in a new header file
called pg_trace.h with multiple #if defined(xxx) sections for Solaris,
FreeBSD, Linux, etc, and add pg_trace.h to c.h which is included in postgres.h
and included by every C file.

The macros will have the following format:

PG_TRACE[n](module_name, probe_name [, arg1, ..., arg5])

module_name = Name to identify PG module such as pg_backend, pg_psql, pg_plpgsql, etc
probe_name = Probe name such as transaction_start, lwlock_acquire, etc
arg1..arg5 = Any args to pass to the probe such as txn id, lock id, etc

2. Map PG_TRACE, PG_TRACE1, etc, to macros or functions appropriate for each OS.
For OSes that don't have suitable tracing facility, just map the macros to nothing - doing this will not have any affect on performance or
existing behavior.

Sample of pg_trace.h

#if defined(sun) || defined(FreeBSD)

#include <sys/sdt.h>
#define PG_TRACE DTRACE_PROBE
#define PG_TRACE1 DTRACE_PROBE1
...
#define PG_TRACE5 DTRACE_PROBE5

#elif defined(__linux__) || defined(_AIX) || defined(__sgi) ...

/* Map the macros to no-ops */
#define PG_TRACE(module, name)
#define PG_TRACE1(module, name, arg1)
...
#define PG_TRACE5(module, name, arg1, arg2, arg3, arg4, arg5)

#endif

3. Add any file(s) to support the particular OS tracing facility

4. Update the Makefiles as necessary for each OS

How to add probes:
-----------------

To add a probe, just add a one line macro in the appropriate location in the source. Here's an example of two probes, one with no argument
and the other with 2 arguments:

PG_TRACE (pg_backend, fsync_start);
PG_TRACE2 (pg_backend, lwlock_acquire, lockid, mode);

If there are enough probes embedded in PG, its behavior can be easily observed.

With the help of Gavin Sherry, we have added about 20 probes, and Gavin has suggested a number of other interesting areas for additional probes.
Pervasive has also added some probes to PG 8.0.4 and posted the patch on http://pgfoundry.org/projects/dtrace/. I hope to combine the probes
using this generic framework for 8.1.4, and make it available for folks to try.

Since my knowledge of the PG source code is limited, I'm looking for assistance from experts to hep identify some new interesting probe points.

How to use probes:
----------------

For DTrace, probes can be enabled using a D script. When the probes are not enabled, there is absolutely no performance hit whatsoever.
Here is a simple example to print out the number of LWLock counts for each PG process.

test.d

#!/usr/sbin/dtrace -s
pg_backend*:::lwlock-acquire
{
@foo[pid] = count();
}

dtrace:::END {
printf("\n%10s %15s\n", "PID", "Count");
printa("%10d %(at)15d\n",@foo);
}

# ./test.d

PID Count
1438 28
1447 7240
1448 9675
1449 11972

I have a prototype working, so if anyone wants to try it, I can provide a patch or give access to my test system.

This is a proposal, so comments, suggestions, feedbacks are certainly welcome.

Regards,
Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 20:40:26
Message-ID: 21399.1150749626@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
> The main goal for this Generic Monitoring Framework is to provide a
> common interface for adding instrumentation points or probes to
> Postgres so its behavior can be easily observed by developers and
> administrators even in production systems.

What is the overhead of a "probe" when you're not using it? The answer
had better not include the phrase "kernel call", or this is unlikely to
pass muster...

> For DTrace, probes can be enabled using a D script. When the probes are not enabled, there is absolutely no performance hit whatsoever.

If you believe that, I have a bridge in Brooklyn you might be interested
in.

What are the criteria going to be for where to put probe calls? If it
has to be hard-wired into the source code, I foresee a lot of contention
about which probes are worth their overhead, because we'll need
one-size-fits-all answers.

> arg1..arg5 = Any args to pass to the probe such as txn id, lock id, etc
Where is the data type of a probe argument defined?

regards, tom lane


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 21:14:15
Message-ID: 60k67cyhjs.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert(dot)Lor(at)Sun(dot)COM (Robert Lor) writes:
> For DTrace, probes can be enabled using a D script. When the probes
> are not enabled, there is absolutely no performance hit whatsoever.

That seems inconceivable.

In order to have a way of deciding whether or not the probes are
enabled, there has *got* to be at least one instruction executed, and
that can't be costless.
--
output = reverse("gro.mca" "@" "enworbbc")
http://www.ntlug.org/~cbbrowne/wp.html
"...while I know many people who emphatically believe in
reincarnation, I have never met or read one who could satisfactorily
explain population growth." -- Spider Robinson


From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 21:20:31
Message-ID: B1691267-F31F-40E4-844B-3CCFE88EA094@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jun 19, 2006, at 4:40 PM, Tom Lane wrote:

> Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
>> The main goal for this Generic Monitoring Framework is to provide a
>> common interface for adding instrumentation points or probes to
>> Postgres so its behavior can be easily observed by developers and
>> administrators even in production systems.
>
> What is the overhead of a "probe" when you're not using it? The
> answer
> had better not include the phrase "kernel call", or this is
> unlikely to
> pass muster...
>
>> For DTrace, probes can be enabled using a D script. When the
>> probes are not enabled, there is absolutely no performance hit
>> whatsoever.
>
> If you believe that, I have a bridge in Brooklyn you might be
> interested
> in.

Heh. Syscall probes and FBT probes in Dtrace have zero overhead.
User-space probes do have overhead, but it is only a few instructions
(two I think). Besically, the probe points are replaced by illegal
instructions and the kernel infrastructure for Dtrace will fasttrap
the ops and then act. So, it is tiny tiny overhead. Little enough
that it isn't unreasonable to instrument things like s_lock which are
tiny.

> What are the criteria going to be for where to put probe calls? If it
> has to be hard-wired into the source code, I foresee a lot of
> contention
> about which probes are worth their overhead, because we'll need
> one-size-fits-all answers.
>
>> arg1..arg5 = Any args to pass to the probe such as txn id, lock
>> id, etc
> Where is the data type of a probe argument defined?

I assume it would depend on the probe implementation. In Dtrace they
are implemented in .d files that will post-instrument the object
before final linkage. Dtrace's whole purpose is to be low overhead
and it really does it in a fantastic way.

As an example, you can take an uninstrumented binary and add dynamic
instrumentation to the entry, exit and every instruction op-code over
every single routine in the process. And clearly, as the binary is
uninstrumented, the overhead is indeed zero when the probes are not
enabled.

The reason that Robert proposes user-space probes (I assume) is that
tracing C functions can be too granular and not conveniently expose
the "right" information to make tracing useful.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 21:46:40
Message-ID: 44971B40.1040900@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

>Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:
>
>
>>The main goal for this Generic Monitoring Framework is to provide a
>>common interface for adding instrumentation points or probes to
>>Postgres so its behavior can be easily observed by developers and
>>administrators even in production systems.
>>
>>
>
>What is the overhead of a "probe" when you're not using it? The answer
>had better not include the phrase "kernel call", or this is unlikely to
>pass muster...
>
>
Here's what the DTrace developers have to say in their Usenix paper.
"When not explicitly enabled, DTrace has zero probe effect - the system
operates exactly as if DTrace were not present at all."

http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf

The technical details are beyond me, so I can't tell you exactly what
happens internally. I can find out if you're interested!

>
>
>>For DTrace, probes can be enabled using a D script. When the probes are not enabled, there is absolutely no performance hit whatsoever.
>>
>>
>
>If you believe that, I have a bridge in Brooklyn you might be interested
>in.
>
>What are the criteria going to be for where to put probe calls? If it
>has to be hard-wired into the source code, I foresee a lot of contention
>about which probes are worth their overhead, because we'll need
>one-size-fits-all answers.
>
>
>
I think we need to be selective in terms of which probes to add since
we don't want to scatter them all over the source files. For DTrace,
the overhead is very minimal, but you're right, other implementation for
the same probe may have more perf overhead.

>>arg1..arg5 = Any args to pass to the probe such as txn id, lock id, etc
>>
>>
>Where is the data type of a probe argument defined?
>
>
It's in a .d file which looks like below:

provider pg_backend {

probe fsync__start(void);
probe fsync__end(void);
probe lwlock__acquire (int, int);
probe lwlock__release(int);
...

}

Regards,
Robert


From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Theo Schlossnagle <jesus(at)omniti(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 21:52:58
Message-ID: 20060619215258.GE93655@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 19, 2006 at 05:20:31PM -0400, Theo Schlossnagle wrote:
> Heh. Syscall probes and FBT probes in Dtrace have zero overhead.
> User-space probes do have overhead, but it is only a few instructions
> (two I think). Besically, the probe points are replaced by illegal
> instructions and the kernel infrastructure for Dtrace will fasttrap
> the ops and then act. So, it is tiny tiny overhead. Little enough
> that it isn't unreasonable to instrument things like s_lock which are
> tiny.

If someone wanted to, they should be able to do benchmarking with the
DTrace patches on pgFoundry to see the overhead of just having the
probes in, and then having the probes in and actually using them. If you
*really* want to see the difference, add a probe in s_lock. :)
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Theo Schlossnagle <jesus(at)omniti(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 22:41:00
Message-ID: 449727FC.4000800@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Theo Schlossnagle wrote:

>
> Heh. Syscall probes and FBT probes in Dtrace have zero overhead.
> User-space probes do have overhead, but it is only a few instructions
> (two I think). Besically, the probe points are replaced by illegal
> instructions and the kernel infrastructure for Dtrace will fasttrap
> the ops and then act. So, it is tiny tiny overhead. Little enough
> that it isn't unreasonable to instrument things like s_lock which are
> tiny.

Theo, you're a genius. FBT (funciton boundary tracing) probes have zero
overhead (section 4.1) and user-space probes has two instructions over
head (section 4.2). I was incorrect about making a general zero overhead
statement. But it's so close to zero :-)

http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf

>
> The reason that Robert proposes user-space probes (I assume) is that
> tracing C functions can be too granular and not conveniently expose
> the "right" information to make tracing useful.

Yes, I'm proposing user-space probes (aka User Statically-Defined
Tracing - USDT). USDT provides a high-level abstraction so the
application can expose well defined probes without the user having to
know the detailed implementation. For example, instead of having to
know the function LWLockAcquire(), a well documented probe called
lwlock_acquire with the appropriate args is much more usable.

Regards,
Robert


From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 23:36:27
Message-ID: 3630FC19-345C-4C75-8114-FC8A48E09D71@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jun 19, 2006, at 6:41 PM, Robert Lor wrote:

> Theo Schlossnagle wrote:
>
>>
>> Heh. Syscall probes and FBT probes in Dtrace have zero
>> overhead. User-space probes do have overhead, but it is only a
>> few instructions (two I think). Besically, the probe points are
>> replaced by illegal instructions and the kernel infrastructure
>> for Dtrace will fasttrap the ops and then act. So, it is tiny
>> tiny overhead. Little enough that it isn't unreasonable to
>> instrument things like s_lock which are tiny.
>
> Theo, you're a genius. FBT (funciton boundary tracing) probes have
> zero overhead (section 4.1) and user-space probes has two
> instructions over head (section 4.2). I was incorrect about making
> a general zero overhead statement. But it's so close to zero :-)
>
> http://www.sun.com/bigadmin/content/dtrace/dtrace_usenix.pdf
>
>>
>> The reason that Robert proposes user-space probes (I assume) is
>> that tracing C functions can be too granular and not conveniently
>> expose the "right" information to make tracing useful.
>
> Yes, I'm proposing user-space probes (aka User Statically-Defined
> Tracing - USDT). USDT provides a high-level abstraction so the
> application can expose well defined probes without the user having
> to know the detailed implementation. For example, instead of
> having to know the function LWLockAcquire(), a well documented
> probe called lwlock_acquire with the appropriate args is much more
> usable.

I am giving a talk at OSCON this year about PostgreSQL on "big
systems". Big is all relative, but I will be talking about dtrace a
bit and the advantages of running PostgreSQL on Solaris which is what
we ended up doing after some extremely disturbing experiences on
Linux. I was able to track a very acute memory "leak" in pl/perl
(which Neil so kindly fixed) within a few moments -- and this is
without explicit user-space trace points. If there were good user-
space points, I likely wouldn't have had to dig in the source as a
pre-cursor to my dtrace efforts.

The things you might be able to do with user-specific trace points:
o better understand the block scatter (distance of block-level
reads) for a specific query).
o understand lock contention in vastly multiprocessor systems
using plockstat (my hunch is that heavy-weight locks might be better).
o our current box is 4 way opteron, but we have a 16-way T2000
as well.
o report on queries including turn-around time, block-accesses,
lock acquisitions grouped by query for specific time windows.

The nice thing about dtrace is that it requires no "prep" to look at
a problem. When something is acting odd in production, you don't
want to attempt to repeat it in a test environment first. You want
to observe it. Dtrace allows you to dig in "really deep" in
production with an acceptable performance penalty and ask questions
that couldn't be asked before. It is exceptionally clever stuff. Of
all the new "neat stuff" in Solaris 10, it has my vote for coolest
and most useful. I've nailed several production problems (outside
of Postgres) using dtrace with accuracy and efficiency. When Solaris
10u2 is released, we'll be trying Postgres on ZFS, so my rankings may
change :-)

The idea of having intelligently placed dtrace probes in Postrgres
would allow us to deal with postgres as a "first class" app on
Solaris 10 with respect to troubleshooting obtuse production
problems. That, to me, is exciting stuff.

Best regards,

Theo

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 23:39:35
Message-ID: 449735B7.5040504@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jim C. Nasby wrote:
> On Mon, Jun 19, 2006 at 05:20:31PM -0400, Theo Schlossnagle wrote:
>> Heh. Syscall probes and FBT probes in Dtrace have zero overhead.
>> User-space probes do have overhead, but it is only a few instructions
>> (two I think). Besically, the probe points are replaced by illegal
>> instructions and the kernel infrastructure for Dtrace will fasttrap
>> the ops and then act. So, it is tiny tiny overhead. Little enough
>> that it isn't unreasonable to instrument things like s_lock which are
>> tiny.
>
> If someone wanted to, they should be able to do benchmarking with the
> DTrace patches on pgFoundry to see the overhead of just having the
> probes in, and then having the probes in and actually using them. If you
> *really* want to see the difference, add a probe in s_lock. :)

We will need to benchmark on FreeBSD to see if those comments about
overhead stand up to scrutiny there too.

I would think that even if (for instance) we find that there is no
overhead on Solaris, those of us on platforms where DTrace is less
mature would want the option of building without any probes at all in
the code - I guess a configure option "--without-dtrace" on by default
on those platforms would do it.

regards

Mark


From: Theo Schlossnagle <jesus(at)omniti(dot)com>
To: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-19 23:48:55
Message-ID: F8550E6D-8CB6-4F98-AC32-32758151543D@omniti.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On Jun 19, 2006, at 7:39 PM, Mark Kirkwood wrote:
> We will need to benchmark on FreeBSD to see if those comments about
> overhead stand up to scrutiny there too.

I've followed the development of DTrace on FreeBSD and the design
approach is mostly identical to the Solaris one. This would mean
that if there is overhead on FreeBSD not present on Solaris it would
be considered a big and likely fixed.

> I would think that even if (for instance) we find that there is no
> overhead on Solaris, those of us on platforms where DTrace is less
> mature would want the option of building without any probes at all
> in the code - I guess a configure option "--without-dtrace" on by
> default on those platforms would do it.

Absolutely. As they are all proposed as preprocessor macros, this
would be trivial to accomplish.

// Theo Schlossnagle
// CTO -- http://www.omniti.com/~jesus/
// OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
// Ecelerity: Run with it.


From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Theo Schlossnagle <jesus(at)omniti(dot)com>
Cc: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 09:06:41
Message-ID: 1150794402.2587.146.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2006-06-19 at 19:36 -0400, Theo Schlossnagle wrote:

> The idea of having intelligently placed dtrace probes in Postrgres
> would allow us
...
> to troubleshoot[ing] obtuse production
> problems. That, to me, is exciting stuff.
[paraphrased by SR]

I very much agree with the requirement here.

This needs to work on Linux and Windows, minimum, also.

It's obviously impossible to move a production system to a different OS
just to use a cool tracing tool. So the architecture must intelligently
handle the needs of multiple OS - even if the underlying facilities on
them do not yet provide what we'd like. So I'm OK with Solaris being the
best, just as long as its not the only one that benefits.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 12:36:12
Message-ID: 20060620123612.GA24606@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jun 19, 2006 at 05:14:15PM -0400, Chris Browne wrote:
> Robert(dot)Lor(at)Sun(dot)COM (Robert Lor) writes:
> > For DTrace, probes can be enabled using a D script. When the probes
> > are not enabled, there is absolutely no performance hit whatsoever.
>
> That seems inconceivable.
>
> In order to have a way of deciding whether or not the probes are
> enabled, there has *got* to be at least one instruction executed, and
> that can't be costless.

I think the trick is that the probe are enabled by overwriting bits of
code. So by default you might put a No-Op instruction and if you want
to trace you replace that with an illegal instruction or the special
one-byte INT3 instruction x86 system have for this purpose.

With a 17-stage pipelined processor I imagine the cost of a no-op would
indeed be almost unmeasurable (increase code size I suppose).

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 14:19:37
Message-ID: 87irmv52py.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Robert Lor <Robert(dot)Lor(at)Sun(dot)COM> writes:

> Yes, I'm proposing user-space probes (aka User Statically-Defined Tracing -
> USDT). USDT provides a high-level abstraction so the application can expose
> well defined probes without the user having to know the detailed
> implementation. For example, instead of having to know the function
> LWLockAcquire(), a well documented probe called lwlock_acquire with the
> appropriate args is much more usable.

It seems pointless to me to expose things like lwlock_acuire that map 1-1 to C
function calls like LWLockAcquire. They're useless except to people who
understand what's going on and if people know the low level implementation
details of Postgres they can already trace those calls with dtrace without any
help.

What would be useful is instrumenting high level calls that can't be traced
without application guidance. For example, inserting a dtrace probe for each
SQL and each plan node. That way someone could get the same info as EXPLAIN
ANALYZE from a production server without having to make application
modifications (or suffer the gettimeofday overhead).

It's one thing to know "I seem to be acquiring a lot of locks" or "i'm
spending all my time in sorting". It's another to be able to ask dtrace "what
query am I running when doing all this sorting?" or "what kind of plan node am
I running when I'm acquiring all these locks?"

--
greg


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, Theo Schlossnagle <jesus(at)omniti(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 14:34:13
Message-ID: 27941.1150814053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> What would be useful is instrumenting high level calls that can't be traced
> without application guidance. For example, inserting a dtrace probe for each
> SQL and each plan node. That way someone could get the same info as EXPLAIN
> ANALYZE from a production server without having to make application
> modifications (or suffer the gettimeofday overhead).

My bogometer just went off again. How is something like dtrace going to
magically get realtime information without reading the clock?

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>, Theo Schlossnagle <jesus(at)omniti(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 15:31:19
Message-ID: 87d5d34zeg.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
> > What would be useful is instrumenting high level calls that can't be traced
> > without application guidance. For example, inserting a dtrace probe for each
> > SQL and each plan node. That way someone could get the same info as EXPLAIN
> > ANALYZE from a production server without having to make application
> > modifications (or suffer the gettimeofday overhead).
>
> My bogometer just went off again. How is something like dtrace going to
> magically get realtime information without reading the clock?

Sorry, I meant get the same info as EXPLAIN ANALYZE minus the timing.

I'm not familiar with DTrace first-hand but I did have the impression it was
possible to get timing information though. I don't know how much overhead it
has but I wouldn't be surprised if it was lower for a kernel-based profiling
elapsed time counter on Sun hardware than a general purpose gettimeofday call
on commodity PC hardware.

For example it could use a cpu instruction counter and have hooks in the
scheduler for saving and restoring the counter to avoid the familiar gotchas
with being rescheduled across processors.

--
greg


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 17:27:56
Message-ID: 4498301C.3000707@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:

>This needs to work on Linux and Windows, minimum, also.
>
>
The proposed solution will work on Linux & Windows if they similar
facility that the macros can map to. Otherwise, the macros stay as
no-ops and will not affect those platforms at all.

>It's obviously impossible to move a production system to a different OS
>just to use a cool tracing tool. So the architecture must intelligently
>handle the needs of multiple OS - even if the underlying facilities on
>them do not yet provide what we'd like. So I'm OK with Solaris being the
>best, just as long as its not the only one that benefits.
>
>
>
The way it's proposed now, any OS can use the same interfaces and map to
their underlying facilities. Does it look reasonable?

Regards,
Robert


From: Robert Lor <Robert(dot)Lor(at)Sun(dot)COM>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Theo Schlossnagle <jesus(at)omniti(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Generic Monitoring Framework Proposal
Date: 2006-06-20 17:44:43
Message-ID: 4498340B.30700@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:

>It seems pointless to me to expose things like lwlock_acuire that map 1-1 to C
>function calls like LWLockAcquire. They're useless except to people who
>understand what's going on and if people know the low level implementation
>details of Postgres they can already trace those calls with dtrace without any
>help.
>
>
>
lwlock_acquire is just an example. I think once we decided to down this
path, we can solicit ideas for interesting probes and put them up for
discussion on this alias whether or not they are needed. I think we need
to have two categories of probes for admins and developers. Perhaps the
probes for admins are more important since, as you said, the developers
already know which function does what, but I think the low-level probes
are still useful for new developers as there behavior will be documented.

>What would be useful is instrumenting high level calls that can't be traced
>without application guidance. For example, inserting a dtrace probe for each
>SQL and each plan node. That way someone could get the same info as EXPLAIN
>ANALYZE from a production server without having to make application
>modifications (or suffer the gettimeofday overhead).
>
>
>It's one thing to know "I seem to be acquiring a lot of locks" or "i'm
>spending all my time in sorting". It's another to be able to ask dtrace "what
>query am I running when doing all this sorting?" or "what kind of plan node am
>I running when I'm acquiring all these locks?"
>
>
>
Completely agree.

Regards,
Robert