oprofile results for stats collector test

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: oprofile results for stats collector test
Date: 2006-06-17 18:56:47
Message-ID: 22408.1150570607@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've gotten some insight into the stats collection issues by monitoring
Bruce's test case with oprofile (http://oprofile.sourceforge.net/).

Test conditions: PG CVS HEAD, built with --enable-debug --disable-cassert
(debug symbols are needed for oprofile), on current Fedora Core 5
(Linux kernel 2.6.16-1.2133_FC5 #1 SMP, gcc 4.1.1), running on dual Xeon
EM64T with hyperthreading enabled. All configuration parameters are at
defaults except for stats_command_string. I extended the test script to
be 240000 "SELECT 1;" commands so that it would run long enough to get a
trustworthy profile. In the profile data below, I cut off each listing
where the percentages fell below about 1% of global runtime.

With stats_command_string off:

21.7 sec to run script (with oprofile off)
22.7 sec to run script (with oprofile active)
These numbers stay pretty consistent over repeated trials.

oprofile global report:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 240000
GLOBAL_POWER_E...|
samples| %|
------------------
156544 47.2570 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux
84114 25.3921 /home/tgl/testversion/bin/postgres
63873 19.2818 /lib64/libc-2.4.so
13621 4.1119 /home/tgl/testversion/bin/psql
10101 3.0493 /home/tgl/testversion/lib/libpq.so.5.0
1324 0.3997 /usr/bin/oprofiled

kernel (note percentages here are percentage of kernel time not total time):
samples % symbol name
19326 12.3454 schedule
11738 7.4982 _raw_spin_lock
11631 7.4299 __switch_to
9897 6.3222 mwait_idle
7153 4.5693 __find_first_bit
6395 4.0851 try_to_wake_up
3972 2.5373 unix_stream_recvmsg
3813 2.4357 system_call
3739 2.3885 thread_return
3612 2.3073 find_busiest_group
3446 2.2013 resched_task

postgres:
samples % symbol name
6415 7.6266 base_yyparse
6115 7.2699 AllocSetAlloc
3525 4.1907 hash_search

libc:
samples % symbol name
9276 14.5226 memcpy
8686 13.5989 _int_malloc
5865 9.1823 strncpy
5538 8.6703 strlen
5286 8.2758 strspn
4351 6.8120 vfprintf

With stats_command_string on:

45.2 sec to run script (with oprofile off)
47.5 sec to run script (with oprofile active)
Also repeatable.

oprofile global report:
CPU: P4 / Xeon with 2 hyper-threads, speed 2793.03 MHz (estimated)
Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped)
with a unit mask of 0x01 (mandatory) count 240000
GLOBAL_POWER_E...|
samples| %|
------------------
785756 71.9386 /usr/lib/debug/lib/modules/2.6.16-1.2133_FC5/vmlinux
135820 12.4348 /home/tgl/testversion/bin/postgres
105828 9.6889 /lib64/libc-2.4.so
19903 1.8222 /home/tgl/testversion/lib/libpq.so.5.0
19293 1.7663 /home/tgl/testversion/bin/psql
7489 0.6856 /ip_conntrack
4555 0.4170 /usr/bin/oprofiled

kernel:
samples % symbol name
62337 7.9334 security_port_sid
61566 7.8353 schedule
37897 4.8230 _raw_spin_lock
36410 4.6338 __switch_to
26580 3.3827 mwait_idle
21797 2.7740 __find_first_bit
19118 2.4331 avc_has_perm_noaudit
18929 2.4090 sys_select
17005 2.1642 system_call
15979 2.0336 do_select
15762 2.0060 fget
14838 1.8884 try_to_wake_up
13488 1.7166 find_busiest_group
11956 1.5216 _raw_read_lock
11685 1.4871 thread_return
10639 1.3540 resched_task
10542 1.3416 copy_user_generic

postgres:
samples % symbol name
7817 5.7554 PgstatBufferMain
7541 5.5522 base_yyparse
7247 5.3357 AllocSetAlloc
6725 4.9514 hash_search

libc:
samples % symbol name
15012 14.1853 strncpy
12265 11.5896 _int_malloc
11949 11.2910 memcpy

This system is running with SELinux enabled (in permissive mode).
I presume that if I turned it off, security_port_sid() and
avc_has_perm_noaudit() would drop off the radar screen but things
wouldn't change much otherwise.

Also, you'd expect to see PgstatCollectorMain() somewhere but it's
not in the oprofile report at all. I believe that gcc sees that it's
a static singly-referenced function and inlines it into
PgstatBufferMain; so the time shown above for PgstatBufferMain should
be read as the joint time for those two functions. Also it looks like
all of the pgstat_recv_foo functions got inlined into it too. The
only pgstat functions that show separately in the oprofile output are

1214 0.8938 pgstat_add_backend
860 0.6332 pgstat_report_tabstat
749 0.5515 pgstat_report_activity
434 0.3195 pgstat_write_statsfile
330 0.2430 pgstat_setheader
196 0.1443 pgstat_count_xact_commit
106 0.0780 pgstat_send
91 0.0670 pgstat_get_db_entry

So the total time spent directly in pgstat.c is less than 9% of the
postgres executable's time, or only about 1% of total runtime.

Comparing the sample counts between the two cases ought to be legitimate
because in both cases the oprofile run covered the full run of the test
script. However the second case shows twice as many hits in psql and
libpq.so, which ought to be doing the same amount of work, so I think
there's some measurement artifact involved there. In any case, the
kernel time has gone up way more than the userland time (a result
confirmed by "top" and "vmstat").

Conclusions:

* Kernel time is what's hurting us.

* Most of the kernel time is associated with process scheduling.
In our current system, a simple command with no stats overhead
involves two process dispatches (the backend and psql) per command
round trip. Enabling stats_command_string increases that to six,
because we'll also dispatch the buffer process and collector process
twice per command.

* If you are using SELinux, UDP datagrams are expensive because the
kernel permission-checks each one separately (even with a bound socket,
apparently). We might want to think about replacing the UDP mechanism
with a Unix socket where available.

In short, it sure looks like the separate buffer process is a huge loss,
at least on this platform. Getting rid of it should reduce the kernel
overhead by a third. It's hard to tell how much userland time it would
save, but the userland time isn't the problem.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-06-17 19:09:01 Re: MultiXacts & WAL
Previous Message paolo romano 2006-06-17 17:43:34 Re: MultiXacts & WAL