Re: postgres processes spending most of their time in the kernel

Lists: pgsql-general
From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: <pgsql-general(at)postgresql(dot)org>
Subject: postgres processes spending most of their time in the kernel
Date: 2001-12-28 18:06:16
Message-ID: Pine.LNX.4.33.0112281001050.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

I have a moderately loaded postgres server running 7.2beta4 (i wanted to
try out the live vacuum) that turns out to spend the majority of its cpu
time in kernel land. With only a handful of running processes, postgres
induces tens of thousands of context switches per second. Practically the
only thing postgres does with all this CPU time is semop() in a tight
loop. Here is a snippet of strace:

[pid 11410] 0.000064 <... semop resumed> ) = 0
[pid 11409] 0.000020 <... semop resumed> ) = 0
[pid 11410] 0.000024 semop(1179648, 0xbfffe658, 1 <unfinished ...>
[pid 11409] 0.000027 semop(1179648, 0xbfffe488, 1 <unfinished ...>
[pid 11407] 0.000027 semop(1179648, 0xbfffe8b8, 1 <unfinished ...>
[pid 11409] 0.000022 <... semop resumed> ) = 0
[pid 11406] 0.000018 <... semop resumed> ) = 0
[pid 11409] 0.000023 semop(1179648, 0xbfffe468, 1 <unfinished ...>
[pid 11406] 0.000026 semop(1179648, 0xbfffe958, 1) = 0
[pid 11406] 0.000057 semop(1179648, 0xbfffe9f8, 1 <unfinished ...>
[pid 11408] 0.000037 <... semop resumed> ) = 0
[pid 11408] 0.000029 semop(1179648, 0xbfffe4d8, 1) = 0
[pid 11411] 0.000038 <... semop resumed> ) = 0
[pid 11408] 0.000023 semop(1179648, 0xbfffe4d8, 1 <unfinished ...>
[pid 11411] 0.000026 semop(1179648, 0xbfffe498, 1) = 0
[pid 11407] 0.000040 <... semop resumed> ) = 0
[pid 11411] 0.000024 semop(1179648, 0xbfffe658, 1 <unfinished ...>
[pid 11407] 0.000027 semop(1179648, 0xbfffe8a8, 1) = 0
[pid 11410] 0.000038 <... semop resumed> ) = 0
[pid 11407] 0.000024 semop(1179648, 0xbfffe918, 1 <unfinished ...>
[pid 11410] 0.000026 semop(1179648, 0xbfffe618, 1) = 0
[pid 11410] 0.000058 semop(1179648, 0xbfffe6a8, 1 <unfinished ...>
[pid 11409] 0.000024 <... semop resumed> ) = 0
[pid 11409] 1.214166 semop(1179648, 0xbfffe428, 1) = 0
[pid 11406] 0.000063 <... semop resumed> ) = 0
[pid 11406] 0.000031 semop(1179648, 0xbfffe9f8, 1) = 0
[pid 11406] 0.000051 semop(1179648, 0xbfffe8f8, 1 <unfinished ...>

Performance on this database kind of sucks. Since there is little or no
block I/O, I assume this is because postgres is wasting its CPU
allocations.

Does anyone else see this? Is there a config option to tune the locking
behavior? Any other workarounds?

The machine is a 2-way x86 running Linux 2.4. I brought this up on
linux-kernel and they don't seem to think it is the scheduler's problem.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 18:44:49
Message-ID: 13952.1009565089@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> I have a moderately loaded postgres server running 7.2beta4 (i wanted to
> try out the live vacuum) that turns out to spend the majority of its cpu
> time in kernel land. With only a handful of running processes, postgres
> induces tens of thousands of context switches per second. Practically the
> only thing postgres does with all this CPU time is semop() in a tight
> loop.

It sounds like you have a build that's using SysV semaphores in place of
test-and-set instructions. That should not happen on x86 hardware,
since we have assembly TAS code for x86. Please look at your port
header file (src/include/pg_config_os.h symlink) and
src/include/storage/s_lock.h to figure out why it's misbuilt.

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 19:18:03
Message-ID: Pine.LNX.4.33.0112281111140.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Dec 2001, Tom Lane wrote:

> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > I have a moderately loaded postgres server running 7.2beta4 (i wanted to
> > try out the live vacuum) that turns out to spend the majority of its cpu
> > time in kernel land. With only a handful of running processes, postgres
> > induces tens of thousands of context switches per second. Practically the
> > only thing postgres does with all this CPU time is semop() in a tight
> > loop.
>
> It sounds like you have a build that's using SysV semaphores in place of
> test-and-set instructions. That should not happen on x86 hardware,
> since we have assembly TAS code for x86. Please look at your port
> header file (src/include/pg_config_os.h symlink) and
> src/include/storage/s_lock.h to figure out why it's misbuilt.

Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time.
I'm using GCC on i386 so I don't see how that is possible. It should be
safe for me to simply define these two things in pg_config.h, I suspect.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 19:29:49
Message-ID: 14241.1009567789@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
>> It sounds like you have a build that's using SysV semaphores in place of
>> test-and-set instructions. That should not happen on x86 hardware,
>> since we have assembly TAS code for x86. Please look at your port
>> header file (src/include/pg_config_os.h symlink) and
>> src/include/storage/s_lock.h to figure out why it's misbuilt.

> Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time.
> I'm using GCC on i386 so I don't see how that is possible.

I don't either.

> It should be
> safe for me to simply define these two things in pg_config.h, I suspect.

That is not a solution. If it's broken for you then it's likely to be
broken for other people. We need to figure out what went wrong and
provide a permanent fix.

What gcc version are you running, exactly, and what symbols does it
predefine? (I seem to recall that there's a way to find that out,
though I'm not recalling how at the moment. Anyone?)

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 19:40:16
Message-ID: Pine.LNX.4.33.0112281137540.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Dec 2001, Tom Lane wrote:

> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> >> It sounds like you have a build that's using SysV semaphores in place of
> >> test-and-set instructions. That should not happen on x86 hardware,
> >> since we have assembly TAS code for x86. Please look at your port
> >> header file (src/include/pg_config_os.h symlink) and
> >> src/include/storage/s_lock.h to figure out why it's misbuilt.
>
> > Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time.
> > I'm using GCC on i386 so I don't see how that is possible.
>
> I don't either.
>
> > It should be
> > safe for me to simply define these two things in pg_config.h, I suspect.
>
> That is not a solution. If it's broken for you then it's likely to be
> broken for other people. We need to figure out what went wrong and
> provide a permanent fix.
>
> What gcc version are you running, exactly, and what symbols does it
> predefine? (I seem to recall that there's a way to find that out,
> though I'm not recalling how at the moment. Anyone?)

GCC 2.95.3 on Slackware 8 i386. I just tried this program:

#if defined __i386__ && defined __GNUC__
int
main(void)
{
exit();
}
#endif

which builds and runs, so the problem must lie elsewhere.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 20:20:44
Message-ID: 14539.1009570844@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> GCC 2.95.3 on Slackware 8 i386.

Seems pretty vanilla-flavored.

> I just tried this program:

> #if defined __i386__ && defined __GNUC__
> int
> main(void)
> {
> exit();
> }
> #endif

> which builds and runs, so the problem must lie elsewhere.

Hmm. Perhaps the TAS support does get compiled and the problem is
something else. It'd be worth trying to determine directly what
the code is doing --- viz, run a test case that causes all the
semop thrashing, attach to one of the backends with gdb, set a
breakpoint at semop, get a backtrace to see where it's called from.
(Repeat a few times to see if the results are consistent or not.)

You might need to configure and build with --enable-debug to get
a useful (symbolic) backtrace.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>, pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 20:30:04
Message-ID: 200112282030.fBSKU5409832@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> What gcc version are you running, exactly, and what symbols does it
> predefine? (I seem to recall that there's a way to find that out,
> though I'm not recalling how at the moment. Anyone?)

pgsql/src/tools/ccsym shows compiler symbols.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 20:37:30
Message-ID: Pine.LNX.4.33.0112281236190.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm. Perhaps the TAS support does get compiled and the problem is
> something else. It'd be worth trying to determine directly what
> the code is doing --- viz, run a test case that causes all the
> semop thrashing, attach to one of the backends with gdb, set a
> breakpoint at semop, get a backtrace to see where it's called from.
> (Repeat a few times to see if the results are consistent or not.)
>
> You might need to configure and build with --enable-debug to get
> a useful (symbolic) backtrace.

#0 semop (semid=1277952, sops=0xbfffe934, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1 0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=0) at ipc.c:456
#2 0x8157fa6 in LWLockRelease (lockid=LockMgrLock) at lwlock.c:455
#3 0x8154a5f in LockAcquire (lockmethod=1, locktag=0xbfffea50,
xid=6753879, lockmode=1, dontWait=0 '\000')
at lock.c:723
#4 0x81537fb in LockRelation (relation=0x4054a070, lockmode=1) at
lmgr.c:153
#5 0x807a58b in relation_openr (relationName=0x821c38d "pg_statistic",
lockmode=1) at heapam.c:524
#6 0x807a6de in heap_openr (relationName=0x821c38d "pg_statistic",
lockmode=1) at heapam.c:595
#7 0x81b2f49 in SearchCatCache (cache=0x82a0fe0, v1=7522815, v2=65535,
v3=0, v4=0) at catcache.c:943
#8 0x81b8d4d in SearchSysCache (cacheId=24, key1=7522815, key2=65535,
key3=0, key4=0) at syscache.c:461
#9 0x81b9f35 in get_attavgwidth (relid=7522815, attnum=-1) at
lsyscache.c:906
#10 0x81223c2 in set_rel_width (root=0x82a72d0, rel=0x82a99b0) at
costsize.c:1286
#11 0x81221e6 in set_baserel_size_estimates (root=0x82a72d0,
rel=0x82a99b0) at costsize.c:1161
#12 0x8120262 in set_plain_rel_pathlist (root=0x82a72d0, rel=0x82a99b0,
rte=0x82a7358) at allpaths.c:132
#13 0x8120237 in set_base_rel_pathlists (root=0x82a72d0) at allpaths.c:115
#14 0x8120142 in make_one_rel (root=0x82a72d0) at allpaths.c:62
#15 0x812b09d in subplanner (root=0x82a72d0, flat_tlist=0x82a9718,
tuple_fraction=0) at planmain.c:238
#16 0x812af2e in query_planner (root=0x82a72d0, tlist=0x82a9308,
tuple_fraction=0) at planmain.c:126
#17 0x812c4ab in grouping_planner (parse=0x82a72d0, tuple_fraction=0) at
planner.c:1094
#18 0x812b4f5 in subquery_planner (parse=0x82a72d0, tuple_fraction=-1) at
planner.c:228
#19 0x812b2d2 in planner (parse=0x82a72d0) at planner.c:94
#20 0x815c0cc in pg_plan_query (querytree=0x82a72d0) at postgres.c:511
#21 0x815c362 in pg_exec_query_string (
query_string=0x82a6fa0 "UPDATE job SET state = 'retired' WHERE req =
745458", dest=Remote,
parse_context=0x827d510) at postgres.c:758
#22 0x815d581 in PostgresMain (argc=4, argv=0xbffff10c, username=0x8278da9
"pg") at postgres.c:1849

and :

#0 semop (semid=1277952, sops=0xbfffe9b0, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1 0x8150424 in IpcSemaphoreLock (semId=1277952, sem=2, interruptOK=0
'\000') at ipc.c:422
#2 0x8157bfb in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE) at
lwlock.c:271
#3 0x8154fb2 in LockRelease (lockmethod=1, locktag=0xbfffea80,
xid=6753879, lockmode=1) at lock.c:1018
#4 0x81539f9 in UnlockRelation (relation=0x4054b8f8, lockmode=1) at
lmgr.c:217
#5 0x8081823 in index_endscan (scan=0x82aa620) at indexam.c:288
#6 0x81b3146 in SearchCatCache (cache=0x82a0fe0, v1=7522815, v2=65535,
v3=0, v4=0) at catcache.c:994

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 20:49:16
Message-ID: 14784.1009572556@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> #0 semop (semid=1277952, sops=0xbfffe934, nsops=1) at
> ../sysdeps/unix/sysv/linux/semop.c:35
> #1 0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=0) at ipc.c:456
> #2 0x8157fa6 in LWLockRelease (lockid=LockMgrLock) at lwlock.c:455
> #3 0x8154a5f in LockAcquire (lockmethod=1, locktag=0xbfffea50,
> xid=6753879, lockmode=1, dontWait=0 '\000')
> at lock.c:723
> #4 0x81537fb in LockRelation (relation=0x4054a070, lockmode=1) at
> lmgr.c:153

> and :

> #0 semop (semid=1277952, sops=0xbfffe9b0, nsops=1) at
> ../sysdeps/unix/sysv/linux/semop.c:35
> #1 0x8150424 in IpcSemaphoreLock (semId=1277952, sem=2, interruptOK=0
> '\000') at ipc.c:422
> #2 0x8157bfb in LWLockAcquire (lockid=LockMgrLock, mode=LW_EXCLUSIVE) at
> lwlock.c:271
> #3 0x8154fb2 in LockRelease (lockmethod=1, locktag=0xbfffea80,
> xid=6753879, lockmode=1) at lock.c:1018
> #4 0x81539f9 in UnlockRelation (relation=0x4054b8f8, lockmode=1) at
> lmgr.c:217

Hmm, those look perfectly normal, except one would like to think it's
the exception not the rule to be blocking on LWLocks. If you're seeing
a lot of semops coming from these code paths then something unhappy is
going on (and I would imagine that 7.1 would've been worse, since it
would busy-wait under similar conditions).

Could we get some details about your application? How many active
backends do you have, and what are they doing? A query trace
(postmaster log with debug_print_query on) might be helpful.

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 21:00:27
Message-ID: Pine.LNX.4.33.0112281251390.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm, those look perfectly normal, except one would like to think it's
> the exception not the rule to be blocking on LWLocks. If you're seeing
> a lot of semops coming from these code paths then something unhappy is
> going on (and I would imagine that 7.1 would've been worse, since it
> would busy-wait under similar conditions).
>
> Could we get some details about your application? How many active
> backends do you have, and what are they doing? A query trace
> (postmaster log with debug_print_query on) might be helpful.

Well, let's don't get too far off track. The problem, I think, is using
SysV semaphores instead of CPU-specific test-and-set locking. Unless I
misunderstand something.....

Here's some more stacks:

#0 semop (semid=1277952, sops=0xbfffeae0, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1 0x8150424 in IpcSemaphoreLock (semId=1277952, sem=4, interruptOK=0
'\000') at ipc.c:422
#2 0x8157bfb in LWLockAcquire (lockid=WALWriteLock, mode=LW_EXCLUSIVE) at
lwlock.c:271
#3 0x8095dc2 in XLogFlush (record={xlogid = 5, xrecoff = 194128600}) at
xlog.c:1230
#4 0x814af12 in BufferReplace (bufHdr=0x40401ba8) at bufmgr.c:1061
#5 0x814a231 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:424
#6 0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
at bufmgr.c:184
#7 0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#8 0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#9 0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#10 0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#11 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#12 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#13 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#14 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

#0 semop (semid=1277952, sops=0xbfffead4, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1 0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=2) at ipc.c:456
#2 0x8157fa6 in LWLockRelease (lockid=WALWriteLock) at lwlock.c:455
#3 0x8095e90 in XLogFlush (record={xlogid = 5, xrecoff = 194128600}) at
xlog.c:1242
#4 0x814af12 in BufferReplace (bufHdr=0x40401ba8) at bufmgr.c:1061
#5 0x814a231 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:424
#6 0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
at bufmgr.c:184
#7 0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#8 0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#9 0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#10 0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#11 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#12 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#13 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#14 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

#0 semop (semid=1277952, sops=0xbfffeb24, nsops=1) at
../sysdeps/unix/sysv/linux/semop.c:35
#1 0x81504d2 in IpcSemaphoreUnlock (semId=1277952, sem=1) at ipc.c:456
#2 0x8157fa6 in LWLockRelease (lockid=101) at lwlock.c:455
#3 0x814c05d in TerminateBufferIO (buf=0x40401ba8) at bufmgr.c:2139
#4 0x814a338 in BufferAlloc (reln=0x4054e240, blockNum=1395,
foundPtr=0xbfffec03 "") at bufmgr.c:470
#5 0x8149dc5 in ReadBufferInternal (reln=0x4054e240, blockNum=1395,
bufferLockHeld=0 '\000')
at bufmgr.c:184
#6 0x814a802 in ReleaseAndReadBuffer (buffer=55, relation=0x4054e240,
blockNum=1395) at bufmgr.c:694
#7 0x807a3cd in heapgettup (relation=0x4054e240, dir=1, tuple=0x82af4a4,
buffer=0x82af4bc,
snapshot=0x82afb58, nkeys=0, key=0x0) at heapam.c:369
#8 0x807aad4 in heap_getnext (scan=0x82af4a0, backw=0) at heapam.c:806
#9 0x80fad1f in SeqNext (node=0x82afa30) at nodeSeqscan.c:96
#10 0x80f27bc in ExecScan (node=0x82afa30, accessMtd=0x80fac30 <SeqNext>)
at execScan.c:95
#11 0x80fad6b in ExecSeqScan (node=0x82afa30) at nodeSeqscan.c:133
#12 0x80ef84c in ExecProcNode (node=0x82afa30, parent=0x0) at
execProcnode.c:285
#13 0x80ee0e6 in ExecutePlan (estate=0x82afad0, plan=0x82afa30,
operation=CMD_SELECT, numberTuples=0,

I've got 5-10 backends running, always busy. I think if Pg could keep up
my application's parallelism would increase to 15-20. All the backends
are connected to the same database, with 10 tables + indexes, and the
cycle looks like such:

begin
select
insert approx 20 rows on four tables
update 1 table
commit
goto begin

I think it's pretty basic. It used to run right quickly before I moved to
7.2beta4. My application machine saw loads of 25+, now it only sees loads
of about 3. Postgres is throttling it.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 21:10:08
Message-ID: 14880.1009573808@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> On Fri, 28 Dec 2001, Tom Lane wrote:
>> Hmm, those look perfectly normal, except one would like to think it's
>> the exception not the rule to be blocking on LWLocks.

> Well, let's don't get too far off track. The problem, I think, is using
> SysV semaphores instead of CPU-specific test-and-set locking.

No, I've abandoned that theory. You've shown us half a dozen backtraces
now, and they all are coming from LWLock acquire/release, not spinlock
acquire/release. If you were using SysV semaphores for spinlocks then
I'd expect to see a lot of traces leading back to spin.c.

> I think it's pretty basic. It used to run right quickly before I moved to
> 7.2beta4. My application machine saw loads of 25+, now it only sees loads
> of about 3. Postgres is throttling it.

Hmm. And what was your actual *throughput*? The backtraces you've
shown us all correspond to places where 7.1 would have busy-waited
rather than blocking on a semaphore. Reduction of nominal CPU load
is exactly what I'd expect, and is not in itself bad. The real question
is how many transactions can you process per second.

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 21:42:00
Message-ID: Pine.LNX.4.33.0112281321340.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Fri, 28 Dec 2001, Tom Lane wrote:

> Hmm. And what was your actual *throughput*?

Throughput is down noticeably, but unfortunately I don't have any
controlled instrumentation to demonstrate that. A process which took
approximately 9 minutes to complete with 7.1 now takes approximately 17
minutes to complete, for the same amount of work. Further, I can't really
compare the two versions because the required dump/reload 1) may be
changing something that isn't obvious, and 2) takes too long.

> The backtraces you've shown us all correspond to places where 7.1
> would have busy-waited rather than blocking on a semaphore.

But the behavior seems to me to be more like busy-waiting than blocking on
locks. For example, both of my CPUs are now pegged whenever the database
is busy, and I never used to see tens of thousands of contexts switches
per second. I also never used to spend half of my CPU time in the kernel,
but now I do.

Also the straces show control bouncing back and forth between processes
stuck in semop(). In a trace of 576000 syscalls, 79% were semop().

I'd believe that the problem is in the kernel's scheduler or SysV, but
they didn't seem to think so on linux-kernel.

> Reduction of nominal CPU load is exactly what I'd expect, and is not
> in itself bad. The real question is how many transactions can you
> process per second.

What I meant to say was that the CPU load on my application machine was
much lower, which is to say that machine is just waiting around all the
time for the database machine to do something. The CPUs on the database
machine are pinned all the time.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 22:02:15
Message-ID: 15163.1009576935@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
>> The backtraces you've shown us all correspond to places where 7.1
>> would have busy-waited rather than blocking on a semaphore.

> But the behavior seems to me to be more like busy-waiting than blocking on
> locks. For example, both of my CPUs are now pegged whenever the database
> is busy, and I never used to see tens of thousands of contexts switches
> per second. I also never used to spend half of my CPU time in the kernel,
> but now I do.

[ scratches head ... ] Well, the LWLock stuff is new code, and it's not
out of the question that there's a bug in it. I can't see where though,
and I don't have enough info to proceed further.

We need to understand what the dynamic behavior is in your situation.
Can you poke into it further, or perhaps grant access to your machine
to someone who can?

One thing that would be quite useful is a more-detailed strace that
would let us determine whether each semop is a lock or unlock. Can you
get strace to record the user-space PC of the semop() caller? If no
joy there, maybe beefing up the LWDEBUG log printouts would produce
a useful trace.

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 22:12:44
Message-ID: Pine.LNX.4.33.0112281408300.23655-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Dec 2001, Tom Lane wrote:

> [ scratches head ... ] Well, the LWLock stuff is new code, and it's not
> out of the question that there's a bug in it. I can't see where though,
> and I don't have enough info to proceed further.

Thanks for all your attention so far.

> We need to understand what the dynamic behavior is in your situation.
> Can you poke into it further, or perhaps grant access to your machine
> to someone who can?

I can provide as much dumping, logging, and tracing as you want, with the
single constraint of upstream network bandwith. I don't think files
larger than 100MB will be practical. I don't know what logging will be
useful, so someone will have to tell me what to do.

I don't think I can let anybody have access to this particular machine but
if I can reduce things to a simple testcase on another machine, I'll grant
access to that.

> One thing that would be quite useful is a more-detailed strace that
> would let us determine whether each semop is a lock or unlock. Can you
> get strace to record the user-space PC of the semop() caller? If no
> joy there, maybe beefing up the LWDEBUG log printouts would produce
> a useful trace.

strace unfortunately doesn't deref the sembuf structure in semop.

-jwb


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: postgres processes spending most of their time in the kernel
Date: 2001-12-28 22:26:15
Message-ID: 15320.1009578375@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

"Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> strace unfortunately doesn't deref the sembuf structure in semop.

Right, but the PC would be enough to let us tell whether semop was being
called from IpcSemaphoreLock or IpcSemaphoreUnlock.

However, if the info isn't available then no use crying over it. What
we'll need to do instead is add some debugging printouts to lwlock.c,
and hope that the extra logfile I/O calls don't change the timing enough
to hide the misbehavior. I'll think a little about this and send you
an updated lwlock.c to try out.

regards, tom lane


From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: postgres processes spending most of their time in the
Date: 2001-12-28 22:31:41
Message-ID: Pine.LNX.4.33.0112281431120.25470-100000@windmill.gghcwest.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, 28 Dec 2001, Tom Lane wrote:

> "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> writes:
> > strace unfortunately doesn't deref the sembuf structure in semop.
>
> Right, but the PC would be enough to let us tell whether semop was being
> called from IpcSemaphoreLock or IpcSemaphoreUnlock.
>
> However, if the info isn't available then no use crying over it. What
> we'll need to do instead is add some debugging printouts to lwlock.c,
> and hope that the extra logfile I/O calls don't change the timing enough
> to hide the misbehavior. I'll think a little about this and send you
> an updated lwlock.c to try out.

Is there any use in frobbing these frobs?

#
# Lock Tracing
#
#trace_notify = false
#ifdef LOCK_DEBUG
#trace_locks = false
#trace_userlocks = false
#trace_lwlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0
#endif

-jwb