Re: XLogInsert scaling, revisited

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: XLogInsert scaling, revisited
Date: 2013-06-21 18:55:02
Message-ID: CAMkU=1yvfWsQ4YV2k-kDvRAmA39F2RzgPqpZRUVz0vref4Mwrg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 18, 2013 at 11:28 PM, Heikki Linnakangas <
hlinnakangas(at)vmware(dot)com> wrote:

> On 18.06.2013 21:17, Jeff Janes wrote:
>
>> Hi Heikki,
>>
>> I am getting conflicts applying version 22 of this patch to 9.4dev. Could
>> you rebase?
>>
>
> Here you go.

I think I'm getting an undetected deadlock between the checkpointer and a
user process running a TRUNCATE command.

This is the checkpointer:

#0 0x0000003a73eeaf37 in semop () from /lib64/libc.so.6
#1 0x00000000005ff847 in PGSemaphoreLock (sema=0x7f8c0a4eb730,
interruptOK=0 '\000') at pg_sema.c:415
#2 0x00000000004b0abf in WaitOnSlot (upto=416178159648) at xlog.c:1775
#3 WaitXLogInsertionsToFinish (upto=416178159648) at xlog.c:2086
#4 0x00000000004b657a in CopyXLogRecordToWAL (write_len=32, isLogSwitch=1
'\001', rdata=0x0, StartPos=<value optimized out>, EndPos=416192397312)
at xlog.c:1389
#5 0x00000000004b6fb2 in XLogInsert (rmid=0 '\000', info=<value optimized
out>, rdata=0x7fff00000020) at xlog.c:1209
#6 0x00000000004b7644 in RequestXLogSwitch () at xlog.c:8748
#7 0x0000000000611be3 in CheckArchiveTimeout () at checkpointer.c:622
#8 0x0000000000611d97 in CheckpointWriteDelay (flags=<value optimized
out>, progress=<value optimized out>) at checkpointer.c:705
#9 0x000000000062ec5a in BufferSync (flags=64) at bufmgr.c:1322
#10 CheckPointBuffers (flags=64) at bufmgr.c:1828
#11 0x00000000004b1393 in CheckPointGuts (checkPointRedo=416178159592,
flags=64) at xlog.c:8365
#12 0x00000000004b8ff3 in CreateCheckPoint (flags=64) at xlog.c:8148
#13 0x00000000006121c3 in CheckpointerMain () at checkpointer.c:502
#14 0x00000000004c4c4a in AuxiliaryProcessMain (argc=2,
argv=0x7fff21c4a5d0) at bootstrap.c:439
#15 0x000000000060a68c in StartChildProcess (type=CheckpointerProcess) at
postmaster.c:4954
#16 0x000000000060d1ea in reaper (postgres_signal_arg=<value optimized
out>) at postmaster.c:2571
#17 <signal handler called>
#18 0x0000003a73ee14d3 in __select_nocancel () from /lib64/libc.so.6
#19 0x000000000060efee in ServerLoop (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:1537
#20 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1246
#21 0x00000000005ad4e0 in main (argc=3, argv=0x179fd00) at main.c:196

And this is the TRUNCATE command.

#0 0x0000003a73eeaf37 in semop () from /lib64/libc.so.6
#1 0x00000000005ff847 in PGSemaphoreLock (sema=0x7f8c0a4ea8d0,
interruptOK=0 '\000') at pg_sema.c:415
#2 0x00000000004b002d in WALInsertSlotAcquireOne (slotno=-1) at xlog.c:1667
#3 0x00000000004b6d5d in XLogInsert (rmid=0 '\000', info=<value optimized
out>, rdata=0x7fff21c4a5e0) at xlog.c:1115
#4 0x00000000004b8abc in XLogPutNextOid (nextOid=67198981) at xlog.c:8704
#5 0x00000000004a3bc2 in GetNewObjectId () at varsup.c:495
#6 0x00000000004c5195 in GetNewRelFileNode (reltablespace=<value optimized
out>, pg_class=0x0, relpersistence=<value optimized out>) at catalog.c:437
#7 0x000000000070f52d in RelationSetNewRelfilenode
(relation=0x7f8c019cb2a0, freezeXid=2144367079, minmulti=1) at
relcache.c:2758
#8 0x000000000055de61 in ExecuteTruncate (stmt=<value optimized out>) at
tablecmds.c:1163
#9 0x0000000000656080 in standard_ProcessUtility (parsetree=0x2058900,
queryString=<value optimized out>, context=<value optimized out>,
params=0x0,
dest=<value optimized out>, completionTag=<value optimized out>) at
utility.c:552
#10 0x0000000000652a87 in PortalRunUtility (portal=0x17bf510,
utilityStmt=0x2058900, isTopLevel=1 '\001', dest=0x2058c40,
completionTag=0x7fff21c4a9a0 "")
at pquery.c:1187
#11 0x00000000006539fd in PortalRunMulti (portal=0x17bf510, isTopLevel=1
'\001', dest=0x2058c40, altdest=0x2058c40, completionTag=0x7fff21c4a9a0 "")
at pquery.c:1318
#12 0x00000000006540b3 in PortalRun (portal=0x17bf510,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2058c40,
altdest=0x2058c40,
completionTag=0x7fff21c4a9a0 "") at pquery.c:816
#13 0x0000000000650944 in exec_simple_query (query_string=0x2057e90
"truncate foo") at postgres.c:1048
#14 0x0000000000651fe9 in PostgresMain (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x1fc9e98 "jjanes", username=<value
optimized out>)
at postgres.c:3985
#15 0x000000000060f80b in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:3987
#16 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3676
#17 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1577
#18 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1246
#19 0x00000000005ad4e0 in main (argc=3, argv=0x179fd00) at main.c:196

This is using the same testing harness as in the last round of this patch.

Is there a way for me to dump the list of held/waiting lwlocks from gdb?

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2013-06-21 18:55:47 Re: backend hangs at immediate shutdown (Re: Back-branch update releases coming in a couple weeks)
Previous Message Joshua D. Drake 2013-06-21 18:45:46 Re: Hardware donation