Re: atomic pin/unpin causing errors

From: Andres Freund <andres(at)anarazel(dot)de>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: atomic pin/unpin causing errors
Date: 2016-05-05 18:52:46
Message-ID: 20160505185246.2i7qftadwhzewykj@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Jeff,

On 2016-04-29 10:38:55 -0700, Jeff Janes wrote:
> I don't see the problem with an cassert-enabled, probably because it
> is just too slow to ever reach the point where the problem occurs.

Running the test with cassert enabled I actually get assertion failures,
due to the FATAL you added.

#1 0x0000000000958dde in ExceptionalCondition (conditionName=0xb36c2a "!(RefCountErrors == 0)", errorType=0xb361af "FailedAssertion",
fileName=0xb36170 "/home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c", lineNumber=2506) at /home/admin/src/postgresql/src/backend/utils/error/assert.c:54
#2 0x00000000007c9fc9 in CheckForBufferLeaks () at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2506
#3 0x00000000007c9f09 in AtProcExit_Buffers (code=1, arg=0) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2459
#4 0x00000000007d927f in shmem_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:261
#5 0x00000000007d90dd in proc_exit_prepare (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:185
#6 0x00000000007d904b in proc_exit (code=1) at /home/admin/src/postgresql/src/backend/storage/ipc/ipc.c:102
#7 0x000000000095958d in errfinish (dummy=0) at /home/admin/src/postgresql/src/backend/utils/error/elog.c:543
#8 0x000000000080214b in mdwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/md.c:832
#9 0x0000000000804633 in smgrwrite (reln=0x2e8b4a8, forknum=MAIN_FORKNUM, blocknum=154, buffer=0x2e8e5a8 "", skipFsync=0 '\000')
at /home/admin/src/postgresql/src/backend/storage/smgr/smgr.c:650
#10 0x00000000007ca548 in FlushBuffer (buf=0x7f0285955330, reln=0x2e8b4a8) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2734
#11 0x00000000007c9d5a in SyncOneBuffer (buf_id=2503, skip_recently_used=0 '\000', wb_context=0x7ffe7305d290) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2377
#12 0x00000000007c964e in BufferSync (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:1967
#13 0x00000000007ca185 in CheckPointBuffers (flags=64) at /home/admin/src/postgresql/src/backend/storage/buffer/bufmgr.c:2561
#14 0x000000000052d497 in CheckPointGuts (checkPointRedo=382762776, flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8644
#15 0x000000000052cede in CreateCheckPoint (flags=64) at /home/admin/src/postgresql/src/backend/access/transam/xlog.c:8430
#16 0x00000000007706ac in CheckpointerMain () at /home/admin/src/postgresql/src/backend/postmaster/checkpointer.c:488
#17 0x000000000053e0d5 in AuxiliaryProcessMain (argc=2, argv=0x7ffe7305ea40) at /home/admin/src/postgresql/src/backend/bootstrap/bootstrap.c:429
#18 0x000000000078099f in StartChildProcess (type=CheckpointerProcess) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:5227
#19 0x000000000077dcc3 in reaper (postgres_signal_arg=17) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:2781
#20 <signal handler called>
#21 0x00007f028ebbdac3 in __select_nocancel () at ../sysdeps/unix/syscall-template.S:81
#22 0x000000000077c049 in ServerLoop () at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1654
#23 0x000000000077b7a9 in PostmasterMain (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/postmaster/postmaster.c:1298
#24 0x00000000006c5849 in main (argc=4, argv=0x2e49f20) at /home/admin/src/postgresql/src/backend/main/main.c:228

You didn't see those?

The trigger here appears to be that the checkpointer doesn't have
on-exit callback similar to a normal backend's ShutdownPostgres() et al,
and thus doesn't trigger a resource owner release. The normal ERROR
path has
/* buffer pins are released here: */
ResourceOwnerRelease(CurrentResourceOwner,
RESOURCE_RELEASE_BEFORE_LOCKS,
false, true);
/* we needn't bother with the other ResourceOwnerRelease phases */

That clearly is a bug. But I'm not immediately seing how this could
trigger the corruption issue you observed.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-05-05 18:59:45 Re: atomic pin/unpin causing errors
Previous Message Alvaro Herrera 2016-05-05 18:46:14 Re: Postgres 9.6 scariest patch tournament