Re: Allocation in critical section after node exits archive recovery

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allocation in critical section after node exits archive recovery
Date: 2014-11-28 07:43:29
Message-ID: 547827A1.1040501@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/28/2014 08:35 AM, Michael Paquier wrote:
> Hi all,
>
> While performing some archive recovery tests I found the following
> assertion failure once standby gets out of archive recovery (HEAD at
> a5eb85e):
> (lldb)
> * thread #1: tid = 0x0000, 0x00007fff8faaf282
> libsystem_kernel.dylib`__pthread_kill + 10, stop reason = signal
> SIGSTOP
> * frame #0: 0x00007fff8faaf282 libsystem_kernel.dylib`__pthread_kill + 10
> frame #1: 0x00007fff888774c3 libsystem_pthread.dylib`pthread_kill + 90
> frame #2: 0x00007fff898acb73 libsystem_c.dylib`abort + 129
> frame #3: 0x0000000107106029
> postgres`ExceptionalCondition(conditionName=0x000000010725770b,
> errorType=0x0000000107219b83, fileName=0x00000001072c4c84,
> lineNumber=555) + 137 at assert.c:54
> frame #4: 0x0000000107140ec5
> postgres`MemoryContextCreate(tag=T_AllocSetContext, size=192,
> methods=0x00000001072fc7b0, parent=0x00007fca6a405710,
> name=0x0000000107257e1b) + 117 at mcxt.c:555
> frame #5: 0x000000010713e6c7
> postgres`AllocSetContextCreate(parent=0x00007fca6a405710,
> name=0x0000000107257e1b, minContextSize=0, initBlockSize=8192,
> maxBlockSize=8388608) + 71 at aset.c:444
> frame #6: 0x0000000106c51449 postgres`InitXLogInsert + 73 at
> xloginsert.c:869
> frame #7: 0x0000000106c450ee postgres`InitXLOGAccess + 190 at xlog.c:7281
> frame #8: 0x0000000106c43e7c postgres`LocalSetXLogInsertAllowed +
> 76 at xlog.c:7133
> frame #9: 0x0000000106c445dd postgres`CreateCheckPoint(flags=38) +
> 1165 at xlog.c:7693
> frame #10: 0x0000000106ebf935 postgres`CheckpointerMain + 1989 at
> checkpointer.c:502
> frame #11: 0x0000000106c5a0c3
> postgres`AuxiliaryProcessMain(argc=2, argv=0x00007fff5909bb30) + 1907
> at bootstrap.c:426
> frame #12: 0x0000000106ecf696
> postgres`StartChildProcess(type=CheckpointerProcess) + 358 at
> postmaster.c:5109
> frame #13: 0x0000000106ece229
> postgres`sigusr1_handler(postgres_signal_arg=30) + 265 at
> postmaster.c:4795
> frame #14: 0x00007fff91468f1a libsystem_platform.dylib`_sigtramp + 26
> frame #15: 0x00007fff8faaf3f7 libsystem_kernel.dylib`__select + 11
> frame #16: 0x0000000106ecd717 postgres`PostmasterMain(argc=3,
> argv=0x00007fca6a4056a0) + 5479 at postmaster.c:1220
> frame #17: 0x0000000106e11fc5 postgres`main(argc=3,
> argv=0x00007fca6a4056a0) + 773 at main.c:220
>
> Logs:
> 2014-11-28 15:28:20.733 JST: LOG: selected new timeline ID: 2
> cp: /Users/ioltas/archive/5432/00000001.history: No such file or
> directory
> 2014-11-28 15:28:20.776 JST: LOG: archive recovery complete
> TRAP: FailedAssertion("!(CritSectionCount == 0)", File: "mcxt.c", Line: 555)
>
> This area of the code has been changed recently by the WAL API patch.
> Note that I haven't looked yet at implementing a fix.

Fixed, thanks. I had tested PITR, but apparently not with hot_standby=off.

It might be more robust to call InitXLogInsert() directly at process
startup, somewhere in postinit.c. Currently, it's called by
InitXLOGAccess(), which has the advantage that processes that have no
intention of creating WAL records don't need to allocate the XLogInsert
working buffers, notably hot standby backends. But it means that it's
not safe to call RecoveryInProgress() in a critical section, when
InitXLOGAccess() hasn't been called yet. All the critical sections that
cover WAL-logging page modifications are safe, because you wouldn't get
to the critical section without having called InitXLOGAccess() earlier,
but it's a bit fiddly.

I think I'll leave it as it is for now, but if any more issues like this
crop up, I'll move the call to InitXLogInsert to backend startup.

- Heikki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2014-11-28 07:49:55 Re: Using pg_rewind for differential backup
Previous Message Michael Paquier 2014-11-28 07:30:02 Re: Using pg_rewind for differential backup