From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com> |
Cc: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) |
Date: | 2012-02-20 09:09:14 |
Message-ID: | CAHGQGwGRuNJ=_ctXwteNkFkdvMDNFYxFdn0D1cd-CqL0OgNCLg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, Feb 19, 2012 at 3:01 AM, Jeff Janes <jeff(dot)janes(at)gmail(dot)com> wrote:
> I've tested your v9 patch. I no longer see any inconsistencies or
> lost transactions in the recovered database. But occasionally I get
> databases that fail to recover at all.
> It has always been with the exact same failed assertion, at xlog.c line 2154.
>
> I've only seen this 4 times out of 2202 cycles of crash and recovery,
> so it must be some rather obscure situation.
>
> LOG: database system was not properly shut down; automatic recovery in progress
> LOG: redo starts at 0/180001B0
> LOG: unexpected pageaddr 0/15084000 in log file 0, segment 25, offset 540672
> LOG: redo done at 0/19083FD0
> LOG: last completed transaction was at log time 2012-02-17 11:13:50.369488-08
> LOG: checkpoint starting: end-of-recovery immediate
> TRAP: FailedAssertion("!(((((((uint64) (NewPageEndPtr).xlogid *
> (uint64) (((uint32) 0xffffffff) / ((uint32) (16 * 1024 * 1024))) *
> ((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192)
> % (XLogCtl->XLogCacheBlck + 1)) == nextidx)", File: "xlog.c", Line:
> 2154)
> LOG: startup process (PID 5390) was terminated by signal 6: Aborted
> LOG: aborting startup due to startup process failure
I could reproduce this when I made the server crash just after executing
"select pg_switch_xlog()".
$ initdb -D data
$ pg_ctl -D data start
$ psql -c "select pg_switch_xlog()"
$ pg_ctl -D data stop -m i
$ pg_ctl -D data start
...
LOG: redo done at 0/16E3B0C
TRAP: FailedAssertion("!(((((((uint64) (NewPageEndPtr).xlogid *
(uint64) (((uint32) 0xffffffff) / ((uint32) (16 * 1024 * 1024))) *
((uint32) (16 * 1024 * 1024))) + (NewPageEndPtr).xrecoff - 1)) / 8192)
% (XLogCtl->XLogCacheBlck + 1)) == nextidx)", File: "xlog.c", Line:
2154)
LOG: startup process (PID 16361) was terminated by signal 6: Aborted
LOG: aborting startup due to startup process failure
Though I've not read new patch yet, I doubt that xlog switch code would
still have a bug.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Marc Mamin | 2012-02-20 09:18:31 | Re: Qual evaluation cost estimates for GIN indexes |
Previous Message | Jehan-Guillaume (ioguix) de Rorthais | 2012-02-20 09:01:34 | Re: Google Summer of Code? Call for mentors. |