SIGSEGV when trying to start in single user mode

Lists: pgsql-general
From: Björn Häuser <bjoernhaeuser(at)googlemail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: SIGSEGV when trying to start in single user mode
Date: 2009-09-19 15:24:59
Message-ID: e8a218660909190824h12c1c088p80b937342077ed38@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello List,

I have a problem with my PostgreSQL 8.3.4 installation.

We had some problems with our storage subsystem and it seems
postgresql suffered a little bit from it.
Here are some log excerpts:

When trying to start postgesql:
---
# /etc/init.d/postgresql-8.3 start
Starting PostgreSQL 8.3 database server: main* Removed stale pid file.
The PostgreSQL server failed to start. Please check the log output:
2009-09-19 16:51:00 CEST LOG: could not load root certificate file
"root.crt": no SSL error reported
2009-09-19 16:51:00 CEST DETAIL: Will not verify client certificates.
2009-09-19 16:51:00 CEST LOG: could not create IPv6 socket: Address
family not supported by protocol
2009-09-19 16:51:00 CEST LOG: database system was interrupted while
in recovery at 2009-09-19 16:47:52 CEST
2009-09-19 16:51:00 CEST HINT: This probably means that some data is
corrupted and you will have to use the last backup for recovery.
2009-09-19 16:51:00 CEST LOG: database system was not properly shut
down; automatic recovery in progress
2009-09-19 16:51:00 CEST LOG: incomplete startup packet
2009-09-19 16:51:00 CEST LOG: redo starts at 44D/CEAFB200
2009-09-19 16:51:00 CEST LOG: unexpected pageaddr 44D/B8B0A000 in log
file 1101, segment 206, offset 11575296
2009-09-19 16:51:00 CEST LOG: redo done at 44D/CEB062C0
2009-09-19 16:51:00 CEST PANIC: right sibling's left-link doesn't
match: block 49696 links to 49978 instead of expected 3 in index
"132010"
2009-09-19 16:51:00 CEST LOG: startup process (PID 3727) was
terminated by signal 6: Aborted
2009-09-19 16:51:00 CEST LOG: aborting startup due to startup process failure
failed!
---

I think the index is not a system index.
But when I tried to start Postgresql in single-user mode to be able to
repair this index i am getting the mentioned SIGSEGV.
Here is the last part of the strace output:
http://pastie.org/622807

Here is the gdb output:
---
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 47930899067328 (LWP 3881)]
0x00000000005a7fc0 in SHMQueueInsertBefore ()
(gdb) bt
#0 0x00000000005a7fc0 in SHMQueueInsertBefore ()
#1 0x00000000005ac7a0 in LockAcquire ()
#2 0x00000000005aa416 in LockRelationForExtension ()
#3 0x0000000000469b0e in _bt_getbuf ()
#4 0x0000000000467759 in _bt_getstackbuf ()
#5 0x00000000004687d2 in _bt_insert_parent ()
#6 0x000000000046fc34 in btree_xlog_cleanup ()
#7 0x000000000047fbfb in StartupXLOG ()
#8 0x00000000005b899d in PostgresMain ()
#9 0x00000000005448df in main ()
---

Its a debian, and I think there are no debug symbols in the package
(gdb announces some "no debugging symbols found")

Anyone knows what to do?

Thanks in advance,
Björn


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Björn Häuser <bjoernhaeuser(at)googlemail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SIGSEGV when trying to start in single user mode
Date: 2009-09-19 17:04:59
Message-ID: 16669.1253379899@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

=?UTF-8?B?QmrDtnJuIEjDpHVzZXI=?= <bjoernhaeuser(at)googlemail(dot)com> writes:
> I have a problem with my PostgreSQL 8.3.4 installation.

> We had some problems with our storage subsystem and it seems
> postgresql suffered a little bit from it.
> Here are some log excerpts:

> # /etc/init.d/postgresql-8.3 start
> Starting PostgreSQL 8.3 database server: main* Removed stale pid file.

You really need to get rid of that startup script, or at least get rid
of the part of it that thinks it should remove the postmaster's PID
file. That's completely unsafe and poor practice. (I doubt it's
related to your immediate problem, though.)

> 2009-09-19 16:51:00 CEST PANIC: right sibling's left-link doesn't
> match: block 49696 links to 49978 instead of expected 3 in index
> "132010"
> 2009-09-19 16:51:00 CEST LOG: startup process (PID 3727) was
> terminated by signal 6: Aborted

Ugh, so you have a corrupted index that is touched by the unreplayed
WAL sequence. I'm afraid the only easy way out of this is to use
pg_resetxlog, which is a bit risky since you'll lose whatever other
changes haven't been applied to the database. Probably the safest
thing to do is pg_resetxlog, start up, dump everything, initdb,
reload.

> But when I tried to start Postgresql in single-user mode to be able to
> repair this index i am getting the mentioned SIGSEGV.

Hmm, that's a bug, but even if it weren't broken it would not help you.
A single-user backend still has to replay any unreplayed WAL, so it
would still hit the PANIC.

regards, tom lane


From: Björn Häuser <bjoernhaeuser(at)googlemail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SIGSEGV when trying to start in single user mode
Date: 2009-09-19 18:41:49
Message-ID: e8a218660909191141m4738f6e8u8c1cfb2999d9fb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Tom,

thank you for your help.

I resetted the xlog and the server started again.

Regards,
Björn

2009/9/19 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> =?UTF-8?B?QmrDtnJuIEjDpHVzZXI=?= <bjoernhaeuser(at)googlemail(dot)com> writes:
>> I have a problem with my PostgreSQL 8.3.4 installation.
>
>> We had some problems with our storage subsystem and it seems
>> postgresql suffered a little bit from it.
>> Here are some log excerpts:
>
>> # /etc/init.d/postgresql-8.3 start
>> Starting PostgreSQL 8.3 database server: main* Removed stale pid file.
>
> You really need to get rid of that startup script, or at least get rid
> of the part of it that thinks it should remove the postmaster's PID
> file.  That's completely unsafe and poor practice.  (I doubt it's
> related to your immediate problem, though.)
>
>> 2009-09-19 16:51:00 CEST PANIC:  right sibling's left-link doesn't
>> match: block 49696 links to 49978 instead of expected 3 in index
>> "132010"
>> 2009-09-19 16:51:00 CEST LOG:  startup process (PID 3727) was
>> terminated by signal 6: Aborted
>
> Ugh, so you have a corrupted index that is touched by the unreplayed
> WAL sequence.  I'm afraid the only easy way out of this is to use
> pg_resetxlog, which is a bit risky since you'll lose whatever other
> changes haven't been applied to the database.  Probably the safest
> thing to do is pg_resetxlog, start up, dump everything, initdb,
> reload.
>
>> But when I tried to start Postgresql in single-user mode to be able to
>> repair this index i am getting the mentioned SIGSEGV.
>
> Hmm, that's a bug, but even if it weren't broken it would not help you.
> A single-user backend still has to replay any unreplayed WAL, so it
> would still hit the PANIC.
>
>                        regards, tom lane
>