Mysterious server crashes

Lists: pgsql-hackers
From: Žiga Kranjec <ziga(at)ljudmila(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Mysterious server crashes
Date: 2011-07-15 21:37:54
Message-ID: 4E20B332.5070107@ljudmila.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello!

Recently we have upgraded our debian system (sid),
which has since started crashing mysteriously.
We are still looking into that. It runs on 3ware RAID.
Postgres package is 8.4.8-2.

The database came back up apparently ok, except
for indexes. Running reindex produces this error on
one of the tables:

ERROR: unexpected chunk number 1 (expected 0) for toast value 17539760
in pg_toast_16992

Same with select.

I tried running reindex on toast table didn't help. Running:

select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;

crashed postgres backend (and apparently the whole server).

Is there anything we can/should do to fix the problem, besides
restoring the whole database from backup?

Thanks!

Ziga


From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: Žiga Kranjec <ziga(at)ljudmila(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Mysterious server crashes
Date: 2011-07-16 15:26:25
Message-ID: 20110716152625.GC16411@staff-mud-56-27.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 15, 2011 at 11:37:54PM +0200, Žiga Kranjec wrote:
> Hello!
>
> Recently we have upgraded our debian system (sid),
> which has since started crashing mysteriously.
> We are still looking into that. It runs on 3ware RAID.
> Postgres package is 8.4.8-2.
>
> The database came back up apparently ok, except
> for indexes. Running reindex produces this error on
> one of the tables:
>
> ERROR: unexpected chunk number 1 (expected 0) for toast value
> 17539760 in pg_toast_16992
>
> Same with select.
>
> I tried running reindex on toast table didn't help. Running:
>
> select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;
>
> crashed postgres backend (and apparently the whole server).
>
> Is there anything we can/should do to fix the problem, besides
> restoring the whole database from backup?
>
> Thanks!
>
> Ziga
>

Hi Ziga,

I do not want to be negative, but it sounds like your server is
having serious problems completely outside of PostgreSQL. Reading a
file should not cause your system to crash. That sounds like a
driver or hardware problem and you need to fix that. I would make
sure you have a good backup for your DB before you do anything
else.

Good luck,
Ken


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Žiga Kranjec <ziga(at)ljudmila(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Mysterious server crashes
Date: 2011-07-17 02:01:40
Message-ID: CA+TgmoYEq1nJQ-h0ZTG__xOGox=P+EsWihxFguyvOshr8SgSqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 15, 2011 at 5:37 PM, Žiga Kranjec <ziga(at)ljudmila(dot)org> wrote:
> Recently we have upgraded our debian system (sid),
> which has since started crashing mysteriously.
> We are still looking into that. It runs on 3ware RAID.
> Postgres package is 8.4.8-2.
>
> The database came back up apparently ok, except
> for indexes. Running reindex produces this error on
> one of the tables:
>
> ERROR:  unexpected chunk number 1 (expected 0) for toast value 17539760 in
> pg_toast_16992
>
> Same with select.
>
> I tried running reindex on toast table didn't help. Running:
>
> select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;
>
> crashed postgres backend (and apparently the whole server).
>
> Is there anything we can/should do to fix the problem, besides
> restoring the whole database from backup?

Well, in theory, an operating system crash shouldn't corrupt your
database. Maybe you've configured fsync=off, or have some other
problem that is making it not work reliably. There are some useful
resources here:

http://wiki.postgresql.org/wiki/Reliable_Writes

At this point, it sounds like things are pretty badly messed up. A
restore from backup seems like a good idea, but first you might want
to try to track down what else is wrong with this machine (bad memory?
corrupted OS?), else you might find yourself back in the same
situation all over again pretty quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company