Re: How to recover when can't start database

Lists: pgsql-admin
From: <simon(at)2ndquadrant(dot)com>
To: "L(dot)Boldareva" <pg(at)pierro(dot)dds(dot)nl>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: How to recover when can't start database
Date: 2005-04-01 11:50:02
Message-ID: 28292295$1112355374424d322e5958c4.20461539@config20.schlund.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


"L.Boldareva" <pg(at)pierro(dot)dds(dot)nl> wrote on 01.04.2005, 12:02:21:
> Hi!
> (Hope this is the right place to post)
>
> I crashed the postmaster and cannot start it anymore, with the error
>
> LOG: database system was interrupted while in recovery at 2005-04-01 11:04:33 CEST
> HINT: This probably means that some data is corrupted and you will have to use the last backup for recovery.
> LOG: checkpoint record is at 5/6F00C540
> LOG: redo record is at 5/6F000ABC; undo record is at 0/0; shutdown FALSE
> LOG: next transaction ID: 599824; next OID: 147679259
> LOG: database system was not properly shut down; automatic recovery in
> progress
> LOG: redo starts at 5/6F000ABC
> PANIC: btree_split_redo: lost left sibling
> LOG: startup process (PID 5603) was terminated by signal 6
> LOG: aborting startup due to startup process failure
>
> Is there a way to recover from that?
>
> I don't have a fresh backup, but loosing some couple of days won't be a
> problem.
>
> I use PG 8.0 on a linux box, with standard postgresq.conf (except some
> increased memory settings).
>

Well, it *might* be possible to recover using a Point in Time Recovery,
with some manipulation. Never been done, as far as I know, so don't
hold your breath.

PITR wasn't designed for the situation where you haven't actually taken
a backup, but it might still be possible. I think it will cause a
problem since there's no pg_stop_backup() been executed, but perhaps we
can think of a way to override that or build a custom recovery server.

First, backup exactly everything you have now and save it.
You might even want to do it twice, so there's no mistake.

If you've got the original failure log that would be great. We need to
establish what time the original failure took place, if there was one,
so we can try to rollforward to a time just before that.

Anyway, I'll be free in a few hours to have a look at this, but it could
take a few days to figure it out, so don't promise anybody success and
don't say it would be quick either. You may not wish to wait that long,
I've no idea of your business. Please save the database anyway so we've
got a test case.

Best Regards, Simon Riggs


From: "L(dot)Boldareva" <pg(at)pierro(dot)dds(dot)nl>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: How to recover when can't start database
Date: 2005-04-01 15:31:22
Message-ID: Pine.LNX.4.58.0504011418270.11455@yafa.dds.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


Ok, looks like I kind of fixed it.

(after tarring data/) I ran pg_resetxlog -f , although it's not meant to
fix this problem.

The database starts up now, but the last created couple of tables are
coppupted, so that it cannot be reindexed or vaccumed, and
there is an error in a system table:

PostgreSQL stand-alone backend 8.0.0
backend> reindex database tr
ERROR: could not create unique index
DETAIL: Table contains duplicated values.
backend> drop table t3512;
ERROR: catalog is missing 3 attribute(s) for relid 147630962

Deleting tuple with this oid from pg_class seems to have helped with that,
too. I ran vacuum + vacuumfull after that and everything seems to be Ok.

Going to read more about PITR for the next time...

Thanks,
L.B.

> Well, it *might* be possible to recover using a Point in Time Recovery,
> with some manipulation. Never been done, as far as I know, so don't
> hold your breath.
>
> PITR wasn't designed for the situation where you haven't actually taken
> a backup, but it might still be possible. I think it will cause a
> problem since there's no pg_stop_backup() been executed, but perhaps we
> can think of a way to override that or build a custom recovery server.
>
> First, backup exactly everything you have now and save it.
> You might even want to do it twice, so there's no mistake.
>
> If you've got the original failure log that would be great. We need to
> establish what time the original failure took place, if there was one,
> so we can try to rollforward to a time just before that.
>
> Anyway, I'll be free in a few hours to have a look at this, but it could
> take a few days to figure it out, so don't promise anybody success and
> don't say it would be quick either. You may not wish to wait that long,
> I've no idea of your business. Please save the database anyway so we've
> got a test case.
>
> Best Regards, Simon Riggs
>