crash / data recovery issues

Lists: pgsql-hackers
From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: crash / data recovery issues
Date: 2008-02-06 18:45:28
Message-ID: 200802061345.28418.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm trying to do some data recovery on an 8.1.9 system. The brief history is
the system crashed, attempted to do xlog replay but that failed. I did a
pg_resetxlog to get something that would startup, and it looks as if the
indexes on pg_class have become corrupt. (ie. reindex claimes duplicate rows,
which do not show up when doing count() manipulations on the data). As it
turns out, I can't drop these indexes either (system refuses with message
indexes are needed by the system). This has kind of let the system in an
unworkable state.

I've tried to do a pg_dump, but get schema with OID 96568 does not exist
error. The database has a number (~100) temp schemas in it, so I was
suspecting that the problem was with some object referencing a temp schema
with broken dependencies, but I looked through pg_depend for any referencing
objects but found none. I also looked through pg_type, pg_proc, pg_class,
pg_constraint, pg_operator, pg_opclass, pg_conversion at their respective
*namespace fields and also found no matches. Any suggestions on what else
might cause this, or how to get past it?

I also did some digging to find the original error on xlog replay and it
was "failed to re-find parent key in "763769" for split pages 21032/21033".
I'm wondering if this is actually something you can push past with
pg_resetxlog, or if I need to do a pg_resetxlog and pass in values prior to
that error point (i guess essentially letting pg_resetxlog do a lookup)...
thoughts?

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash / data recovery issues
Date: 2008-02-06 18:56:41
Message-ID: 20080206185641.GB19269@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Treat wrote:
> I'm trying to do some data recovery on an 8.1.9 system. The brief history is
> the system crashed, attempted to do xlog replay but that failed. I did a
> pg_resetxlog to get something that would startup, and it looks as if the
> indexes on pg_class have become corrupt. (ie. reindex claimes duplicate rows,
> which do not show up when doing count() manipulations on the data). As it
> turns out, I can't drop these indexes either (system refuses with message
> indexes are needed by the system). This has kind of let the system in an
> unworkable state.

You can work out of it by starting a standalone server with system
indexes disabled (postgres -O -P, I think) and do a REINDEX on it (the
form of it that reindexes all system indexes -- I think it's REINDEX
DATABASE).

> I also did some digging to find the original error on xlog replay and it
> was "failed to re-find parent key in "763769" for split pages 21032/21033".
> I'm wondering if this is actually something you can push past with
> pg_resetxlog, or if I need to do a pg_resetxlog and pass in values prior to
> that error point (i guess essentially letting pg_resetxlog do a lookup)...
> thoughts?

You should be able to get out of that by reindexing that index.
(Actually, after you do a pg_resetxlog I think the best is to pg_dump
the whole thing and reload it. That gives you at least the assurance
that your FKs are not b0rked)

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash / data recovery issues
Date: 2008-02-06 19:30:55
Message-ID: 15521.1202326255@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Treat <xzilla(at)users(dot)sourceforge(dot)net> writes:
> I'm trying to do some data recovery on an 8.1.9 system.
> ...
> I also did some digging to find the original error on xlog replay and it
> was "failed to re-find parent key in "763769" for split pages 21032/21033".

Hmm, the only known cause of that was fixed in 8.1.6. Don't suppose you made
a copy of everything before destroying the evidence with pg_resetxlog?
If you did, any chance I could get access to it?

regards, tom lane


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: crash / data recovery issues
Date: 2008-02-06 19:33:15
Message-ID: 200802061433.15900.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wednesday 06 February 2008 13:56, Alvaro Herrera wrote:
> Robert Treat wrote:
> > it looks as if the indexes on pg_class have become corrupt. (ie. reindex
> > claimes duplicate rows, which do not show up when doing count()
> > manipulations on the data). As it turns out, I can't drop these indexes
> > either (system refuses with message indexes are needed by the system).
> > This has kind of let the system in an unworkable state.
>
> You can work out of it by starting a standalone server with system
> indexes disabled (postgres -O -P, I think) and do a REINDEX on it (the
> form of it that reindexes all system indexes -- I think it's REINDEX
> DATABASE).
>

Sorry, I should have mentioned I tried the above was under postgres -d
1 -P -O -D /path/to/data, but the reindex complains (doing reindex directly
on the pg_class indexes, or doing reindex system).

Personally I was surprised to find out it wouldn't let me drop the indexes
under this mode, but thats a different story. Oh, probably worth noting I
am able to reindex other system tables this way, just not pg_class.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL