Corrupted data, best course of repair?

Lists: pgsql-admin
From: Sean Chittenden <sean(at)gigave(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Corrupted data, best course of repair?
Date: 2005-08-22 06:51:49
Message-ID: 20050822065149.GD71159@sean.gigave.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Today I was charged with fixing a DB, and upon looking into it, I
discovered that there was gigs upon gigs of the errors from below.
Has anyone seen this or, more interesting to me, recovered from this
kind of a failure mode?

Here's a bit of background. The system is a good system with verified
disks/FS, etc., but it's a busy database and (here's what's bugging
me): it's using pg_autovacuum *AND* is a system that gets a power
cycle every few weeks[1]. I'm worried that a non-exclusive vacuum
during recovery is causing some kind of problem. The system was
running fine under 7.4.8 for months, but after an upgrade to 8,
they've been experiencing some "interesting characteristics" that led
them to resolve the system getting wedged by "readjusting a power cord
or toggling a power switch[1]." Any helpful advice? reindex? I
still have the original data files, but they're sizable (~30GB) and I
haven't had a chance to really experiment. A dump/reload has been
done and data is sporadically missing (less than 0.01%) or in some
cases, there are duplicate rows. I'm guessing this is due to data
that's been updated since some kind of index/corruption occurred
during the previous "reboot." fsync was off.

I'll be the first to admit, I've seen a bunch of PG errors, but this
one is new to me. -sc

[1] I'm not the hosting company and I wasn't involved in the
"power-cycle to fix the database" epidemic. I'm doing the
janitorial work to recover, please spare me the lecture.

Aug 21 15:04:32 db postgres[434]: [10-1] ERROR: xlog flush request 1A1/3F1C34FC is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:32 db postgres[434]: [10-2] CONTEXT: writing block 3415 of relation 1663/17236/70731682
Aug 21 15:04:32 db postgres[434]: [11-1] WARNING: could not write block 3415 of 1663/17236/70731682
Aug 21 15:04:32 db postgres[434]: [11-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:32 db postgres[199]: [127-1] ERROR: xlog flush request 1A1/3B114118 is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:32 db postgres[199]: [127-2] CONTEXT: writing block 16096 of relation 1663/17236/29223136
Aug 21 15:04:32 db postgres[199]: [128-1] WARNING: could not write block 16096 of 1663/17236/29223136
Aug 21 15:04:32 db postgres[199]: [128-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:32 db postgres[435]: [2-1] ERROR: xlog flush request 1A1/3F1C34FC is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:32 db postgres[435]: [2-2] CONTEXT: writing block 3415 of relation 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [3-1] WARNING: could not write block 3415 of 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [3-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:32 db postgres[435]: [4-1] ERROR: xlog flush request 1A1/3F1C34FC is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:32 db postgres[435]: [4-2] CONTEXT: writing block 3415 of relation 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [5-1] WARNING: could not write block 3415 of 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [5-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:32 db postgres[435]: [6-1] ERROR: xlog flush request 1A1/3F1C34FC is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:32 db postgres[435]: [6-2] CONTEXT: writing block 3415 of relation 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [7-1] WARNING: could not write block 3415 of 1663/17236/70731682
Aug 21 15:04:32 db postgres[435]: [7-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:33 db postgres[392]: [2-1] ERROR: out of memory
Aug 21 15:04:33 db postgres[392]: [2-2] DETAIL: Failed on request of size 20.
Aug 21 15:04:33 db postgres[199]: [129-1] ERROR: xlog flush request 1A1/3B114118 is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:33 db postgres[199]: [129-2] CONTEXT: writing block 16096 of relation 1663/17236/29223136
Aug 21 15:04:33 db postgres[199]: [130-1] WARNING: could not write block 16096 of 1663/17236/29223136
Aug 21 15:04:33 db postgres[199]: [130-2] DETAIL: Multiple failures --- write error may be permanent.
Aug 21 15:04:33 db postgres[392]: [3-1] ERROR: xlog flush request 1A1/3F1C34FC is not satisfied --- flushed only to 1A1/3A20E26C
Aug 21 15:04:33 db postgres[392]: [3-2] CONTEXT: writing block 3415 of relation 1663/17236/70731682
Aug 21 15:04:33 db postgres[392]: [4-1] WARNING: could not write block 3415 of 1663/17236/70731682
Aug 21 15:04:33 db postgres[392]: [4-2] DETAIL: Multiple failures --- write error may be permanent.

--
Sean Chittenden


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sean Chittenden <sean(at)gigave(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Corrupted data, best course of repair?
Date: 2005-08-22 15:00:14
Message-ID: 15945.1124722814@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Sean Chittenden <sean(at)gigave(dot)com> writes:
> Today I was charged with fixing a DB, and upon looking into it, I
> discovered that there was gigs upon gigs of the errors from below.
> Has anyone seen this or, more interesting to me, recovered from this
> kind of a failure mode?

> Here's a bit of background. The system is a good system with verified
> disks/FS, etc., but it's a busy database and (here's what's bugging
> me): it's using pg_autovacuum *AND* is a system that gets a power
> cycle every few weeks[1]. I'm worried that a non-exclusive vacuum
> during recovery is causing some kind of problem. The system was
> running fine under 7.4.8 for months, but after an upgrade to 8,
> they've been experiencing some "interesting characteristics" that led
> them to resolve the system getting wedged by "readjusting a power cord
> or toggling a power switch[1]." Any helpful advice? reindex? I
> still have the original data files, but they're sizable (~30GB) and I
> haven't had a chance to really experiment. A dump/reload has been
> done and data is sporadically missing (less than 0.01%) or in some
> cases, there are duplicate rows. I'm guessing this is due to data
> that's been updated since some kind of index/corruption occurred
> during the previous "reboot." fsync was off.

They run with fsync off AND they like to toggle the power switch at
random? I'd suggest finding other employment --- they couldn't possibly
be paying you enough to justify cleaning up after stupidity as gross as
that.

Anyway, the errors appear to indicate that there are pages in the
database with LSN (last WAL location) larger than the actual current end
of WAL. The difference is pretty large though --- at least 85MB of WAL
seems to have gone missing. My first thought was a corrupted LSN field.
But seeing that there are at least two such pages, and given the antics
you describe above, what seems more likely is that the LSNs were correct
when written. I think some page of WAL never made it to disk during a
period of heavy updates that was terminated by a power cycle, and during
replay we stopped at the first point where the WAL data was detectably
corrupt, and so a big chunk of WAL never got replayed. Which of course
means there's probably a lot of stuff that needs to be fixed and did not
get fixed, but in particular our idea of the current end-of-WAL address
is a lot less than it should be. If you have the server log from just
after the last postmaster restart, looking at what terminated the replay
might confirm this.

You could get the DB to stop complaining by doing a pg_resetxlog to push
the WAL start address above the largest "flush request" mentioned in any
of the messages. But my guess is that you'll find a lot of internal
corruption after you do it. Going back to the dump might be a saner way
to proceed.

regards, tom lane


From: Sean Chittenden <sean(at)gigave(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Corrupted data, best course of repair?
Date: 2005-08-22 17:46:06
Message-ID: 20050822174606.GA46390@sean.gigave.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

> They run with fsync off AND they like to toggle the power switch at
> random? I'd suggest finding other employment --- they couldn't
> possibly be paying you enough to justify cleaning up after stupidity
> as gross as that.

Colo-by-windows. If there weren't DBAs with Win32 admin tendencies,
I'd be out of work. :)

> Anyway, the errors appear to indicate that there are pages in the
> database with LSN (last WAL location) larger than the actual current
> end of WAL. The difference is pretty large though --- at least 85MB
> of WAL seems to have gone missing. My first thought was a corrupted
> LSN field. But seeing that there are at least two such pages, and
> given the antics you describe above, what seems more likely is that
> the LSNs were correct when written. I think some page of WAL never
> made it to disk during a period of heavy updates that was terminated
> by a power cycle, and during replay we stopped at the first point
> where the WAL data was detectably corrupt, and so a big chunk of WAL
> never got replayed. Which of course means there's probably a lot of
> stuff that needs to be fixed and did not get fixed, but in
> particular our idea of the current end-of-WAL address is a lot less
> than it should be. If you have the server log from just after the
> last postmaster restart, looking at what terminated the replay might
> confirm this.

Peachy.

> You could get the DB to stop complaining by doing a pg_resetxlog to
> push the WAL start address above the largest "flush request"
> mentioned in any of the messages. But my guess is that you'll find
> a lot of internal corruption after you do it. Going back to the
> dump might be a saner way to proceed.

Tons of corruption and a backup that's a few weeks old. *grin* The
most recent dump seems to have all of the data, but some rows are
there in duplicate. Thanks for the input. -sc

--
Sean Chittenden