BUG #5077: Corrupted Table

Lists: pgsql-bugs
From: "Bryan McLemore" <kaelten(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5077: Corrupted Table
Date: 2009-09-23 23:29:33
Message-ID: 200909232329.n8NNTXER019344@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5077
Logged by: Bryan McLemore
Email address: kaelten(at)gmail(dot)com
PostgreSQL version: 8.4.0
Operating system: Ubuntu 64 bit
Description: Corrupted Table
Details:

Today a table corrupted and I started getting:

"invalid page header in block 900 of relation pg_tblspc/32041/138911/187737"

on all selects on a given table.

RhodiumToad & StuckMojo (and a few others) helped me track it down. The
page in question looked like this:

http://pgsql.privatepaste.com/83JfmQGtS5

RhodiumToad gave me this command to repair the table:

printf '\x00\x01\x40\x03\x00\x20' | dd of=pg_tblspc/32041/138911/187737 bs=1
conv=notrunc seek=7372812 count=6

The reason they asked me to report this is that it appears this occured when
a disk filled up while pg_dump was running.

On this system df -h shows:

/dev/sda1 65G 23G 39G 38% /
varrun 4.0G 48K 4.0G 1% /var/run
varlock 4.0G 0 4.0G 0% /var/lock
udev 4.0G 40K 4.0G 1% /dev
devshm 4.0G 0 4.0G 0% /dev/shm
/dev/sdb1 136G 23G 107G 18% /data

/dev/sda1 is where the pgdata directory is.
/dev/sbd1 is where the tablespace is.

/sda1 is the drive that filled up while running a pg_dump.

If there is any additional info I can provide please let me know.


From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: pgsql-bugs(at)postgresql(dot)org
Cc: kaelten(at)gmail(dot)com ("Bryan McLemore")
Subject: Re: BUG #5077: Corrupted Table
Date: 2009-09-24 02:38:07
Message-ID: 874oqt2hkg.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

>>>>> "Bryan" == "Bryan McLemore" <kaelten(at)gmail(dot)com> writes:

Bryan> "invalid page header in block 900 of relation pg_tblspc/32041/138911/187737"

Bryan> http://pgsql.privatepaste.com/83JfmQGtS5

Privatepaste urls do expire, so for the record here is the relevant
part of the data in question:

00000000 82 00 00 00 50 01 72 8a 01 00 04 00 00 00 84 03 |....P.r.........|
00000010 02 00 04 20 13 01 d9 00 a8 8e a2 01 d0 8d a2 01 |... ............|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 16 00 01 00 17 00 01 00 |................|
00000040 18 00 01 00 19 00 01 00 1a 00 01 00 00 00 00 00 |................|
00000050 1b 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000060 00 00 00 00 00 00 00 00 00 00 00 00 e8 9e 24 02 |..............$.|
00000070 60 9e 0c 01 d0 9d 1c 01 10 9d 74 01 48 9c 84 01 |`.........t.H...|
00000080 c0 9b 0c 01 e8 9a a2 01 30 9a 6c 01 50 99 bc 01 |........0.l.P...|
00000090 1c 00 01 00 1d 00 01 00 1e 00 01 00 28 00 01 00 |............(...|
000000a0 29 00 01 00 2a 00 01 00 2b 00 01 00 2c 00 01 00 |)...*...+...,...|
000000b0 2d 00 01 00 30 98 32 02 68 97 8c 01 80 96 cc 01 |-...0.2.h.......|
000000c0 d8 95 44 01 98 94 74 02 c0 93 a8 01 a8 92 24 02 |..D...t.......$.|
000000d0 20 92 0c 01 90 91 1c 01 d0 90 74 01 08 90 84 01 | .........t.....|
000000e0 80 8f 0c 01 00 00 00 00 00 00 00 00 00 00 00 00 |................|
000000f0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|

The data appears intact other than invalid values for pd_lower and pd_special
(and possibly pd_upper, wasn't sure about that one).

Bryan> The reason they asked me to report this is that it appears
Bryan> this occured when a disk filled up while pg_dump was running.

I have no idea whether the disk full was the cause of this, but there
was no evidence in the page data of a hardware failure, so it could do
with investigation. (I don't know of any external cause that could damage
pd_lower while leaving the rest of the page intact.)

I did ask Bryan on IRC to make a copy of his data directory before doing
the fix.

--
Andrew (irc:RhodiumToad)


From: Kaelten <kaelten(at)gmail(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #5077: Corrupted Table
Date: 2009-09-24 03:11:33
Message-ID: bc5784710909232011t27409bfcs9b908e2e02ddeb0c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I do have said copies and would be glad to help in whatever ways I am able too.

Bryan McLemore
Kaelten

On Wed, Sep 23, 2009 at 9:38 PM, Andrew Gierth
<andrew(at)tao11(dot)riddles(dot)org(dot)uk> wrote:
>>>>>> "Bryan" == "Bryan McLemore" <kaelten(at)gmail(dot)com> writes:
>
>  Bryan> "invalid page header in block 900 of relation pg_tblspc/32041/138911/187737"
>
>  Bryan> http://pgsql.privatepaste.com/83JfmQGtS5
>
> Privatepaste urls do expire, so for the record here is the relevant
> part of the data in question:
>
> 00000000  82 00 00 00 50 01 72 8a  01 00 04 00 00 00 84 03  |....P.r.........|
> 00000010  02 00 04 20 13 01 d9 00  a8 8e a2 01 d0 8d a2 01  |... ............|
> 00000020  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 00000030  00 00 00 00 00 00 00 00  16 00 01 00 17 00 01 00  |................|
> 00000040  18 00 01 00 19 00 01 00  1a 00 01 00 00 00 00 00  |................|
> 00000050  1b 00 01 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 00000060  00 00 00 00 00 00 00 00  00 00 00 00 e8 9e 24 02  |..............$.|
> 00000070  60 9e 0c 01 d0 9d 1c 01  10 9d 74 01 48 9c 84 01  |`.........t.H...|
> 00000080  c0 9b 0c 01 e8 9a a2 01  30 9a 6c 01 50 99 bc 01  |........0.l.P...|
> 00000090  1c 00 01 00 1d 00 01 00  1e 00 01 00 28 00 01 00  |............(...|
> 000000a0  29 00 01 00 2a 00 01 00  2b 00 01 00 2c 00 01 00  |)...*...+...,...|
> 000000b0  2d 00 01 00 30 98 32 02  68 97 8c 01 80 96 cc 01  |-...0.2.h.......|
> 000000c0  d8 95 44 01 98 94 74 02  c0 93 a8 01 a8 92 24 02  |..D...t.......$.|
> 000000d0  20 92 0c 01 90 91 1c 01  d0 90 74 01 08 90 84 01  | .........t.....|
> 000000e0  80 8f 0c 01 00 00 00 00  00 00 00 00 00 00 00 00  |................|
> 000000f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
>
> The data appears intact other than invalid values for pd_lower and pd_special
> (and possibly pd_upper, wasn't sure about that one).
>
>  Bryan> The reason they asked me to report this is that it appears
>  Bryan> this occured when a disk filled up while pg_dump was running.
>
> I have no idea whether the disk full was the cause of this, but there
> was no evidence in the page data of a hardware failure, so it could do
> with investigation. (I don't know of any external cause that could damage
> pd_lower while leaving the rest of the page intact.)
>
> I did ask Bryan on IRC to make a copy of his data directory before doing
> the fix.
>
> --
> Andrew (irc:RhodiumToad)
>