Re: PANIC: block 463 unfound during REDO after out of

Lists: pgsql-hackers
From: Warren Guy <warren(dot)guy(at)calorieking(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: PANIC: block 463 unfound during REDO after out of disk space failure during VACUUM
Date: 2007-01-11 06:12:31
Message-ID: 45A5D54F.7010906@calorieking.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi everyone

Was running a VACUUM on a database on a partition which was running out
of disk space. During VACUUM the server process died and failed to restart.

Running PostgreSQL 8.1.4

I basically want to get the system back up and running ASAP with as
little data loss as possible. All and any help is greatly appreciated.

Here is output from error log:

Jan 11 15:02:32 marshall postgres[71515]: [2-1] WARNING: terminating
connection because of crash of another server process
Jan 11 15:02:32 marshall postgres[71515]: [2-2] DETAIL: The postmaster
has commanded this server process to roll back the current transaction
and exit, because another server
Jan 11 15:02:32 marshall postgres[71515]: [2-3] process exited
abnormally and possibly corrupted shared memory.
Jan 11 15:02:32 marshall postgres[71515]: [2-4] HINT: In a moment you
should be able to reconnect to the database and repeat your command.
Jan 11 15:02:32 marshall postgres[67977]: [4-1] LOG: all server
processes terminated; reinitializing
Jan 11 15:02:32 marshall postgres[73888]: [5-1] LOG: database system
was interrupted at 2007-01-11 15:02:22 WST
Jan 11 15:02:32 marshall postgres[73888]: [6-1] LOG: checkpoint record
is at 4D/AA7B784
Jan 11 15:02:32 marshall postgres[73888]: [7-1] LOG: redo record is at
4D/AA7B784; undo record is at 0/0; shutdown FALSE
Jan 11 15:02:32 marshall postgres[73888]: [8-1] LOG: next transaction
ID: 376382676; next OID: 2891876
Jan 11 15:02:32 marshall postgres[73888]: [9-1] LOG: next MultiXactId:
44140; next MultiXactOffset: 91044
Jan 11 15:02:32 marshall postgres[73888]: [10-1] LOG: database system
was not properly shut down; automatic recovery in progress
Jan 11 15:02:32 marshall postgres[73888]: [11-1] LOG: redo starts at
4D/AA7B7C8
Jan 11 15:02:32 marshall postgres[73889]: [5-1] FATAL: the database
system is starting up
Jan 11 15:02:32 marshall postgres[73892]: [5-1] FATAL: the database
system is starting up
Jan 11 15:02:39 marshall postgres[73909]: [5-1] FATAL: the database
system is starting up
Jan 11 15:02:40 marshall postgres[73888]: [12-1] PANIC: block 463 unfound
Jan 11 15:02:41 marshall postgres[67977]: [5-1] LOG: startup process
(PID 73888) was terminated by signal 6
Jan 11 15:02:41 marshall postgres[67977]: [6-1] LOG: aborting startup
due to startup process failure

Thanks in advance

--
Warren Guy

System Administrator
CalorieKing - Australia
Tel: +618.9389.8777
Fax: +618.9389.8444
warren(dot)guy(at)calorieking(dot)com
www.calorieking.com


From: "Christopher Kings-Lynne" <chris(at)kkl(dot)com(dot)au>
To: "Warren Guy" <warren(dot)guy(at)calorieking(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC: block 463 unfound during REDO after out of disk space failure during VACUUM
Date: 2007-01-11 06:41:57
Message-ID: 1acfe1a40701102241k7d1a97a0v4c36351d068e95ba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'd just like to point out that Warren is a mate of mine :)

I recall a time when a related issue occurred years ago:

http://groups-beta.google.com/group/comp.databases.postgresql.hackers/browse_thread/thread/c97c853f640b9ac1/d6bc3c75eed6c2a4?q=could+not+access+status+of+transaction#d6bc3c75eed6c2a4

Not sure if it's a similar problem?

Chris

On 1/11/07, Warren Guy <warren(dot)guy(at)calorieking(dot)com> wrote:
> Hi everyone
>
> Was running a VACUUM on a database on a partition which was running out
> of disk space. During VACUUM the server process died and failed to restart.
>
> Running PostgreSQL 8.1.4
>
> I basically want to get the system back up and running ASAP with as
> little data loss as possible. All and any help is greatly appreciated.
>
> Here is output from error log:
>
> Jan 11 15:02:32 marshall postgres[71515]: [2-1] WARNING: terminating
> connection because of crash of another server process
> Jan 11 15:02:32 marshall postgres[71515]: [2-2] DETAIL: The postmaster
> has commanded this server process to roll back the current transaction
> and exit, because another server
> Jan 11 15:02:32 marshall postgres[71515]: [2-3] process exited
> abnormally and possibly corrupted shared memory.
> Jan 11 15:02:32 marshall postgres[71515]: [2-4] HINT: In a moment you
> should be able to reconnect to the database and repeat your command.
> Jan 11 15:02:32 marshall postgres[67977]: [4-1] LOG: all server
> processes terminated; reinitializing
> Jan 11 15:02:32 marshall postgres[73888]: [5-1] LOG: database system
> was interrupted at 2007-01-11 15:02:22 WST
> Jan 11 15:02:32 marshall postgres[73888]: [6-1] LOG: checkpoint record
> is at 4D/AA7B784
> Jan 11 15:02:32 marshall postgres[73888]: [7-1] LOG: redo record is at
> 4D/AA7B784; undo record is at 0/0; shutdown FALSE
> Jan 11 15:02:32 marshall postgres[73888]: [8-1] LOG: next transaction
> ID: 376382676; next OID: 2891876
> Jan 11 15:02:32 marshall postgres[73888]: [9-1] LOG: next MultiXactId:
> 44140; next MultiXactOffset: 91044
> Jan 11 15:02:32 marshall postgres[73888]: [10-1] LOG: database system
> was not properly shut down; automatic recovery in progress
> Jan 11 15:02:32 marshall postgres[73888]: [11-1] LOG: redo starts at
> 4D/AA7B7C8
> Jan 11 15:02:32 marshall postgres[73889]: [5-1] FATAL: the database
> system is starting up
> Jan 11 15:02:32 marshall postgres[73892]: [5-1] FATAL: the database
> system is starting up
> Jan 11 15:02:39 marshall postgres[73909]: [5-1] FATAL: the database
> system is starting up
> Jan 11 15:02:40 marshall postgres[73888]: [12-1] PANIC: block 463 unfound
> Jan 11 15:02:41 marshall postgres[67977]: [5-1] LOG: startup process
> (PID 73888) was terminated by signal 6
> Jan 11 15:02:41 marshall postgres[67977]: [6-1] LOG: aborting startup
> due to startup process failure
>
>
> Thanks in advance
>
> --
> Warren Guy
>
> System Administrator
> CalorieKing - Australia
> Tel: +618.9389.8777
> Fax: +618.9389.8444
> warren(dot)guy(at)calorieking(dot)com
> www.calorieking.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Chris Kings-Lynne
Director
KKL Pty. Ltd.

Biz: +61 8 9328 4780
Mob: +61 (0)409 294078
Web: www.kkl.com.au


From: Richard Huxton <dev(at)archonet(dot)com>
To: Warren Guy <warren(dot)guy(at)calorieking(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC: block 463 unfound during REDO after out of
Date: 2007-01-11 07:37:20
Message-ID: 45A5E930.2090607@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Warren Guy wrote:
> Hi everyone
>
> Was running a VACUUM on a database on a partition which was running out
> of disk space. During VACUUM the server process died and failed to restart.
>
> Running PostgreSQL 8.1.4

...
> Jan 11 15:02:39 marshall postgres[73909]: [5-1] FATAL: the database
> system is starting up
> Jan 11 15:02:40 marshall postgres[73888]: [12-1] PANIC: block 463 unfound
> Jan 11 15:02:41 marshall postgres[67977]: [5-1] LOG: startup process
> (PID 73888) was terminated by signal 6
> Jan 11 15:02:41 marshall postgres[67977]: [6-1] LOG: aborting startup
> due to startup process failure

You say "was running out of disk space" - does that mean it did run out
of disk space? I don't see the error that caused this, just the results.
That would suggest to me that something unusual caused this (or you
clipped the log fragment too far down :-)

In any case, the first thing I'd try is to make your on-disk backups and
set it up as though it's PITR recovery you're doing. That way you can
stop the recovery before block 463 causes the failure. Oh, assuming
you've got the space you need on your partition of course.

HTH
--
Richard Huxton
Archonet Ltd


From: "Christopher Kings-Lynne" <chris(at)kkl(dot)com(dot)au>
To: "Richard Huxton" <dev(at)archonet(dot)com>
Cc: "Warren Guy" <warren(dot)guy(at)calorieking(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC: block 463 unfound during REDO after out of
Date: 2007-01-12 02:07:38
Message-ID: 1acfe1a40701111807k40441c78neaa8ad49ff5ef430@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Btw -"unfound"?? I think the English there might need to be improved :)

Chris

On 1/11/07, Richard Huxton <dev(at)archonet(dot)com> wrote:
> Warren Guy wrote:
> > Hi everyone
> >
> > Was running a VACUUM on a database on a partition which was running out
> > of disk space. During VACUUM the server process died and failed to restart.
> >
> > Running PostgreSQL 8.1.4
>
> ...
> > Jan 11 15:02:39 marshall postgres[73909]: [5-1] FATAL: the database
> > system is starting up
> > Jan 11 15:02:40 marshall postgres[73888]: [12-1] PANIC: block 463 unfound
> > Jan 11 15:02:41 marshall postgres[67977]: [5-1] LOG: startup process
> > (PID 73888) was terminated by signal 6
> > Jan 11 15:02:41 marshall postgres[67977]: [6-1] LOG: aborting startup
> > due to startup process failure
>
> You say "was running out of disk space" - does that mean it did run out
> of disk space? I don't see the error that caused this, just the results.
> That would suggest to me that something unusual caused this (or you
> clipped the log fragment too far down :-)
>
> In any case, the first thing I'd try is to make your on-disk backups and
> set it up as though it's PITR recovery you're doing. That way you can
> stop the recovery before block 463 causes the failure. Oh, assuming
> you've got the space you need on your partition of course.
>
> HTH
> --
> Richard Huxton
> Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Chris Kings-Lynne
Director
KKL Pty. Ltd.

Biz: +61 8 9328 4780
Mob: +61 (0)409 294078
Web: www.kkl.com.au