production server down

From: Joe Conway <mail(at)joeconway(dot)com>
To: "Hackers (PostgreSQL)" <pgsql-hackers(at)postgresql(dot)org>
Subject: production server down
Date: 2004-12-15 03:11:56
Message-ID: 41BFAB7C.5040108@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've got a down production server (will not restart) with the following
tail to its log file:

2004-12-13 15:05:52 LOG: recycled transaction log file "000001650000004C"
2004-12-13 15:26:01 LOG: recycled transaction log file "000001650000004D"
2004-12-13 16:39:55 LOG: database system was shut down at 2004-11-02
17:05:33 PST
2004-12-13 16:39:55 LOG: checkpoint record is at 0/9B0B8C
2004-12-13 16:39:55 LOG: redo record is at 0/9B0B8C; undo record is at
0/0; shutdown TRUE
2004-12-13 16:39:55 LOG: next transaction ID: 536; next OID: 17142
2004-12-13 16:39:55 LOG: database system is ready
2004-12-14 15:36:20 FATAL: IDENT authentication failed for user "colprod"
2004-12-14 15:36:58 FATAL: IDENT authentication failed for user "colprod"
2004-12-14 15:39:26 LOG: received smart shutdown request
2004-12-14 15:39:26 LOG: shutting down
2004-12-14 15:39:28 PANIC: could not open file
"/replica/pgdata/pg_xlog/0000000000000000" (log file 0, segment 0): No
such file or directory
2004-12-14 15:39:28 LOG: shutdown process (PID 23202) was terminated by
signal 6
2004-12-14 15:39:39 LOG: database system shutdown was interrupted at
2004-12-14 15:39:26 PST
2004-12-14 15:39:39 LOG: could not open file
"/replica/pgdata/pg_xlog/0000000000000000" (log file 0, segment 0): No
such file or directory
2004-12-14 15:39:39 LOG: invalid primary checkpoint record
2004-12-14 15:39:39 LOG: could not open file
"/replica/pgdata/pg_xlog/0000000000000000" (log file 0, segment 0): No
such file or directory
2004-12-14 15:39:39 LOG: invalid secondary checkpoint record
2004-12-14 15:39:39 PANIC: could not locate a valid checkpoint record
2004-12-14 15:39:39 LOG: startup process (PID 23298) was terminated by
signal 6
2004-12-14 15:39:39 LOG: aborting startup due to startup process failure

This is a SuSE 9, 8-way Xeon IBM x445, with nfs mounted Network
Appliance for database storage, postgresql-7.4.5-36.4.

The server experienced a hang (as yet unexplained) yesterday and was
restarted at 2004-12-13 16:38:49 according to syslog. I'm told by the
network admin that there was a problem with the network card on restart,
so the nfs mount most probably disappeared and then reappeared
underneath a quiescent postgresql at some point between 2004-12-13
16:39:55 and 2004-12-14 15:36:20 (but much closer to the former than the
latter).

Any help would be much appreciated. Is our only option pg_resetxlog?

Thanks,

Joe

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2004-12-15 03:42:50 Re: production server down
Previous Message Simon Riggs 2004-12-15 00:51:07 Re: [Testperf-general] BufferSync and bgwriter