Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory

Lists: pgsql-hackers
From: james(at)unifiedmind(dot)com (James Thornton)
To: pgsql-hackers(at)postgresql(dot)org
Subject: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-15 17:08:15
Message-ID: cabf0e7b.0206150908.1edab2f8@posting.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

What does this mean, and what could be causing it?

FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
directory

That's the second time in as many months that I have received this
error when trying to start postmaster after a crash -- both times a
server reboot remedied the issue.

Thanks.


From: James Thornton <thornton(at)cs(dot)baylor(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
Date: 2002-06-16 20:54:53
Message-ID: 3D0CFB1D.56706A60@cs.baylor.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

ow wrote:
>
> Just curious ... how often does the server crash? Thanks

Postgres has crashed twice in two months. I am running several OpenACS
websites with it, and I have been for ~1.5 yrs -- Postgres has been
solid, these two crashes are not the norm.


From: "ow" <oneway_111(at)yahoo(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-16 21:06:21
Message-ID: aeiuh2$1p9q$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just curious ... how often does the server crash? Thanks

"James Thornton" <james(at)unifiedmind(dot)com> wrote in message
news:cabf0e7b(dot)0206150908(dot)1edab2f8(at)posting(dot)google(dot)com(dot)(dot)(dot)
> What does this mean, and what could be causing it?
>
> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
> directory
>
> That's the second time in as many months that I have received this
> error when trying to start postmaster after a crash -- both times a
> server reboot remedied the issue.
>
> Thanks.


From: James Thornton <thornton(at)cs(dot)baylor(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Thornton <james(at)unifiedmind(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such
Date: 2002-06-17 09:13:48
Message-ID: 3D0DA84C.7EA3BB71@cs.baylor.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> That really should be impossible --- it says that a rename() failed for
> a file we just created.
>
> I judge from the spelling of the error message that you are running 7.1.

7.1.3

> However, given that you state a system reboot is necessary and
> sufficient to make the problem go away, I am going to stick my neck
> *way* out and suggest that:
>
> 1. You have the $PGDATA directory (or at least its pg_xlog subdirectory)
> mounted via NFS.
>
> 2. This is an NFS problem.

I am not running NFS on this system.


From: James Thornton <thornton(at)cs(dot)baylor(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: James Thornton <thornton(at)cs(dot)ecs(dot)baylor(dot)edu>, James Thornton <james(at)unifiedmind(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such
Date: 2002-06-17 12:28:42
Message-ID: 3D0DD5FA.A78B2096@cs.baylor.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
>
> James Thornton <thornton(at)cs(dot)ecs(dot)baylor(dot)edu> writes:
> > I am not running NFS on this system.
>
> Oh well, scratch that theory. Perhaps you should tell us what you *are*
> running --- what OS, what hardware? I still believe that this must be
> a system-level bug and not directly Postgres' fault.

[nsadmin(at)roam proc]$ cat version cpuinfo meminfo pci

Linux version 2.4.7-10smp (bhcompile(at)stripples(dot)devel(dot)redhat(dot)com) (gcc
version 2.96 20000731 (Red Hat Linux 7.1 2.96-98)) #1 SMP Thu Sep 6
17:09:31 EDT 2001

processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 7
model name : Pentium III (Katmai)
stepping : 3
cpu MHz : 548.324
cache size : 512 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips : 1094.45

total: used: free: shared: buffers: cached:
Mem: 327278592 321400832 5877760 720896 10825728 52867072
Swap: 271392768 13783040 257609728
MemTotal: 319608 kB
MemFree: 5740 kB
MemShared: 704 kB
Buffers: 10572 kB
Cached: 39552 kB
SwapCached: 12076 kB
Active: 21956 kB
Inact_dirty: 40668 kB
Inact_clean: 280 kB
Inact_target: 480 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 319608 kB
LowFree: 5740 kB
SwapTotal: 265032 kB
SwapFree: 251572 kB
NrSwapPages: 62893 pages

PCI devices found:
Bus 0, device 0, function 0:
Host bridge: Intel Corporation 440BX/ZX - 82443BX/ZX Host bridge
(rev 3).
Master Capable. Latency=64.
Prefetchable 32 bit memory at 0xf0000000 [0xf3ffffff].
Bus 0, device 1, function 0:
PCI bridge: Intel Corporation 440BX/ZX - 82443BX/ZX AGP bridge (rev
3).
Master Capable. Latency=64. Min Gnt=136.
Bus 0, device 7, function 0:
ISA bridge: Intel Corporation 82371AB PIIX4 ISA (rev 2).
Bus 0, device 7, function 1:
IDE interface: Intel Corporation 82371AB PIIX4 IDE (rev 1).
Master Capable. Latency=32.
I/O at 0x1000 [0x100f].
Bus 0, device 7, function 2:
USB Controller: Intel Corporation 82371AB PIIX4 USB (rev 1).
IRQ 14.
Master Capable. Latency=64.
I/O at 0xdce0 [0xdcff].
Bus 0, device 7, function 3:
Bridge: Intel Corporation 82371AB PIIX4 ACPI (rev 2).
IRQ 9.
Bus 0, device 14, function 0:
Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev
4).
IRQ 11.
Master Capable. Latency=64. Min Gnt=8.Max Lat=56.
Prefetchable 32 bit memory at 0xf7000000 [0xf7000fff].
I/O at 0xdcc0 [0xdcdf].
Non-prefetchable 32 bit memory at 0xff000000 [0xff0fffff].
Bus 0, device 15, function 0:
PCI bridge: Digital Equipment Corporation DECchip 21152 (rev 3).
Master Capable. Latency=64. Min Gnt=2.
Bus 0, device 17, function 0:
Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 36).
IRQ 14.
Master Capable. Latency=64. Min Gnt=10.Max Lat=10.
I/O at 0xdc00 [0xdc7f].
Non-prefetchable 32 bit memory at 0xff100000 [0xff10007f].
Bus 1, device 0, function 0:
VGA compatible controller: ATI Technologies Inc 3D Rage Pro AGP
1X/2X (rev 92).
IRQ 9.
Master Capable. Latency=64. Min Gnt=8.
Non-prefetchable 32 bit memory at 0xfd000000 [0xfdffffff].
I/O at 0xfc00 [0xfcff].
Non-prefetchable 32 bit memory at 0xfcfff000 [0xfcffffff].
Bus 2, device 9, function 0:
Unknown mass storage controller: Promise Technology, Inc. 20262 (rev
1).
IRQ 9.
Master Capable. Latency=64.
I/O at 0xecf8 [0xecff].
I/O at 0xecf0 [0xecf3].
I/O at 0xece0 [0xece7].
I/O at 0xecd8 [0xecdb].
I/O at 0xec80 [0xecbf].
Non-prefetchable 32 bit memory at 0xfafe0000 [0xfaffffff].


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: james(at)unifiedmind(dot)com (James Thornton)
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-17 14:16:48
Message-ID: 18747.1024323408@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

james(at)unifiedmind(dot)com (James Thornton) writes:
> What does this mean, and what could be causing it?
> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
> directory
> That's the second time in as many months that I have received this
> error when trying to start postmaster after a crash -- both times a
> server reboot remedied the issue.

That really should be impossible --- it says that a rename() failed for
a file we just created.

I judge from the spelling of the error message that you are running 7.1.
I would recommend an update to 7.2, wherein the error message looks
more like this:

if (rename(tmppath, path) < 0)
elog(STOP, "rename from %s to %s (initialization of log file %u, segment %u) failed: %m",
tmppath, path, log, seg);

(Alternatively, you could just edit the message in your existing sources
to include the actual source and destination pathnames given to rename()
--- it's in src/backend/access/transam/xlog.c, line 1396 in 7.1.3.)

That will allow us to eliminate the faint possibility that the code is
somehow miscomputing the pathnames occasionally.

However, given that you state a system reboot is necessary and
sufficient to make the problem go away, I am going to stick my neck
*way* out and suggest that:

1. You have the $PGDATA directory (or at least its pg_xlog subdirectory)
mounted via NFS.

2. This is an NFS problem.

In my book, no adequately-paranoid DBA will trust his database to NFS.
There are some cautionary tales in our mailing list archives...

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: James Thornton <thornton(at)cs(dot)ecs(dot)baylor(dot)edu>
Cc: James Thornton <james(at)unifiedmind(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-17 14:55:20
Message-ID: 19423.1024325720@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

James Thornton <thornton(at)cs(dot)ecs(dot)baylor(dot)edu> writes:
> I am not running NFS on this system.

Oh well, scratch that theory. Perhaps you should tell us what you *are*
running --- what OS, what hardware? I still believe that this must be
a system-level bug and not directly Postgres' fault.

regards, tom lane


From: nield(at)usol(dot)com
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, james(at)unifiedmind(dot)com
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-17 18:36:36
Message-ID: C04ZRM5ZC0GD95GAFASN04C7OIPZTYW.3d0e2c34@qw_winnt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

6/17/02 10:16:48 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>james(at)unifiedmind(dot)com (James Thornton) writes:
>> What does this mean, and what could be causing it?
>> FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or
>> directory
>> That's the second time in as many months that I have received this
>> error when trying to start postmaster after a crash -- both times a
>> server reboot remedied the issue.
>
>That really should be impossible --- it says that a rename() failed for
>a file we just created.
>
>I judge from the spelling of the error message that you are running 7.1.
>I would recommend an update to 7.2, wherein the error message looks
>more like this:
>
> if (rename(tmppath, path) < 0)
> elog(STOP, "rename from %s to %s (initialization of log file %u,
segment %u) failed: %m",
> tmppath, path, log, seg);
>
[snip]

From the xlog.c file in 7.3devel in InstallXLogFileSegment(), look at the
code near:

> while ((fd = BasicOpenFile(path, O_RDWR | PG_BINARY,
> S_IRUSR | S_IWUSR)) >= 0)

It would seem like we assume that ANY failure of BasicOpenFile() implies
that 'path' does not exist. So then we don't handle any other cases, and
rename might fail because 'path' actually exists.

What if BasicOpenFile() got some other error?

This would seem to be wrong, but it still doesn't explain why
BasicOpenFile() would be failing when 'path' exists in this
particular case.

I don't have the 7.1 or 7.2 code around, and I've never looked at it.

J.R. Nield
nield(at)usol(dot)com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: nield(at)usol(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org, james(at)unifiedmind(dot)com
Subject: Re: FATAL 2: InitRelink(logfile 0 seg 173) failed: No such file or directory
Date: 2002-06-18 14:45:12
Message-ID: 26245.1024411512@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

nield(at)usol(dot)com writes:
> What if BasicOpenFile() got some other error?

Doesn't really matter; anything else would be a problem we can't recover
from anyhow. Besides, given that rename is failing with ENOENT, a
conflict on the destination name does not appear to be the issue.

regards, tom lane