Re: PANIC: rename from /data/pg_xlog/0000002200000009

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ybaykshtis(at)micropat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC: rename from /data/pg_xlog/0000002200000009
Date: 2003-11-21 04:47:37
Message-ID: 9108.1069390057@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Yurgis Baykshtis" <ybaykshtis(at)micropat(dot)com> writes:
> The most interesting thing is that rename failure is always followed by
> IpcMemoryCreate and vice-versa, IpcMemoryCreate always comes after the
> rename error.

That is not "interesting" ... it's exactly what I'd expect for the panic
recovery sequence (given that SHMMAX is preventing creation of a second
shared-memory segment).

>> What filesystem are you using, and what is the platform exactly?

> DBExperts 7.3.4 on Win2000 (so it's a cygwin-based system)

Perhaps you need to get a real operating system :-(. No such failure
mode has been reported on any Unix variant, AFAIR.

It's hard to be certain what's happening from the after-the-fact
evidence you've offered. I'd like to see what is in pg_xlog immediately
after the crash, *before* Postgres is restarted. I get the feeling that
what we will see is the destination filename already present and the
source not, which would suggest that two backends tried to do the rename
concurrently. AFAICS that must mean that the operating system's lock
support is broken, because we do not try to rename WAL segments except
while holding the CheckpointLock, not to mention the ControlFileLock.

This is not necessarily Windows' fault, it could be a cygwin or cygipc
bug ... are you up to date on those?

regards, tom lane


From: "Yurgis Baykshtis" <ybaykshtis(at)micropat(dot)com>
To: "'Tom Lane'" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PANIC: rename from /data/pg_xlog/0000002200000009
Date: 2003-11-25 23:41:29
Message-ID: 001b01c3b3ad$a700afa0$a5936e3f@aurigin.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I get the feeling that what we will see is the destination
> filename already present and the source not, which would suggest
> that two backends tried to do the rename concurrently.

Tom,

I just noticed that the rename panic errors like this one:

PANIC: rename from /data/pg_xlog/000000030000001F to
/data/pg_xlog/000000030000002C (initialization of log file 3, segment 44)
failed: No such file or directory

come shortly AFTER the following messages

LOG: recycled transaction log file 000000030000001B
LOG: recycled transaction log file 000000030000001C
LOG: recycled transaction log file 000000030000001D
LOG: recycled transaction log file 000000030000001E
LOG: removing transaction log file 000000030000001F
LOG: removing transaction log file 0000000300000020
LOG: removing transaction log file 0000000300000021
LOG: removing transaction log file 0000000300000022

So, you can see that 000000030000001F file was previously deleted by the
logic in MoveOfflineLogs() function.
Now what I can see is that MoveOfflineLogs() does not seem to be
synchronized between backends.
The xlog directory reading loop is not synchronized itself and the caller
code is not synchronized either:

CreateCheckPoint(bool shutdown, bool force)
...
LWLockRelease(ControlFileLock);

/*
* We are now done with critical updates; no need for system panic if
* we have trouble while fooling with offline log segments.
*/
END_CRIT_SECTION();

/*
* Delete offline log files (those no longer needed even for previous
* checkpoint).
*/
if (_logId || _logSeg)
{
PrevLogSeg(_logId, _logSeg);
MoveOfflineLogs(_logId, _logSeg, recptr);
}
...

So is it possible that due to the lack of synchronization, two backends call
MoveOfflineLogs() simultaneously?
For example, first backend has unlinked the log segment file and then the
second one tries to rename the same file because it was returned by
readdir() function before it got deleted by the first beckend.

However, scenario seems to be hard to materialize since it must happen in a
very short timeframe.
The "remove" and "rename" log messages look separated in time.
Also, we have a suspicion that the problem happens even with only one client
connected to postgres.

Thanks


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ybaykshtis(at)micropat(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PANIC: rename from /data/pg_xlog/0000002200000009
Date: 2003-11-26 00:22:38
Message-ID: 9728.1069806158@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Yurgis Baykshtis" <ybaykshtis(at)micropat(dot)com> writes:
> I just noticed that the rename panic errors like this one:

> PANIC: rename from /data/pg_xlog/000000030000001F to
> /data/pg_xlog/000000030000002C (initialization of log file 3, segment 44)
> failed: No such file or directory

> come shortly AFTER the following messages

> LOG: recycled transaction log file 000000030000001B
> LOG: recycled transaction log file 000000030000001C
> LOG: recycled transaction log file 000000030000001D
> LOG: recycled transaction log file 000000030000001E
> LOG: removing transaction log file 000000030000001F
> LOG: removing transaction log file 0000000300000020
> LOG: removing transaction log file 0000000300000021
> LOG: removing transaction log file 0000000300000022

> So, you can see that 000000030000001F file was previously deleted by the
> logic in MoveOfflineLogs() function.

Interesting ...

> Now what I can see is that MoveOfflineLogs() does not seem to be
> synchronized between backends.

It's certainly supposed to be, because the only place it is called from
holds the CheckPointLock while it's doing it. If more than one backend
is able to run MoveOfflineLogs at a time, then the LWLock code is simply
broken. That seems unlikely, as just about nothing would work reliably
if LWLock failed to lock out concurrent operations.

What I suspect at this point is a cygwin bug: somehow, its
implementation of readdir() is able to retrieve a stale view of a
directory. I'd suggest pinging the cygwin developers to see if that
idea strikes a chord or not.

[ thinks for a bit... ] It might be that it isn't even a stale-data
issue, but that readdir() misbehaves if there are concurrent insert,
rename or delete operations carried out in the same directory. (The
renames or deletes would be coming from MoveOfflineLogs itself, the
inserts, if any, from concurrent backends finding that they need more
WAL space.) Again I would call that a cygwin bug, as we've not seen
reports of comparable behavior anywhere else.

> Also, we have a suspicion that the problem happens even with only one client
> connected to postgres.

Unless the clients are issuing explicit CHECKPOINT operations, that
wouldn't matter, because MoveOfflineLogs is called only from
checkpointing, and the postmaster never creates more than one background
checkpoint process at a time. (So there are actually two levels of
protection in place against concurrent execution of this code.)

regards, tom lane