BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.

From: "Luke Koops" <luke(dot)koops(at)entrust(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #5038: WAL file is pending deletion in pg_xlog folder, this interferes with WAL archiving.
Date: 2009-09-05 03:52:29
Message-ID: 200909050352.n853qTEH071667@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 5038
Logged by: Luke Koops
Email address: luke(dot)koops(at)entrust(dot)com
PostgreSQL version: 8.3.7
Operating system: Windows 2003 Server Enterprise Edition
Description: WAL file is pending deletion in pg_xlog folder, this
interferes with WAL archiving.
Details:

On my system, one of the WAL files is pending deletion. The handle is being
held by one of the postgres backend processes, but that is another potential
bug.

At first, the unlink worked, and the .ready and .done files were deleted.
But the WAL file still shows up in the pg_xlog directory listing.

Note: the WAL file did get archived properly. There was no error reported
at the time.

When it comes time to recycle the log files, RemoveOldXLogFiles() calls
ReadDir() to get the list of files, then it calls XLogArchiveCheckDone()
which, if it cannot find a .done or a .ready file, calls
XLogArchiveNotify(). XLogArchiveNotify() creates the .ready file again.
This causes the archiver to call the archive command on the old WAL file
that is pending deletion. The copy command will fail and all subsequent
archive attempts will keep trying to copy the old WAL file that is pending
deletion.

At this point, none of the WAL files will get shipped and the pg_xlog folder
will start filling up.

Before calling XLogArchiveCheckDone(), RemoveOldXLogFiles() makes a number
of tests to make sure the name is for a legitimate XLOG. This would be a
good time to make sure the file is real, not pending deletion. That would
prevent the creation of the .ready file and WAL archiving would continue to
work.

It might be a good idea to log something at the DEBUG level if a directory
entry is encoutered that matches the naming conventions but is not a real
file.

You could probably reproduce this behaviour by changing the permissions on a
WAL file, although you wouldn't be able to test a fix in the same way.

I have not reliably reproduced the WAL file handle "leak" in the postgres
back end. I believe may be related to statements timing out. My system
currently has statement_timeout=1min, but that will be removed. I will
report the "leak" when I have a better handle (no pun) on the situation.

-Luke

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Robert Haas 2009-09-05 04:02:50 Re: BUG #5010: perl iconv function returns ? character
Previous Message Robert Haas 2009-09-05 03:27:16 Re: BUG #5034: plperlu problem with gethostbyname