Re: Cleaning up unreferenced table files

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: Cleaning up unreferenced table files
Date: 2005-05-02 18:30:03
Message-ID: 200505021830.j42IU3l21583@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


Applied.

---------------------------------------------------------------------------

pgman wrote:
> Tom Lane wrote:
> > Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> > > On Tue, 26 Apr 2005, Tom Lane wrote:
> > >> Heikki Linnakangas <hlinnaka(at)iki(dot)fi> writes:
> > >>> I feel that crashes that leaves behind stale files are rare.
> > >>
> > >> Indeed, and getting more so all the time ...
> >
> > > How so? Have changes been made in those parts of the code?
> >
> > No, just that the overall reliability of Postgres keeps improving.
> >
> > At the time that TODO entry was created, I don't think we even had the
> > ability to roll back table creates/drops properly, so there were
> > scenarios in which unreferenced files could be left behind without even
> > assuming any software error. And the prevalence of backend crashes was
> > way higher than it is now, too. So I think a good argument could be
> > made that the TODO item isn't nearly as important as it was at the time.
>
> Agreed, but I don't think we are ever going to have the file system obey
> the same semantics as the database, so there will always be corner cases
> where we have the possibility for unreferenced files.
>
> > > If nobody ever runs into this issue in production, and this whole exercise
> > > turns out to be completely unnecessary, at least we'll know. That alone
> > > makes me feel better.
> >
> > We will know no such thing, unless the patch is made to announce the
> > problem so intrusively that people are certain to see it *and report it
> > to us*. Which I don't think will be acceptable.
>
> You are right that checking only after a server crash isn't going to
> help us very much, because the odds someone is going to look at the logs
> just at that moment are slim, and if they stop and start the server
> again, the message will not appear so they will think it is fixed.
>
> Though the problem is caused by a server crash, it remains even after a
> clean startup. My new version of the patch checks not just after a
> server crash, but every time the server starts up. Here are some
> timings:
>
> 3dbs without patch 0m0.230s 300 files
> 3dbs with patch 0m0.230s 300 files
> 100dbs with patch 0m0.270s 10000 files
>
> (The timing test started the server and waited for the message "database
> system is ready".)
>
> You can see that for three empty databases (only system tables/indexes),
> the patch has no impact. For 100 databases, or 10,000 files, there is a
> 17% increase in startup time, which seems acceptable.
>
> The message printed in the logs is:
>
> LOG: The table or index file "/u/pgsql/data/base/1/100001" is stale and can be safely removed
>
> I made a few changes to the patch. I changed 'clean' to 'check' because
> I don't think we are ever going to remove the files on startup (If we
> did we could do the cleanup just on crash recovery startup.) I changed
> the message from NOTICE to LOG, I made it run on all startups and not
> just crash recovery, and it was missing FreeDir() calls.
>
> --
> Bruce Momjian | http://candle.pha.pa.us
> pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
> + If your life is a hard drive, | 13 Roberts Road
> + Christ can be your backup. | Newtown Square, Pennsylvania 19073

> Index: doc/src/sgml/maintenance.sgml
> ===================================================================
> RCS file: /cvsroot/pgsql/doc/src/sgml/maintenance.sgml,v
> retrieving revision 1.41
> diff -c -c -r1.41 maintenance.sgml
> *** doc/src/sgml/maintenance.sgml 20 Feb 2005 02:21:26 -0000 1.41
> --- doc/src/sgml/maintenance.sgml 1 May 2005 04:30:48 -0000
> ***************
> *** 474,479 ****
> --- 474,496 ----
> </para>
> </sect1>
>
> + <sect1 id="check-files-after-crash">
> + <title>Check files after crash</title>
> +
> + <indexterm zone="check-files-after-crash">
> + <primary>stale file</primary>
> + </indexterm>
> +
> + <para>
> + <productname>PostgreSQL</productname> recovers automatically after crash
> + using the write-ahead log (see <xref linkend="wal">) and no manual
> + operations are normally needed. However, if there was a transaction running
> + when the crash occured that created or dropped a relation, the
> + transaction might have left a stale file in the data directory. If this
> + happens, you will get a notice in the log file stating which files can be
> + deleted.
> + </para>
> + </sect1>
>
> <sect1 id="logfile-maintenance">
> <title>Log File Maintenance</title>
> Index: src/backend/access/transam/xlog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/access/transam/xlog.c,v
> retrieving revision 1.189
> diff -c -c -r1.189 xlog.c
> *** src/backend/access/transam/xlog.c 28 Apr 2005 21:47:10 -0000 1.189
> --- src/backend/access/transam/xlog.c 1 May 2005 04:30:52 -0000
> ***************
> *** 43,48 ****
> --- 43,49 ----
> #include "utils/builtins.h"
> #include "utils/guc.h"
> #include "utils/relcache.h"
> + #include "utils/flatfiles.h"
>
>
> /*
> ***************
> *** 4525,4530 ****
> --- 4526,4533 ----
>
> CreateCheckPoint(true, true);
>
> + CheckStaleRelFiles();
> +
> /*
> * Close down recovery environment
> */
> ***************
> *** 4536,4541 ****
> --- 4539,4550 ----
> */
> remove_backup_label();
> }
> + else
> + {
> + XLogInitRelationCache();
> + CheckStaleRelFiles();
> + XLogCloseRelationCache();
> + }
>
> /*
> * Preallocate additional log files, if wanted.
> Index: src/backend/catalog/catalog.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/catalog/catalog.c,v
> retrieving revision 1.59
> diff -c -c -r1.59 catalog.c
> *** src/backend/catalog/catalog.c 14 Apr 2005 20:03:23 -0000 1.59
> --- src/backend/catalog/catalog.c 1 May 2005 04:30:52 -0000
> ***************
> *** 106,111 ****
> --- 106,144 ----
> return path;
> }
>
> + /*
> + * GetTablespacePath - construct path to a tablespace symbolic link
> + *
> + * Result is a palloc'd string.
> + *
> + * XXX this must agree with relpath and GetDatabasePath!
> + */
> + char *
> + GetTablespacePath(Oid spcNode)
> + {
> + int pathlen;
> + char *path;
> +
> + Assert(spcNode != GLOBALTABLESPACE_OID);
> +
> + if (spcNode == DEFAULTTABLESPACE_OID)
> + {
> + /* The default tablespace is {datadir}/base */
> + pathlen = strlen(DataDir) + 5 + 1;
> + path = (char *) palloc(pathlen);
> + snprintf(path, pathlen, "%s/base",
> + DataDir);
> + }
> + else
> + {
> + /* All other tablespaces have symlinks in pg_tblspc */
> + pathlen = strlen(DataDir) + 11 + OIDCHARS + 1;
> + path = (char *) palloc(pathlen);
> + snprintf(path, pathlen, "%s/pg_tblspc/%u",
> + DataDir, spcNode);
> + }
> + return path;
> + }
>
> /*
> * IsSystemRelation
> Index: src/backend/commands/tablespace.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/commands/tablespace.c,v
> retrieving revision 1.17
> diff -c -c -r1.17 tablespace.c
> *** src/backend/commands/tablespace.c 14 Apr 2005 20:03:24 -0000 1.17
> --- src/backend/commands/tablespace.c 1 May 2005 04:30:53 -0000
> ***************
> *** 341,348 ****
> /*
> * All seems well, create the symlink
> */
> ! linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> ! sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, tablespaceoid);
>
> if (symlink(location, linkloc) < 0)
> ereport(ERROR,
> --- 341,347 ----
> /*
> * All seems well, create the symlink
> */
> ! linkloc = GetTablespacePath(tablespaceoid);
>
> if (symlink(location, linkloc) < 0)
> ereport(ERROR,
> ***************
> *** 495,502 ****
> char *subfile;
> struct stat st;
>
> ! location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> ! sprintf(location, "%s/pg_tblspc/%u", DataDir, tablespaceoid);
>
> /*
> * Check if the tablespace still contains any files. We try to rmdir
> --- 494,500 ----
> char *subfile;
> struct stat st;
>
> ! location = GetTablespacePath(tablespaceoid);
>
> /*
> * Check if the tablespace still contains any files. We try to rmdir
> ***************
> *** 1036,1043 ****
> set_short_version(location);
>
> /* Create the symlink if not already present */
> ! linkloc = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> ! sprintf(linkloc, "%s/pg_tblspc/%u", DataDir, xlrec->ts_id);
>
> if (symlink(location, linkloc) < 0)
> {
> --- 1034,1040 ----
> set_short_version(location);
>
> /* Create the symlink if not already present */
> ! linkloc = GetTablespacePath(xlrec->ts_id);
>
> if (symlink(location, linkloc) < 0)
> {
> Index: src/backend/utils/adt/misc.c
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/adt/misc.c,v
> retrieving revision 1.40
> diff -c -c -r1.40 misc.c
> *** src/backend/utils/adt/misc.c 31 Dec 2004 22:01:22 -0000 1.40
> --- src/backend/utils/adt/misc.c 1 May 2005 04:30:53 -0000
> ***************
> *** 26,31 ****
> --- 26,32 ----
> #include "funcapi.h"
> #include "catalog/pg_type.h"
> #include "catalog/pg_tablespace.h"
> + #include "catalog/catalog.h"
>
> #define atooid(x) ((Oid) strtoul((x), NULL, 10))
>
> ***************
> *** 144,154 ****
>
> fctx = palloc(sizeof(ts_db_fctx));
>
> - /*
> - * size = path length + tablespace dirname length + 2 dir sep
> - * chars + oid + terminator
> - */
> - fctx->location = (char *) palloc(strlen(DataDir) + 11 + 10 + 1);
> if (tablespaceOid == GLOBALTABLESPACE_OID)
> {
> fctx->dirdesc = NULL;
> --- 145,150 ----
> ***************
> *** 157,168 ****
> }
> else
> {
> ! if (tablespaceOid == DEFAULTTABLESPACE_OID)
> ! sprintf(fctx->location, "%s/base", DataDir);
> ! else
> ! sprintf(fctx->location, "%s/pg_tblspc/%u", DataDir,
> ! tablespaceOid);
> !
> fctx->dirdesc = AllocateDir(fctx->location);
>
> if (!fctx->dirdesc)
> --- 153,159 ----
> }
> else
> {
> ! fctx->location = GetTablespacePath(tablespaceOid);
> fctx->dirdesc = AllocateDir(fctx->location);
>
> if (!fctx->dirdesc)
> Index: src/backend/utils/init/Makefile
> ===================================================================
> RCS file: /cvsroot/pgsql/src/backend/utils/init/Makefile,v
> retrieving revision 1.18
> diff -c -c -r1.18 Makefile
> *** src/backend/utils/init/Makefile 20 Feb 2005 02:22:00 -0000 1.18
> --- src/backend/utils/init/Makefile 1 May 2005 04:30:53 -0000
> ***************
> *** 12,18 ****
> top_builddir = ../../../..
> include $(top_builddir)/src/Makefile.global
>
> ! OBJS = flatfiles.o globals.o miscinit.o postinit.o
>
> all: SUBSYS.o
>
> --- 12,18 ----
> top_builddir = ../../../..
> include $(top_builddir)/src/Makefile.global
>
> ! OBJS = flatfiles.o globals.o miscinit.o postinit.o checkfiles.o
>
> all: SUBSYS.o
>
> Index: src/include/catalog/catalog.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/catalog/catalog.h,v
> retrieving revision 1.30
> diff -c -c -r1.30 catalog.h
> *** src/include/catalog/catalog.h 31 Dec 2004 22:03:24 -0000 1.30
> --- src/include/catalog/catalog.h 1 May 2005 04:30:54 -0000
> ***************
> *** 19,24 ****
> --- 19,25 ----
>
> extern char *relpath(RelFileNode rnode);
> extern char *GetDatabasePath(Oid dbNode, Oid spcNode);
> + extern char *GetTablespacePath(Oid spcNode);
>
> extern bool IsSystemRelation(Relation relation);
> extern bool IsToastRelation(Relation relation);
> Index: src/include/utils/flatfiles.h
> ===================================================================
> RCS file: /cvsroot/pgsql/src/include/utils/flatfiles.h,v
> retrieving revision 1.1
> diff -c -c -r1.1 flatfiles.h
> *** src/include/utils/flatfiles.h 20 Feb 2005 02:22:07 -0000 1.1
> --- src/include/utils/flatfiles.h 1 May 2005 04:30:54 -0000
> ***************
> *** 30,33 ****
> --- 30,36 ----
>
> extern Datum flatfile_update_trigger(PG_FUNCTION_ARGS);
>
> + /* from checkfiles.c */
> + extern void CheckStaleRelFiles(void);
> +
> #endif /* FLATFILES_H */

> /*-------------------------------------------------------------------------
> *
> * checkfiles.c
> * support to clean up stale relation files on crash recovery
> *
> * If a backend crashes while in a transaction that has created or
> * deleted a relfilenode, a stale file can be left over in the data
> * directory. This file contains routines to clean up those stale
> * files on recovery.
> *
> * $PostgreSQL$
> *
> *-------------------------------------------------------------------------
> */
> #include "postgres.h"
>
> #include "storage/fd.h"
>
> #include "utils/flatfiles.h"
> #include "miscadmin.h"
> #include "catalog/pg_tablespace.h"
> #include "catalog/catalog.h"
> #include "access/skey.h"
> #include "utils/fmgroids.h"
> #include "access/relscan.h"
> #include "access/heapam.h"
> #include "utils/resowner.h"
>
> static void CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid);
> static void CheckStaleRelFilesFromTablespace(Oid tablespaceoid);
>
> /* Like AllocateDir, but ereports on failure */
> static DIR *
> AllocateDirChecked(char *path)
> {
> DIR *dirdesc = AllocateDir(path);
> if (dirdesc == NULL)
> ereport(ERROR,
> (errcode_for_file_access(),
> errmsg("could not open directory \"%s\": %m",
> path)));
> return dirdesc;
> }
>
> /*
> * Scan through all tablespaces for relations left over
> * by aborted transactions.
> *
> * For example, if a transaction issues
> * BEGIN; CREATE TABLE foobar ();
> * and then the backend crashes, the file is left in the
> * tablespace until CheckStaleRelFiles deletes it.
> */
> void
> CheckStaleRelFiles(void)
> {
> DIR *dirdesc;
> struct dirent *de;
> char *path;
> int pathlen;
>
> pathlen = strlen(DataDir) + 11 + 1;
> path = (char *) palloc(pathlen);
> snprintf(path, pathlen, "%s/pg_tblspc/", DataDir);
> dirdesc = AllocateDirChecked(path);
> while ((de = readdir(dirdesc)) != NULL)
> {
> char *invalid;
> Oid tablespaceoid;
>
> /* Check that the directory name looks like valid tablespace link. */
> tablespaceoid = (Oid) strtol(de->d_name, &invalid, 10);
> if(invalid[0] == '\0')
> CheckStaleRelFilesFromTablespace(tablespaceoid);
> }
> FreeDir(dirdesc);
> pfree(path);
>
> CheckStaleRelFilesFromTablespace(DEFAULTTABLESPACE_OID);
> }
>
> /* Scan a specific tablespace for stale relations */
> static void
> CheckStaleRelFilesFromTablespace(Oid tablespaceoid)
> {
> DIR *dirdesc;
> struct dirent *de;
> char *path;
>
> path = GetTablespacePath(tablespaceoid);
>
> dirdesc = AllocateDirChecked(path);
> while ((de = readdir(dirdesc)) != NULL)
> {
> char *invalid;
> Oid dboid;
>
> dboid = (Oid) strtol(de->d_name, &invalid, 10);
> if(invalid[0] == '\0')
> CheckStaleRelFilesFrom(tablespaceoid, dboid);
> }
> FreeDir(dirdesc);
> pfree(path);
> }
>
> /* Scan a specific database in a specific tablespace for stale relations.
> *
> * First, pg_class for the database is opened, and the relfilenodes of all
> * relations mentioned there are stored in a hash table.
> *
> * Then the directory is scanned. Every file in the directory that's not
> * found in pg_class (the hash table) is logged.
> */
> static void
> CheckStaleRelFilesFrom(Oid tablespaceoid, Oid dboid)
> {
> DIR *dirdesc;
> struct dirent *de;
> HASHCTL hashctl;
> HTAB *relfilenodeHash;
> MemoryContext mcxt;
> RelFileNode rnode;
> char *path;
>
> /* We create a private memory context so that we can easily deallocate
> * the hash table and its contents
> */
> mcxt = AllocSetContextCreate(TopMemoryContext, "CheckStaleRelFiles",
> ALLOCSET_DEFAULT_MINSIZE,
> ALLOCSET_DEFAULT_INITSIZE,
> ALLOCSET_DEFAULT_MAXSIZE);
>
> hashctl.hash = tag_hash;
> /* The entry contents is not used for anything, we just check
> * if an oid is in the hash table or not. */
> hashctl.keysize = sizeof(Oid);
> hashctl.entrysize = 1;
> hashctl.hcxt = mcxt;
> relfilenodeHash = hash_create("relfilenodeHash", 100, &hashctl,
> HASH_FUNCTION
> | HASH_ELEM | HASH_CONTEXT);
>
> /* Read all relfilenodes from pg_class into the hash table */
> {
> ResourceOwner owner, oldowner;
> Relation rel;
> HeapScanDesc scan;
> HeapTuple tuple;
>
> /* Need a resowner to keep the heapam and buffer code happy */
> owner = ResourceOwnerCreate(NULL, "CheckStaleRelFiles");
> oldowner = CurrentResourceOwner;
> CurrentResourceOwner = owner;
>
> rnode.spcNode = tablespaceoid;
> rnode.dbNode = dboid;
> rnode.relNode = RelationRelationId;
> rel = XLogOpenRelation(true, 0 , rnode);
>
> scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
> while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
> {
> Form_pg_class classform = (Form_pg_class) GETSTRUCT(tuple);
>
> hash_search(relfilenodeHash, &classform->relfilenode,
> HASH_ENTER, NULL);
> }
> heap_endscan(scan);
>
> XLogCloseRelation(rnode);
> CurrentResourceOwner = oldowner;
> ResourceOwnerDelete(owner);
> }
>
> /* Scan the directory */
> path = GetDatabasePath(dboid, tablespaceoid);
>
> dirdesc = AllocateDirChecked(path);
> while ((de = readdir(dirdesc)) != NULL)
> {
> char *invalid;
> Oid relfilenode;
>
> relfilenode = strtol(de->d_name, &invalid, 10);
> if(invalid[0] == '\0')
> {
> /* Filename was a valid number, check if pg_class knows
> about it */
> if(hash_search(relfilenodeHash, &relfilenode,
> HASH_FIND, NULL) == NULL)
> {
> char *filepath;
> rnode.spcNode = tablespaceoid;
> rnode.dbNode = dboid;
> rnode.relNode = relfilenode;
>
> filepath = relpath(rnode);
>
> ereport(LOG,
> (errcode_for_file_access(),
> errmsg("The table or index file \"%s\" is stale and can be safely removed",
> filepath)));
> pfree(filepath);
> }
> }
> }
> FreeDir(dirdesc);
> pfree(path);
> hash_destroy(relfilenodeHash);
> MemoryContextDelete(mcxt);
> }

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2005-05-02 18:35:33 Re: [HACKERS] Decision Process WAS: Increased
Previous Message Andrew - Supernews 2005-05-02 18:26:40 Re: Feature freeze date for 8.1

Browse pgsql-patches by date

  From Date Subject
Next Message Hannu Krosing 2005-05-02 21:00:36 Re: Feature freeze date for 8.1
Previous Message Andrew - Supernews 2005-05-02 18:26:40 Re: Feature freeze date for 8.1