Race condition in pg_database_size()

Lists: pgsql-hackers
From: Michael Fuhr <mike(at)fuhr(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Race condition in pg_database_size()
Date: 2007-03-10 16:31:00
Message-ID: 20070310163100.GA91660@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I'm occasionally seeing calls to pg_database_size() fail with

ERROR: could not stat file "/var/lib/pgsql/data/base/16404/1738343": No such file or directory

So far I haven't noticed any other problems that might be related
to this error. This database frequently uses temporary tables so
I'm wondering if the error might be due to a race condition in
db_dir_size(), which does the following:

while ((direntry = ReadDir(dirdesc, path)) != NULL)
{
struct stat fst;

if (strcmp(direntry->d_name, ".") == 0 ||
strcmp(direntry->d_name, "..") == 0)
continue;

snprintf(filename, MAXPGPATH, "%s/%s", path, direntry->d_name);

if (stat(filename, &fst) < 0)
ereport(ERROR,
(errcode_for_file_access(),
errmsg("could not stat file \"%s\": %m", filename)));

dirsize += fst.st_size;
}

I'm wondering if the code should check for ENOENT if stat() fails
and either skip this entry silently under the assumption that the
file had been deleted since the call to ReadDir(), or issue a warning
without failing.

--
Michael Fuhr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race condition in pg_database_size()
Date: 2007-03-10 17:32:04
Message-ID: 17038.1173547924@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Fuhr <mike(at)fuhr(dot)org> writes:
> I'm wondering if the code should check for ENOENT if stat() fails
> and either skip this entry silently under the assumption that the
> file had been deleted since the call to ReadDir(),

Probably. Want to look through the rest of that module for similar
problems?

regards, tom lane


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race condition in pg_database_size()
Date: 2007-03-10 22:36:43
Message-ID: 20070310223643.GA93317@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 10, 2007 at 12:32:04PM -0500, Tom Lane wrote:
> Michael Fuhr <mike(at)fuhr(dot)org> writes:
> > I'm wondering if the code should check for ENOENT if stat() fails
> > and either skip this entry silently under the assumption that the
> > file had been deleted since the call to ReadDir(),
>
> Probably. Want to look through the rest of that module for similar
> problems?

I think only db_dir_size() and calculate_tablespace_size() are
affected by this particular failure (ReadDir followed by stat).
I'll submit a patch -- any preferences for silent continuation vs.
continuation with a notice or warning?

--
Michael Fuhr


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race condition in pg_database_size()
Date: 2007-03-10 22:39:37
Message-ID: 5194.1173566377@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Fuhr <mike(at)fuhr(dot)org> writes:
> I'll submit a patch -- any preferences for silent continuation vs.
> continuation with a notice or warning?

I think silent is fine for ENOENT cases. We know the file had been
there at ReadDir time, so the only possible conclusion is that it was
just unlinked, and I see no reason to complain about that.

regards, tom lane


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race condition in pg_database_size()
Date: 2007-03-11 04:57:32
Message-ID: 20070311045732.GA95013@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:
> Michael Fuhr <mike(at)fuhr(dot)org> writes:
> > I'll submit a patch -- any preferences for silent continuation vs.
> > continuation with a notice or warning?
>
> I think silent is fine for ENOENT cases. We know the file had been
> there at ReadDir time, so the only possible conclusion is that it was
> just unlinked, and I see no reason to complain about that.

Patch submitted.

--
Michael Fuhr


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Michael Fuhr <mike(at)fuhr(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Race condition in pg_database_size()
Date: 2007-03-11 05:24:15
Message-ID: 20070311052415.GA10458@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Michael Fuhr wrote:
> On Sat, Mar 10, 2007 at 05:39:37PM -0500, Tom Lane wrote:
> > Michael Fuhr <mike(at)fuhr(dot)org> writes:
> > > I'll submit a patch -- any preferences for silent continuation vs.
> > > continuation with a notice or warning?
> >
> > I think silent is fine for ENOENT cases. We know the file had been
> > there at ReadDir time, so the only possible conclusion is that it was
> > just unlinked, and I see no reason to complain about that.
>
> Patch submitted.

Applied, thanks.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.