Disc space usage

Lists: pgsql-performance
From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Disc space usage
Date: 2008-10-08 11:55:52
Message-ID: alpine.DEB.1.10.0810081238380.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


One of our build servers recently ran out of disc space while trying to
copy an entire database. This led me to investigate the database cluster,
which is stored on a RAID array with a total size of 1TB. Running a query
to list all databases and their sizes did not add up to the amount of
space being used by Postgres, so I had a look at the pgsql/base directory.
It appears that there are a few large directories that do not correspond
to any database. I wonder if these have been left behind accidentally by
Postgres.

Here are the database directories:

Size (kB) Directory Database

32 pgsql_tmp
4352 11510 template0
4368 1 template1
4464 11511 postgres
5368 30103627 xav-userprofile-test
6096 8088167 outerjoins-userprofile-12.0-copy
8676 30103406 xav-test
10052 31313164 common-tgt-items-kmr-modmine
19956 1108178 modmine-3-preview-18-feb-2008
89452 14578911 common-tgt-items-kmr
118940 9952565 production-xav-13
201192 1257481 common-tgt-items-gtocmine-rns
296552 7040137 common-tgt-items-flyminebuild
1557160 9843085
1699624 18456655 common-src-items-flyminebuild
3376096 278561
3995276 9064702 production-unimine-pride-beta5
8528136 1257482 gtocmine-rns
40815456 29233051
42278196 27473906
47112412 28110832
47913532 32728815 production-flyminebuild:ensembl-anopheles
60519524 32841289 production-flyminebuild:go
67626328 27377902
69513844 32856736 production-flyminebuild:flybase-dmel-gene-fasta
74289908 32938724 production-flyminebuild:pubmed-gene
75786720 32941684 production-flyminebuild:biogrid
77361800 32944072 production-flyminebuild:update-publications
80160256 32947141 production-flyminebuild:create-references
81333908 32574190 flybasemine-production
86356140 12110825
87544200 33049747 production-flyminebuild

So on this server, the wasted space takes up 276GB, which is not
acceptable. I believe that if we re-initialise the cluster and re-create
the databases, these directories would disappear. Taking a look at the
directory 12110825, all the files inside were last accessed several
months ago. So, I have a few questions:

1. Is this space used for anything, or is it just abandoned? Is this a
bug?
2. How do I reclaim this wasted space in a safe manner?
3. How do I prevent this happening again?

Matthew

--
An ant doesn't have a lot of processing power available to it. I'm not trying
to be speciesist - I wouldn't want to detract you from such a wonderful
creature, but, well, there isn't a lot there, is there?
-- Computer Science Lecturer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 13:16:50
Message-ID: 27956.1223471810@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Matthew Wakeling <matthew(at)flymine(dot)org> writes:
> One of our build servers recently ran out of disc space while trying to
> copy an entire database. This led me to investigate the database cluster,
> which is stored on a RAID array with a total size of 1TB. Running a query
> to list all databases and their sizes did not add up to the amount of
> space being used by Postgres, so I had a look at the pgsql/base directory.
> It appears that there are a few large directories that do not correspond
> to any database. I wonder if these have been left behind accidentally by
> Postgres.

Anything under $PGDATA/base that doesn't correspond to a live row in
pg_database is junk. The interesting question is how it got that way,
and in particular how you seem to have managed to have repeated
instances of it.

I gather that you're in the habit of using CREATE DATABASE to copy
large existing databases, so the most likely theory is that these are
leftovers from previous failed copy attempts. Now CREATE DATABASE
does attempt to clean up if its copying fails, but there are various
ways to break that, for instance hitting control-C partway through the
cleanup phase. So I'm wondering if maybe that's been done a few times.

What PG version is this, anyway?

regards, tom lane


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 13:46:49
Message-ID: alpine.DEB.1.10.0810081443260.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 8 Oct 2008, Tom Lane wrote:
>> It appears that there are a few large directories that do not correspond
>> to any database. I wonder if these have been left behind accidentally by
>> Postgres.
>
> Anything under $PGDATA/base that doesn't correspond to a live row in
> pg_database is junk.

So I can delete it? Might be safer to stop the db server while I do that
though.

> The interesting question is how it got that way, and in particular how
> you seem to have managed to have repeated instances of it.
>
> I gather that you're in the habit of using CREATE DATABASE to copy
> large existing databases, so the most likely theory is that these are
> leftovers from previous failed copy attempts. Now CREATE DATABASE
> does attempt to clean up if its copying fails, but there are various
> ways to break that, for instance hitting control-C partway through the
> cleanup phase. So I'm wondering if maybe that's been done a few times.

Yes, we do copy large databases quite often, and drop them again. The
database cluster was initialised back in March.

> What PG version is this, anyway?

Postgres 8.3.0

Matthew

--
Unfortunately, university regulations probably prohibit me from eating
small children in front of the lecture class.
-- Computer Science Lecturer


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 14:00:24
Message-ID: alpine.DEB.1.10.0810081455070.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 8 Oct 2008, Tom Lane wrote:
> The interesting question is how it got that way, and in particular how
> you seem to have managed to have repeated instances of it.

Speaking to some of my colleagues, sometimes the createdb process fails
with a very specific error message. If we wait five seconds and try again,
then it succeeds. So, maybe the duff directories are from those failures.

The error message is always something like this:

createdb: database creation failed: ERROR: could not stat file "base/32285287/32687035": No such file or directory

Just before running createdb, we always have some quite heavy write
traffic. Is it possible that the changes that we just wrote haven't been
checkpointed properly yet, resulting in some of those files being missing
from the template database, and therefore the createdb to fail?

Matthew

--
Now, you would have thought these coefficients would be integers, given that
we're working out integer results. Using a fraction would seem really
stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
use complex numbers. -- Computer Science Lecturer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 14:01:05
Message-ID: 28838.1223474465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Matthew Wakeling <matthew(at)flymine(dot)org> writes:
> On Wed, 8 Oct 2008, Tom Lane wrote:
>> Anything under $PGDATA/base that doesn't correspond to a live row in
>> pg_database is junk.

> So I can delete it? Might be safer to stop the db server while I do that
> though.

In principle, at least, you shouldn't need to --- there shouldn't be any
buffers representing such files.

>> What PG version is this, anyway?

> Postgres 8.3.0

You should consider an update to 8.3.4. A quick look in the post-8.3.0
CVS logs shows a couple of possibly relevant fixes:

2008-04-18 13:05 tgl

* src/: backend/commands/dbcommands.c, include/port.h,
port/dirmod.c (REL8_3_STABLE): Fix rmtree() so that it keeps going
after failure to remove any individual file; the idea is that we
should clean up as much as we can, even if there's some problem
removing one file. Make the error messages a bit less misleading,
too. In passing, const-ify function arguments.

2008-04-16 19:59 tgl

* src/: backend/access/nbtree/nbtree.c,
backend/access/nbtree/nbtutils.c, backend/access/transam/xlog.c,
backend/commands/dbcommands.c, backend/port/ipc_test.c,
backend/storage/ipc/ipc.c, include/access/nbtree.h,
include/storage/ipc.h, include/utils/elog.h (REL8_3_STABLE): Repair
two places where SIGTERM exit could leave shared memory state
corrupted. (Neither is very important if SIGTERM is used to shut
down the whole database cluster together, but there's a problem if
someone tries to SIGTERM individual backends.) To do this,
introduce new infrastructure macros
PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook. Also use
this method for createdb cleanup --- that wasn't a
shared-memory-corruption problem, but SIGTERM abort of createdb
could leave orphaned files lying around.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 14:38:01
Message-ID: 29539.1223476681@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Matthew Wakeling <matthew(at)flymine(dot)org> writes:
> Speaking to some of my colleagues, sometimes the createdb process fails
> with a very specific error message. If we wait five seconds and try again,
> then it succeeds. So, maybe the duff directories are from those failures.

> The error message is always something like this:

> createdb: database creation failed: ERROR: could not stat file "base/32285287/32687035": No such file or directory

> Just before running createdb, we always have some quite heavy write
> traffic.

Hmm, would that include dropping tables in the database you are about to
copy? If so, this error is fairly readily explainable as a side effect
of the delayed dropping of physical files in recent PG versions.

(As noted in the manual, CREATE DATABASE isn't really intended as a COPY
DATABASE operation --- it is expecting the source database to be pretty
static. I think you could make this more reliable if you do a manual
checkpoint between modifying the source database and copying it.)

However, that still leaves me wondering why the leftover copied
directories stick around. If the copying step failed that way,
CREATE DATABASE *should* try to clean up the target tree before
exiting. And AFAICS it wouldn't even report the error until after
completing that cleanup. So there's still some piece of the puzzle
that's missing.

Do you have some specific examples of this error message at hand?
Can you try to confirm whether the reported path corresponds to
something in the CREATE's source database? If it's actually
complaining about a stat failure in the target tree, then there's
something else going on altogether. I don't see anything in that
path that would give this message, but I might be missing it.

regards, tom lane


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Matthew Wakeling" <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 15:14:37
Message-ID: dcc563d10810080814q627b53e0jb35d0caa27a1984@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Oct 8, 2008 at 8:00 AM, Matthew Wakeling <matthew(at)flymine(dot)org> wrote:
> The error message is always something like this:
>
> createdb: database creation failed: ERROR: could not stat file
> "base/32285287/32687035": No such file or directory

By any chance are you running on windows with virus protection
software on the server?


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 15:23:24
Message-ID: alpine.DEB.1.10.0810081621360.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 8 Oct 2008, Scott Marlowe wrote:
>> The error message is always something like this:
>>
>> createdb: database creation failed: ERROR: could not stat file
>> "base/32285287/32687035": No such file or directory
>
> By any chance are you running on windows with virus protection
> software on the server?

You insult me, sir! ;)

No, it's Linux.

Matthew

--
For every complex problem, there is a solution that is simple, neat, and wrong.
-- H. L. Mencken


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 15:38:10
Message-ID: alpine.DEB.1.10.0810081629460.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 8 Oct 2008, Tom Lane wrote:
> Hmm, would that include dropping tables in the database you are about to
> copy? If so, this error is fairly readily explainable as a side effect
> of the delayed dropping of physical files in recent PG versions.

It could quite possibly include dropping tables. We're running quite a
complex system with lots going on all the time.

> (As noted in the manual, CREATE DATABASE isn't really intended as a COPY
> DATABASE operation --- it is expecting the source database to be pretty
> static. I think you could make this more reliable if you do a manual
> checkpoint between modifying the source database and copying it.)

I gather this. However, I think it would be sensible to make sure it can
never "corrupt the database" as it were. It's fine for it to lock everyone
out of the database while the copying is happening though. The only reason
for it to fail should be if someone is logged into the template database.

> Do you have some specific examples of this error message at hand?
> Can you try to confirm whether the reported path corresponds to
> something in the CREATE's source database? If it's actually
> complaining about a stat failure in the target tree, then there's
> something else going on altogether. I don't see anything in that
> path that would give this message, but I might be missing it.

The oid in the error message is of a database that no longer exists, which
indicates that it is *probably* referring to the template database.
Unfortunately my colleagues just wrote the script so that it retries, so
we don't have a decent log of the failures, which were a while back.
However, I have now altered the script so that it fails with a message
saying "Report this to Matthew", so if it happens again I'll be able to
give you some more detail.

Matthew

--
You will see this is a 3-blackboard lecture. This is the closest you are going
to get from me to high-tech teaching aids. Hey, if they put nooses on this, it
would be fun! -- Computer Science Lecturer


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Matthew Wakeling <matthew(at)flymine(dot)org>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-08 15:51:27
Message-ID: 8827.1223481087@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Matthew Wakeling <matthew(at)flymine(dot)org> writes:
> The oid in the error message is of a database that no longer exists, which
> indicates that it is *probably* referring to the template database.
> Unfortunately my colleagues just wrote the script so that it retries, so
> we don't have a decent log of the failures, which were a while back.
> However, I have now altered the script so that it fails with a message
> saying "Report this to Matthew", so if it happens again I'll be able to
> give you some more detail.

One other bit of possibly useful data would be to eyeball the file mod
times in the orphaned subdirectories. If they were from failed CREATE
DATABASEs then I'd expect every file in a given directory to have the
same mod time (modulo the amount of time it takes to copy the DB, which
is probably not trivial for the DB sizes you're dealing with). If you
could also correlate that to the times you saw CREATE failures then it'd
be pretty convincing that we know failed CREATEs are the issue.

Also, I would definitely urge you to update to 8.3.4. Although I'm not
seeing a mechanism for CREATE to fail to clean up like this, I'm looking
at the 8.3 branch tip code, not 8.3.0 ...

regards, tom lane


From: Matthew Wakeling <matthew(at)flymine(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Disc space usage
Date: 2008-10-09 10:00:31
Message-ID: alpine.DEB.1.10.0810091057520.15851@aragorn.flymine.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, 8 Oct 2008, Tom Lane wrote:
> One other bit of possibly useful data would be to eyeball the file mod
> times in the orphaned subdirectories. If they were from failed CREATE
> DATABASEs then I'd expect every file in a given directory to have the
> same mod time (modulo the amount of time it takes to copy the DB, which
> is probably not trivial for the DB sizes you're dealing with).

Yes, I did that, and the file modification times were in such a pattern.

> If you could also correlate that to the times you saw CREATE failures
> then it'd be pretty convincing that we know failed CREATEs are the
> issue.

Can't do that until next time it happens, because we don't have the logs
from when it did happen any more.

Matthew

--
Jadzia: Don't forget the 34th rule of acquisition: Peace is good for business.
Quark: That's the 35th.
Jadzia: Oh yes, that's right. What's the 34th again?
Quark: War is good for business. It's easy to get them mixed up.