BUG #3483: Dropped temporary tables filled up the disk

From: "Brent Reid" <bfraci(at)aol(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3483: Dropped temporary tables filled up the disk
Date: 2007-07-24 20:02:42
Message-ID: 200707242002.l6OK2gC0065335@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 3483
Logged by: Brent Reid
Email address: bfraci(at)aol(dot)com
PostgreSQL version: 8.1.9
Operating system: Red Hat Linux 3.2.3-53
Description: Dropped temporary tables filled up the disk
Details:

Our application opens several pooled sessions to the database. These
sessions will typically never be closed. One of the threads, Thread A, of
our application will use one of these sessions to create a temporary table.
One of the columns in this table is of data type text. It will then insert
into this table several rows. Typically the amount of data put into the
text column is several 100K in length. However, on a recent Friday, the
application tried to insert several 100M into this text column. When it did
that, it got an out of memory error. In this situation, the application
closes the connection, gets a new connection from the pool, and tries again.
It will repeat this until it is able to succeed.

Our application has another thread, Thread B, that is continually looping on
a select from a large table looking for work to do. This select will
typically return in a few milliseconds. However, it can take up to 5
minutes to return depending on how much data is in the table and how much
work there is to do. The only relationship between Thread A and Thread B
is they are run from the same application and access the same database.
They don't reference the same tables.

When we came in on Monday morning, we had discovered that the application
had stopped putting data into the database because the disk containing the
database was 100% full. At least df said that it was 100% full, du said
that it was only 10% full. Thread A had stopped because it was not able to
create the temporary table because the disk was full. Thread B was still
looping looking for work to do it was still using its original
connection. At this point, there was no more work for it to do because the
other threads were not able to put anymore data into the database.

We executed the command lsof +aL1 /usr/local. This showed us the
following (this is just a portion of what was listed):

-sh-2.05b# lsof +aL1 /usr/local
COMMAND PID USER FD TYPE DEVICE SIZE NLINK NODE NAME
postmaste 30061 postgres 42u REG 9,0 79773696 0 32555014
/usr/local/pgsql/data8.1/base/159232015/164866273 (deleted)
postmaste 30061 postgres 65u REG 9,0 79773696 0 803521
/usr/local/pgsql/data8.1/base/159232015/164865706 (deleted)
postmaste 30061 postgres 66u REG 9,0 1228800 0 803522
/usr/local/pgsql/data8.1/base/159232015/164865708 (deleted)
postmaste 30061 postgres 73u REG 9,0 79773696 0 803502
/usr/local/pgsql/data8.1/base/159232015/164865754 (deleted)
postmaste 30061 postgres 74u REG 9,0 1228800 0 803517
/usr/local/pgsql/data8.1/base/159232015/164865756 (deleted)
postmaste 30061 postgres 75u REG 9,0 79773696 0 803524
/usr/local/pgsql/data8.1/base/159232015/164865802 (deleted)
postmaste 30061 postgres 76u REG 9,0 1228800 0 803529
/usr/local/pgsql/data8.1/base/159232015/164865804 (deleted)
postmaste 30061 postgres 77u REG 9,0 1228800 0 32555015
/usr/local/pgsql/data8.1/base/159232015/164866275 (deleted)
postmaste 30061 postgres 78u REG 9,0 79773696 0 32555023
/usr/local/pgsql/data8.1/base/159232015/164866609 (deleted)
postmaste 30061 postgres 79u REG 9,0 79773696 0 32555011
/usr/local/pgsql/data8.1/base/159232015/164866321 (deleted)
postmaste 30061 postgres 80u REG 9,0 1228800 0 32555012
/usr/local/pgsql/data8.1/base/159232015/164866323 (deleted)
postmaste 30061 postgres 81u REG 9,0 79773696 0 32555013
/usr/local/pgsql/data8.1/base/159232015/164866369 (deleted)
postmaste 30061 postgres 82u REG 9,0 1228800 0 32555016
/usr/local/pgsql/data8.1/base/159232015/164866371 (deleted)
postmaste 30061 postgres 96u REG 9,0 79773696 0 32555019
/usr/local/pgsql/data8.1/base/159232015/164866465 (deleted)
postmaste 30061 postgres 97u REG 9,0 1228800 0 32555020
/usr/local/pgsql/data8.1/base/159232015/164866467 (deleted)
postmaste 30061 postgres 98u REG 9,0 79773696 0 32555017
/usr/local/pgsql/data8.1/base/159232015/164866513 (deleted)
postmaste 30061 postgres 99u REG 9,0 1228800 0 32555018
/usr/local/pgsql/data8.1/base/159232015/164866515 (deleted)
postmaste 30061 postgres 100u REG 9,0 1228800 0 32555024
/usr/local/pgsql/data8.1/base/159232015/164866611 (deleted)

Process 30061 is the backend process for Thread B.

The steps to reproduce this situation:

1.In terminal session A create a file, let's say a.sql, that contains the
following:

create temporary table tmp_foo( col1 int, col2 text ) on commit delete
rows;

insert into tmp_foo select s.a, repeat('Testing',s.a*1024000) from
generate_series(1,100) as s(a);

2.Execute the following shell commands in terminal session A:

while [ 1 ] ; do date; /usr/local/pgsql/bin/psql -U testuser < a.sql; sleep
1; done

3.While that is running in terminal session A, in terminal session B start a
psql session to the database and execute the following commands:

create table foo( col1 int, col2 text );

insert into foo select s.a, repeat('Testing',2000) from
generate_series(1,10000000) as s(a);

select count(*) from foo; -- repeat this command to continue to experience
the problem

4.In terminal session C run the lsof command to look for deleted files that
are still open. You will see session B start to have several of these files
that never go away, at least until you close session B's psql session.

One of my observations was whatever was run in session B needed to run for
several seconds before you would have the problem. When I ran several short
queries, the longest of which took 0.493 ms, I did not experience the
problem. I did not spend the time to find out exactly how long a query
needed to run.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2007-07-24 23:29:43 Re: BUG #3483: Dropped temporary tables filled up the disk
Previous Message Tom Lane 2007-07-24 19:35:34 Re: [GENERAL] Should SERIAL column have MAXVAL set on sequence