Cleaning up large objects

Lists: pgsql-jdbc
From: "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Cleaning up large objects
Date: 2004-08-10 15:13:00
Message-ID: 4890.1092150780@www53.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Hi,

I have a problem with the cleanup of large objects.

My database consists of (mainly) one table that holds a date and an
associated large object id. The large object can be as big as 2 megabytes.

Every hour, I have a small method that removes all entries that are older
than a certain date:

private void cleanup(Connection dbConn, Date deleteTo) throws SQLException
{
try {
dbConn.setAutoCommit(false);
dbConn.setTransactionIsolation(Connection.TRANSACTION_READ_COMMITTED);
LargeObjectManager lobj = ((PGConnection)dbConn).getLargeObjectAPI();

PreparedStatement queryStmt = dbConn.prepareStatement(
"SELECT Values FROM MyTable WHERE From_Date < ?");
queryStmt.setTimestamp(0, new Timestamp(deleteTo.getTime()));
ResultSet rs = queryStmt.executeQuery();
try {
while (rs.next()) {
int oid = rs.getInt(1);
try {
lobj.delete(oid);
} catch (SQLException e) {e.printStackTrace();}
}
} finally {
rs.close();
}
PreparedStatement deleteStmt
= dbConn.prepareStatement("DELETE FROM MyTable WHERE From_Date < ?");
deleteStmt.setTimestamp(1, new Timestamp(deleteTo.getTime()));
deleteStmt.executeUpdate();
dbConn.commit();
} catch (SQLException e) {
dbConn.rollback();
throw e;
}

This program also seems to work, that means that I dont get any exceptions
from it.
Also, every night I run the "vacuum" command:

/usr/bin/vacuumdb -a -z

But: the data base keeps growing. The table MyTable seems to successfully
remove the entries, but the disk usage keeps high.

I am quit sure that the disk usage mainly comes from the LOBs since the disk
usage shrinks with exactly the speed I expect from the LOBs I put in.

What is wrong with my approach that it does not free the disk space?

Regards

Ole

--
NEU: WLAN-Router fr 0,- EUR* - auch fr DSL-Wechsler!
GMX DSL = supergnstig & kabellos http://www.gmx.net/de/go/dsl


From: "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net>
To: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Cleaning up large objects
Date: 2004-08-10 15:33:33
Message-ID: 12095.1092152013@www25.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

I forgot to mention some things:

I use postgreSQL 7.3.2, with the jdbc driver 7.4.

When I execute "vacuum -a -v -z", I get
INFO: --Relation pg_catalog.pg_largeobject--
INFO: Index pg_largeobject_loid_pn_index: Pages 41027; Tuples 731708:
Deleted 20.
CPU 3.95s/0.55u sec elapsed 100.14 sec.
INFO: Removed 320 tuples in 75 pages.
CPU 0.02s/0.01u sec elapsed 1.61 sec.
INFO: Pages 1768741: Changed 14814, Empty 0; Tup 731708: Vac 320, Keep
104534 UnUsed 9314229.
Total CPU 169.45s/9.47u sec elapsed 737.33 sec.
INFO: Analyzing pg_catalog.pg_largeobject

320 tuples look like twice the number of large object I just removed. So, I
gues that /something/ works.
However, no space is freed. Why?

Ole

--
NEU: WLAN-Router fr 0,- EUR* - auch fr DSL-Wechsler!
GMX DSL = supergnstig & kabellos http://www.gmx.net/de/go/dsl


From: Kris Jurka <books(at)ejurka(dot)com>
To: Ole Streicher <ole-usenet-spam(at)gmx(dot)net>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Cleaning up large objects
Date: 2004-08-10 17:38:26
Message-ID: Pine.BSO.4.56.0408101236340.26464@leary.csoft.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

On Tue, 10 Aug 2004, Ole Streicher wrote:

> When I execute "vacuum -a -v -z", I get
> However, no space is freed. Why?
>

Without -f (FULL) vacuum will not actually free space, but just mark it
for later reuse. If you are seeing a growing rate over time that you
think should hit a steady state, perhaps your free space map settings
aren't large enough.

Kris Jurka


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Cleaning up large objects
Date: 2004-08-10 17:52:06
Message-ID: 29794.1092160326@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

"Ole Streicher" <ole-usenet-spam(at)gmx(dot)net> writes:
> What is wrong with my approach that it does not free the disk space?

Plain vacuum is not designed to "free" disk space, at least not in the
sense of returning it to the operating system. What it's designed to do
is record free disk space within table files in the "free space map"
(FSM) so that it can be re-used for future row creation.

To aggressively compact table files and return space to the OS, you need
vacuum full, which is a great deal slower and requires exclusive table
locks.

If you see space leakage in a database that should have a reasonably
constant total size, the reason is usually that you don't have the FSM
parameters set large enough to remember all the free space. Check your
settings in postgresql.conf and increase if needed. (Note that an
increase requires a postmaster restart to take effect.) You might also
want to think about vacuuming more often than once a day, so that space
can be recycled into the FSM sooner. When you have a periodic bulk
delete process, it's not a bad idea to vacuum the table that had the
deletes immediately after each deletion run.

Depending on how far behind the eight-ball you are, you may need a pass
of vacuum full to get the DB back down to a reasonable size.

BTW, "cluster" can serve as a substitute for vacuum full, since it also
compacts out dead space. It can be significantly faster than vacuum
full when there's a lot of space to be reclaimed.

regards, tom lane


From: "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Cleaning up large objects
Date: 2004-08-11 07:20:59
Message-ID: 13241.1092208859@www53.gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

Hi Tom,

> "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net> writes:
> To aggressively compact table files and return space to the OS, you need
> vacuum full, which is a great deal slower and requires exclusive table
> locks.

OK; this freed the space while running for about four hours.

> If you see space leakage in a database that should have a reasonably
> constant total size, the reason is usually that you don't have the FSM
> parameters set large enough to remember all the free space. Check your
> settings in postgresql.conf and increase if needed. (Note that an
> increase requires a postmaster restart to take effect.)

What is a reasonable value for these settings in my case? I write about 5
MBytes per hour (measured by the decrease of free disk space over some time)
to 157 LOBs (mostly by appending to existing LOBs), where the data remain
for about three months. This makes about 11 Gigabytes of needed Disk space.
But my disk went out of space at about 15 Gigabytes.

I have still no idea which value to put there; the (commented out) line in
the cfg file is

#max_fsm_pages = 10000 # min 1000, fsm is free space map, ~6 bytes

> You might also want to think about vacuuming more often than once a day,
> so that space can be recycled into the FSM sooner. When you have a
> periodic bulk delete process, it's not a bad idea to vacuum the table
> that had the deletes immediately after each deletion run.

My problem here is that vaccum takes always a few minutes (except in cases
when the database was not touched at all), and during that time the system
load is quite high. That disturbs the "normal" data storage processes on the
same server.

Is it possible to run the vacuum with a lower priority (while all other
postgres queries keeping the normal priority)? It does not help just to
decrease the priority of the vacuum command since it is the postmaster
thread that makes the load and not the command script).

Regards

Ole

--
NEU: WLAN-Router fr 0,- EUR* - auch fr DSL-Wechsler!
GMX DSL = supergnstig & kabellos http://www.gmx.net/de/go/dsl


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Ole Streicher" <ole-usenet-spam(at)gmx(dot)net>
Cc: pgsql-jdbc(at)postgresql(dot)org
Subject: Re: Cleaning up large objects
Date: 2004-08-11 14:18:51
Message-ID: 4987.1092233931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-jdbc

"Ole Streicher" <ole-usenet-spam(at)gmx(dot)net> writes:
> What is a reasonable value for these settings in my case?

I'd try bumping up max_fsm_pages to a million or so. That would
definitely be enough for a 10Gb database. You could probably get
away with less but I'm not sure it's worth worrying about.

If you are running 7.4 then a "vacuum verbose" will give you some
data (at the very end of its lengthy printout) about FSM consumption,
which you could use if you want to size more carefully.

> Is it possible to run the vacuum with a lower priority (while all other
> postgres queries keeping the normal priority)?

No. In 8.0 there will be some tuning parameters that will help with
that, but not in current releases...

regards, tom lane