Re: vacuum, vacuum full and problems releasing disk space

From: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
To: Horaci Macias <hmacias(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 17:28:00
Message-ID: AA6F9A6A-8BBE-4F2A-B2E6-A5C590DBDF74@palisadesystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

BTW, it's not a problem to query data across multiple days as long as you query from the parent table -- Postgresql will use the child table constraints to search all the child tables that could contain data.

Guy

On May 10, 2012, at 11:01 AM, Horaci Macias wrote:

> thanks Guy. I had thought about using per-day tables (although I didn't know about child tables) but my problem is that some entries are related and they can span several minutes, so my worry is that I end up not finding all the right entries when I search for entries that happen close to the end of day / start of day time.
> Anyway, worth a thought for sure so thanks.
>
> H
>
> On 10/05/12 16:42, Guy Helmer wrote:
>> On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
>>
>>> Hi everybody,
>>>
>>> I'm running postgres 9.1 and having disk space problems.
>>> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
>>> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
>>> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
>>> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>>>
>>> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>>>
>>> Some questions:
>>> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
>>> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
>>> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?
>> I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.
>>
>> Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.
>>
>> For example, I have a table called data_tbl, and child per-day tables like data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
>> "data_tbl_20120509_ts_check" CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)"
>> (each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)
>>
>> Each day on my systems, a cron job creates the child table, constraints on the child table, and index(es) for the child table to hold the next day's data, and another cron job drops any outdated child tables. I believe the command to create the example child table above would have been:
>>
>> CREATE TABLE data_tbl_ts_20120509 (CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)
>>
>> (followed by any necessary GRANT commands to provide access to the new child table)
>>
>> Hope this helps,
>> Guy
>>
>>

--------
This message has been scanned by ComplianceSafe, powered by Palisade's PacketSure.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Lee Hachadoorian 2012-05-10 18:10:00 Multiple COPY statements
Previous Message Evan Carroll 2012-05-10 16:32:52 Custom Domain; migration from 8.4 to 9.1 and COLLATE