Re: vacuum, vacuum full and problems releasing disk space

Lists: pgsql-general
From: Horaci Macias <hmacias(at)avaya(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 09:31:52
Message-ID: 4FAB8B08.7060407@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi everybody,

I'm running postgres 9.1 and having disk space problems.
My application captures information 24x7 and stores it into the
database. This includes several bytea and can be ~5M entries a day, so
the size can be an issue after several days.
My application also cleans up entries older than 10 days; it does this
every night and the delete operations are happening successfully. I
cannot truncate the tables as they contain both stale and active data.
The database is able to store all the entries for ~15 days without
problems, but for some reason the deletion of old entries is not freeing
up the space (or the insertion of new entries is not reusing the space
used by old entries) because after running the application for ~20days I
run out of space on disk.
I've been reading on this forum and the postgres documentation; vacuum
full is not recommended and apparently vacuum should be all I need. I'm
using autovacuum but this doesn't seem to be solving the problem
(perhaps because while vacuum is running the application keeps inserting
entries 24x7?)

Just to clarify, I don't really care if the disk space is returned to
the OS; what I need though is to be sure that I can keep a window of 10
days of records (assuming of course my HD is big enough for those 10
days, which seems to be the case).

Some questions:
* Although not being generally recommended, I've read that vacuum full
is sometimes the only choice when large deletions are in place in order
to maintain the database. Is this the case here?
* Should I try to have a "maintenance window" and stop all
inserts/writes while vacuum is running? If so, is there any way to
configure at what time vacuum will be executed by autovacuum or should I
rely on cron-type jobs for this? and is there any way to prevent
external connections at certain times of day to make sure inserts/writes
don't happen while vacuum is going, or again I should use cron-type jobs
for this?
* Any other suggestions/ideas to troubleshoot this or any pointers to
further documentation?

thank you,

Horaci


From: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
To: Horaci Macias <hmacias(at)avaya(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 14:42:21
Message-ID: 7D318A5C-9730-4E45-8CB5-F76702D68BC3@palisadesystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On May 10, 2012, at 4:31 AM, Horaci Macias wrote:

> Hi everybody,
>
> I'm running postgres 9.1 and having disk space problems.
> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>
> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>
> Some questions:
> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?

I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.

Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.

For example, I have a table called data_tbl, and child per-day tables like data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
"data_tbl_20120509_ts_check" CHECK (ts >= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts < '2012-05-09 19:00:00-05'::timestamp with time zone)"
(each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)

Each day on my systems, a cron job creates the child table, constraints on the child table, and index(es) for the child table to hold the next day's data, and another cron job drops any outdated child tables. I believe the command to create the example child table above would have been:

CREATE TABLE data_tbl_ts_20120509 (CHECK (ts >= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts < '2012-05-09 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)

(followed by any necessary GRANT commands to provide access to the new child table)

Hope this helps,
Guy

--------
This message has been scanned by ComplianceSafe, powered by Palisade's PacketSure.


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Horaci Macias *EXTERN*" <hmacias(at)avaya(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 14:50:36
Message-ID: D960CB61B694CF459DCFB4B0128514C207DEC98C@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Horaci Macias wrote:
> I'm running postgres 9.1 and having disk space problems.
> My application captures information 24x7 and stores it into the
> database. This includes several bytea and can be ~5M entries a day, so
> the size can be an issue after several days.
> My application also cleans up entries older than 10 days; it does this
> every night and the delete operations are happening successfully. I
> cannot truncate the tables as they contain both stale and active data.
> The database is able to store all the entries for ~15 days without
> problems, but for some reason the deletion of old entries is not
freeing
> up the space (or the insertion of new entries is not reusing the space
> used by old entries) because after running the application for ~20days
I
> run out of space on disk.
> I've been reading on this forum and the postgres documentation; vacuum
> full is not recommended and apparently vacuum should be all I need.
I'm
> using autovacuum but this doesn't seem to be solving the problem
> (perhaps because while vacuum is running the application keeps
inserting
> entries 24x7?)

That is possible.

You can check the last_autovacuum field in pg_stat_all_tables to see
when the table was last vacuumed.
Do you have any long running transactions? Either long statements or
sessions that are "idle in connection". Those can also block vacuum.
Do you use two phase commit?

I would try to make autovacuum more aggressive (see the documentation)
and see if that helps.

> Just to clarify, I don't really care if the disk space is returned to
> the OS; what I need though is to be sure that I can keep a window of
10
> days of records (assuming of course my HD is big enough for those 10
> days, which seems to be the case).
>
> Some questions:
> * Although not being generally recommended, I've read that vacuum full
> is sometimes the only choice when large deletions are in place in
order
> to maintain the database. Is this the case here?

You need VACUUM FULL once the bloat of the table is unacceptable,
i.e. if you don't want to leave the empty space in the tables but
want to return it to the operating system.

> * Should I try to have a "maintenance window" and stop all
> inserts/writes while vacuum is running? If so, is there any way to
> configure at what time vacuum will be executed by autovacuum or should
I
> rely on cron-type jobs for this? and is there any way to prevent
> external connections at certain times of day to make sure
inserts/writes
> don't happen while vacuum is going, or again I should use cron-type
jobs
> for this?

If you cannot keep up using autovacuum, that will be the other option.

If you want to run VACUUM, say, once daily, cron is a good way to do it.
If it turns out to be necessary, you can block new connections with
pg_hba.conf or by revoking connect permissions on the database.

Yours,
Laurenz Albe


From: Horaci Macias <hmacias(at)avaya(dot)com>
To: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 16:01:43
Message-ID: 4FABE667.803@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

thanks Guy. I had thought about using per-day tables (although I didn't
know about child tables) but my problem is that some entries are related
and they can span several minutes, so my worry is that I end up not
finding all the right entries when I search for entries that happen
close to the end of day / start of day time.
Anyway, worth a thought for sure so thanks.

H

On 10/05/12 16:42, Guy Helmer wrote:
> On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
>
>> Hi everybody,
>>
>> I'm running postgres 9.1 and having disk space problems.
>> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
>> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
>> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
>> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>>
>> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>>
>> Some questions:
>> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
>> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
>> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?
> I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.
>
> Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.
>
> For example, I have a table called data_tbl, and child per-day tables like data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
> "data_tbl_20120509_ts_check" CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)"
> (each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)
>
> Each day on my systems, a cron job creates the child table, constraints on the child table, and index(es) for the child table to hold the next day's data, and another cron job drops any outdated child tables. I believe the command to create the example child table above would have been:
>
> CREATE TABLE data_tbl_ts_20120509 (CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)
>
> (followed by any necessary GRANT commands to provide access to the new child table)
>
> Hope this helps,
> Guy
>
>
>
> --------
> This message has been scanned by ComplianceSafe, powered by Palisade's PacketSure.


From: Horaci Macias <hmacias(at)avaya(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 16:08:06
Message-ID: 4FABE7E6.8030002@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

thanks Laurenz, I'll keep an eye on pg_stat_all_tables. I checked just
now and apparently autovacuum has been ran, but this is after my recent
upgrade to 9.1 from 8.3 (I upgraded hoping this problem would go away
and so far the application hasn't been running for long enough for me to
confirm whether it's gone or not).
I don't see any clients idle in connection and I don't think I use long
running transactions. I may be using a lot of short-lived transactions 24x7.

If pg_stat_all_tables has a date on last_autovacuum, does this mean all
deleted tuples should be marked for reuse or is there any scenario where
autovacuum runs but some deleted tuples are not marked so the space is
being reused at some point? In other words, if last_autovacuum has a
recent date, can I forget about checking idle clients or long/short
running transactions or can this still be a problem even if
last_autovacuum shows autovacuum ran?

thanks,

H

On 10/05/12 16:50, Albe Laurenz wrote:
> Horaci Macias wrote:
>> I'm running postgres 9.1 and having disk space problems.
>> My application captures information 24x7 and stores it into the
>> database. This includes several bytea and can be ~5M entries a day, so
>> the size can be an issue after several days.
>> My application also cleans up entries older than 10 days; it does this
>> every night and the delete operations are happening successfully. I
>> cannot truncate the tables as they contain both stale and active data.
>> The database is able to store all the entries for ~15 days without
>> problems, but for some reason the deletion of old entries is not
> freeing
>> up the space (or the insertion of new entries is not reusing the space
>> used by old entries) because after running the application for ~20days
> I
>> run out of space on disk.
>> I've been reading on this forum and the postgres documentation; vacuum
>> full is not recommended and apparently vacuum should be all I need.
> I'm
>> using autovacuum but this doesn't seem to be solving the problem
>> (perhaps because while vacuum is running the application keeps
> inserting
>> entries 24x7?)
> That is possible.
>
> You can check the last_autovacuum field in pg_stat_all_tables to see
> when the table was last vacuumed.
> Do you have any long running transactions? Either long statements or
> sessions that are "idle in connection". Those can also block vacuum.
> Do you use two phase commit?
>
> I would try to make autovacuum more aggressive (see the documentation)
> and see if that helps.
>
>> Just to clarify, I don't really care if the disk space is returned to
>> the OS; what I need though is to be sure that I can keep a window of
> 10
>> days of records (assuming of course my HD is big enough for those 10
>> days, which seems to be the case).
>>
>> Some questions:
>> * Although not being generally recommended, I've read that vacuum full
>> is sometimes the only choice when large deletions are in place in
> order
>> to maintain the database. Is this the case here?
> You need VACUUM FULL once the bloat of the table is unacceptable,
> i.e. if you don't want to leave the empty space in the tables but
> want to return it to the operating system.
>
>> * Should I try to have a "maintenance window" and stop all
>> inserts/writes while vacuum is running? If so, is there any way to
>> configure at what time vacuum will be executed by autovacuum or should
> I
>> rely on cron-type jobs for this? and is there any way to prevent
>> external connections at certain times of day to make sure
> inserts/writes
>> don't happen while vacuum is going, or again I should use cron-type
> jobs
>> for this?
> If you cannot keep up using autovacuum, that will be the other option.
>
> If you want to run VACUUM, say, once daily, cron is a good way to do it.
> If it turns out to be necessary, you can block new connections with
> pg_hba.conf or by revoking connect permissions on the database.
>
> Yours,
> Laurenz Albe


From: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
To: Horaci Macias <hmacias(at)avaya(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 17:28:00
Message-ID: AA6F9A6A-8BBE-4F2A-B2E6-A5C590DBDF74@palisadesystems.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

BTW, it's not a problem to query data across multiple days as long as you query from the parent table -- Postgresql will use the child table constraints to search all the child tables that could contain data.

Guy

On May 10, 2012, at 11:01 AM, Horaci Macias wrote:

> thanks Guy. I had thought about using per-day tables (although I didn't know about child tables) but my problem is that some entries are related and they can span several minutes, so my worry is that I end up not finding all the right entries when I search for entries that happen close to the end of day / start of day time.
> Anyway, worth a thought for sure so thanks.
>
> H
>
> On 10/05/12 16:42, Guy Helmer wrote:
>> On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
>>
>>> Hi everybody,
>>>
>>> I'm running postgres 9.1 and having disk space problems.
>>> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
>>> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
>>> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
>>> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>>>
>>> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>>>
>>> Some questions:
>>> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
>>> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
>>> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?
>> I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.
>>
>> Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.
>>
>> For example, I have a table called data_tbl, and child per-day tables like data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
>> "data_tbl_20120509_ts_check" CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)"
>> (each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)
>>
>> Each day on my systems, a cron job creates the child table, constraints on the child table, and index(es) for the child table to hold the next day's data, and another cron job drops any outdated child tables. I believe the command to create the example child table above would have been:
>>
>> CREATE TABLE data_tbl_ts_20120509 (CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)
>>
>> (followed by any necessary GRANT commands to provide access to the new child table)
>>
>> Hope this helps,
>> Guy
>>
>>

--------
This message has been scanned by ComplianceSafe, powered by Palisade's PacketSure.


From: Horaci Macias <hmacias(at)avaya(dot)com>
To: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-10 21:07:23
Message-ID: 4FAC2E0B.4040003@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

oh I see what you mean; I definitely didn't know that, and makes this
approach much more interesting now :)

thanks,

H

On 10/05/12 19:28, Guy Helmer wrote:
> BTW, it's not a problem to query data across multiple days as long as you query from the parent table -- Postgresql will use the child table constraints to search all the child tables that could contain data.
>
> Guy
>
> On May 10, 2012, at 11:01 AM, Horaci Macias wrote:
>
>> thanks Guy. I had thought about using per-day tables (although I didn't know about child tables) but my problem is that some entries are related and they can span several minutes, so my worry is that I end up not finding all the right entries when I search for entries that happen close to the end of day / start of day time.
>> Anyway, worth a thought for sure so thanks.
>>
>> H
>>
>> On 10/05/12 16:42, Guy Helmer wrote:
>>> On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> I'm running postgres 9.1 and having disk space problems.
>>>> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
>>>> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
>>>> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
>>>> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>>>>
>>>> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>>>>
>>>> Some questions:
>>>> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
>>>> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
>>>> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?
>>> I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.
>>>
>>> Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.
>>>
>>> For example, I have a table called data_tbl, and child per-day tables like data_tbl_ts_20120509. The child table data_tbl_ts_20120509 has constraint:
>>> "data_tbl_20120509_ts_check" CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)"
>>> (each child table has data from 00:00:00 GMT to 23:59:59 GMT for that day)
>>>
>>> Each day on my systems, a cron job creates the child table, constraints on the child table, and index(es) for the child table to hold the next day's data, and another cron job drops any outdated child tables. I believe the command to create the example child table above would have been:
>>>
>>> CREATE TABLE data_tbl_ts_20120509 (CHECK (ts>= '2012-05-08 19:00:00-05'::timestamp with time zone AND ts< '2012-05-09 19:00:00-05'::timestamp with time zone)) INHERITS (data_tbl)
>>>
>>> (followed by any necessary GRANT commands to provide access to the new child table)
>>>
>>> Hope this helps,
>>> Guy
>>>
>>>
>
> --------
> This message has been scanned by ComplianceSafe, powered by Palisade's PacketSure.


From: Horaci Macias <hmacias(at)avaya(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-16 08:45:53
Message-ID: 4FB36941.9060502@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

after tuning the autovacuum settings I can now see the tables vaccumed
and the number of dead tuples dropping whenever an autovacuum happens,
which makes sense.
What I don't see though is the size of the tables ever decreasing, but
I'm not sure I should see this.

Can somebody please confirm whether vacuum (not vacuum full) will ever
reduce the size of a table or will the table always have whatever
maximum size it ever reached, even if under the hood some inserts don't
result in size increasing because space is being reused?
For example, starting from an empty table, I insert tuples until the
table is 1G in size. Then I insert another bunch of tuples and the table
reaches 2G. If I delete this second bunch of tuples and vacuum (not
vacuum full) the table, should I expect the table to be ~1G in size
again or is it "normal" that the table stays at 2G (although ~1G
contains dead tuples)? If I add again the bunch of tuples I deleted,
should I expect the table to remain at ~2G (since the dead tuples space
was reused) or would the table grow to ~3G?
Is there any easy way to see how much of the size of a table is occupied
by dead tuples and how much is occupied by live tuples?

H

On 10/05/12 16:50, Albe Laurenz wrote:
> Horaci Macias wrote:
>> I'm running postgres 9.1 and having disk space problems.
>> My application captures information 24x7 and stores it into the
>> database. This includes several bytea and can be ~5M entries a day, so
>> the size can be an issue after several days.
>> My application also cleans up entries older than 10 days; it does this
>> every night and the delete operations are happening successfully. I
>> cannot truncate the tables as they contain both stale and active data.
>> The database is able to store all the entries for ~15 days without
>> problems, but for some reason the deletion of old entries is not
> freeing
>> up the space (or the insertion of new entries is not reusing the space
>> used by old entries) because after running the application for ~20days
> I
>> run out of space on disk.
>> I've been reading on this forum and the postgres documentation; vacuum
>> full is not recommended and apparently vacuum should be all I need.
> I'm
>> using autovacuum but this doesn't seem to be solving the problem
>> (perhaps because while vacuum is running the application keeps
> inserting
>> entries 24x7?)
> That is possible.
>
> You can check the last_autovacuum field in pg_stat_all_tables to see
> when the table was last vacuumed.
> Do you have any long running transactions? Either long statements or
> sessions that are "idle in connection". Those can also block vacuum.
> Do you use two phase commit?
>
> I would try to make autovacuum more aggressive (see the documentation)
> and see if that helps.
>
>> Just to clarify, I don't really care if the disk space is returned to
>> the OS; what I need though is to be sure that I can keep a window of
> 10
>> days of records (assuming of course my HD is big enough for those 10
>> days, which seems to be the case).
>>
>> Some questions:
>> * Although not being generally recommended, I've read that vacuum full
>> is sometimes the only choice when large deletions are in place in
> order
>> to maintain the database. Is this the case here?
> You need VACUUM FULL once the bloat of the table is unacceptable,
> i.e. if you don't want to leave the empty space in the tables but
> want to return it to the operating system.
>
>> * Should I try to have a "maintenance window" and stop all
>> inserts/writes while vacuum is running? If so, is there any way to
>> configure at what time vacuum will be executed by autovacuum or should
> I
>> rely on cron-type jobs for this? and is there any way to prevent
>> external connections at certain times of day to make sure
> inserts/writes
>> don't happen while vacuum is going, or again I should use cron-type
> jobs
>> for this?
> If you cannot keep up using autovacuum, that will be the other option.
>
> If you want to run VACUUM, say, once daily, cron is a good way to do it.
> If it turns out to be necessary, you can block new connections with
> pg_hba.conf or by revoking connect permissions on the database.
>
> Yours,
> Laurenz Albe
>


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Horaci Macias *EXTERN*" <hmacias(at)avaya(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-16 12:41:20
Message-ID: D960CB61B694CF459DCFB4B0128514C207E6A58F@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Horaci Macias wrote:
> after tuning the autovacuum settings I can now see the tables vaccumed
> and the number of dead tuples dropping whenever an autovacuum happens,
> which makes sense.

Great.

> What I don't see though is the size of the tables ever decreasing, but
> I'm not sure I should see this.
>
> Can somebody please confirm whether vacuum (not vacuum full) will ever
> reduce the size of a table or will the table always have whatever
> maximum size it ever reached, even if under the hood some inserts
don't
> result in size increasing because space is being reused?
> For example, starting from an empty table, I insert tuples until the
> table is 1G in size. Then I insert another bunch of tuples and the
table
> reaches 2G. If I delete this second bunch of tuples and vacuum (not
> vacuum full) the table, should I expect the table to be ~1G in size
> again or is it "normal" that the table stays at 2G (although ~1G
> contains dead tuples)? If I add again the bunch of tuples I deleted,
> should I expect the table to remain at ~2G (since the dead tuples
space
> was reused) or would the table grow to ~3G?

Yes, that's expected behaviour.
AFAIK VACUUM will only reclaim zeroed pages at the end of the table,
but everything else stays empty.

> Is there any easy way to see how much of the size of a table is
occupied
> by dead tuples and how much is occupied by live tuples?

I don't think there is - you could come up with a formula using
pg_statistics (stawidth = average width of column) and pg_class
(reltuples = number of tuples, relpages = number of pages), but
you'd have to do some accounting for headers and other stuff.

Might be an interesting exercise though.

Yours,
Laurenz Albe


From: Horaci Macias <hmacias(at)avaya(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-16 12:46:17
Message-ID: 4FB3A199.5050004@avaya.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

thanks Laurenz, at least this confirms the big size is not an issue.
Regarding % of dead tuples vs live tuples, I haven't tried it but
apparently pgstattuple, from contribs should do that, just in case
anybody reading had the same question.

thanks,

H

On 16/05/12 14:41, Albe Laurenz wrote:
> Horaci Macias wrote:
>> after tuning the autovacuum settings I can now see the tables vaccumed
>> and the number of dead tuples dropping whenever an autovacuum happens,
>> which makes sense.
> Great.
>
>> What I don't see though is the size of the tables ever decreasing, but
>> I'm not sure I should see this.
>>
>> Can somebody please confirm whether vacuum (not vacuum full) will ever
>> reduce the size of a table or will the table always have whatever
>> maximum size it ever reached, even if under the hood some inserts
> don't
>> result in size increasing because space is being reused?
>> For example, starting from an empty table, I insert tuples until the
>> table is 1G in size. Then I insert another bunch of tuples and the
> table
>> reaches 2G. If I delete this second bunch of tuples and vacuum (not
>> vacuum full) the table, should I expect the table to be ~1G in size
>> again or is it "normal" that the table stays at 2G (although ~1G
>> contains dead tuples)? If I add again the bunch of tuples I deleted,
>> should I expect the table to remain at ~2G (since the dead tuples
> space
>> was reused) or would the table grow to ~3G?
> Yes, that's expected behaviour.
> AFAIK VACUUM will only reclaim zeroed pages at the end of the table,
> but everything else stays empty.
>
>> Is there any easy way to see how much of the size of a table is
> occupied
>> by dead tuples and how much is occupied by live tuples?
> I don't think there is - you could come up with a formula using
> pg_statistics (stawidth = average width of column) and pg_class
> (reltuples = number of tuples, relpages = number of pages), but
> you'd have to do some accounting for headers and other stuff.
>
> Might be an interesting exercise though.
>
> Yours,
> Laurenz Albe


From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Guy Helmer <guy(dot)helmer(at)palisadesystems(dot)com>
Cc: Horaci Macias <hmacias(at)avaya(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: vacuum, vacuum full and problems releasing disk space
Date: 2012-05-16 13:02:11
Message-ID: CAHyXU0w7Y5cvLkU+divk3o=XnRBumPvTMo2D_X2AWf8zHPOb_A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, May 10, 2012 at 9:42 AM, Guy Helmer
<guy(dot)helmer(at)palisadesystems(dot)com> wrote:
> On May 10, 2012, at 4:31 AM, Horaci Macias wrote:
>
>> Hi everybody,
>>
>> I'm running postgres 9.1 and having disk space problems.
>> My application captures information 24x7 and stores it into the database. This includes several bytea and can be ~5M entries a day, so the size can be an issue after several days.
>> My application also cleans up entries older than 10 days; it does this every night and the delete operations are happening successfully. I cannot truncate the tables as they contain both stale and active data.
>> The database is able to store all the entries for ~15 days without problems, but for some reason the deletion of old entries is not freeing up the space (or the insertion of new entries is not reusing the space used by old entries) because after running the application for ~20days I run out of space on disk.
>> I've been reading on this forum and the postgres documentation; vacuum full is not recommended and apparently vacuum should be all I need. I'm using autovacuum but this doesn't seem to be solving the problem (perhaps because while vacuum is running the application keeps inserting entries 24x7?)
>>
>> Just to clarify, I don't really care if the disk space is returned to the OS; what I need though is to be sure that I can keep a window of 10 days of records (assuming of course my HD is big enough for those 10 days, which seems to be the case).
>>
>> Some questions:
>> * Although not being generally recommended, I've read that vacuum full is sometimes the only choice when large deletions are in place in order to maintain the database. Is this the case here?
>> * Should I try to have a "maintenance window" and stop all inserts/writes while vacuum is running? If so, is there any way to configure at what time vacuum will be executed by autovacuum or should I rely on cron-type jobs for this? and is there any way to prevent external connections at certain times of day to make sure inserts/writes don't happen while vacuum is going, or again I should use cron-type jobs for this?
>> * Any other suggestions/ideas to troubleshoot this or any pointers to further documentation?
>
> I would expect a plain VACUUM to make unused space available for re-use -- not sure why it would not be helping.
>
> Since Postgresql can have tables that are children of tables, a neat trick in this situation is to create per-day child tables and insert the new data directly into the appropriate per-day table; with this approach, deleting old data is accomplished by simply dropping outdated tables and thereby avoiding VACUUM completely. With constraints on the child tables, Postgresql can optimize a query on the parent table by knowing what child table has data from what day and will only check child tables that would have data for a given query.

+1 on this approach. it's a best practice for rotating logging tables.

merlin