Re: High inserts, bulk deletes - autovacuum vs scheduled vacuum

From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: Jeremy Haile <jhaile(at)fastmail(dot)fm>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: High inserts, bulk deletes - autovacuum vs scheduled vacuum
Date: 2007-01-10 20:35:56
Message-ID: 20070110203556.GX12217@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, Jan 09, 2007 at 12:26:41PM -0500, Jeremy Haile wrote:
> I am developing an application that has very predictable database
> operations:
> -inserts several thousand rows into 3 tables every 5 minutes. (table
> contain around 10 million rows each)
> -truncates and rebuilds aggregate tables of this data every 5 minutes.
> (several thousand rows each)
> -regular reads of aggregate table and sometimes large tables by user
> interaction
> -every night, hundreds of thousands of rows are deleted from these 3
> tables (old data)
> -20-30 other tables get inserted/updated slowly throughout the day
>
> In order to optimize performance of the inserts, I disabled
> autovacuum/row-level stats and instead run "vacuum analyze" on the whole
> DB every hour. However this operation takes around 20 minutes of each
> hour. This means that the database is involved in vacuum/analyzing
> tables 33% of the time.
>
> I'd like any performance advice, but my main concern is the amount of
> time vacuum/analyze runs and its possible impact on the overall DB
> performance. Thanks!

If much of the data in the database isn't changing that often, then why
continually re-vacuum the whole thing?

I'd suggest trying autovacuum and see how it does (though you might want
to tune it to be more or less aggressive, and you'll probably want to
enable the cost delay).

The only cases where manual vacuum makes sense to me is if you've got a
defined slow period and vacuuming during that slow period is still
frequent enough to keep up with demand, or if you've got tables that
have a very high churn rate and need to be kept small. In the later
case, I'll usually setup a cronjob to vacuum those tables once a minute
with no cost delay. I'm sure there might be some other cases where not
using autovac might make sense, but generally I'd much rather let
autovac worry about this so I don't have to.

> I am running 8.2 (will be 8.2.1 soon). The box is Windows with 2GB RAM
> connected to a SAN over fiber. The data and pg_xlog are on separate
> partitions.
>
> Modified configuration:
> effective_cache_size = 1000MB
> random_page_cost = 3
> default_statistics_target = 50
> maintenance_work_mem = 256MB
> shared_buffers = 400MB
> temp_buffers = 10MB
> work_mem = 10MB
> max_fsm_pages = 1500000

One other useful manual vacuum to consider is running vacuumdb -av
periodically (say, once a month) and looking at the last few lines of
output. That will give you a good idea on how large you should set
max_fsm_pages. Running the output of vacuumdb -av through pgFouine will
give you other useful data.

> checkpoint_segments = 30
> stats_row_level = off
> stats_start_collector = off

Unless you're really trying to get the last ounce of performance out,
it's probably not worth turning those stats settings off.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Marlowe 2007-01-10 20:45:31 Re: Performance of PostgreSQL on Windows vs Linux
Previous Message Jeff Frost 2007-01-10 20:33:53 Re: High update activity, PostgreSQL vs BigDBMS