Re: Database-wide VACUUM ANALYZE

Lists: pgsql-performance
From: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Database-wide VACUUM ANALYZE
Date: 2007-06-21 17:09:57
Message-ID: 357fa7590706211009w2ef6071fn83c5759db6be1e80@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

We recently upgraded a very large database (~550 GB) from 8.1.4 to 8.2.4 via
a pg_dump and pg_restore. (Note that the restore took several days.) We
had accepted the default settings:

vacuum_freeze_min_age = 100 million
autovacuum_freeze_max_age = 200 million

Due to our very high transaction rate, it appears that a database-wide
vacuum kicked off approximately 2 weeks after the restore. (Aside: after
reading the docs and considering our system characteristics, I know now that
our autovacuum_freeze_max_age should be more like 2 billion. However on
this machine I haven't changed the config settings yet.) Also, I believe,
that due to the bulk of our data having the same "age" after the restore,
the db-wide vacuum had *a lot* of rows to mark with the FrozenXID.

The good thing is that the db-wide vacuum, which ran for a long time, was
reasonably non-intrusive to other database activity (somewhat, but
reasonable for the short term). The other good thing was that concurrent
autovacuum processes were still vacuuming/analyzing tables as necessary.

The bad thing, which I don't totally understand from reading the docs, is
that another db-wide vacuum kicked off exactly 24 hours after the first
db-wide vacuum kicked off, before the first one had finished. (Note that
these vacuums seem to go through the tables alphabetically.) I managed to
explain this to myself in that there were still rows in tables not yet
touched by the first db-wide vacuum that could have XIDs older than
autovacuum_freeze_max_age. Fine, so two db-wide vacuums were now taking
place, one behind the other.

The first db-wide vacuum finished approximately 36 hours after it started.
At this point I was convinced that the second db-wide vacuum would run to
completion with little or no work to do and all would be good. The thing I
can't explain is why a third db-wide vacuum kicked off exactly 24 hours
(again) after the second db-wide vacuum kicked off (and the second vacuum
still running).

Wouldn't the first db-wide vacuum have marked any rows that needed it with
the FrozenXID? Why would a third db-wide vacuum kick off so soon after the
first db-wide vacuum had completed? Surely there haven't been 100 million
more transactions in the last two days?

Can someone explain what is going on here? I can't quite figure it out
based on the docs.

Thanks,
Steve


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 18:33:47
Message-ID: 467AC48B.2050803@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Steven Flatt wrote:
> The bad thing, which I don't totally understand from reading the docs, is
> that another db-wide vacuum kicked off exactly 24 hours after the first
> db-wide vacuum kicked off, before the first one had finished. (Note that
> these vacuums seem to go through the tables alphabetically.) I managed to
> explain this to myself in that there were still rows in tables not yet
> touched by the first db-wide vacuum that could have XIDs older than
> autovacuum_freeze_max_age. Fine, so two db-wide vacuums were now taking
> place, one behind the other.

Are you sure there's no cron job starting the vacuums? 24h sounds too
good to be a coincidence, and there's no magic constant of 24h in the
autovacuum code. Besides, autovacuum can only be running one VACUUM at a
time, so there must be something else launching them.

What's your vacuuming strategy in general, before and after upgrade?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Francisco Reyes <lists(at)stringsutils(dot)com>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 18:58:33
Message-ID: cone.1182452313.269532.11564.5001@35st.simplicato.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Steven Flatt writes:

> Can someone explain what is going on here?  I can't quite figure it out
> based on the docs.

Are you on FreeBSD by any chance?

I think the FreeBSD port by default installs a script that does a daily
vacuum. If using another OS, perhaps you want to see if you used some sort
of package system and if that package added a nightly vacuum.


From: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
To: "Francisco Reyes" <lists(at)stringsutils(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 19:36:00
Message-ID: 357fa7590706211236y2856ed96q5394b9302b9a528f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 6/21/07, Francisco Reyes <lists(at)stringsutils(dot)com> wrote:
>
> Are you on FreeBSD by any chance?
>
> I think the FreeBSD port by default installs a script that does a daily
> vacuum.

Yes, FreeBSD. Do you know what script that is? And it does a db-wide
VACUUM ANALYZE every day?! That is certainly not necessary, and in fact,
costly for us.

Hmmm... I wonder why this would just start now, three days ago. Everything
seemed to be normal for the last two weeks.

Steve


From: Larry Rosenman <ler(at)lerctr(dot)org>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: Francisco Reyes <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 19:59:53
Message-ID: 20070621145901.M42326@thebighonker.lerctr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 21 Jun 2007, Steven Flatt wrote:

> On 6/21/07, Francisco Reyes <lists(at)stringsutils(dot)com> wrote:
>>
>> Are you on FreeBSD by any chance?
>>
>> I think the FreeBSD port by default installs a script that does a daily
>> vacuum.
>
>
> Yes, FreeBSD. Do you know what script that is? And it does a db-wide
> VACUUM ANALYZE every day?! That is certainly not necessary, and in fact,
> costly for us.
>
> Hmmm... I wonder why this would just start now, three days ago. Everything
> seemed to be normal for the last two weeks.
>
The current FreeBSD port places the script in:

/usr/local/etc/periodic/daily/502.pgsql

And it can be controlled from /etc/periodic.conf

See the top of that script.

LER

> Steve
>

--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler(at)lerctr(dot)org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


From: Bill Moran <wmoran(at)collaborativefusion(dot)com>
To: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
Cc: "Francisco Reyes" <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 20:07:32
Message-ID: 20070621160732.58497e2b.wmoran@collaborativefusion.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

In response to "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>:

> On 6/21/07, Francisco Reyes <lists(at)stringsutils(dot)com> wrote:
> >
> > Are you on FreeBSD by any chance?
> >
> > I think the FreeBSD port by default installs a script that does a daily
> > vacuum.
>
>
> Yes, FreeBSD. Do you know what script that is?

/usr/local/etc/periodic/daily/502.pgsql

> And it does a db-wide
> VACUUM ANALYZE every day?! That is certainly not necessary, and in fact,
> costly for us.

You can control it with knobs in /etc/periodic.conf (just like other
periodic job):
daily_pgsql_vacuum_enable="YES"
daily_pgsql_backup_enable="NO"

are the defaults.

> Hmmm... I wonder why this would just start now, three days ago. Everything
> seemed to be normal for the last two weeks.

Someone alter /etc/periodic.conf? Perhaps it's been running all along but
you never noticed it before now?

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran(at)collaborativefusion(dot)com
Phone: 412-422-3463x4023


From: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
To: "Bill Moran" <wmoran(at)collaborativefusion(dot)com>
Cc: "Francisco Reyes" <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-21 20:37:49
Message-ID: 357fa7590706211337i55e9b20nea5a1bd4c5e41268@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks everyone. It appears that we had hacked the 502.pgsql script for our
8.1 build to disable the daily vacuum. I was not aware of this when
building and upgrading to 8.2.

So it looks like for the past two weeks, that 36 hour db-wide vacuum has
been running every 24 hours. Good for it for being reasonably non-intrusive
and going unnoticed until now. :)

Although apparently not related anymore, I still think it was a good move to
change autovacuum_freeze_max_age from 200 million to 2 billion.

Steve


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: Bill Moran <wmoran(at)collaborativefusion(dot)com>, Francisco Reyes <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-22 13:55:04
Message-ID: 20070622135504.GA8949@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Steven Flatt escribió:
> Thanks everyone. It appears that we had hacked the 502.pgsql script for our
> 8.1 build to disable the daily vacuum. I was not aware of this when
> building and upgrading to 8.2.
>
> So it looks like for the past two weeks, that 36 hour db-wide vacuum has
> been running every 24 hours. Good for it for being reasonably non-intrusive
> and going unnoticed until now. :)

Looks like you have plenty of spare I/O ;-)

> Although apparently not related anymore, I still think it was a good move to
> change autovacuum_freeze_max_age from 200 million to 2 billion.

Absolutely not related. Also note that

1. autovacuum is not able (in 8.2 or older) to have more than one task
running

2. autovacuum in 8.2 doesn't ever launch database-wide vacuums. As of
8.2 it only vacuums tables that are in actual danger of Xid
wraparound (according to pg_class.relfrozenxid); tables that had been
vacuumed the day before would not need another vacuum for Xid
purposes (though if you had modified the table to the point that it
needed another vacuum, that would be another matter). Unless you
consumed 200 million (or 2 billion) transactions during the day, that
is.

--
Alvaro Herrera http://www.amazon.com/gp/registry/5ZYLFMCVHXC
"The only difference is that Saddam would kill you on private, where the
Americans will kill you in public" (Mohammad Saleh, 39, a building contractor)


From: Jim Nasby <decibel(at)decibel(dot)org>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: "Bill Moran" <wmoran(at)collaborativefusion(dot)com>, "Francisco Reyes" <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-26 00:00:20
Message-ID: FFD6DF28-B30D-47D0-BFB5-6485B32BF8AC@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Jun 21, 2007, at 3:37 PM, Steven Flatt wrote:
> Thanks everyone. It appears that we had hacked the 502.pgsql
> script for our 8.1 build to disable the daily vacuum. I was not
> aware of this when building and upgrading to 8.2.

Much better to change stuff in a config file than to hack installed
scripts, for this very reason. :)

> So it looks like for the past two weeks, that 36 hour db-wide
> vacuum has been running every 24 hours. Good for it for being
> reasonably non-intrusive and going unnoticed until now. :)
>
> Although apparently not related anymore, I still think it was a
> good move to change autovacuum_freeze_max_age from 200 million to 2
> billion.

If you set that to 2B, that means you're 2^31-"2 billion"-1000000
transactions away from a shutdown when autovac finally gets around to
trying to run a wraparound vacuum on a table. If you have any number
of large tables, that could be a big problem, as autovac could get
tied up on a large table for a long enough period that the table
needing to be frozen doesn't get frozen in time.

I suspect 1B is a much better setting. I probably wouldn't go past 1.5B.
--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: "Steven Flatt" <steven(dot)flatt(at)gmail(dot)com>
To: "Jim Nasby" <decibel(at)decibel(dot)org>
Cc: "Bill Moran" <wmoran(at)collaborativefusion(dot)com>, "Francisco Reyes" <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-26 17:25:44
Message-ID: 357fa7590706261025l5b9cf016p7c61505643aabb9c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 6/25/07, Jim Nasby <decibel(at)decibel(dot)org> wrote:
>
> If you set that to 2B, that means you're 2^31-"2 billion"-1000000
> transactions away from a shutdown when autovac finally gets around to
> trying to run a wraparound vacuum on a table. If you have any number
> of large tables, that could be a big problem, as autovac could get
> tied up on a large table for a long enough period that the table
> needing to be frozen doesn't get frozen in time.
>
> I suspect 1B is a much better setting. I probably wouldn't go past 1.5B.

From my understanding of the docs, for tables that are not otherwise
vacuumed, autovac will be invoked on it once every autovacuum_freeze_max_age
minus vacuum_freeze_min_age transactions. In our case that's 2 billion -
100 million = 1.9 billion transactions. So when an autovac finally kicks
off on an otherwise non-vacuumed table, we are (2^31 - 1.9 billion) - 1
million =~ 250 million transactions away from shutdown. (I guess that's
close to what you were saying.)

Most of our large (partitioned) tables are insert-only (truncated
eventually) so will not be touched by autovacuum until wraparound prevention
kicks in. However the tables are partitioned by timestamp so tables will
cross the 1.9 billion marker at different times (some not at all, as the
data will have been truncated).

Do you still think the 250 million transactions away from shutdown is
cutting it too close? Recall that the unintentional db-wide vacuum analyze
that was going on last week on our system took less than two days to
complete.

Steve


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Steven Flatt <steven(dot)flatt(at)gmail(dot)com>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, Bill Moran <wmoran(at)collaborativefusion(dot)com>, Francisco Reyes <lists(at)stringsutils(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Database-wide VACUUM ANALYZE
Date: 2007-06-26 18:49:06
Message-ID: 20070626184906.GE11609@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Steven Flatt escribió:

> Most of our large (partitioned) tables are insert-only (truncated
> eventually) so will not be touched by autovacuum until wraparound prevention
> kicks in. However the tables are partitioned by timestamp so tables will
> cross the 1.9 billion marker at different times (some not at all, as the
> data will have been truncated).

Note that as of 8.3, tables that are truncated do not need vacuuming for
Xid wraparound purposes, because the counter is updated on TRUNCATE (as
it is on CLUSTER and certain forms of ALTER TABLE).

> Do you still think the 250 million transactions away from shutdown is
> cutting it too close? Recall that the unintentional db-wide vacuum analyze
> that was going on last week on our system took less than two days to
> complete.

Is this 8.1 or 8.2? In the latter you don't ever need db-wide vacuums
at all, because Xid wraparound is tracked per table, so only tables
actually needing vacuum are processed. To answer your question, the
followup question is how many transactions normally take place in two
days. If they are way less than 250 million then you don't need to
worry. Otherwise, the database may shut itself down to protect from Xid
wraparound.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"La soledad es compañía"