Re: autovacuum does not start in HEAD

Lists: pgsql-hackerspgsql-patches
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: autovacuum does not start in HEAD
Date: 2007-04-25 08:27:15
Message-ID: 20070425164613.70BB.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I found that autovacuum launcher does not launch any workers in HEAD.

AFAICS, we track the time to be vaccumed of each database in the following way:

1. In rebuild_database_list(), we initialize avl_dbase->adl_next_worker
with (current_time + autovacuum_naptime / nDBs).
2. In do_start_worker(), we skip database entries that adl_next_worker
is between current_time and current_time + autovacuum_naptime.
3. If there is no jobs in do_start_worker(), we call rebuild_database_list()
to rebuild database entries.

The point is we use the same range (current_time and current_time +
autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
range, and drop all of them at 2 because their values are in the range.
And if there is no database to vacuum, we re-initilaize database list at 3,
then we repeat the cycle.

Or am I missing something?

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum does not start in HEAD
Date: 2007-04-25 13:05:32
Message-ID: 20070425130532.GA4894@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> I found that autovacuum launcher does not launch any workers in HEAD.
>
> AFAICS, we track the time to be vaccumed of each database in the following way:
>
> 1. In rebuild_database_list(), we initialize avl_dbase->adl_next_worker
> with (current_time + autovacuum_naptime / nDBs).
> 2. In do_start_worker(), we skip database entries that adl_next_worker
> is between current_time and current_time + autovacuum_naptime.
> 3. If there is no jobs in do_start_worker(), we call rebuild_database_list()
> to rebuild database entries.
>
> The point is we use the same range (current_time and current_time +
> autovacuum_naptime) at 1 and 2. We set adl_next_worker with values in the
> range, and drop all of them at 2 because their values are in the range.
> And if there is no database to vacuum, we re-initilaize database list at 3,
> then we repeat the cycle.
>
> Or am I missing something?

Note that rebuild_database_list skips databases that don't have stat
entries. Maybe that's what confusing your examination. When the list
is empty, worker are launched only every naptime seconds; and then it'll
also pick only databases with stat entries. All other databases will be
skipped until the max_freeze_age is reached. Right after an initdb or a
WAL replay, all database stats are deleted.

The point of (1) is to spread the starting of workers in the
autovacuum_naptime interval.

The point of (2) is that we don't want to process a database that was
processed too recently (less than autovacuum_naptime seconds ago). This
is useful in the cases where databases are dropped, so the launcher is
awakened earlier than what the schedule would say if the dropped
database were not in the list. It is possible that I confused the
arithmetic in there (because TimestampDifference does not return
negative results so there may be strange corner cases), but the last
time I examined it it was correct.

The point of (3) is to cover the case where there were no databases
being previously autovacuumed and that may now need vacuuming (i.e. just
after a database got its stat entry).

The fact that some databases may not have stat entries tends to confuse
the logic, both in rebuild_database_list and do_start_worker. If it's
not documented enough maybe it needs extra clarification in code
comments.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org, alvherre(at)commandprompt(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] autovacuum does not start in HEAD
Date: 2007-04-26 01:26:38
Message-ID: 20070426102316.654D.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I wrote:
> I found that autovacuum launcher does not launch any workers in HEAD.

The attached autovacuum-fix.patch could fix the problem. I changed
to use 'greater or equal' instead of 'greater' at the decision of
next autovacuum target.

The point was in the resolution of timer; There is a platform that timer
has only a resolution of milliseconds. We initialize adl_next_worker with
current_time in rebuild_database_list(), but we could use again the same
value in do_start_worker(), because there is no measurable difference
in those low-resolution-platforms.

Another attached patch, autovacuum-debug.patch, is just for printf-debug.
I got the following logs without fix -- autovacuum never works.

# SELECT oid, datname FROM pg_database ORDER BY oid;
oid | datname
-------+-----------
1 | template1
11494 | template0
11495 | postgres
16384 | bench
(4 rows)

# pgbench bench -s1 -c1 -t100000
[with configurations of autovacuum_naptime = 10s and log_min_messages = debug1]

LOG: do_start_worker skip : 230863399.250000, 230863399.250000, 230863409.250000
LOG: rebuild_database_list: db=11495, time=230863404.250000
LOG: rebuild_database_list: db=16384, time=230863409.250000
DEBUG: autovacuum: processing database "bench"
LOG: do_start_worker skip : 230863404.250000, 230863404.250000, 230863414.250000
LOG: do_start_worker skip : 230863404.250000, 230863409.250000, 230863414.250000
LOG: rebuild_database_list: db=11495, time=230863409.250000
LOG: rebuild_database_list: db=16384, time=230863414.250000
LOG: do_start_worker skip : 230863409.250000, 230863409.250000, 230863419.250000
LOG: do_start_worker skip : 230863409.250000, 230863414.250000, 230863419.250000
LOG: rebuild_database_list: db=11495, time=230863414.250000
LOG: rebuild_database_list: db=16384, time=230863419.250000
...
(no autovacuum activities forever)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
autovacuum-debug.patch application/octet-stream 1.4 KB
autovacuum-fix.patch application/octet-stream 685 bytes

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, alvherre(at)commandprompt(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] autovacuum does not start in HEAD
Date: 2007-04-27 02:57:37
Message-ID: 200704270257.l3R2vbh13624@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.
>
> The point was in the resolution of timer; There is a platform that timer
> has only a resolution of milliseconds. We initialize adl_next_worker with
> current_time in rebuild_database_list(), but we could use again the same
> value in do_start_worker(), because there is no measurable difference
> in those low-resolution-platforms.
>
>
> Another attached patch, autovacuum-debug.patch, is just for printf-debug.
> I got the following logs without fix -- autovacuum never works.
>
> # SELECT oid, datname FROM pg_database ORDER BY oid;
> oid | datname
> -------+-----------
> 1 | template1
> 11494 | template0
> 11495 | postgres
> 16384 | bench
> (4 rows)
>
> # pgbench bench -s1 -c1 -t100000
> [with configurations of autovacuum_naptime = 10s and log_min_messages = debug1]
>
> LOG: do_start_worker skip : 230863399.250000, 230863399.250000, 230863409.250000
> LOG: rebuild_database_list: db=11495, time=230863404.250000
> LOG: rebuild_database_list: db=16384, time=230863409.250000
> DEBUG: autovacuum: processing database "bench"
> LOG: do_start_worker skip : 230863404.250000, 230863404.250000, 230863414.250000
> LOG: do_start_worker skip : 230863404.250000, 230863409.250000, 230863414.250000
> LOG: rebuild_database_list: db=11495, time=230863409.250000
> LOG: rebuild_database_list: db=16384, time=230863414.250000
> LOG: do_start_worker skip : 230863409.250000, 230863409.250000, 230863419.250000
> LOG: do_start_worker skip : 230863409.250000, 230863414.250000, 230863419.250000
> LOG: rebuild_database_list: db=11495, time=230863414.250000
> LOG: rebuild_database_list: db=16384, time=230863419.250000
> ...
> (no autovacuum activities forever)
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>

[ Attachment, skipping... ]

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum does not start in HEAD
Date: 2007-05-02 01:57:44
Message-ID: 20070502015744.GI5867@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.

I developed a different fix, which is possible due to the addition of
TimestampDifferenceExceeds to the TimestampTz API. (Thanks Tom).

It continues to work for me here, but please confirm that it fixes the
bug you reported -- I don't have a low-resolution platform handy.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Attachment Content-Type Size
autovac-timestamp.patch text/x-diff 3.4 KB

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum does not start in HEAD
Date: 2007-05-02 18:44:25
Message-ID: 20070502184425.GC3766@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> I wrote:
> > I found that autovacuum launcher does not launch any workers in HEAD.
>
> The attached autovacuum-fix.patch could fix the problem. I changed
> to use 'greater or equal' instead of 'greater' at the decision of
> next autovacuum target.

I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
Please retest.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum does not start in HEAD
Date: 2007-05-07 04:43:05
Message-ID: 20070507125824.8850.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> ITAGAKI Takahiro wrote:
> > > I found that autovacuum launcher does not launch any workers in HEAD.
> >
> > The attached autovacuum-fix.patch could fix the problem. I changed
> > to use 'greater or equal' instead of 'greater' at the decision of
> > next autovacuum target.
>
> I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
> Please retest.

HEAD (r1.45) is still broken. We skip entries using the test
adl_next_worker - autovacuum_naptime < current_time <= adl_next_worker,
but the second inequation should be
adl_next_worker - autovacuum_naptime < current_time < adl_next_worker,
because adl_next_worker can equal current_time.

@@ -1036,8 +1036,8 @@
* Skip this database if its next_worker value falls between
* the current time and the current time plus naptime.
*/
- if (TimestampDifferenceExceeds(current_time,
- dbp->adl_next_worker, 0) &&
+ if (!TimestampDifferenceExceeds(dbp->adl_next_worker,
+ current_time, 0) &&
!TimestampDifferenceExceeds(current_time,
dbp->adl_next_worker,
autovacuum_naptime * 1000))

By the way, why do we need the upper bounds to decide a next target?
Can we use simplify it to "current_time < adl_next_worker"?

@@ -1033,16 +1033,11 @@
if (dbp->adl_datid == tmp->adw_datid)
{
/*
- * Skip this database if its next_worker value falls between
- * the current time and the current time plus naptime.
+ * Skip this database if its next_worker value is later than
+ * the current time.
*/
- if (TimestampDifferenceExceeds(current_time,
- dbp->adl_next_worker, 0) &&
- !TimestampDifferenceExceeds(current_time,
- dbp->adl_next_worker,
- autovacuum_naptime * 1000))
- skipit = true;
-
+ skipit = !TimestampDifferenceExceeds(dbp->adl_next_worker,
+ current_time, 0);
break;
}
elem = DLGetPred(elem);

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: autovacuum does not start in HEAD
Date: 2007-05-07 19:26:05
Message-ID: 20070507192605.GP3939@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

ITAGAKI Takahiro wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
> > ITAGAKI Takahiro wrote:
> > > > I found that autovacuum launcher does not launch any workers in HEAD.
> > >
> > > The attached autovacuum-fix.patch could fix the problem. I changed
> > > to use 'greater or equal' instead of 'greater' at the decision of
> > > next autovacuum target.
> >
> > I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
> > Please retest.
>
> HEAD (r1.45) is still broken. We skip entries using the test
> adl_next_worker - autovacuum_naptime < current_time <= adl_next_worker,
> but the second inequation should be
> adl_next_worker - autovacuum_naptime < current_time < adl_next_worker,
> because adl_next_worker can equal current_time.

Ok, I'll change this.

> By the way, why do we need the upper bounds to decide a next target?
> Can we use simplify it to "current_time < adl_next_worker"?

No, we can't take that check out, because otherwise a database could be
skipped forever if it happens to fall behind for some reason (for
example when a new database is created and autovac decides to work on
that one instead of the one that was scheduled).

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support