Autovacuum to-do list

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Autovacuum to-do list
Date: 2005-07-28 00:22:54
Message-ID: 20050728002254.GS1832@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hackers,

This is a list of things people mentioned as interesting to do for
vacuum/autovacuum, during the last autovacuum discussion thread. Some
of them are wishful thinking, others are doable.

Neil, Gavin: both of you mentioned that you didn't like autovacuum as a
long term solution. May I ask you if you have more suggestions for this
list?

* Enable autovacuum by default.

Get some field experience with it first, so the worst bugs are covered.
(Has anybody tested it?)

* Add per-table vacuum delay parameters.

* Integrate the FSM with autovacuum.

Maybe invent the concept of a "dead space map". This could be a bitmap per
heap segment, where we keep a bit set for each page in the segment that has
dead tuples.

* Make the decision taking about what database to vacuum be smarter.

Right now, we only consider what database was least recently vacuumed.
We could have a per-database counter of dead tuples in pgstats; we have
to figure out a way to use that and not cause starvation for less-used
databases.

* Make XID wraparound issues be determined on a per-table basis.

* Implement some sort of maintenance window where vacuum policy would be
more aggresive. Maybe the reverse: specify some hours at which vacuum
should not run at all.

One alternative for this is to suggest changing autovacuum parameters from
a script to be run by cron or pgAgent.

* Implement partial vacuums.

A partial vacuum would scan only a portion of the table looking for dead
tuples, then stop. Or maybe not partial, but instead vacuum a portion,
then stop, release locks, sleep for a while, then continue with the rest.

* Have autovacuum daemons per-database or per-tablespace.

* Use REINDEX instead of updating the index for each tuple, if the dead
tuple count is too high.

* Stop a running VACUUM if the system load is too high.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"La verdad no siempre es bonita, pero el hambre de ella sí"


From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum to-do list
Date: 2005-07-28 01:42:22
Message-ID: Pine.LNX.4.58.0507281034240.32271@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 27 Jul 2005, Alvaro Herrera wrote:

> Hackers,
>
> This is a list of things people mentioned as interesting to do for
> vacuum/autovacuum, during the last autovacuum discussion thread. Some
> of them are wishful thinking, others are doable.
>
> Neil, Gavin: both of you mentioned that you didn't like autovacuum as a
> long term solution. May I ask you if you have more suggestions for this
> list?

My argument against autovacuum integration was more about the fact that we
shouldn't automate vacuuming in the backend until the cost of vacuum
reduces significantly. The fact is that it has been integrated so I think
the following might be worth considering:

* Change how we track the need to vacuum

I don't really think the use of row level stats are the best
indicator for a need to vacuum. I wonder if we could keep a
counter for each segment (ie, every 1 GB file backing a heap or
index) or file which indicates the number of blocks which have been
dirtied since the last vacuum. The user would specify a threshold
of X blocks and once we reached that we would invoke vacuum.

This means we can do without row level stats, I think, at the
expense of having to maintain state in shared memory for each
table and index (or segment). I'm not sure if that is a good
trade off or not but it reminds me of counting cards in blackjack
-- and good things always come of that ;-).

That's all I can think of at the moment.

Thanks,

Gavin


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum to-do list
Date: 2005-07-28 02:51:12
Message-ID: 20050728025112.GW1832@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 28, 2005 at 11:42:22AM +1000, Gavin Sherry wrote:
> On Wed, 27 Jul 2005, Alvaro Herrera wrote:
>
> * Change how we track the need to vacuum
>
> I don't really think the use of row level stats are the best
> indicator for a need to vacuum. I wonder if we could keep a
> counter for each segment (ie, every 1 GB file backing a heap or
> index) or file which indicates the number of blocks which have been
> dirtied since the last vacuum. The user would specify a threshold
> of X blocks and once we reached that we would invoke vacuum.

Interesting approach. I don't think it's too difficult to implement.
We just need a new stat message indicating that block N was dirtied
because of new or deleted tuples (hint bit changes need not apply)

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"I love the Postgres community. It's all about doing things _properly_. :-)"
(David Garamond)


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Autovacuum to-do list
Date: 2005-08-01 21:15:33
Message-ID: 42EE90F5.7000209@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All the items you mentioned look like 8.2 issues to me. But here are
some thoughts.

Alvaro Herrera wrote:

>* Enable autovacuum by default.
>
> Get some field experience with it first, so the worst bugs are covered.
> (Has anybody tested it?)
>
>

I have done some testing and it seems to be working ok. I am planning
on doing some more in depth tests this week.

>* Stop a running VACUUM if the system load is too high.
>

What if vacuum used a vacuum delay that was equal to the vacuum delay
GUC settings * the system load. Or something more sophisticated but
this would have the effect of having vacuum automatically throttle down
when the system gets busy and throttle back up when the system gets
quiet. We would probably set some min / max values but it sounds
interesting, thoughts?


From: Andrew - Supernews <andrew+nonews(at)supernews(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum to-do list
Date: 2005-08-01 22:34:21
Message-ID: slrndet8rd.bj3.andrew+nonews@trinity.supernews.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2005-08-01, "Matthew T. O'Connor" <matthew(at)zeut(dot)net> wrote:
>>* Stop a running VACUUM if the system load is too high.
>
> What if vacuum used a vacuum delay that was equal to the vacuum delay
> GUC settings * the system load. Or something more sophisticated but
> this would have the effect of having vacuum automatically throttle down
> when the system gets busy and throttle back up when the system gets
> quiet. We would probably set some min / max values but it sounds
> interesting, thoughts?

I'd be very concerned about feedback loop stability; insufficient
vacuuming can increase the system load, causing vacuum to get further
behind...

--
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


From: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum to-do list
Date: 2005-08-02 02:22:14
Message-ID: 42EED8D6.6050000@zeut.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew - Supernews wrote:

>On 2005-08-01, "Matthew T. O'Connor" <matthew(at)zeut(dot)net> wrote:
>
>
>>>* Stop a running VACUUM if the system load is too high.
>>>
>>>
>>What if vacuum used a vacuum delay that was equal to the vacuum delay
>>GUC settings * the system load. Or something more sophisticated but
>>this would have the effect of having vacuum automatically throttle down
>>when the system gets busy and throttle back up when the system gets
>>quiet. We would probably set some min / max values but it sounds
>>interesting, thoughts?
>>
>>
>
>I'd be very concerned about feedback loop stability; insufficient
>vacuuming can increase the system load, causing vacuum to get further
>behind...
>

Right which is why we would need to enforce some max value so that
vacuuming will never be totally squeezed out.


From: Jeff MacDonald <jam(at)zoidtechnologies(dot)com>
To: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>
Cc: andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum to-do list
Date: 2005-08-02 02:35:00
Message-ID: 20050802023500.GL16160@zoidtechnologies.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 01, 2005 at 10:22:14PM -0400, Matthew T. O'Connor wrote:
[..snipped..]
>
> Right which is why we would need to enforce some max value so that
> vacuuming will never be totally squeezed out.
>

greetings,

I'm a linux guy, so please forgive my ignorance, but is it even possible to
determine load average on win32?

regards,
J


From: mark(at)mark(dot)mielke(dot)cc
To: Jeff MacDonald <jam(at)zoidtechnologies(dot)com>
Cc: "Matthew T(dot) O'Connor" <matthew(at)zeut(dot)net>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Autovacuum to-do list
Date: 2005-08-02 04:55:28
Message-ID: 20050802045528.GA10745@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 01, 2005 at 10:35:00PM -0400, Jeff MacDonald wrote:
> On Mon, Aug 01, 2005 at 10:22:14PM -0400, Matthew T. O'Connor wrote:
> [..snipped..]
> > Right which is why we would need to enforce some max value so that
> > vacuuming will never be totally squeezed out.
> I'm a linux guy, so please forgive my ignorance, but is it even possible to
> determine load average on win32?

Oooo... That's a lot of win32 ignorance to ignore... :-)

Push control-alt-delete and look under "Performance". I believe
Windows may even keep *more* information that Linux. It's all a
question of figuring out what the Win32 API calls are to get what you
are looking for. Most concepts found in one system are also found in
the other. I would assume yes before no.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/