Re: Second attempt, roll your own autovacuum

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Second attempt, roll your own autovacuum
Date: 2006-12-19 14:04:14
Message-ID: 87ac1kkmn5.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

In an attempt to throw the authorities off his trail, tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane) transmitted:
> Glen Parker <glenebob(at)nwlink(dot)com> writes:
>> I am still trying to roll my own auto vacuum thingy.
>
> Um, is this purely for hack value? What is it that you find inadequate
> about regular autovacuum? It is configurable through the pg_autovacuum
> catalog --- which I'd be the first to agree is a sucky user interface,
> but we're not going to set the user interface in concrete until we are
> pretty confident it's feature-complete. So: what do you see missing?

I think that about a year ago I proposed a more sophisticated approach
to autovacuum; one part of it was to set up a "request queue," a table
where vacuum requests would get added.

There's some "producer" side stuff:

- There could be tables you want to vacuum exceedingly frequently;
those could get added periodically via something shaped like cron.

- One could ask for all the tables in a given database to be added to
the queue, so as to mean that all tables would get vacuumed every so
often.

- You might even inject requests 'quasi-manually', asking for the
queue to do work on particular tables.

There's some "policy side" stuff:

- Rules might be put in place to eliminate certain tables from the
queue, providing some intelligence as to what oughtn't get vacuumed

Then there's the "consumer":

- The obvious "dumb" approach is simply to have one connection that
runs through the queue, pulling the eldest entry, vacuuming, and
marking it done.

- The obvious extension is that if a table is listed multiple times in
the queue, it only need be processed once.

- There might be time-based exclusions to the effect that large tables
oughtn't be processed during certain periods (backup time?)

- One might have *two* consumers, one that will only process small
tables, so that those little, frequently updated tables can get
handled quickly, and another consumer that does larger tables.
Or perhaps that knows that it's fine, between 04:00 and 09:00 UTC,
to have 6 consumers, and blow through a lot of larger tables
simultaneously.

After all, changes in 8.2 mean that concurrent vacuums don't block
one another from cleaning out dead content.

I went as far as scripting up the simplest form of this, with
"injector" and queue and the "dumb consumer." Gave up because it
wasn't that much better than what we already had.
--
output = reverse("moc.liamg" "@" "enworbbc")
http://linuxfinances.info/info/
Minds, like parachutes, only function when they are open.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Jeff Amiel 2006-12-19 14:20:23 MAGIC_MODULE and libc
Previous Message Lincoln Yeoh 2006-12-19 14:03:09 Re: Let's play bash the search engine

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2006-12-19 14:09:32 Re: pg_restore fails with a custom backup file
Previous Message Peter Eisentraut 2006-12-19 14:02:47 Re: small pg_dump RFE: new --no-prompt (password) option