Real-Time Vacuum Possibility

Lists: pgsql-hackers
From: Rod Taylor <pg(at)rbt(dot)ca>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Real-Time Vacuum Possibility
Date: 2005-03-16 03:57:49
Message-ID: 1110945469.1132.215.camel@home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is a much discussed topic, which always boils down to grabbing
indexes from disk.

At the moment we have bgwriter periodically flushing parts of the dirty
buffers to disk on a periodic basis. Would it be possible to have the
bgwriter take a look at the pages it has, and see if it can do any
vacuum work based on pages it is about to send to disk?

That is, quickly check table pages, prepare a list of tuples to be
cleaned, look at other pages its preparing to write and see if any of
them contain the index tuples for the data in the table pages?

When an update occurs which does not change the key values, the table
page and index pages all become dirty at once (assume key values, like
primary key, didn't change) so I would expect, without any knowledge in
the bgwriter algorithm, the bgwriter to push them all to disk within the
same batch most of the time.

It's a fairly limited case and by no means removes the requirement for
regular vacuums, but for an update heavy structure perhaps it would be
worth while? Even if it could only keep indexes clean it would help.

Just a thought..

--


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Real-Time Vacuum Possibility
Date: 2005-03-16 13:59:09
Message-ID: m3y8cnr8sy.fsf@knuth.knuth.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

pg(at)rbt(dot)ca (Rod Taylor) wrote:
> It's a fairly limited case and by no means removes the requirement for
> regular vacuums, but for an update heavy structure perhaps it would be
> worth while? Even if it could only keep indexes clean it would help.

The problem that persists with this is that it throws in extra
processing at the time that the system is the _most_ busy doing
updates, thereby worsening latency at times when the system may
already be reeling at the load.

I think, as a result, that VACUUM will _have_ to be done
asynchronously.

What strikes me as being a useful approach would be to set up an
LRU-ordered (or perhaps unordered) queue of pages that have had tuples
"killed off" by DELETE or UPDATE.

Thus, a DELETE/UPDATE would add the page the tuple is on to the list.

"VACUUM RECENT CHANGES" (or something of the sort) could walk through
just those pages. Cleaning up indexes would require some further
reads, but that's a given.

This "architecture" would be way more supportive than the present way
vacuum works for tables which are large and which have portions that
see heavy update activity.
--
(format nil "~S(at)~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/lisp.html
Rules of the Evil Overlord #129. "Despite the delicious irony, I will
not force two heroes to fight each other in the arena."
<http://www.eviloverlord.com/>


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Real-Time Vacuum Possibility
Date: 2005-03-17 05:22:41
Message-ID: 39sih1F66n1imU1@individual.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The problem that persists with this is that it throws in extra
processing at the time that the system is the _most_ busy doing
updates, thereby worsening latency at times when the system may
already be reeling at the load.

I think, as a result, that VACUUM will _have_ to be done
asynchronously.

What strikes me as being a useful approach would be to set up an
LRU-ordered (or perhaps unordered) queue of pages that have had tuples
"killed off" by DELETE or UPDATE.

Thus, a DELETE/UPDATE would add the page the tuple is on to the list.

"VACUUM RECENT CHANGES" (or something of the sort) could walk through
just those pages. Cleaning up indexes would require some further
reads, but that's a given.

This "architecture" would be way more supportive than the present way
vacuum works for tables which are large and which have portions that
see heavy update activity.
--
(format nil "~S(at)~S" "cbbrowne" "gmail.com")
http://linuxdatabases.info/info/lisp.html
Rules of the Evil Overlord #129. "Despite the delicious irony, I will
not force two heroes to fight each other in the arena."
<http://www.eviloverlord.com/>


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Christopher Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Real-Time Vacuum Possibility
Date: 2005-03-21 08:04:35
Message-ID: 20050321080435.GK51784@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 17, 2005 at 05:22:41AM +0000, Christopher Browne wrote:
> What strikes me as being a useful approach would be to set up an
> LRU-ordered (or perhaps unordered) queue of pages that have had tuples
> "killed off" by DELETE or UPDATE.

From http://www.postgresql.org/docs/faqs.TODO.html :

* Maintain a map of recently-expired rows

This allows vacuum to reclaim free space without requiring
a sequential scan
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"