'TID index'

Lists: pgsql-hackers
From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: 'TID index'
Date: 2004-09-15 20:02:21
Message-ID: 20040915200221.GQ56059@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I just had a thought that could potentially greatly improve vacuum
performance. What about some kind of TID (or does vacuum use CID?)
index? This would allow vacuum to visit only the pages it needs to
visit. Actually, I guess TID/CID wouldn't even be involved; the only
information needed would be if any tuples on a page have been marked
deleted. Something as simple as a bitmap could work. Storing TID info
might provide added vacuum efficiency, but my guess is it's probably not
worth the extra effort.

This might not help much for tables that just see a lot of random update
activity, but I think it would be very useful for large tables where
pages with dead tuples are likely to be a small percentage of the total
number of pages.

Maintaining this information on a per-transaction basis might prove
difficult to do without causing concurrency issues. Luckily, I think
this could probably be done in the background without much difficulty.
One possibility is to check for dead tuples as pages are written to disk
(actually, by definition, there would have to be dead tuples at that
point I would think). If memory serves writing these pages is now a
background process, so this shouldn't cause contention issues.
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 'TID index'
Date: 2004-09-15 21:56:28
Message-ID: NOEFLCFHBPDAFHEIPGBOMEJCCEAA.simon@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Jim C. Nasby wrote
> I just had a thought that could potentially greatly improve vacuum
> performance. What about some kind of TID (or does vacuum use CID?)
> index? This would allow vacuum to visit only the pages it needs to
> visit. Actually, I guess TID/CID wouldn't even be involved; the only
> information needed would be if any tuples on a page have been marked
> deleted. Something as simple as a bitmap could work. Storing TID info
> might provide added vacuum efficiency, but my guess is it's probably not
> worth the extra effort.
>
> This might not help much for tables that just see a lot of random update
> activity, but I think it would be very useful for large tables where
> pages with dead tuples are likely to be a small percentage of the total
> number of pages.
>

> Maintaining this information on a per-transaction basis might prove
> difficult to do without causing concurrency issues. Luckily, I think
> this could probably be done in the background without much difficulty.
> One possibility is to check for dead tuples as pages are written to disk
> (actually, by definition, there would have to be dead tuples at that
> point I would think). If memory serves writing these pages is now a
> background process, so this shouldn't cause contention issues.

There are many good ideas out there, yet it is almost impossible to find
somebody else to implement yours!

The acid test is to try and write it...

Overall, I agree VACUUM could do with some tuning - and 8.0 has just that.
It needs very careful thought to make sure both concurrency and
recoverability considerations are fully met in any solution you come up
with.

Best regards, Simon Riggs


From: "Jim C(dot) Nasby" <decibel(at)decibel(dot)org>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: 'TID index'
Date: 2004-09-16 01:26:19
Message-ID: 20040916012619.GX56059@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote:
> There are many good ideas out there, yet it is almost impossible to find
> somebody else to implement yours!
>
> The acid test is to try and write it...
>
> Overall, I agree VACUUM could do with some tuning - and 8.0 has just that.
> It needs very careful thought to make sure both concurrency and
> recoverability considerations are fully met in any solution you come up
> with.

Heh, I wasn't even thinking of implentation yet. :) I fully understand
the lack of developers.

Unfortunately, I have very little idea on the internals of PGSQL, and
I'm decidedly not a C coder. I *might* be able to get something hacked
up that stores info in a table (since that would mean all the space
management stuff would be handled for me).

If this is a worthwhile idea can we at least get a TODO? Would it be
useful to come up with a high-level design (something I could probably
do)?
--
Jim C. Nasby, Database Consultant decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"