Re: possible vacuum improvement?

From: Barry Lind <barry(at)xythos(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: shridhar_daithankar(at)persistent(dot)co(dot)in, pgsql-hackers(at)postgresql(dot)org
Subject: Re: possible vacuum improvement?
Date: 2002-09-03 18:15:32
Message-ID: 3D74FC44.9010909@xythos.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Wouldn't it make sense to implement autovacuum information in a struture
like the FSM, a Dirty Space Map (DSM)? As blocks are dirtied by
transactions they can be added to the DSM. Then vacuum can give
priority processing to those blocks only. The reason I suggest this is
that in many usage senerios it will be more efficient to only vacuum
part of a table than the entire table. Given a large table that grows
over time, it tends to be the case that older data in the table becomes
more static as it ages (a lot of financial data is like this, when it is
initially created it may get a lot of updates done early in it's life
and may even be deleted, but once the data gets older (for example a
year old), it is unlikely to change). This would imply that over time
the first blocks in a table will change less and most activity will
occur towards the end of the table. If you have a multigig table, where
most of the activity occurs near the end, a lot of cpu cycles can be
wasted going over the mostly static begining of the table.

thanks,
--Barry

Tom Lane wrote:

>"Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in> writes:
>
>
>>1)Is this sounds like a workable solution?
>>
>>
>
>Adding a trigger to every tuple update won't do at all. Storing the
>counts in a table won't do either, as the updates on that table will
>generate a huge amount of wasted space themselves (not to mention
>enough contention to destroy concurrent performance).
>
>
>
>>4)Is use of threads sounds portable enough?
>>
>>
>
>Threads are completely out of the question, at least if you have any
>hope of seeing this code get accepted into the core distro.
>
>
>For vacuum's purposes all that we really care to know about is the
>number of obsoleted tuples in each table: committed deletes and updates,
>and aborted inserts and updates all count. Furthermore, we do not need
>or want a 100% reliable solution; approximate counts would be plenty
>good enough.
>
>What I had in the back of my mind was: each backend counts attempted
>insertions and deletions in its relcache entries (an update adds to both
>counts). At transaction commit or abort, we know which of these two
>counts represents the number of dead tuples added to each relation, so
>while we scan the relcache for post-xact cleanup (which we will be doing
>anyway) we can transfer the correct count into the shared FSM entry for
>the relation. This gives us a reasonably accurate count in shared
>memory of all the tuple obsoletions since bootup, at least for
>heavily-used tables. (The FSM might choose to forget about lightly-used
>tables.) The auto vacuumer could look at the FSM numbers to decide
>which tables are highest priority to vacuum.
>
>This scheme would lose the count info on a database restart, but that
>doesn't bother me. In typical scenarios the same tables will soon get
>enough new counts to be highly ranked for vacuuming. In any case the
>auto vacuumer must be designed so that it vacuums every table every so
>often anyhow, so the possibility of forgetting that there were some dead
>tuples in a given table isn't catastrophic.
>
>I do not think we need or want a control table for this; certainly I see
>no need for per-table manual control over this process. There should
>probably be a few knobs in the form of GUC parameters so that the admin
>can control how much overall work the auto-vacuumer does. For instance
>you'd probably like to turn it off when under peak interactive load.
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>
>
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2002-09-03 18:53:03 Re: CREATE CAST requires immutable cast function?
Previous Message Joe Conway 2002-09-03 18:14:02 Re: Memory management question