Re: vacuum, performance, and MVCC

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: vacuum, performance, and MVCC
Date: 2006-06-24 21:57:20
Message-ID: 87ejxeb4jj.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martha Stewart called it a Good Thing when JanWieck(at)Yahoo(dot)com (Jan Wieck) wrote:
> On 6/22/2006 2:37 PM, Alvaro Herrera wrote:
>
>> Adding back pgsql-hackers.
>> Mark Woodward wrote:
>>> > Mark Woodward wrote:
>>> >
>>> >> Hmm, OK, then the problem is more serious than I suspected.
>>> >> This means that every index on a row has to be updated on every
>>> >> transaction that modifies that row. Is that correct?
>>> >
>>> > Add an index entry, yes.
>>> >
>>> >> I am attaching some code that shows the problem with regard to
>>> >> applications such as web server session management, when run, each
>>> >> second
>>> >> the system can handle fewer and fewer connections. Here is a brief
>>> >> output:
>>> >> [...]
>>> >> There has to be a more linear way of handling this scenario.
>>> >
>>> > So vacuum the table often.
>>> That fixes the symptom, not the problem. The problem is performance
>>> steadily degrades over time.
>> No, you got it backwards. The performance degradation is the
>> symptom.
>> The problem is that there are too many dead tuples in the table. There
>> is one way to solve that problem -- remove them, which is done by
>> running vacuum.
>
> Precisely.
>
>> There are some problems with vacuum itself, that I agree with. For
>> example it would be good if a long-running vacuum wouldn't affect a
>> vacuum running in another table because of the long-running transaction
>> effect it has. It would be good if vacuum could be run partially over a
>> table. It would be good if there was a way to speed up vacuum by using
>> a dead space map or something.
>
> It would be good if vacuum wouldn't waste time on blocks that don't
> have any possible work in them. Vacuum has two main purposes. A)
> remove dead rows and B) freeze xids. Once a block has zero deleted
> rows and all xids are frozen, there is nothing to do with this block
> and vacuum should skip it until a transaction updates that block.
>
> This requires 2 bits per block, which is 32K per 1G segment of a
> heap. Clearing the bits is done when the block is marked dirty. This
> way vacuum would not waste any time and IO on huge slow changing
> tables. That part, sequentially scanning huge tables that didn't
> change much is what keeps us from running vacuum every couple of
> seconds.

This is, in effect, the "VACUUM Space Map."

I see one unfortunate thing about that representation of it, namely
that it would in effect require that non-frozen pages be kept on the
VSM for potentially a long time.

Based on *present* VACUUM strategy, at least.

Would it not be the case, here, that any time a page could be
"frozen," it would have to be? In effect, we are always trying to run
VACUUM FREEZE?
--
output = ("cbbrowne" "@" "gmail.com")
http://cbbrowne.com/info/finances.html
Rules of the Evil Overlord #72. "If all the heroes are standing
together around a strange device and begin to taunt me, I will pull
out a conventional weapon instead of using my unstoppable superweapon
on them. <http://www.eviloverlord.com/>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2006-06-24 22:05:45 Re: Buffer for inner and outer table
Previous Message Daniel Xavier de Sousa 2006-06-24 21:41:11 Re: Buffer for inner and outer table