Dead Space Map for vacuum

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Dead Space Map for vacuum
Date: 2006-12-28 05:50:07
Message-ID: 20061228142342.5FBD.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

NTT staffs are working on TODO item:
| Create a bitmap of pages that need vacuuming

We call the bitmap "Dead Space Map" (DSM), that allows VACUUM to scan
only pages that need vacuuming or freezing. We'd like to discuss the
design on hackers and make agreements with community.

We implemented the basic parts of it and measured the performance.
As expected, VACUUM took shorter time when the fraction of updates are low.
DSM is useful for large but not so heavily-updated databases. The overhead
of managing DSM seemed to be negligible small in our CPU-loaded tests.

There are a lot of choices in implementation. We followed the descriptions
in TODO list and past discussions in some parts, but did not in other parts
for some reasons. I would appreciate your comments and suggestions.
Also, I'll post a patch to PATCHES. Evaluation reports are welcome.

[Overview of Dead Space Map]

Dead Space Map:
DSM consists of two hash table (chunk hash and relation hash) and
one pool of bitmap chunks. DSM stuff is allocated in shared memory,
but not in shared buffers nor in heaps, like FSM (Free Space Map).

DSM Chunk:
Heaps are managed in 8192 of pages. The unit is called DSM Chunk.
Each chunk has a bitmap that tracks where to vacuum using 1 bit per page.
Pages that potentially contain dead or unfrozen tuples are represented '1'
in the bitmap. VACUUM reads only pages that need vacuuming using the bitmap.
It works well when the fraction of updates are not so high; If there are
too many updates, dead tuples are spread all over the heaps. VACUUM has to
scan almost pages with or without DSM in such cases.

DSM Relation:
There are three status in relation; 'fully tracked', 'partially tracked'
and 'untracked'. Unregistered chunks are treated differently between the
status. For full tracked relation, these chunks are regarded as filled
with '0' (i.e, no needs to vacuum) but with '1' for other status. We can
handle some particular cases using those status; newly created tables,
accidental lost of DSM, out of DSM memory, etc.

[Association with past discussions]

| Instead of sequentially scanning the entire table, have the background writer
| or some other process record pages that have expired rows, then VACUUM can
| look at just those pages rather than the entire table.

Recorders of DSM are backends themselves, not the background writer. Each
backend locks DSM and records the pages whenever they modify tuples in them.
We decided to record DSM before updating heap pages because of keeping sight
of all pages that requires vacuum. If a backend crashs after updating tuples
but before recording to DSM, we may miss the page.

| In the event of a system crash, the bitmap would probably be invalidated.

We bought it for simplicity. Currently, all DSM are lost after crash.
All tables will be untracked status. Full-scanned vacuum is needed
after the lost of DSM.

| One complexity is that index entries still have to be vacuumed, and doing
| this without an index scan (by using the heap values to find the index entry)
| might be slow and unreliable, especially for user-defined index functions.

Indexes are still needed to be full-scanned at the present moment. We are
also researching a retail index vacuum method, but it requires more works.

| http://archives.postgresql.org/pgsql-hackers/2004-03/msg00957.php
| Maintain 2 bits per block that tell if the block has been vaccumed of all
| dead tuples since the last time it was dirtied, and if all its tuples are
| completely frozen.

We use 1 bit per block, so we cannot separate pages that need either
vacuum or freeze only. The reason is that we cannot determine where to
record before/after updated tuples. If the transaction is commited,
before-version should be recorded into needs-vacuum bitmap and
after-version into needs-freeze bitmap. But on rollback, it is inverted.
We cannot judge which we should do until the end of transaction.

| [TODO item] Allow data to be pulled directly from indexes
| Another idea is to maintain a bitmap of heap pages where all rows are
| visible to all backends, and allow index lookups to reference that bitmap
| to avoid heap lookups

It is not done yet, but we can use DSM for this purpose. If the corresponding
bit in DSM is '0', all tuples in the page are frozen and visible to all
backends. We don't have to look up frozen pages only for visibiliby checking.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Galy Lee 2006-12-28 07:09:18 Deadline-Based Vacuum Delay
Previous Message Michael Fuhr 2006-12-28 04:40:54 Re: psql display of Unicode combining characters in 8.2