Block B-Tree concept

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Block B-Tree concept
Date: 2006-09-26 10:16:54
Message-ID: 4518FE16.1020507@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been experimenting with the idea of a so-called Block B-Tree. The
basic idea is that instead of storing an index tuple for each heap
tuple, we store an index tuple for each heap block. This dramatically
reduces the size of an index, leading to savings on I/O. This idea was
briefly discussed in January:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00565.php

To make it actually work, the semantics of the B-Tree has been modified
so that every index tuple represents 1 or more heap tuples that fall
within some range of values, and are on the same heap page. The range
that an index tuple represents is from X, inclusive, to Y, exclusive,
where X is the key of the index tuple and Y is the key of the *next*
index tuple in the index. If the heap is in index order (as after
CLUSTER), we get a very compact index this way, effectively eliminating
the leaf level of the B-tree.

To locate the actual matching items on the heap page, we have to scan
the heap page because we don't have the item ids, so this is a tradeoff
between CPU and I/O. However, we could have a hybrid where we initially
store heap tuple pointers like we do now, and when there's more than X
consecutive pointers to the same page, we collapse them to just one
pointer to the whole page. This would potentially give us the best of
both worlds.

This design is more flexible and less invasive than true
index-organized-tables, because it doesn't require perfect ordering of
the heap or moving heap tuples around. You can also define than one
Block B-Tree on a table, though you wouldn't get much benefit from using
Block B-Tree instead of normal B-Tree if there's no correlation between
the index order and the heap order.

It also fits in nicely with the bitmap scans, since there's already
support for "lossy" bitmap pages. Doing normal ordered index scans
requires some coding, but can be done.

Thoughts? I'm thinking of getting this in early in the 8.3 cycle. We'll
have to take a look at the indexam API to support both bitmap indexes
and block B-trees nicely.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-09-26 10:31:08 Re: Phantom Command ID
Previous Message Dave Page 2006-09-26 09:51:53 Re: Buildfarm alarms