Re: CLUSTER and MVCC

From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, nagy(at)ecircle-ag(dot)com
Subject: Re: CLUSTER and MVCC
Date: 2007-03-15 15:14:59
Message-ID: 45F962F3.2020804@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> I'm thinking of keeping an in-memory mapping of old and new tids of
>> updated tuples while clustering, instead. That means that cluster
>> requires a little bit of memory for each RECENTLY_DEAD updated tuple. In
>> the worst case that means that you run out of memory if there's too many
>> of those in the table, but I doubt that's going to be a problem in practice.
>
> That is more or less isomorphic to what VACUUM FULL does. While people
> have complained about VACUUM FULL's memory usage on occasion, just at
> the moment I feel that the main problem with it is complexity. If we
> still haven't gotten all the bugs out of VACUUM FULL after more than
> eight years of work on it, what are the odds that we can make CLUSTER
> do it right the first time?

Well, I can't guarantee that there's no bugs.

To copy a chain correctly, we need to correctly detect tuples that have
a t_ctid pointing to a non-dead tuple (non-dead meaning
HeapTupleSatisfiesVacuum(tuple) != DEAD), and tuples that are being
pointed to by a non-dead tuple. If we incorrectly detect that a tuple
belongs to either of those categories, when in fact it doesn't, we don't
corrupt anything, but we waste a little bit of memory memorizing the
tuple unnecessarily.

To detect tuples in the first category, we need to check that xmax of
the tuple isn't invalid, and t_ctid doesn't point to itself.

To detect tuples in the second category, we need to check that xmin
isn't invalid, and is greater than OldestXmin.

With both categories correctly identified, it's just a matter of mapping
old ctids to corresponding tids in the new heap.

Unlike in my first proposal, if something nevertheless goes wrong in
detecting the chains, we only lose the chaining between the tuples, but
we don't otherwise lose any data. The latest version of each row is fine
anyway. I think this approach is pretty robust, and it fails in a good way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-03-15 15:30:53 Re: [PATCHES] Bitmapscan changes
Previous Message Tom Lane 2007-03-15 14:55:11 Re: CLUSTER and MVCC