Re: Bug in VACUUM FULL ?

From: "Pavan Deolasee" <pavan(dot)deolasee(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Pavan Deolasee" <pavan(dot)deolasee(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bug in VACUUM FULL ?
Date: 2007-03-10 11:37:18
Message-ID: 2e78013d0703100337i62b64580oa890ea27a0306cbd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/10/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

>
> I've been banging away on this since yesterday, and I think I've
> achieved a full understanding of what's going on. There are three or
> four different-looking pathologies but they all seem to arise from
> the same problem: the update-chain-moving code assumes that
> RECENTLY_DEAD tuples will never have update successors that are entirely
> DEAD (according to HeapTupleSatisfiesVacuum). When faced with an
> update chain in which that does happen, it can move the chain multiple
> times, neglect to remove index entries for tuples that get truncated
> away, crash on an Assert, or other interesting stuff. Here's an example
> from some debug printouts I inserted into repair_frag:
>
> It's not surprising that tuples could have xmax less than xmin, since
> transactions can commit in orders different than they start; when using
> READ COMMITTED updates it's not at all surprising that a transaction
> might update rows after a later-numbered transaction does. However, in
> looking at this code previously I'd assumed that the OldestXmin cutoff
> could never fall between two such transactions, and so the above
> scenario wouldn't happen. I'm not real sure why I thought that.
> For the cases that VACUUM FULL is interested in, both XIDs mentioned
> in a DEAD tuple must have committed before OldestXmin was computed, but
> there doesn't seem to be a compelling reason why OldestXmin might not
> have been determined by an unrelated third transaction with a number
> between those two.

Sorry for replying late, I was away from the mails since yesterday. I
arrived at the same conclusion yesterday while fixing this for HOT. It
was much more common with HOT and the heavy concurrent
UPDATE test that I was trying. It was very surprising for me
as well. I always believed that we can *never* has a chain like:

RD - RD - D - D - RD - L

But thats not true. Thankfully ISTM that does not have any implication
on HOT, but I shall think more and confirm.

I had to introduce a hard-pruning mechanism to deal this in HOT
where we remove any DEAD_CHAIN tuples from the chain
before doing scan_heap.

I thought this should not be a problem in CVS HEAD though because
we vacuum_page() the block and mark DEAD tuples ~LP_USED.
And later while moving backward in the chain we break whenever
we find a tuple whose xmin < OldtestXmin assuming that to be
the start of the chain. My understanding is that these two things
together anyways break the chain into pieces and we move these
pieces together. But may be I am missing something or there
is some other corner case.

In general, I believe that the most likely cause for earlier reported
errors is that we are failing to clean up one or more index entries
in VACUUM FULL, thus causing all sorts of errors. I had a hard
time fixing this case for HOT.

I believe it's the case that any update chain members appearing before
> a DEAD entry must in fact also be dead (ie, not listed as live in any
> active snapshot) but a test based on OldestXmin hasn't got enough
> resolution to prove that they are dead.

I agree.

Does anyone want to argue that this is an error in the calculation of
> OldestXmin (and if so, how would you propose to fix it)? If not, I'll
> set to work on fixing the chain-moving logic. I think the correct
> behavior is that we shouldn't consider DEAD tuples part of a movable
> chain, and so in the above example there are two separate chains to move
> not one. Alternatively we could try to recognize that the older part of
> the chain is really dead and removable, but that seems complicated and
> likely to introduce new bugs.

Seems like a good idea. But as I said above, my guess is we already
do that while moving backward in the chain. But I might be wrong.

> I wonder whether this has any implications for HOT ...
>
>
Fortunately doesn't seem to be the case. VACUUM LAZY is fine
because we anyways don't declare a tuple DEAD if its in the
middle of a HOT-update chain. For VACUUM FULL, I am
removing any intermediate DEAD tuples, fix the chain and then
move the chain. We can actually do the same even for the
current implementation. This would require changing xmin of
the next tuple (when we remove a DEAD tuple in the chain) so
that xmax/xmin chain is preserved, but we are only changing
from one committed xmin to another committed xmin and hence
should not have any correctness implication.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2007-03-10 11:42:44 Re: Bug in VACUUM FULL ?
Previous Message Gregory Stark 2007-03-10 10:41:20 Recursive Queries and tuplestore.c