Re: Proposal: Another attempt at vacuum improvements

From: Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Proposal: Another attempt at vacuum improvements
Date: 2011-06-08 05:19:14
Message-ID: BANLkTi=6kR01m0Oe9vFknB6M3fsDwDO6Zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 26, 2011 at 4:10 PM, Pavan Deolasee
<pavan(dot)deolasee(at)gmail(dot)com> wrote:

>
> So are there any other objections/suggestions ? Anyone else cares to
> look at the brief design that we discussed above ? Otherwise, I would
> go ahead and work on this in the coming days. Of course, I will keep
> the list posted about any new issues that I see.
>

I went on to create a WIP patch based on our discussion. There are
couple of issues that I stumbled upon while testing it.

1. The start-of-index-vacuum LSN that we want to track must be noted
even before the heap scan is started. This is because we must be
absolutely sure that the index vacuum removes index pointers to all
dead line pointers generated by any operation with LSN less than the
start-of-index-vacuum LSN. If we don't remember the LSN before heap
scan starts and rather delay it until the start of the index vacuum,
new dead line pointers may get generated on a page which is already
scanned by the heap scan but before the start of the index scan. Since
the index pointers to these new dead line pointers haven't been
vacuumed, we should really not be removing them.

But as a consequence of using a LSN from the start of the heap scan,
at the end of vacuum, all pruned pages will have vacuum LSN greater
than the index vacuum LSN that we are going to remember in the
pg_class. And by our design, we can't remove dead line pointers on
those pages because we don't know if the index pointers have been
vacuumed or not. We might not be able to reclaim any dead line
pointers, if the page is again HOT pruned before the next vacuum cycle
because that will overwrite the page vacuum LSN with a newer value.

I think we definitely need to track the dead line pointers that a heap
scan has collected. The index pointers to them will be removed if the
vacuum completes successfully. That gets us back to the original idea
that we had discussed a while back about marking such dead line
pointers as LP_DEAD_RECLAIMED or something like that. When vacuum
runs heap scan, it would collect all dead line pointers and mark them
dead-reclaimed and also store an identifier of the vacuum operation
that would remove the associated index pointers. During HOT cleanup or
the next vacuum, we can safely remove the LP_DEAD_RECLAIMED line
pointers if we can safely check if the vacuum completed successfully
or not. We don't have any free flags in ItemIdData, but we can use
special lp_off to recognize a dead and dead-reclaimed line pointer.
The identifier itself can either be an LSN or XID or anything else.
Also, since we just need one identifier, I think this technique would
work for unlogged and temp relations, with little adjustments.

2. Another issue is with analyze counting dead line pointers as dead
rows. While its correct in principle because a vacuum is needed to
remove these dead line pointers, the overhead of having a dead line
pointer is much lesser than a dead tuple. Also, with single pass
vacuum, there will be many dead line pointers waiting to be cleaned up
in the next vacuum or HOT-prune. We should not really count them as
dead rows because they don't require a vacuum per se and counting them
as dead will force more vacuum cycles than required. If we go by the
idea described above, we can definitely skip the dead-reclaimed line
pointers, definitely when we know that index vacuum was completed
successfully.

Thoughts ?

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB     http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-06-08 05:32:47 Re: [Pgbuildfarm-members] CREATE FUNCTION hang on test machine polecat on HEAD
Previous Message Alvaro Herrera 2011-06-08 05:07:33 Re: reindex creates predicate lock on index root