Re: Page at a time index scan

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Page at a time index scan
Date: 2006-05-03 14:17:43
Message-ID: 16108.1146665863@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> AFAICS we will need to return to the page for a backward scan, so we
> could just keep the pin the whole way. It's not possible to cache the
> left page pointer because block splits to our immediate left can update
> them even after we read the page contents.

Sure we can cache the left pointer: it's not any different from the
normal walk-left problem, because as soon as you drop pin on the page
you are leaving, all the same hazards apply. This algorithm just has a
slightly wider window between dropping one pin and acquiring the next.
Walk-left is painful and (rarely) expensive, but it's a solved problem.

> We need never return to a page that *could* be deleted. While scanning
> in either direction, if the complete page contains nothing but dead
> items we can simply move straight onto the next page, having updated the
> page status to half-dead.

This is unnecessary and probably wrong.

> - we have a minimum threshold of deleted tuples. Otherwise we might
> re-dirty the page for just a single hint bit, so we end up writing the
> page out hundreds of times. (Guess: that should be 2 or 3)

You are optimizing the wrong thing here. If we choose not to mark an
entry dead then we will pay for that omission on every future scan of
the same entry. I don't think that outweighs the (doubtless rare)
situation where we expend an extra page fetch to reload the page.

It's worth noting that all of this stuff is predicated on the assumption
that index items never move across pre-existing page boundaries, in
either direction. We are therefore going to be permanently giving up
any prospect of index space reclamation by merging partly-filled pages
(unless maybe in VACUUM FULL). We didn't know how to do that anyway,
so I don't feel too bad about it, but if indexscans don't make any
attempt to explicitly re-locate their positions then that certainly
goes out the window.

regards, tom lane

In response to

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-05-03 14:45:15 Re: Page at a time index scan
Previous Message Martijn van Oosterhout 2006-05-03 13:54:57 Re: patch review, please: Autovacuum/Vacuum times via stats.