heap_hot_search_buffer refactoring

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: heap_hot_search_buffer refactoring
Date: 2011-06-06 18:03:29
Message-ID: BANLkTi=CM_wN=2RAZOHofM3zRS2qkBftNA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

The attached patch refactors heap_hot_search_buffer() so that
index_getnext() can use it, and modifies index_getnext() to do so.

The idea is based on one of Heikki's index-only scan patches, from 2009:

http://archives.postgresql.org/pgsql-hackers/2009-07/msg00676.php

I believe, however, that this portion of that patch stands alone,
completely independently of index-only scans. At present, we have two
copies of the logic to traverse HOT chains floating around, which
means they can get out of sync, possibly resulting in bugs. This has
already happened once:

http://archives.postgresql.org/pgsql-hackers/2011-05/msg00733.php

As a nice bonus, the new logic is both simpler and, I believe, more
correct than the current logic. In IndexScanDescData, xs_hot_dead,
xs_next_hot, and xs_prev_xmax get replaced by a single boolean
xs_continue_hot. There is a small behavioral difference: in the
current code, when use a non-MVCC snapshot with index_getnext() and
walk a HOT chain, each call remembers the *next* TID that should be
examined. With this change, we instead remember the TID that we most
recently returned, and compute the next TID when index_getnext() is
called again. It seems to me that this is really what we should have
been doing all along. Imagine, for example, that we have a HOT chain
A -> B, where B has not yet committed. We return A and remember that
we next intend to look at B. Before index_getnext() is called, B
aborts and a new HOT update produces a HOT chain A -> C. The next
call to index_getnext() will nonetheless look at B and conclude that
it's reached the end of the HOT chain. This doesn't actually matter a
bit for current uses of index_getnext(), because - at least according
to Heikki's old notes - the only place we use a non-MVCC snapshot with
this function is during CLUSTER. And at that point, we have the whole
table locked down, so nothing is happening concurrently. I'm not sure
there's any possibility of us ever using this function in a way that
would break under the current implementation, but what I'm proposing
here does seem more robust to me.

Thoughts?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
heap-hot-search-refactoring.patch application/octet-stream 14.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-06-06 18:06:35 Re: patch: Allow \dd to show constraint comments
Previous Message Ross J. Reedstrom 2011-06-06 18:03:06 Re: Range Types and extensions