Re: Stating the significance of Lehman & Yao in the nbtree README

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Stating the significance of Lehman & Yao in the nbtree README
Date: 2014-09-09 07:01:31
Message-ID: 540EA5CB.8070601@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/07/2014 05:58 AM, Peter Geoghegan wrote:
> + Lehman and Yao don't require read locks, but assume that in-memory
> + copies of tree pages are unshared. Postgres shares in-memory buffers
> + among backends. As a result, we do page-level read locking on btree
> + pages in order to guarantee that no record is modified while we are
> + examining it. This reduces concurrency but guarantees correct
> + behavior. An advantage is that when trading in a read lock for a
> + write lock, we need not re-read the page after getting the write lock.
> + Since we're also holding a pin on the shared buffer containing the
> + page, we know that buffer still contains the page and is up-to-date.

This is the existing paragraph, just moved to different place in the README.

> + Although it could be argued that Lehman and Yao isn't followed to the
> + letter because single pages are read locked as the tree is descended,
> + this is at least necessary to support deletion, a common requirement
> + which L&Y hardly acknowledge. Read locks also ensure that B-tree
> + pages are self-consistent (L&Y appear to assume atomic page reads and
> + writes).

This is just duplicating the existing paragraph. I don't see the point
of this.

> Even with these read locks, following L&Y obviates the need
> + of earlier schemes to hold multiple locks concurrently when descending
> + the tree as part of servicing index scans (pessimistic lock coupling).
> + The addition of right-links at all levels, as well as the addition of
> + a page "high key" allows detection and dynamic recovery from
> + concurrent page splits (that is, splits between unlocking an internal
> + page, and subsequently locking its child page during a descent). When
> + a page is first locked (at every level of a descent servicing an index
> + scan), we consider the need to "move right": if the scankey value is
> + less than (or sometimes less than or equal to) the page's existing
> + highkey value, a value which serves as an upper bound for values on
> + the page generally, then it must be necessary to move the scan to the
> + right-hand page on the same level. It's even possible that the scan
> + needs to move right more than once. Once the other session's
> + concurrent page split finishes, a downlink will be inserted into the
> + parent, and so assuming there are no further page splits, future index
> + scans using the same scankey value will not need to move right. L&Y
> + Trees are sometimes referred to as "B-Link trees" in the literature.

This explains in a few sentences what a L&Y B-tree looks like. The
current README assumes that you're already familiar with what a L&Y tree
looks like, or that you go read the paper mentioned at the top of the
README. It might be a good idea to expand on that, and add an
introduction like this so that an unfamiliar reader doesn't need to read
the L&Y paper first. Is that the purpose of this patch? Please make it
more explicit. And please make the sentences simpler - the above reads
like a Shakespeare play. Something like:

The basic Lehman & Yao Algorithm
================================

Compared to a classic B-tree, L&Y adds a right-link pointer to each
page, to the page's right sibling. It also adds a "high key" to each
page, which is an upper bound on the keys that are allowed on that page.
These two additions make it possible detect a concurrent page split,
which allows the tree to be searched without holding any read locks
(except to keep a single page from being modified while reading it).

When a search follows a downlink to a child page, it compares the page's
high key with the search key. If the search key is greater than the high
key, the page must've been split concurrently, and you must follow the
right-link to find the new page containing the key range you're looking
for. This might need to be repeated, if the page has been split more
than once.

Differences to the Lehman & Yao algorithm
=========================================

(current "Lehamn and Yao Algorithm and Insertions section)

I think that's pretty much the same information you added, but it's in a
separate section, with the clear purpose that it explains what a L&Y
tree looks like. You can skip over it, if you have read the paper or are
otherwise already familiar with it. It still assumes that you're
familiar with B-trees in general.

Anyway, I see that you had resurrected this in the commitfest app after
three weeks of inactivity. I'm going to mark this back to "Returned with
Feedback". Please don't resurrect it again, this patch has received more
than its fair share of attention. Instead, please help by signing up to
review a patch. The commitfest progress is glacial at the moment, and we
really need experienced reviewers like you to get closure to people's
patches.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marko Tiikkaja 2014-09-09 07:49:48 Re: proposal: plpgsql - Assert statement
Previous Message Craig Ringer 2014-09-09 05:55:56 Re: proposal: plpgsql - Assert statement