Re: Failure while inserting parent tuple to B-tree is not fun

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Failure while inserting parent tuple to B-tree is not fun
Date: 2014-01-29 23:46:08
Message-ID: CAM3SWZRBwB=7mwwRM=OeDE1s6RGxy6m1KaHjp7YcfGwRetQOOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 27, 2014 at 10:54 AM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
> On Mon, Jan 27, 2014 at 10:27 AM, Heikki Linnakangas
> <hlinnakangas(at)vmware(dot)com> wrote:
>>> I think I see some bugs in _bt_moveright(). If you examine
>>> _bt_finish_split() in detail, you'll see that it doesn't just drop the
>>> write buffer lock that the caller will have provided (per its
>>> comments) - it also drops the buffer pin. It calls _bt_insert_parent()
>>> last, which was previously only called last thing by _bt_insertonpg()
>>> (some of the time), and _bt_insertonpg() is indeed documented to drop
>>> both the lock and the pin. And if you look at _bt_moveright() again,
>>> you'll see that there is a tacit assumption that the buffer pin isn't
>>> dropped, or at least that "opaque" (the BTPageOpaque BT special page
>>> area pointer) continues to point to the same page held in the same
>>> buffer, even though the code doesn't set the "opaque' pointer again
>>> and it may not point to a pinned buffer or even the appropriate
>>> buffer. Ditto "page". So "opaque" may point to the wrong thing on
>>> subsequent iterations - you don't do anything with the return value of
>>> _bt_getbuf(), unlike all of the existing call sites. I believe you
>>> should store its return value, and get the held page and the special
>>> pointer on the page, and assign that to the "opaque" pointer before
>>> the next iteration (an iteration in which you very probably really do
>>> make progress not limited to completing a page split, the occurrence
>>> of which the _bt_moveright() loop gets "stuck on"). You know, do what
>>> is done in the non-split-handling case. It may not always be the same
>>> buffer as before, obviously.
>>
>>
>> Yep, fixed.
>
> Can you explain what the fix was, please?

Ping? I would like to hear some details here.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-01-29 23:49:11 Re: updated emacs configuration
Previous Message Tom Lane 2014-01-29 23:29:58 Re: Add min and max execute statement time in pg_stat_statement