Re: Failure while inserting parent tuple to B-tree is not fun

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Failure while inserting parent tuple to B-tree is not fun
Date: 2013-11-14 17:23:39
Message-ID: 5285071B.1040100@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05.11.2013 15:07, Heikki Linnakangas wrote:
> On 25.10.2013 22:13, Heikki Linnakangas wrote:
>> On 22.10.2013 19:55, Heikki Linnakangas wrote:
>>> I fixed the the same problem in GiST a few years back, by making it
>>> tolerate missing downlinks, and inserting them lazily. The B-tree code
>>> tolerates them already on scans, but gets confused on insertion, as seen
>>> above. I propose that we use the same approach I used with GiST, and add
>>> a flag to the page header to indicate "the downlink hasn't been inserted
>>> yet". When insertion (or vacuum) bumps into a flagged page, it can
>>> finish the incomplete action by inserting the downlink.
>>
>> This is what I came up with.
>>
>> One thing I'm not totally happy about is the way page deletions of
>> incompletely split pages are handled. Basically, it just bails out and
>> refuses to delete a page that is part of an incomplete split. That's
>> probably OK in practice, as incomplete splits should be very rare
>> anyway, but it's a bit dissatisfying to not handle the case because at
>> first glance it seems like it should be even simpler than usual to
>> delete a page that has no downlink. Nevertheless, I decided to just skip
>> that for now.
>>
>> After this patch, deleting the only child of a parent and the parent
>> itself is still a multi-WAL-record operation that needs to be tracked
>> during recovery, and completed at the end of recovery. I'd like to
>> eliminate that too, but that's another patch.
>
> Here's a new version of this, which uses a similar technique to handle
> page deletions, eliminating the "incomplete action" tracking code
> altogether (from btree). When an internal page is marked as half-dead,
> its right sibling is atomically marked with a
> "left-sibling-is-half-dead" flag. Whenever an insertion encounters a
> page with that flag set, it will finish the deletion of the left sibling
> before proceeding with the insertion.
>
> This needs a lot more testing, but I wanted to get this out for review,
> in case someone sees a fundamental problem with this.

Ok, here's a new version of the patch to handle incomplete B-tree
splits. I rejected the scheme I outlined above about handling half-dead
pages - instead, see the patch in the "Race condition in b-tree page
deletion" thread:
http://www.postgresql.org/message-id/5283A2FC.6040606@vmware.com. That
approach to handling half-dead pages makes the incomplete-split stuff a
lot easier, as half-dead pages don't need any special treatment wrt.
splits- This patch should be after that patch.

- Heikki

Attachment Content-Type Size
btree-incomplete-splits-2.patch text/x-diff 56.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-11-14 17:26:32 Replication Node Identifiers and crashsafe Apply Progress
Previous Message Zhan Li 2013-11-14 17:13:19 Re: Ideas of "printing out" the alternative paths