Failure while inserting parent tuple to B-tree is not fun

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Failure while inserting parent tuple to B-tree is not fun
Date: 2013-10-22 16:55:09
Message-ID: 5266ADED.9030008@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Splitting a B-tree page is a two-stage process: First, the page is
split, and then a downlink for the new right page is inserted into the
parent (which might recurse to split the parent page, too). What happens
if inserting the downlink fails for some reason? I tried that out, and
it turns out that it's not nice.

I used this to cause a failure:

> --- a/src/backend/access/nbtree/nbtinsert.c
> +++ b/src/backend/access/nbtree/nbtinsert.c
> @@ -1669,6 +1669,8 @@ _bt_insert_parent(Relation rel,
> _bt_relbuf(rel, pbuf);
> }
>
> + elog(ERROR, "fail!");
> +
> /* get high key from left page == lowest key on new right page */
> ritem = (IndexTuple) PageGetItem(page,
> PageGetItemId(page, P_HIKEY));

postgres=# create table foo (i int4 primary key);
CREATE TABLE
postgres=# insert into foo select generate_series(1, 10000);
ERROR: fail!

That's not surprising. But when I removed that elog again and restarted
the server, I still can't insert. The index is permanently broken:

postgres=# insert into foo select generate_series(1, 10000);
ERROR: failed to re-find parent key in index "foo_pkey" for split pages 4/5

In real life, you would get a failure like this e.g if you run out of
memory or disk space while inserting the downlink to the parent.
Although rare in practice, it's no fun if it happens.

I fixed the the same problem in GiST a few years back, by making it
tolerate missing downlinks, and inserting them lazily. The B-tree code
tolerates them already on scans, but gets confused on insertion, as seen
above. I propose that we use the same approach I used with GiST, and add
a flag to the page header to indicate "the downlink hasn't been inserted
yet". When insertion (or vacuum) bumps into a flagged page, it can
finish the incomplete action by inserting the downlink.

- Heikki

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-10-22 17:08:43 Re: logical changeset generation v6.2
Previous Message Andres Freund 2013-10-22 16:43:49 Re: logical changeset generation v6.2