Re: _bt_parent_deletion_safe() isn't safe

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Subject: _bt_parent_deletion_safe() isn't safe
Date: 2010-06-08 17:50:13
Message-ID: 7409.1276019413@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The btree page deletion logic has a restriction that it cannot delete
the rightmost child page of any non-leaf btree page (see nbtree/README
for explanations). This is checked by _bt_parent_deletion_safe(),
which claims

* Note: it's OK to release page locks after checking, because a safe
* deletion can't become unsafe due to concurrent activity. A non-rightmost
* page cannot become rightmost unless there's a concurrent page deletion,
* but only VACUUM does page deletion and we only allow one VACUUM on an index
* at a time. An only child could acquire a sibling (of the same parent) only
* by being split ... but that would make it a non-rightmost child so the
* deletion is still safe.

This analysis missed a case, though. What if an insertion into some
nearby leaf page causes a split, and the resulting insertion into the
parent page causes it to split, and we choose a split point just after
the downlink for the page that VACUUM is trying to delete? That will
leave the deletion target as the rightmost child, and we're screwed.

I realized this while thinking about Jeff Amiel's report here:
http://archives.postgresql.org/pgsql-general/2010-06/msg00351.php
I can't prove that this is what's causing his crashes, but it could
produce the symptom he's reporting. And it'd also explain the
observation that the crash doesn't recur when autovacuum tries again,
since at that time it'll see the page as a rightmost child and not try
to delete it. Maybe the reason he's seeing it repeatedly is that in his
installation the deletions lag behind insertions at about the right rate
for the problem case to occur.

Right at the moment I'm not seeing a fix other than to have page
deletion hold lock on the parent page till it's done. That's unpleasant
from a concurrency standpoint. Anybody see a better way?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Subject: Re: _bt_parent_deletion_safe() isn't safe
Date: 2010-06-08 19:35:52
Message-ID: 9158.1276025752@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> I realized this while thinking about Jeff Amiel's report here:
> http://archives.postgresql.org/pgsql-general/2010-06/msg00351.php
> I can't prove that this is what's causing his crashes, but it could
> produce the symptom he's reporting.

Actually, no it can't: the case I'm envisioning should lead to throwing
this error:

elog(ERROR, "failed to delete rightmost child %u of block %u in index \"%s\"",
target, parent, RelationGetRelationName(rel));

a bit further up. That's annoying enough, but it's not a PANIC.

A search of the archives produces no evidence that anyone has ever
reported the "failed to delete rightmost child" error from the field.
So while I still think this is a bug that needs to be fixed, it may
be lower priority than I thought initially.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Subject: Re: _bt_parent_deletion_safe() isn't safe
Date: 2010-07-02 23:20:41
Message-ID: 201007022320.o62NKfZ12063@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
> > I realized this while thinking about Jeff Amiel's report here:
> > http://archives.postgresql.org/pgsql-general/2010-06/msg00351.php
> > I can't prove that this is what's causing his crashes, but it could
> > produce the symptom he's reporting.
>
> Actually, no it can't: the case I'm envisioning should lead to throwing
> this error:
>
> elog(ERROR, "failed to delete rightmost child %u of block %u in index \"%s\"",
> target, parent, RelationGetRelationName(rel));
>
> a bit further up. That's annoying enough, but it's not a PANIC.
>
> A search of the archives produces no evidence that anyone has ever
> reported the "failed to delete rightmost child" error from the field.
> So while I still think this is a bug that needs to be fixed, it may
> be lower priority than I thought initially.

Is this a TODO?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Subject: Re: _bt_parent_deletion_safe() isn't safe
Date: 2010-07-03 02:02:59
Message-ID: 1045.1278122579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Tom Lane wrote:
>> A search of the archives produces no evidence that anyone has ever
>> reported the "failed to delete rightmost child" error from the field.
>> So while I still think this is a bug that needs to be fixed, it may
>> be lower priority than I thought initially.

> Is this a TODO?

Possibly. I was planning to go back and study that code a bit more ---
I have a feeling that there might be some kind of rare concurrency bug
involved in btree page deletion. But I've been up to my rear in
other alligators for the past several weeks.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jeff Amiel <becauseimjeff(at)yahoo(dot)com>
Subject: Re: _bt_parent_deletion_safe() isn't safe
Date: 2010-07-04 02:15:23
Message-ID: 1278209602-sup-1299@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Tom Lane's message of vie jul 02 22:02:59 -0400 2010:

> Possibly. I was planning to go back and study that code a bit more ---
> I have a feeling that there might be some kind of rare concurrency bug
> involved in btree page deletion. But I've been up to my rear in
> other alligators for the past several weeks.

Judging from the evidence I've seen, I'm fairly sure that there *is* a
concurrency bug somewhere in that code.