Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "devrim(at)gunduz(dot)org" <devrim(at)gunduz(dot)org>
Subject: Re: [COMMITTERS] pgsql: Properly set relpersistence for fake relcache entries.
Date: 2012-09-20 21:55:05
Message-ID: 201209202355.05332.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Monday, September 17, 2012 03:58:37 PM Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> > Btw, I played with this some more on Saturday and I think, while
> > definitely a bad bug, the actual consequences aren't as bad as at least
> > I initially feared.
> >
> > Fake relcache entries are currently set in 3 scenarios during recovery:
> > 1. removal of ALL_VISIBLE in heapam.c
> > 2. incomplete splits and incomplete deletions in nbtxlog.c
> > 3. incomplete splits in ginxlog.c
> > [ #1 doesn't really hurt in 9.1, and the others are low probability ]
>
> OK, that explains why we've not seen a blizzard of trouble reports.
> Still seems like a good idea to fix it ASAP, though.
Btw, I think RhodiumToad/Andrew Gierth and I some time ago helped a user in the
IRC Channel that had symptoms matching this bug.

Situation was that he started to get very high IO and xid wraparound shutdown
warnings due to never finishing and not canceleable autovacuums. After some
investigation it turned out that btree indexes were processed at that time. We
found they had cyclic btpo_next pointers leading to an endless loop in
_bt_pagedel.
We solved the issue by forcing leftsib = P_NONE inside the
while (P_ISDELETED(opaque) || opaque->btpo_next != target)
which let a queue DROP INDEX get the necessary locks.

Unfortuantely this was on a busy production system with a nearing shutdown, so
not much was kept for further diagnosis.

After this bug was discovered I asked the user and indeed they previously
shutdown the database twice in quick succession during heavy activity with -m
immediate which could exactly lead to such a problem due to incompletely
processed page splits.

Greetings,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Heikki Linnakangas 2012-09-21 12:26:56 pgsql: Fix obsolete comment.
Previous Message User Sakamotomsh 2012-09-20 15:36:48 reorg - pg_reorg: Updated regression tests to use CREATE EXTENSION for

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-09-20 21:55:19 Re: Assigning NULL to a record variable
Previous Message Pavel Stehule 2012-09-20 21:39:15 Re: Assigning NULL to a record variable