Re: MultiXactId error after upgrade to 9.3.4

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: MultiXactId error after upgrade to 9.3.4
Date: 2014-03-31 13:02:15
Message-ID: 20140331130215.GV9567@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andres Freund wrote:
> On 2014-03-31 09:19:12 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > On 2014-03-31 08:54:53 -0300, Alvaro Herrera wrote:
> > > > My conclusion here is that some part of the code is failing to examine
> > > > XMAX_INVALID before looking at the value stored in xmax itself. There
> > > > ought to be a short-circuit. Fortunately, this bug should be pretty
> > > > harmless.
> > > >
> > > > .. and after looking, I'm fairly sure the bug is in
> > > > heap_tuple_needs_freeze.
> > >
> > > heap_tuple_needs_freeze() isn't *allowed* to look at
> > > XMAX_INVALID. Otherwise it could miss freezing something still visible
> > > on a standby or after an eventual crash.
> >
> > Ah, you're right. It even says so on the comment at the top (no
> > caffeine yet.) But what it's doing is still buggy, per this report, so
> > we need to do *something* ...
>
> Are you sure needs_freeze() is the problem here?
>
> IIRC it already does some checks for allow_old? Why is the check for
> that not sufficient?

GetMultiXactIdMembers has this:

if (MultiXactIdPrecedes(multi, oldestMXact))
{
ereport(allow_old ? DEBUG1 : ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("MultiXactId %u does no longer exist -- apparent wraparound",
multi)));
return -1;
}

if (!MultiXactIdPrecedes(multi, nextMXact))
ereport(ERROR,
(errcode(ERRCODE_INTERNAL_ERROR),
errmsg("MultiXactId %u has not been created yet -- apparent wraparound",
multi)));

I guess I wasn't expecting that too-old values would last longer than a
full wraparound cycle. Maybe the right fix is just to have the second
check also conditional on allow_old.

Anyway, it's not clear to me why this database has a multixact value of
6 million when the next multixact value is barely above one million.
Stephen said a wraparound here is not likely.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2014-03-31 13:09:08 Re: MultiXactId error after upgrade to 9.3.4
Previous Message Michael Paquier 2014-03-31 12:46:07 Re: Re: [HACKERS] New parameter RollbackError to control rollback behavior on error