Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Date: 2014-06-13 17:51:51
Message-ID: 20140613175151.GN18688@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Andres Freund wrote:
> Hi,
>
> When upgrading a < 9.3 cluster pg_upgrade doesn't bother to keep the old
> multixacts around because they won't be read after the upgrade (and
> aren't compatible). It just resets the new cluster's nextMulti to the
> old + 1.
> Unfortunately that means that there'll be a offsets/0000 file created by
> initdb around. Sounds harmless enough, but that'll actually cause
> problems if the old cluster had a nextMulti that's bigger than that
> page.
>
> When vac_truncate_clog() calls TruncateMultiXact() that'll scan
> pg_multixact/offsets to find the earliest existing segment. That'll be
> 0000. If the to-be-truncated data is older than the last existing
> segment it returns. Then it'll try to determine the last required data
> in members/ by accessing the oldest data in offsets/.

I'm trying to understand the mechanism of this bug, and I'm not
succeeding. If the offset/0000 was created by initdb, how come we try
to delete a file that's not also members/0000? I mean, surely the file
as created by initdb is empty (zeroed). In your sample error message
downthread,

ERROR: could not access status of transaction 2072053907
DETAIL: Could not open file "pg_multixact/offsets/7B81": No such file or directory.

what prompted the status of that multixid to be sought? I see one
possible path to this error message, which is SlruPhysicalReadPage().
(There are other paths that lead to similar errors, but they use
"transaction 0" instead, so we can rule those out; and we can rule out
anything that uses MultiXactMemberCtl because of the path given in
DETAIL.)

There are four callsites that lead to that:

RecordNewMultiXact
GetMultiXactIdMembers (2x)
TrimMultiXact

Of those, only GetMultiXactIdMembers is likely to be called from vacuum
(actually RecordNewMultiXact can too, in a few cases, if it happens to
freeze a multi by creating another multi; should be pretty rare).
But you were talking about vacuum truncating pg_multixact -- and I don't
see how that's related to these functions.

Is it possible that you pasted the wrong error message?

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2014-06-13 18:14:08 Re: pg_upgrade < 9.3 -> >=9.3 misses a step around multixacts
Previous Message Tom Lane 2014-06-13 15:51:40 Re: pg_restore PostgreSQL 9.3.3 problems