Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Serge Negodyuck <petr(at)petrovich(dot)kiev(dot)ua>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8673: Could not open file "pg_multixact/members/xxxx" on slave during hot_standby
Date: 2014-06-04 17:46:59
Message-ID: 20140604174659.GP5146@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Serge Negodyuck wrote:
> 2014-06-02 17:10 GMT+03:00 Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>:
>
> > Serge Negodyuck wrote:
> > > Hello,
> > >
> > > I've upgraded postgresql to version 9.3.4 and did fresh initdb and
> > restored
> > > database from sql backup.
> > > According to 9.4.3 changelog issue with multixact wraparound was fixed.
> >
> > Ouch. This is rather strange. First I see the failing multixact has
> > 8684 members, which is totally unusual. My guess is that you have code
> > that creates lots of subtransactions, and perhaps does something to one
> > tuple in a different subtransaction; doing sometihng like that would be,
> > I think, the only way to get subxacts that large. Does that sound
> > right?
> >
> It sounds like you are right. I've found a lot of inserts in logs. Each
> insert cause trigger to be performed. This trigger updates counter in
> other table.
> It is very possible this tirgger tries to update the same counter for
> different inserts.

I wasn't able to reproduce it that way, but I eventually figured out
that if I altered altered the plpython function to grab a FOR NO KEY
UPDATE lock first, insertion would grow the multixact beyond reasonable
limits; see the attachment. If you then INSERT many tuples in "product"
in a single transaction, the resulting xmax is a Multixact that has as
many members as inserts there are, plus one.

(One variation that causes even more bizarre results is dispensing with
the plpy.subtransaction() in the function and instead setting a
savepoint before each insert. In fact, given the multixact members
shown in your log snippet I think that's more similar to what you code
does.)

> > > Then, did pg_basebackup to slave database. It does not help
> > > 2014-06-02 09:58:49 EEST 172.18.10.17 db2 DETAIL: Could not open file
> > > "pg_multixact/members/1112D": No such file or directory.
> > > 2014-06-02 09:58:49 EEST 172.18.10.18 db2 DETAIL: Could not open file
> > > "pg_multixact/members/11130": No such file or directory.
> > > 2014-06-02 09:58:51 EEST 172.18.10.34 db2 DETAIL: Could not open file
> > > "pg_multixact/members/11145": No such file or directory.
> > > 2014-06-02 09:58:51 EEST 172.18.10.38 db2 DETAIL: Could not open file
> > > "pg_multixact/members/13F76": No such file or directory
> >
> > Are these the only files missing? Are intermediate files there?
>
> Only 0000 - 001E files were present on slave server.

I don't understand how can files be missing in the replica.
pg_basebackup simply copies all files it can find in the master to the
replica, so if the 111xx files are present in the master they should
certainly be present in the replica as well. I gave the pg_basebackup
code a look just to be sure there are no 4-char pattern matching or
something like that, and it doesn't look like it attempts to do that at
all. I also asked Magnus just to be sure and he confirms this.

I'm playing a bit more with this test case, I'll let you know where it
leads.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
init text/plain 958 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Gunnar "Nick" Bluth 2014-06-04 17:56:01 Re: BUG #10527: TRAP when joining local table with view on tds_fdw foreign table
Previous Message Tom Lane 2014-06-04 16:35:50 Re: BUG #10527: TRAP when joining local table with view on tds_fdw foreign table

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-04 17:50:47 Re: pg_control is missing a field for LOBLKSIZE
Previous Message Robert Haas 2014-06-04 17:33:16 Re: pg_control is missing a field for LOBLKSIZE