Re: pg_multixact not getting truncated

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: pg_multixact not getting truncated
Date: 2014-11-05 19:15:10
Message-ID: 545A773E.1000004@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11/05/2014 10:40 AM, Jim Nasby wrote:
> On 11/3/14, 7:40 PM, Josh Berkus wrote:
>> On 11/03/2014 05:24 PM, Josh Berkus wrote:
>>> BTW, the reason I started poking into this was a report from a user that
>>> they have a pg_multixact directory which is 21GB in size, and is 2X the
>>> size of the database.
>>>
>>> Here's XID data:
>>>
>>> Latest checkpoint's NextXID: 0/1126461940
>>> Latest checkpoint's NextOID: 371135838
>>> Latest checkpoint's NextMultiXactId: 162092874
>>> Latest checkpoint's NextMultiOffset: 778360290
>>> Latest checkpoint's oldestXID: 945761490
>>> Latest checkpoint's oldestXID's DB: 370038709
>>> Latest checkpoint's oldestActiveXID: 1126461940
>>> Latest checkpoint's oldestMultiXid: 123452201
>>> Latest checkpoint's oldestMulti's DB: 370038709
>>>
>>> Oldest mxid file is 29B2, newest is 3A13
>>>
>>> No tables had a relminmxid of 1 (some of the system tables were 0,
>>> though), and the data from pg_class and pg_database is consistent.
>>
>> More tidbits:
>>
>> I just did a quick check on customer systems (5 of them). This only
>> seems to be happening on customer systems where I specifically know
>> there is a high number of FK lock waits (the system above gets around
>> 1000 per minute that we know of). Other systems with higher transaction
>> rates don't exhibit this issue; I checked a 9.3.5 database which
>> normally needs to do XID wraparound once every 10 days, and it's
>> pg_multixact is only 48K (it has one file, 0000).
>>
>> Note that pg_clog on the bad machine is only 64K in size.
>>
>> How many IDs are there per mxid file?
> #define MULTIXACT_OFFSETS_PER_PAGE (BLCKSZ / sizeof(MultiXactOffset))
>
> So for 8k blocks, there are 2k offsets (really MultiXactIds) per page,
> 32 pages per SLRU segment. Your file names aren't making sense to me. :(
> If I'm doing the math correctly, 29B2 is MXID 699 531 264 and 3A13 is
> 974 323 712. You're only looking in pg_multixact/members/, yes?

These are members, not offsets. What's stored in members?

> Relevant code starts in vacuum.c/vac_update_datfrozenxid()
>
> If there's any rows in pg_class for tables/matviews/toast with either
> relfrozenxid > next XID or relminmxid > next MXID then the code
> *silently* pulls the plug right there. IMO we should at least issue a
> warning.

Wait, how would that situation arise in the first place? Wraparound is
supposed to prevent it.

Mind you, I checked pg_class, and it matches the minmxid shown by
pg_database, so that's not the smoking gun.

> Can you post the contents of pg_multixact/members/?

Well, not as of last week, obviously.

https://gist.github.com/jberkus/d05db3629e8c898664c4

I haven't pasted all the filenames, because, well, look at the count. I
also added the contents of the /offsets directory, for full information.

Note that we've added 400 multixact files since my first email.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2014-11-05 20:38:17 Re: tracking commit timestamps
Previous Message Jim Nasby 2014-11-05 18:40:41 Re: pg_multixact not getting truncated