Re: Multixact truncation for FK locks patch

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multixact truncation for FK locks patch
Date: 2011-10-05 18:58:42
Message-ID: 1317840445-sup-7142@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Excerpts from Alvaro Herrera's message of lun sep 26 13:16:24 -0300 2011:

> So I had to look for something else -- and I think I have it: have
> multixact itself track its truncation position relative to Xid. Each
> pg_multixact/offset segment would store ReadNewTransactionId at the
> time it is created. Whenever vacuum attempts to run pg_clog
> truncation, it would also pass that Xid to multixact truncation; this
> would scan existing segments and delete those that come before the one
> marked with the maximum Xid previous to the pg_clog truncation point.

So here's how it would really work in detail.

The first MultiXactId in each segment, i.e. one pg_multixact/offset
entry every 32 pages, stores RecentGlobalXmin at the time the segment is
created, that is, when we get to ZeroOffsetPage the first page of the
segment. Such positions are skipped when handing out MultiXactIds. (We
abuse the system a bit by storing a TransactionId in a place that
normally holds an offset. It's not like we don't do exactly this in
MultiXactId themselves.) The semantics of this value is "segments prior
to this one can be removed as soon as we freeze Xids behind this one".
(However, note that the value is actually capped to RecentGlobalXmin.)

The reason that this is correct is that any MultiXactId that we might be
interested in knowing must be after the freeze point: any tuple before
the freeze point must have already been visited by vacuum; so either the
mxact contained only locks (in which case the interesting lifetime is
RecentGlobalXmin, which we already covered above), or it contained at
least one update; and since we're freezing, that update must be either
committed (so the tuple is invisible to everyone and would have been
removed by vacuum), or aborted (in which case the tuple would have been
relabeled HEAP_XMAX_INVALID). /* FIXME what if the update is gone but
was locked by a later transaction? */

pg_control contains a new field, TransactionId mxactFrozenXid. This
value is bootstrapped to InvalidTransactionId,

On CHECKPOINT, we call TruncateMultiXact(oldestXid). (This is the most
recent value we know from VACUUM).

In TruncateMultiXact, if we see that pg_control.mxactFrozenXid is older
than oldestXid, we know we have segments to remove. We scan the whole
directory to remove them, and then update mxactFrozenXid to the value
from the oldest remaining segment. (Both things can be done in a single
scan).

I considered the idea that Xids might advance faster than we consume
multixact segments, making the offset's freezeXid wrapped around. After
thinking about this, my conclusion is that there isn't really a problem
here (but I'm very open to be mistaken).

One thing of note is that the first page of the first segment is zeroed
twice: first when it is created by bootstrap, and second when the first
multixactid is created. This is a bit annoying, so I'm going to have
bootstrap set mxact 2 as the first one to be created, not 1. This means
we no longer zero that page twice; it also means we don't set a
freezeXid value uselessly for that page, which causes all sorts of
issues. (After wraparound, mxact 1 would be assigned normally and
segment zero would behave just like any other segment).

In a directory scan to remove segments, we need to open the first page
of each segment to fetch its freezeXid. Therefore it would be nice if
we could skip doing this if possible. I think the way to do this is to
have the callback keep track of "earliest segment that we have to
keep" and "oldest segment that we can remove". That way, any segment in
between needn't be opened. In reality, I doubt this is going to save
much, because removing segments is not all that frequent anyway (unless
you're eating tons of MultiXactIds), so maybe we shouldn't implement
this bit.

Note that this is all about truncating pg_multixact/offset only. To
truncate pg_multixact/members, we need to check the earliest kept
offsets segment, to know what's the earliest members segment we need to
keep. This is the same we do now, IIRC.

Thoughts? Does anybody see any serious flaw?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2011-10-05 18:58:46 Re: [v9.2] DROP statement reworks
Previous Message Seiko Ishida (MP Tech Consulting LLC) 2011-10-05 18:58:05 Re: Action requested - Application Softblock implemented | Issue report ID341057