Multixact truncation for FK locks patch

Lists: pgsql-hackers
From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Multixact truncation for FK locks patch
Date: 2011-09-26 16:16:24
Message-ID: 1317053656-sup-7193@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I've been continuing work on modifying the system to let foreign keys
coexist concurrently with updates that do not touch the "key" columns.
I've made a lot of progress and things seem to be working rather well.
However, I just struck an obstacle that seemed problematic: handling the
truncation of the MultiXactId space when they are no longer needed.

I hadn't stopped to think much about this, regarding it as a trivial
problem to change multixact.c to be just like clog.c. However, when it
came to actually doing it, I immediately realized that this cannot work,
because they don't share a common numeric space -- the mxact counter can
be anywhere, unrelated to the Xid counter. There's no way to figure out
the mutixact truncation point from purely Xid data.

In search of solutions to this problem, two things came to mind:

1. Track MultiXact offset generation just as we track Xid generation.
This means that after vacuum we immediately know where to truncate.

2. Make them share a common numeric space.

Both those two solutions come at a very high cost. #1 means that we
need some sort of "frozenmxact" column in pg_class and pg_database. So
we'd know easily and precisely where to truncate mxact; but we would
bloat those catalogs for something that's not as important. (We eat the
cost of maintaining relfrozenxid and datfrozenxid because it's necessary
for the system to work at all; but in the mxact case, we're talking
about something that's merely a concurrency optimization).

In the case of #2, we avoid having to add those columns, by having
multixact offsets be assigned by GetNewTransactionId; thus, we can
easily know where to truncate just by truncating at the same spot that
we truncate pg_clog. The problem with this idea is that there would be
huge areas of pg_multixact/offset that are completely unused, because
they would correspond to the values assigned to Xids themselves. This
would lead to bloat of multixact. It would also lead to shortening the
useful Xid space, thus leading to shorter times to freeze vacuum. This
is, of course, completely unacceptable.

So I had to look for something else -- and I think I have it: have
multixact itself track its truncation position relative to Xid. Each
pg_multixact/offset segment would store ReadNewTransactionId at the time
it is created. Whenever vacuum attempts to run pg_clog truncation, it
would also pass that Xid to multixact truncation; this would scan
existing segments and delete those that come before the one marked with
the maximum Xid previous to the pg_clog truncation point.

Doing this will require hacking the SLRU truncation logic a bit, so that
it's possible to have it scan segments with a callback in some fashion.

Thoughts?

--
Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Multixact truncation for FK locks patch
Date: 2011-10-05 18:58:42
Message-ID: 1317840445-sup-7142@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Excerpts from Alvaro Herrera's message of lun sep 26 13:16:24 -0300 2011:

> So I had to look for something else -- and I think I have it: have
> multixact itself track its truncation position relative to Xid. Each
> pg_multixact/offset segment would store ReadNewTransactionId at the
> time it is created. Whenever vacuum attempts to run pg_clog
> truncation, it would also pass that Xid to multixact truncation; this
> would scan existing segments and delete those that come before the one
> marked with the maximum Xid previous to the pg_clog truncation point.

So here's how it would really work in detail.

The first MultiXactId in each segment, i.e. one pg_multixact/offset
entry every 32 pages, stores RecentGlobalXmin at the time the segment is
created, that is, when we get to ZeroOffsetPage the first page of the
segment. Such positions are skipped when handing out MultiXactIds. (We
abuse the system a bit by storing a TransactionId in a place that
normally holds an offset. It's not like we don't do exactly this in
MultiXactId themselves.) The semantics of this value is "segments prior
to this one can be removed as soon as we freeze Xids behind this one".
(However, note that the value is actually capped to RecentGlobalXmin.)

The reason that this is correct is that any MultiXactId that we might be
interested in knowing must be after the freeze point: any tuple before
the freeze point must have already been visited by vacuum; so either the
mxact contained only locks (in which case the interesting lifetime is
RecentGlobalXmin, which we already covered above), or it contained at
least one update; and since we're freezing, that update must be either
committed (so the tuple is invisible to everyone and would have been
removed by vacuum), or aborted (in which case the tuple would have been
relabeled HEAP_XMAX_INVALID). /* FIXME what if the update is gone but
was locked by a later transaction? */

pg_control contains a new field, TransactionId mxactFrozenXid. This
value is bootstrapped to InvalidTransactionId,

On CHECKPOINT, we call TruncateMultiXact(oldestXid). (This is the most
recent value we know from VACUUM).

In TruncateMultiXact, if we see that pg_control.mxactFrozenXid is older
than oldestXid, we know we have segments to remove. We scan the whole
directory to remove them, and then update mxactFrozenXid to the value
from the oldest remaining segment. (Both things can be done in a single
scan).

I considered the idea that Xids might advance faster than we consume
multixact segments, making the offset's freezeXid wrapped around. After
thinking about this, my conclusion is that there isn't really a problem
here (but I'm very open to be mistaken).

One thing of note is that the first page of the first segment is zeroed
twice: first when it is created by bootstrap, and second when the first
multixactid is created. This is a bit annoying, so I'm going to have
bootstrap set mxact 2 as the first one to be created, not 1. This means
we no longer zero that page twice; it also means we don't set a
freezeXid value uselessly for that page, which causes all sorts of
issues. (After wraparound, mxact 1 would be assigned normally and
segment zero would behave just like any other segment).

In a directory scan to remove segments, we need to open the first page
of each segment to fetch its freezeXid. Therefore it would be nice if
we could skip doing this if possible. I think the way to do this is to
have the callback keep track of "earliest segment that we have to
keep" and "oldest segment that we can remove". That way, any segment in
between needn't be opened. In reality, I doubt this is going to save
much, because removing segments is not all that frequent anyway (unless
you're eating tons of MultiXactIds), so maybe we shouldn't implement
this bit.

Note that this is all about truncating pg_multixact/offset only. To
truncate pg_multixact/members, we need to check the earliest kept
offsets segment, to know what's the earliest members segment we need to
keep. This is the same we do now, IIRC.

Thoughts? Does anybody see any serious flaw?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support