Re: [WIP] shared locks

Lists: pgsql-hackerspgsql-patches
From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: [WIP] shared locks
Date: 2005-04-18 23:22:53
Message-ID: 20050418232253.GA23483@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Hackers,

Here is another attempt at the shared locks patch. This is a radically
different approach, using Xmax in the tuples and SLRU areas instead of
the lock manager.

The idea is that a tuple's Xmax can either be a real TransactionId
(which is used normally like current CVS tip), or, if the infomask has
HEAP_XMAX_SHARED_LOCK, a MultiXactId.

A MultiXactId is an abstraction for a set of TransactionIds. So a
locker can sleep on the set (effectively calling XactLockTableWait on
each member), add itself to a previously existing set, or create a new
set with only itself.

MultiXactIds are implemented using two SLRU areas and a couple of
variables in ShmemVariableCache. We also XLog groups of them just like
we do for Oids.

This patch is a work in progress. I need to test it more, but the basic
infrastructure is written and working. I'd love some comments on the
basic design idea.

The patch applies cleanly to current CVS tip. There are two new files
which are src/backend/access/transam/multixact.c and
src/include/access/multixact.h, included separately.

Thanks,

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"I dream about dreams about dreams", sang the nightingale
under the pale moon (Sandman)

Attachment Content-Type Size
multixact.c text/x-csrc 27.1 KB
multixact.h text/x-chdr 1.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [WIP] shared locks
Date: 2005-04-19 00:00:57
Message-ID: 1207.1113868857@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> The idea is that a tuple's Xmax can either be a real TransactionId
> (which is used normally like current CVS tip), or, if the infomask has
> HEAP_XMAX_SHARED_LOCK, a MultiXactId.

Interesting idea. Would it be possible to invoke this mechanism only
when actually needed --- that is, the first locker of a given tuple
puts his plain TransactionId into Xmax (and also sets an infomask bit
indicating his intent to have a shared rather than exclusive lock),
and then the second locker to come along replaces the TransactionId
with a MultiTransactionId including himself and the first locker?

This requires 2 infomask bits: 1 for shared vs exclusive lock and one
for whether the Xmax contains a TransactionId or MultiTransactionId.
But we have them available, and I think I like keeping those concepts
separate anyway. (Who's to say we wouldn't want to allow a
MultiTransactionId to hold an exclusive lock, someday?)

The advantage of course would be substantially less overhead in the very
common case where there's no actual contention for the tuple.

> MultiXactIds are implemented using two SLRU areas and a couple of
> variables in ShmemVariableCache. We also XLog groups of them just like
> we do for Oids.

So no need for expansible shmem storage? Might be the way to go.
I haven't read the patch yet but the idea sounds promising.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [WIP] shared locks
Date: 2005-04-19 01:53:38
Message-ID: 200504190153.j3J1rc829706@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > The idea is that a tuple's Xmax can either be a real TransactionId
> > (which is used normally like current CVS tip), or, if the infomask has
> > HEAP_XMAX_SHARED_LOCK, a MultiXactId.
>
> Interesting idea. Would it be possible to invoke this mechanism only
> when actually needed --- that is, the first locker of a given tuple
> puts his plain TransactionId into Xmax (and also sets an infomask bit
> indicating his intent to have a shared rather than exclusive lock),
> and then the second locker to come along replaces the TransactionId
> with a MultiTransactionId including himself and the first locker?
>
> This requires 2 infomask bits: 1 for shared vs exclusive lock and one
> for whether the Xmax contains a TransactionId or MultiTransactionId.
> But we have them available, and I think I like keeping those concepts
> separate anyway. (Who's to say we wouldn't want to allow a
> MultiTransactionId to hold an exclusive lock, someday?)
>
> The advantage of course would be substantially less overhead in the very
> common case where there's no actual contention for the tuple.

Yes, that is certainly possible. Alvaro felt he wanted something
simpler and that the two-bit case would add complexity, but I agree it
would reduce overhead in the most common case.

> > MultiXactIds are implemented using two SLRU areas and a couple of
> > variables in ShmemVariableCache. We also XLog groups of them just like
> > we do for Oids.
>
> So no need for expansible shmem storage? Might be the way to go.
> I haven't read the patch yet but the idea sounds promising.

Right. What he does is to use something like pg_subtrans to have a
rolling window of current multi-xid sets and the idea is that most
access will fall into a small window that is easily stored in the memory
window.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [WIP] shared locks
Date: 2005-04-19 04:00:40
Message-ID: 20050419040040.GA6050@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Apr 18, 2005 at 09:53:38PM -0400, Bruce Momjian wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > > The idea is that a tuple's Xmax can either be a real TransactionId
> > > (which is used normally like current CVS tip), or, if the infomask has
> > > HEAP_XMAX_SHARED_LOCK, a MultiXactId.
> >
> > Interesting idea. Would it be possible to invoke this mechanism only
> > when actually needed --- that is, the first locker of a given tuple
> > puts his plain TransactionId into Xmax (and also sets an infomask bit
> > indicating his intent to have a shared rather than exclusive lock),
> > and then the second locker to come along replaces the TransactionId
> > with a MultiTransactionId including himself and the first locker?
> >
> > This requires 2 infomask bits: 1 for shared vs exclusive lock and one
> > for whether the Xmax contains a TransactionId or MultiTransactionId.
> > But we have them available, and I think I like keeping those concepts
> > separate anyway. (Who's to say we wouldn't want to allow a
> > MultiTransactionId to hold an exclusive lock, someday?)
> >
> > The advantage of course would be substantially less overhead in the very
> > common case where there's no actual contention for the tuple.
>
> Yes, that is certainly possible. Alvaro felt he wanted something
> simpler and that the two-bit case would add complexity, but I agree it
> would reduce overhead in the most common case.

I had thought it would make things more complicated. Now that I know
how the whole thing works I can handle the extra complexity, which is not
much really. Also I wasn't sure if we wanted to waste two infomask
bits on this :-)

> > > MultiXactIds are implemented using two SLRU areas and a couple of
> > > variables in ShmemVariableCache. We also XLog groups of them just like
> > > we do for Oids.
> >
> > So no need for expansible shmem storage? Might be the way to go.

Right. I have stashed some info (like next MultiXactId to assign, the
first MultiXactId this transaction was assigned, etc) in
ShmemVariableCache and PGPROC, but I'm now thinking in storing it
in a [fixed size] shmem area private to multixact.c; this way I don't
have to lock SInvalLock.

BTW, I had to use three additional LWLocks: two for SLRU and one for
MultiXactId generation, which also covers the ShmemVariableCache
variables. I hope that's OK.

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
"No es bueno caminar con un hombre muerto"


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [WIP] shared locks
Date: 2005-04-24 19:15:37
Message-ID: 20050424191537.GB30545@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Mon, Apr 18, 2005 at 08:00:57PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > The idea is that a tuple's Xmax can either be a real TransactionId
> > (which is used normally like current CVS tip), or, if the infomask has
> > HEAP_XMAX_SHARED_LOCK, a MultiXactId.
>
> Interesting idea. Would it be possible to invoke this mechanism only
> when actually needed --- that is, the first locker of a given tuple
> puts his plain TransactionId into Xmax (and also sets an infomask bit
> indicating his intent to have a shared rather than exclusive lock),
> and then the second locker to come along replaces the TransactionId
> with a MultiTransactionId including himself and the first locker?

Ok, here is the patch again. I did this, so there are now two related
bits in the infomask: HEAP_XMAX_IS_MULTI and
HEAP_XMAX_{SHARED,EXCLUSIVE}_LOCK. (I ripped out HEAP_XMAX_FOR_UPDATE).
Locking and using a MultiXactId are orthogonal.

The rest is more or less the same that was in the original patch. I
feel this is in a OK state for review for possible inclusion. Some
testing is still needed regarding MultiXactId wraparound, and SLRU
truncation, and I haven't looked at whether documentation needs
updating.

--
Alvaro Herrera (<alvherre[(at)]dcc(dot)uchile(dot)cl>)
Jude: I wish humans laid eggs
Ringlord: Why would you want humans to lay eggs?
Jude: So I can eat them

Attachment Content-Type Size
shared-locks-2.patch text/plain 105.6 KB
multixact.c text/plain 36.4 KB
multixact.h text/plain 1.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [PATCHES] [WIP] shared locks
Date: 2005-04-27 23:05:40
Message-ID: 3122.1114643140@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Found another interesting thing while testing this. I got a core dump
from the Assert in GetMultiXactIdMembers, complaining that it was being
asked about a MultiXactId >= nextMXact. Sure enough, there was a
multixact on disk, left over from a previous core-dumped test, that was
larger than the nextMXact the current postmaster had started with.

My interpretation of this is that the MultiXact code is violating the
fundamental WAL rule, namely it is allowing data (multixact IDs in data
pages) to reach disk before the relevant WAL record (here the NEXTMULTI
record that should have advanced nextMXact) got to disk. It is very
easy for this to happen in the current system if the buffer page LSNs
aren't updated properly, because the bgwriter will be industriously
dumping dirty pages in the background.

AFAICS there isn't any very convenient way of propagating the true
location of the NEXTMULTI record into the page LSNs of the buffers that
heap_lock_tuple might stick relevant multi IDs into. What's probably
the easiest solution is for XLogPutNextMultiXactId to XLogFlush the
NEXTMULTI record before it returns. This is a mite annoying for
concurrency (because we'll have to hold MultiXactGenLock while flushing
xlog) but it should occur rarely enough to not be a huge deal.

At this point you're probably wondering why OID generation hasn't got
exactly the same problem, seeing that you borrowed all this logic from
the OID generator. The answer is that it would have the same problem,
except that an OID can only get onto disk as part of a tuple insert or
update, and all such events generate xlog records that must follow any
relevant NEXTOID record. Those records *will* get into the page LSNs,
and so the WAL rule is enforced.

So the problem would go away if heap_lock_tuple were generating any xlog
record of its own, which it might be doing by the time the 2PC dust
settles.

Plan B would be to decide that a multi ID that's >= nextMXact isn't
worthy of an Assert failure, but ought to be treated as just a dead
multixact. I'm kind of inclined to do that anyway, because I am not
convinced that this code guarantees no wraparound of multi IDs.

Thoughts?

regards, tom lane