Re: Serializable Snapshot Isolation

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Subject: Serializable Snapshot Isolation
Date: 2010-09-14 16:34:10
Message-ID: 4C8F5DB202000025000356A0@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is the latest Serializable Snapshot Isolation (SSI) patch.

With Joe's testing and review, and with stress tests adapted from
those used by Florian for his patch, we were able to identify and
fix several bugs. Stability seems good now. We have many tests for
correct behavior which are all looking good. The only solid
benchmarks we have so far show no impact on isolation levels other
than SERIALIZABLE, and a 1.8% increase in run time for a saturation
run of small, read only SERIALIZABLE transactions against a fully
cached database. Dan has been working on setting up some benchmarks
using DBT-2, but doesn't yet have results to publish. If we can get
more eyes on the code during this CF, I'm hoping we can get this
patch committed this round.

This patch is basically an implementation of the techniques
described in the 2008 paper by Cahill et al, and which was further
developed in Cahill's 2009 PhD thesis. Techniques needed to be
adapted somewhat because of differences between PostgreSQL and the
two databases used for prototype implementations for those papers
(Oracle Berkeley DB and InnoDB), and there are a few original ideas
from Dan and myself used to optimize the implementation. One reason
for hoping that this patch gets committed in this CF is that it will
leave time to try out some other, more speculative optimizations
before release.

Documentation is not included in this patch; I plan on submitting
that to a later CF as a separate patch. Changes should be almost
entirely within the Concurrency Control chapter. The current patch
has one new GUC which (if kept) will need to be documented, and one
of the potential optimizations could involve adding a new
transaction property which would then need documentation.

The premise of the patch is simple: that snapshot isolation comes so
close to supporting fully serializable transactions that S2PL is not
necessary -- the database engine can watch for rw-dependencies among
transactions, without introducing any blocking, and roll back
transactions as required to prevent serialization anomalies. This
eliminates the need for using the SELECT FOR SHARE or SELECT FOR
UPDATE clauses, the need for explicit locking, and the need for
additional updates to introduce conflict points.

While block-level locking is included in this patch for btree and
GiST indexes, an index relation lock is still used for predicate
locks when a search is made through a GIN or hash index. These
additional index types can be implemented separately. Dan is
looking at bringing btree indexes to finer granularity, but wants to
have good benchmarks first, to confirm that the net impact is a gain
in performance.

Most of the work is in the new predicate.h and predicate.c files,
which total 2,599 lines, over 39% of which are comment lines. There
are 1626 lines in the new pg_dtester.py.in files, which uses Markus
Wanner's dtester software to implement a large number of correctness
tests. We added 79 lines to lockfuncs.c to include the new
SIReadLock entries in the pg_locks view. The rest of the patch
affects 286 lines (counting an updated line twice) across 25
existing PostgreSQL source files to implement the actual feature.

The code organization and naming issues mentioned here remain:

http://archives.postgresql.org/pgsql-hackers/2010-07/msg00383.php

-Kevin

Attachment Content-Type Size
serializable-5.patch text/plain 185.2 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org, Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-14 19:48:50
Message-ID: 4C8FD1A2.9080704@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14/09/10 19:34, Kevin Grittner wrote:
> Attached is the latest Serializable Snapshot Isolation (SSI) patch.

Great work! A year ago I thought it would be impossible to have a true
serializable mode in PostgreSQL because of the way we do MVCC, and now
we have a patch.

At a quick read-through, the code looks very tidy and clear now. Some
comments:

Should add a citation to Cahill's work this is based on. Preferably with
a hyperlink. A short description of how the predicate locks help to
implement serializable mode would be nice too. I haven't read Cahill's
papers, and I'm left wondering what the RW conflicts and dependencies
are, when you're supposed to grab predicate locks etc.

If a page- or relation level SILOCK is taken, is it possible to get
false conflicts? Ie. a transaction is rolled back because it modified a
tuple on a page where some other transaction modified another tuple,
even though there's no dependency between the two.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-14 20:42:37
Message-ID: 4C8F97ED02000025000356DF@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Great work! A year ago I thought it would be impossible to have a
> true serializable mode in PostgreSQL because of the way we do
> MVCC, and now we have a patch.
>
> At a quick read-through, the code looks very tidy and clear now.
> Some comments:
>
> Should add a citation to Cahill's work this is based on.
> Preferably with a hyperlink.

I'm planning on drawing from the current Wiki page:

http://wiki.postgresql.org/wiki/Serializable

to put together a README file; do you think the references should go
in the README file, the source code, or both?

> A short description of how the predicate locks help to implement
> serializable mode would be nice too. I haven't read Cahill's
> papers, and I'm left wondering what the RW conflicts and
> dependencies are, when you're supposed to grab predicate locks
> etc.

Again, I summarize that in the Wiki page, and was planning on
putting it into the README. If you've read the Wiki page and it's
not clear, then I definitely have some work to do there.

> If a page- or relation level SILOCK is taken, is it possible to
> get false conflicts?

Yes. This technique will generate some false positive rollbacks.
Software will need to be prepared to retry any database transaction
which fails with a serialization failure SQLSTATE. I expect that
proper connection pooling will be particularly important when using
SSI, and flagging transactions which don't write to permanent tables
as READ ONLY transactions will help reduce the rollback rate, too.

Some of the optimizations we have sketched out will definitely
reduce the rate of false positives; however, we don't want to
implement them without a better performance baseline because the
cost of tracking the required information and the contention for LW
locks to maintain the information may hurt performance more than the
restart of transactions which experience serialization failure.

I don't want to steal Dan's thunder after all the hard work he's
done to get good numbers from the DBT-2 benchmark, but suffice it to
say that I've been quite pleased with the performance on that
benchmark. He's pulling together the data for a post on the topic.

> Ie. a transaction is rolled back because it modified a tuple on a
> page where some other transaction modified another tuple, even
> though there's no dependency between the two.

Well, no, because this patch doesn't do anything new with write
conflicts. It's all about the apparent order of execution, based on
one transaction not being able to read what was written by a
concurrent transaction. The reading transaction must be considered
to have run first in that case (Hey, now you know what a rw-conflict
is!) -- but such references can create a cycle -- which is the
source of all serialization anomalies in snapshot isolation.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-14 21:49:48
Message-ID: 4C8FA7AC02000025000356ED@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been thinking about these points, and reconsidered somewhat.

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Should add a citation to Cahill's work this is based on.
> Preferably with a hyperlink.

I've been thinking that this should be mentioned in both the README
and the source code.

> A short description of how the predicate locks help to implement
> serializable mode would be nice too. I haven't read Cahill's
> papers, and I'm left wondering what the RW conflicts and
> dependencies are, when you're supposed to grab predicate locks
> etc.

Again -- why be stingy? Given a more complete README file, how
about something like?:

/*
* A rw-conflict occurs when a read by one serializable transaction
* does not see the write of a concurrent serializable transaction
* when that write would have been visible had the writing
* transaction committed before the start of the reading
* transaction. When the write occurs first, the read can detect
* this conflict by examining the MVCC information. When the read
* occurs first, it must record this somewhere so that writes can
* check for a conflict. Predicate locks are used for this.
* Detection of such a conflict does not cause blocking, and does
* not, in itself, cause a transaction rollback.
*
* Transaction rollback is required when one transaction (called a
* "pivot") has a rw-conflict *in* (a concurrent transaction
* couldn't see its write) as well as *out* (it couldn't see the
* write of another transaction). In addition, the transaction on
* the "out" side of the pivot must commit first, and if the
* transaction on the "in" side of the pivot is read-only, it must
* acquire its snapshot after the successful commit of the
* transaction on the "out" side of the pivot.
*/

Would something like that have helped?

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-15 07:49:19
Message-ID: 4C907A7F.4000607@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 15/09/10 00:49, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> A short description of how the predicate locks help to implement
>> serializable mode would be nice too. I haven't read Cahill's
>> papers, and I'm left wondering what the RW conflicts and
>> dependencies are, when you're supposed to grab predicate locks
>> etc.
>
> Again -- why be stingy? Given a more complete README file, how
> about something like?:

Well, if it's explained in the readme, that's probably enough.

> /*
> * A rw-conflict occurs when a read by one serializable transaction
> * does not see the write of a concurrent serializable transaction
> * when that write would have been visible had the writing
> * transaction committed before the start of the reading
> * transaction. When the write occurs first, the read can detect
> * this conflict by examining the MVCC information. When the read
> * occurs first, it must record this somewhere so that writes can
> * check for a conflict. Predicate locks are used for this.
> * Detection of such a conflict does not cause blocking, and does
> * not, in itself, cause a transaction rollback.
> *
> * Transaction rollback is required when one transaction (called a
> * "pivot") has a rw-conflict *in* (a concurrent transaction
> * couldn't see its write) as well as *out* (it couldn't see the
> * write of another transaction). In addition, the transaction on
> * the "out" side of the pivot must commit first, and if the
> * transaction on the "in" side of the pivot is read-only, it must
> * acquire its snapshot after the successful commit of the
> * transaction on the "out" side of the pivot.
> */
>
> Would something like that have helped?

Yes. An examples would be very nice too, that description alone is
pretty hard to grasp. Having read the Wiki page, and the slides from
your presentation at pg east 2010, I think understand it now.

Now that I understand what the predicate locks are for, I'm now trying
to get my head around all the data structures in predicate.c. The
functions are well commented, but an overview at the top of the file of
all the hash tables and other data structures would be nice. What is
stored in each, when are they updated, etc.

I've been meaning to look at this patch for some time, but now I'm
actually glad I haven't because I'm now getting a virgin point of view
on the code, seeing the problems that anyone who's not familiar with the
approach will run into. :-)

BTW, does the patch handle prepared transactions yet? It introduces a
call to PreCommit_CheckForSerializationFailure() in CommitTransaction, I
think you'll need that in PrepareTransaction as well.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-15 13:15:53
Message-ID: 4C9080B9020000250003573D@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Now that I understand what the predicate locks are for, I'm now
> trying to get my head around all the data structures in
> predicate.c. The functions are well commented, but an overview at
> the top of the file of all the hash tables and other data
> structures would be nice. What is stored in each, when are they
> updated, etc.

It probably doesn't help that they're split between predicate.c and
predicate.h. (They were originally all in predicate.c because
nobody else needed to see them, but we moved some to the .h file to
expose them to lockfuncs.c to support listing the locks.)

I'm inclined to move everything except the function prototypes out
of predicate.h to a new predicate_interal.h, and move the structures
defined in predicate.c there, too. And, of course, add the overview
comments in the new file. If that sounds good, I can probably
post a new patch with those changes today -- would that be a good
idea, or should I wait for more feedback before doing that? (It
will be in the git repo either way.)

> BTW, does the patch handle prepared transactions yet? It
> introduces a call to PreCommit_CheckForSerializationFailure() in
> CommitTransaction, I think you'll need that in PrepareTransaction
> as well.

Good point. In spite of the NB comment, I did not notice that.
Will fix.

Thanks for the feedback!

-Kevin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-15 18:29:11
Message-ID: 1284575232-sup-3245@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Kevin Grittner's message of mié sep 15 09:15:53 -0400 2010:

> I'm inclined to move everything except the function prototypes out
> of predicate.h to a new predicate_interal.h, and move the structures
> defined in predicate.c there, too.

I think that would also solve a concern that I had, which is that we
were starting to include relcache.h (and perhaps other headers as well,
but that's the one that triggered it for me) a bit too liberally, so +1
from me.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-15 18:52:36
Message-ID: 4C90CFA40200002500035798@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> I think that would also solve a concern that I had, which is that
> we were starting to include relcache.h (and perhaps other headers
> as well, but that's the one that triggered it for me) a bit too
> liberally, so +1 from me.

Unfortunately, what I proposed doesn't solve that for relcache.h,
although it does eliminate lock.h from almost everywhere and htup.h
from everywhere. (The latter seemed to be left over from an
abandoned approach, and was no longer needed in predicate.h in any
event.)

Most of the functions in predicate.c take a Relation as a parameter.
I could split out the function prototypes for those which *don't*
use it to a separate .h file if you think it is worthwhile. The
functions would be:

void InitPredicateLocks(void);
Size PredicateLockShmemSize(void);
void RegisterSerializableTransaction(const Snapshot snapshot);
void ReleasePredicateLocks(const bool isCommit);
void PreCommit_CheckForSerializationFailure(void);

The files where these are used are:

src/backend/storage/ipc/ipci.c
src/backend/utils/time/snapmgr.c
src/backend/utils/resowner/resowner.c
src/backend/access/transam/xact.c

So any of these files which don't already include relcache.h could
remain without it if we make this split. Is there an easy way to
check which might already include it? Is it worth adding one more
.h file to avoid including relcache.h and snapshot.h in these four
files?

Let me know -- I'm happy to arrange this any way people feel is most
appropriate. I have a profound appreciation for the organization of
this code, and want to maintain it, even if I don't possess the
perspective to know how to best do so. The respect comes from
developing this patch -- every time I gave my manager an estimate of
how long it would take to do something, I found it actually took
about one-third of that time -- and it was entirely due to the
organization and documentation of the code.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-16 22:35:10
Message-ID: 4C92554E02000025000358F0@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> The functions are well commented, but an overview at the top of
> the file of all the hash tables and other data structures would be
> nice. What is stored in each, when are they updated, etc.

I moved all the structures from predicate.h and predicate.c to a new
predicate_internal.h file and added comments. You can view its
current contents here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=blob;f=src/include/storage/predicate_internal.h;h=7cdb5af6eebdc148dd5ed5030847ca50d7df4fe8;hb=7f05b21bc4d846ad22ae8c160b1bf8888495e254

Does this work for you?

That leaves the predicate.h file with just this:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=blob;f=src/include/storage/predicate.h;h=7dcc2af7628b860f9cec9ded6b78f55163b58934;hb=7f05b21bc4d846ad22ae8c160b1bf8888495e254

-Kevin


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-16 22:58:06
Message-ID: 1284677615-sup-2005@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Kevin Grittner's message of mié sep 15 14:52:36 -0400 2010:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
> > I think that would also solve a concern that I had, which is that
> > we were starting to include relcache.h (and perhaps other headers
> > as well, but that's the one that triggered it for me) a bit too
> > liberally, so +1 from me.
>
> Unfortunately, what I proposed doesn't solve that for relcache.h,
> although it does eliminate lock.h from almost everywhere and htup.h
> from everywhere.

Now that I look at your new patch, I noticed that I was actually
confusing relcache.h with rel.h. The latter includes a big chunk of our
headers, but relcache.h is pretty thin. Including relcache.h in another
header is not much of a problem.

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-16 23:09:45
Message-ID: 4C925D6902000025000358F6@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> Now that I look at your new patch, I noticed that I was actually
> confusing relcache.h with rel.h. The latter includes a big chunk
> of our headers, but relcache.h is pretty thin. Including
> relcache.h in another header is not much of a problem.

OK, thanks for the clarification.

With the structures all brought back together in a logical order,
and the new comments in front of the structure declarations, do you
think a summary at the top of the file is still needed in that
header file?

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-17 06:11:35
Message-ID: 4C930697.9090004@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 17/09/10 01:35, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> The functions are well commented, but an overview at the top of
>> the file of all the hash tables and other data structures would be
>> nice. What is stored in each, when are they updated, etc.
>
> I moved all the structures from predicate.h and predicate.c to a new
> predicate_internal.h file and added comments. You can view its
> current contents here:
>
> http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=blob;f=src/include/storage/predicate_internal.h;h=7cdb5af6eebdc148dd5ed5030847ca50d7df4fe8;hb=7f05b21bc4d846ad22ae8c160b1bf8888495e254
>
> Does this work for you?

Yes, thank you, that helps a lot.

So, the purpose of SerializableXidHash is to provide quick access to the
SERIALIZABLEXACT struct of a top-level transaction, when you know its
transaction id or any of its subtransaction ids. To implement the "or
any of its subtransaction ids" part, you need to have a SERIALIZABLEXID
struct for each subtransaction in shared memory. That sounds like it can
eat through your shared memory very quickly if you have a lot of
subtransactions.

Why not use SubTransGetTopmostTransaction() ?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com