Re: SSI predicate locking on heap -- tuple or row?

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-05-30 17:47:44
Message-ID: 4DE3D840.7090604@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 30.05.2011 17:10, Kevin Grittner wrote:
>> Heikki Linnakangas wrote:
>> On 26.05.2011 06:19, Kevin Grittner wrote:
>> I did some of that in the comments, and I think I understand it
>> now. See attached patch. Does that look right to you?
>
> ! * A serialization failure can only occur if there is a dangerous
> ! * structure in the dependency graph:
> ! *
> ! * Tin ------> Tpivot ------> Tout
> ! * rw rw
> ! *
> ! * Furthermore, Tout must commit first.
>
> I agree that this is easier to read and understand than just straight
> text; but you dropped one other condition, which we use rather
> heavily. We should probably add back something like:
>
> * If Tin is declared READ ONLY (or commits without writing), we can
> * only have a problem if Tout committed before Tin acquired its
> * snapshot.
>
>> * XXX: Where does that last condition come from?
>
> This optimization is an original one, not yet appearing in any
> academic papers; Dan and I are both convinced it is safe, and in
> off-list correspondence with Michael Cahill after he left Oracle, he
> said that he discussed this with Alan Fekete and they both concur
> that it is a safe and good optimization.
>
> This optimization is mentioned in the README-SSI file in the
> "Innovations" section. Do you think that file needs to have more on
> the topic?

Oh, I see this:

> o We can avoid ever rolling back a transaction when the
> transaction on the conflict *in* side of the pivot is explicitly or
> implicitly READ ONLY unless the transaction on the conflict *out*
> side of the pivot committed before the READ ONLY transaction acquired
> its snapshot. (An implicit READ ONLY transaction is one which
> committed without writing, even though it was not explicitly declared
> to be READ ONLY.)

Since this isn't coming from the papers, it would be nice to explain why
that is safe.

>> * XXX: for an anomaly to occur, T2 must've committed before W.
>> * Couldn't we easily check that here, or does the fact that W
>> * committed already somehow imply it?
>
> The flags and macros should probably be renamed to make it more
> obvious that this is covered. The flag which SxactHasConflictOut is
> based on is only set when the conflict out is to a transaction which
> committed ahead of the flagged transaction. Regarding the other
> test -- since we have two transactions (Tin and Tout) which have not
> been summarized, and transactions are summarized in order of commit,
> SxactHasSummaryConflictOut implies that the transaction with the flag
> set has not committed before the transaction to which it has a
> rw-conflict out.

Ah, got it.

> The problem is coming up with a more accurate name which isn't really
> long. The best I can think of is SxactHasConflictOutToEarlierCommit,
> which is a little long. If nobody has a better idea, I guess it's
> better to have a long name than a misleading one. Do you think we
> need to rename SxactHasSummaryConflictOut or just add a comment on
> that use that a summarized transaction will always be committed ahead
> of any transactions which are not summarized?

Dunno. I guess a comment would do. Can you write a separate patch for
that, please?

>> (note that I swapped the second and third check in the function, I
>> think it's more straightforward that way).
>
> It doesn't matter to me either way, so if it seems clearer to you,
> that's a win.

FWIW, the reason I prefer it that way is that the function is checking
three scenarios:

R ----> W ----> T2, where W committed after T2
R ----> W ----> T2, "else"
T0 ----> R ----> W

It seems more straightforward to keep both "R ----> W ----> T2" cases
together.

>> Should we say "Cancelled on identification as pivot, during ...",
>> or "Cancelled on identification as a pivot, during ..." ? We seem
>> to use both in the error messages.
>
> Consistency is good. I think it sounds better with the indefinite
> article in there.

Ok, will do.

> Do you want me to post another patch based on this, or are these
> changes from what you posted small enough that you would prefer to
> just do it as part of the commit?

I've committed with the discussed changes.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-01 22:09:09
Message-ID: 4DE67235020000250003DFBC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 30.05.2011 17:10, Kevin Grittner wrote:

>> This optimization is an original one, not yet appearing in any
>> academic papers; Dan and I are both convinced it is safe, and in
>> off-list correspondence with Michael Cahill after he left Oracle,
>> he said that he discussed this with Alan Fekete and they both
>> concur that it is a safe and good optimization.
>>
>> This optimization is mentioned in the README-SSI file in the
>> "Innovations" section. Do you think that file needs to have more
on
>> the topic?
>
> Oh, I see this:
>
>> o We can avoid ever rolling back a transaction when the
>> transaction on the conflict *in* side of the pivot is explicitly
or
>> implicitly READ ONLY unless the transaction on the conflict *out*
>> side of the pivot committed before the READ ONLY transaction
>> acquired its snapshot. (An implicit READ ONLY transaction is one
>> which committed without writing, even though it was not
>> explicitly declared to be READ ONLY.)
>
> Since this isn't coming from the papers, it would be nice to
> explain why that is safe.

I see that this issue first surfaced on the Wiki page 2 April, 2010,
and was never really described in detail on the -hackers list. To
ensure that it has some documentation here (in case of any possible
IP controversy), I will describe a proof now. For earlier
references one could dig around in Wiki history, posted patches
during CFs, and the git repository history. I have kept a copy of
the repo before the official conversion from CVS.

>From many academic papers, there is well-established proof that
serialization anomalies can only occur under snapshot isolation when
there is a cycle in the graph of apparent order of execution of the
transactions, and that in such a cycle the following pattern of
rw-dependencies always occurs:

Tin - - -> Tpivot - - -> Tout

A rw-dependency (also called a rw-conflict) exists when a read by
one transaction doesn't see the write of another transaction because
the two transactions overlap, regardless of whether the read or the
write actually happens first. Since the reader doesn't see the work
of the writer, the reader appears to have executed first, regardless
of the actual order of snapshot acquisition or commits. The arrows
show the apparent order of execution of the transactions -- Tin
first, Tout last. Published papers have further proven that the
transaction which appears to have executed last of these three must
actually commit before either of the others for an anomaly to occur.
Tin and Tout may be the same transaction (as in the case of simple
write skew), or two distinct transactions.

SSI relies on recognition of this "dangerous structure" to decide
when to cancel a transaction to preserve data integrity. No attempt
is made to complete the cycle from Tout back to Tin. Partly this is
because of the expense of doing so -- there could be a long chain of
rw-dependencies, wr-dependencies (where the reader *can* see the
work of another transaction because it *did* commit first), and
ww-dependencies (where the writer is updating a row version left by
another transaction, which must have committed first). There is
also the uncomfortable possibility that a client application
*outside* the database ran a transaction which made some change and,
after observing the successful commit of this transaction, is
proceeding with the knowledge that it ran before the next transaction
is submitted.

The novel optimization we've used in the PostgreSQL implementation
of SSI is based on the fact that of these four ways of determining
which transaction appears to have executed first, all but the
rw-dependency are based on the transaction which appears to have
executed first committing before the apparent later transaction
acquires its snapshot. When you combine this with the fact that the
transaction which appears to execute later in a rw-dependency is the
one which performed the write, you have the basis of an interesting
optimization for READ ONLY transactions.

In the above diagram, if Tin is READ ONLY, it cannot have a
rw-dependency *in* from some other transaction. The only way Tout
can directly appear to have executed before Tin is if Tout committed
before Tin acquired its snapshot, so that Tin can read something
Tout wrote, or an external application can know that it successfully
committed Tout before beginning Tin. The published conditions must
all still hold -- Tin and Tout must both overlap Tpivot, the same
rw-dependencies must exist, and Tout must still commit first of the
three; but when Tin doesn't write, Tout must actually commit *even
earlier* -- before Tin gets started -- to have an anomaly.

For a proof the question becomes: If Tin does not write and Tin and
Tout overlap, can a dependency or chain of dependencies develop
which makes it appear that Tout executed before Tin? If this can't
happen without creating a dangerous structure around a different
pivot, then no cycle can develop and the optimization is safe.

Since Tin doesn't write, no transaction can appear to come before it
because of failure to read its writes -- no rw-dependency in to this
transaction is possible. The only transactions Tin can appear to
follow are transactions which committed in time to make their work
visible to Tin. Let's assume such a transaction exists, called T0:

T0 -----> Tin - - -> Tpivot - - -> Tout

It would be possible for Tout to overlap T0 and develop a
rw-dependency out to it, but T0 must commit before Tin starts, so
for Tout to overlap Tin, T0 must commit before Tout, so a
rw-dependency from Tout to T0 would make Tout a pivot and cause a
rollback which would break the cycle. Any other transaction to
which Tout developed a rw-dependency out would have the same
problem.

Any *other* type of dependency out from Tout would require the
transaction to acquire its snapshot after the commit of Tout. Since
T0 commits before Tin begins and Tout overlaps Tin, the commit of
Tout must come after the commit of T0, so no transaction which
begins after Tout commits can overlap T0. This leaves no possible
way to form a cycle when Tin is READ ONLY and Tout overlaps Tin.

I won't be shocked if Dan can come up with a shorter proof, but I'm
confident this one is solid.

Patch to come soon, but I thought it best to post the proof here
first to allow a chance to refine it based on discussion -- or
shorter proofs.

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-02 07:04:20
Message-ID: 20110602070419.GA10064@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:
> I won't be shocked if Dan can come up with a shorter proof, but I'm
> confident this one is solid.

Well, so happens I wrote a proof on the airplane today, before I saw
your mail. It's actually quite straightforward... (well, at least more
so than I was expecting)

> From many academic papers, there is well-established proof that
> serialization anomalies can only occur under snapshot isolation when
> there is a cycle in the graph of apparent order of execution of the
> transactions, and that in such a cycle the following pattern of
> rw-dependencies always occurs:
>
> Tin - - -> Tpivot - - -> Tout
>
> A rw-dependency (also called a rw-conflict) exists when a read by
> one transaction doesn't see the write of another transaction because
> the two transactions overlap, regardless of whether the read or the
> write actually happens first. Since the reader doesn't see the work
> of the writer, the reader appears to have executed first, regardless
> of the actual order of snapshot acquisition or commits. The arrows
> show the apparent order of execution of the transactions -- Tin
> first, Tout last. Published papers have further proven that the
> transaction which appears to have executed last of these three must
> actually commit before either of the others for an anomaly to occur.

We can actually say something slightly stronger than that last
sentence: Tout has to commit before *any* other transaction in the
cycle. That doesn't help us implement SSI, because we never try to look
at an entire cycle, but it's still true and useful for proofs like this.

Now, supposing Tin is read-only...

Since there's a cycle, there must also be a transaction that precedes
Tin in the serial order. Call it T0. (T0 might be the same transaction
as Tout, but that doesn't matter.) There's an edge in the graph from
T0 to Tin. It can't be a rw-conflict, because Tin was read-only, so it
must be a ww- or wr-dependency. Either means T0 committed before Tin
started.

Because Tout committed before any other transaction in the cycle, Tout
has to commit before T0 commits -- and thus before Tin starts.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-02 18:01:05
Message-ID: 4DE78991020000250003E03A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> On Wed, Jun 01, 2011 at 05:09:09PM -0500, Kevin Grittner wrote:

>> Published papers have further proven that the transaction which
>> appears to have executed last of these three must actually commit
>> before either of the others for an anomaly to occur.
>
> We can actually say something slightly stronger than that last
> sentence: Tout has to commit before *any* other transaction in the
> cycle. That doesn't help us implement SSI, because we never try to
> look at an entire cycle, but it's still true and useful for proofs
> like this.

I didn't know that, although it doesn't seem too surprising. With
that as a given, the proof can be quite short and straightforward.

> Now, supposing Tin is read-only...
>
> Since there's a cycle, there must also be a transaction that
> precedes Tin in the serial order. Call it T0. (T0 might be the
> same transaction as Tout, but that doesn't matter.) There's an
> edge in the graph from T0 to Tin. It can't be a rw-conflict,
> because Tin was read-only, so it must be a ww- or wr-dependency.
> Either means T0 committed before Tin started.
>
> Because Tout committed before any other transaction in the cycle,
> Tout has to commit before T0 commits -- and thus before Tin
> starts.

If we're going to put this into the README-SSI as the proof of the
validity of this optimization, I'd like to have a footnote pointing
to a paper describing the "first commit in the cycle" aspect of a
dangerous structure. Got any favorites, or should I fall back on a
google search?

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-02 21:14:41
Message-ID: 4DE7B6F1020000250003E052@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 30.05.2011 17:10, Kevin Grittner wrote:
>> Heikki Linnakangas wrote:

>>> * XXX: for an anomaly to occur, T2 must've committed
>>> * before W. Couldn't we easily check that here, or does
>>> * the fact that W committed already somehow imply it?
>>
>> The flags and macros should probably be renamed to make it more
>> obvious that this is covered. The flag which SxactHasConflictOut
>> is based on is only set when the conflict out is to a transaction
>> which committed ahead of the flagged transaction. Regarding the
>> other test -- since we have two transactions (Tin and Tout) which
>> have not been summarized, and transactions are summarized in
>> order of commit, SxactHasSummaryConflictOut implies that the
>> transaction with the flag set has not committed before the
>> transaction to which it has a rw-conflict out.
>
> Ah, got it.
>
>> The problem is coming up with a more accurate name which isn't
>> really long. The best I can think of is
>> SxactHasConflictOutToEarlierCommit, which is a little long. If
>> nobody has a better idea, I guess it's better to have a long name
>> than a misleading one. Do you think we need to rename
>> SxactHasSummaryConflictOut or just add a comment on that use that
>> a summarized transaction will always be committed ahead of any
>> transactions which are not summarized?
>
> Dunno. I guess a comment would do. Can you write a separate patch
> for that, please?

Attached is a comments-only patch for this, along with one
correction to the comments you added and a couple other typos.

I'll submit a patch for the README-SSI file once I find a reference
I like to a paper describing what Dan's proof uses as a premise --
that the transaction on the rw-conflict *out* side of the pivot must
not only be the first of the three transactions in the dangerous
structure to commit, but the first in the entire cycle of which the
dangerous structure is a part. With that premise as a given, the
proof is short, clear, and unassailable; but I think we need a cite
to use that argument convincingly.

-Kevin

Attachment Content-Type Size
ssi-comment-fixups-1.patch text/plain 4.0 KB

From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-02 21:37:50
Message-ID: 20110602213749.GB10064@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jun 02, 2011 at 01:01:05PM -0500, Kevin Grittner wrote:
> If we're going to put this into the README-SSI as the proof of the
> validity of this optimization, I'd like to have a footnote pointing
> to a paper describing the "first commit in the cycle" aspect of a
> dangerous structure. Got any favorites, or should I fall back on a
> google search?

Hmm. I don't see any that state that in so many words, but it's an
obvious consequence of the proof of Theorem 2.1 in "Making Snapshot
Isolation Serializable" -- note that T3 is chosen to be the transaction
in the cycle with the earliest commit time.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI predicate locking on heap -- tuple or row?
Date: 2011-06-03 10:04:24
Message-ID: 4DE8B1A8.2020505@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.06.2011 00:14, Kevin Grittner wrote:
> Attached is a comments-only patch for this, along with one
> correction to the comments you added and a couple other typos.

Ok, committed.

> I'll submit a patch for the README-SSI file once I find a reference
> I like to a paper describing what Dan's proof uses as a premise --
> that the transaction on the rw-conflict *out* side of the pivot must
> not only be the first of the three transactions in the dangerous
> structure to commit, but the first in the entire cycle of which the
> dangerous structure is a part. With that premise as a given, the
> proof is short, clear, and unassailable; but I think we need a cite
> to use that argument convincingly.

Agreed.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com