A third lock method

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: A third lock method
Date: 2009-12-30 23:08:04
Message-ID: 4B3B88F4020000250002DAE1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been reviewing code to get a better handle on the scope of
changes to support serializable transactions, in preparation for
next month's meetings with our CIO. My posts should start getting
progressively less hand-wavy. :-)

I've come to a few conclusions:

(1) The notions of having multiple serializable implementations
(SSI, S2PL, OCC) which can be mapped as a configuration option is
really not worth it. The cases where S2PL or OCC beat SSI are too
narrow to be worth the effort, and the pluggable approach seems like
it would be much more invasive and destabilizing than just picking
one and doing it more directly.

(2) If we're going with SSI, it appears that it would be a very
good idea to define a third lock method (SIREAD_LOCKMETHOD perhaps)
for the SIREAD locks. For one thing, that could keep them out of
the way of normal conflict detection (they don't conflict with
anything, per se) and out of the way of deadlock detection,
including rearrangement of waiting transactions. For another, they
have a different life-cycle -- they must stick around (along with
some minimal transaction information) until all transactions with a
snapshot prior to their commit have completed. That seems cleaner
and easier with a separate lock method.

Thoughts?

-Kevin


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: A third lock method
Date: 2009-12-30 23:25:08
Message-ID: 200912302325.nBUNP8O27486@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin Grittner wrote:
> I've been reviewing code to get a better handle on the scope of
> changes to support serializable transactions, in preparation for
> next month's meetings with our CIO. My posts should start getting
> progressively less hand-wavy. :-)
>
> I've come to a few conclusions:
>
> (1) The notions of having multiple serializable implementations
> (SSI, S2PL, OCC) which can be mapped as a configuration option is
> really not worth it. The cases where S2PL or OCC beat SSI are too
> narrow to be worth the effort, and the pluggable approach seems like
> it would be much more invasive and destabilizing than just picking
> one and doing it more directly.

Agreed.

> (2) If we're going with SSI, it appears that it would be a very
> good idea to define a third lock method (SIREAD_LOCKMETHOD perhaps)
> for the SIREAD locks. For one thing, that could keep them out of
> the way of normal conflict detection (they don't conflict with
> anything, per se) and out of the way of deadlock detection,
> including rearrangement of waiting transactions. For another, they
> have a different life-cycle -- they must stick around (along with
> some minimal transaction information) until all transactions with a
> snapshot prior to their commit have completed. That seems cleaner
> and easier with a separate lock method.

I must be missing something but I thought the only problem with our
existing snapshot system was that you could see a row updated after your
snapshot was created, and that the solution to that was to abort the
transaction that would see the new row. Can you tell me what I am
missing?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Bruce Momjian *EXTERN*" <bruce(at)momjian(dot)us>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: A third lock method
Date: 2009-12-31 08:22:15
Message-ID: D960CB61B694CF459DCFB4B0128514C2039380FF@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian wrote:
> I must be missing something but I thought the only problem with our
> existing snapshot system was that you could see a row updated after your
> snapshot was created, and that the solution to that was to abort the
> transaction that would see the new row. Can you tell me what I am
> missing?

But with "snapshot isolation" (what our "serializable" corresponds to)
you cannot see rows updated after snapshot creation, right?

So phantom reads cannot occur, but we still are not truly
serializable.

See the example I concocted in
http://archives.postgresql.org/pgsql-hackers/2009-05/msg00316.php
for illustration.

Yours,
Laurenz Albe

PS: Different from what Kevin claimed, Oracle also cannot grant
you strictly serializable transactions, because they also use
snapshot isolation. Seems that they get away with it.
My feeling is that the cases where this would be a problem are pretty
rare; my example referenced above feels artificial for a good reason.

If we can do it better than Oracle, I'm not against it :^)


From: Nicolas Barbier <nicolas(dot)barbier(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: A third lock method
Date: 2009-12-31 12:20:32
Message-ID: b0f3f5a10912310420h5bf988e1g941fdefa1e8a586a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2009/12/31 Bruce Momjian <bruce(at)momjian(dot)us>:

> I must be missing something but I thought the only problem with our
> existing snapshot system was that you could see a row updated after your
> snapshot was created, and that the solution to that was to abort the
> transaction that would see the new row.  Can you tell me what I am
> missing?

The problem is rather the opposite. A minimal example of a situation
that the current implementation allows, and which the new proposal
tries to fix is:

1. The database contains rows X and Y having one column, and having
different values for that column (i.e., X != Y).
2. "Serializable" (in the current PG sense) transactions A and B run
concurrently (i.e., both take their snapshot before the other commits,
so they don't see each other's changes).
3. Y := X; A reads X and updates Y to become the same as X.
4. X := Y; B reads Y and updates X to become the same as Y.

Result: Sequentially executing A and B in either order leads to a
result where X = Y. Still, after the above steps 1-4, the values of X
and Y are switched around (and thus X != Y). Therefore, the execution
was (by definition) not serializable. This is caused by the fact that
in a serializable execution either A would have seen the update
performed by B, or B would have seen the update performed by A. This
problem is called "write skew" in the paper (their example is less
theoretical, but also more complex because of the use of COUNT(..).)

So instead of aborting transactions "because otherwise they would see
too many changes", the goal is rather to abort transactions "because
otherwise they wouldn't have seen enough changes".

The SIREAD locks are used to mark "the versions that have been read by
whom" (for all transactions that were concurrent with any of the
active transactions), so that potentially problematic writes that
occur after reads can be detected: "I wrote a new version of something
that was already read by a concurrent transaction, so in any
serialization, I must come after that other transaction". The other
direction ("I read something that has a newer version than what I just
read, so in any serialization, I must come before that other
transaction") can be detected straightforwardly.

Nicolas