Quick Links

Re: User-facing aspects of serializable transactions

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<pgsql-hackers(at)postgresql(dot)org>
Cc:	<Michael Cahill <mjc(at)it(dot)usyd(dot)edu(dot)au>
Subject:	User-facing aspects of serializable transactions
Date:	2009-05-27 20:34:36
Message-ID:	4A1D5D8C.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I want to try to get agreement that it would be a good idea to
implement serializable transactions, and what that would look like
from the user side. At this point, we should avoid discussions of
whether it's possible or how it would be implemented, but focus on
what that would look like and whether it would be desirable.

Let's start with reasons:

(1) The standard has always required that the serializable
transaction isolation mode be supported. We don't comply with recent
versions of the standard, which have changed the definition of this
mode to go beyond the four specific anomalies mentioned, and now
requires that any execution of concurrent serializable transactions
must yield results consistent with some serial execution of those
transactions. Being able to show compliance with a significant point
in the standard has value, all by itself.

(2) The standard requires this because it is the only cost-effective
way to ensure data integrity in some environments, particularly those
with a large number of programmers, tables, and queries; and which
have complex data integrity rules. Basically, any serializable
transaction which can be shown to do the right thing when run by
itself will automatically, with no additional development effort, do
the right thing when run in any arbitrary mix of concurrent
transactions. This feature would be likely to make PostgreSQL a
viable option in some shops where it currently isn't.

(3) Many other database products provide serializable transactions,
including DB2, Microsoft SQL Server, and Sybase ASE. Some MVCC
databases, like recent Microsoft SQL Server releases, allow the user
to choose snapshot isolation or full serializable isolation.

(4) It may simplify the code to implement PostgreSQL foreign key
constraints and/or improve concurrency in the face of such
constraints.

(5) It may simplify application code written for PostgreSQL and
improve concurrency of transactions with possible conflicts, since
explicit locks will not need to be taken, and blocking currently
resulting from explicit locks can be eliminated.

Proposed user visible aspects are:

(A) Well known anomalies possible under snapshot isolation will not
be possible among transactions running at the serializable transaction
isolation level, with no need to explicitly take locks to prevent
them.

(B) While no blocking will occur between reads and writes, certain
combinations of reads and writes will cause a rollback with a SQLSTATE
which indicates a serialization failure. Any transaction running at
this isolation level must be prepared to deal with these.

(C) One or more GUCs will be added to control whether the new
behavior is used when serializable transaction isolation is requested
or whether, for compatibility with older PostgreSQL releases, the
transaction actually runs with snapshot isolation. In any event, a
request for repeatable read mode will provide the existing snapshot
isolation mode.

(D) It may be desirable to use these techniques, rather than current
techniques, to enforce the referential integrity specified by foreign
keys. If this is done, enforcement would produce less blocking, but
might increase rollbacks due to serialization failures. Perhaps this
should be controlled by a separate GUC.

(E) Since there will be a trade-off between the overhead of finer
granularity in tracking locks and the reduced number of rollbacks at a
finer granularity, it might be desirable to have a GUC to control
default granularity and a table property which can override the
default for individual tables. (In practice, with a different
database product which supported something like this, we found our
best performance with page level locks on all but a few small,
frequently updated tables -- which we set to row level locking.)

(F) Databases clusters making heavy use of serializable transactions
would need to boost the number of locks per transaction.

Thoughts?

-Kevin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-27 23:53:10
Message-ID:	1243468390.24838.153.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-05-27 at 15:34 -0500, Kevin Grittner wrote:
> (2) The standard requires this because it is the only cost-effective
> way to ensure data integrity in some environments, particularly those
> with a large number of programmers, tables, and queries; and which
> have complex data integrity rules. Basically, any serializable
> transaction which can be shown to do the right thing when run by
> itself will automatically, with no additional development effort, do
> the right thing when run in any arbitrary mix of concurrent
> transactions. This feature would be likely to make PostgreSQL a
> viable option in some shops where it currently isn't.

+1. It would be great if this could be accomplished with reasonable
performance, or at least predictable performance.

> (C) One or more GUCs will be added to control whether the new
> behavior is used when serializable transaction isolation is requested
> or whether, for compatibility with older PostgreSQL releases, the
> transaction actually runs with snapshot isolation. In any event, a
> request for repeatable read mode will provide the existing snapshot
> isolation mode.
>

I'm not sure a GUC is the best way here, are you talking about as a
migration path, or something that would exist forever?

Regards,
Jeff Davis

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-27 23:54:31
Message-ID:	4A1D8C66.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Wed, 2009-05-27 at 15:34 -0500, Kevin Grittner wrote:

>> (C) One or more GUCs will be added to control whether the new
>> behavior is used when serializable transaction isolation is
>> requested or whether, for compatibility with older PostgreSQL
>> releases, the transaction actually runs with snapshot isolation.
>> In any event, a request for repeatable read mode will provide the
>> existing snapshot isolation mode.
>
> I'm not sure a GUC is the best way here, are you talking about as a
> migration path, or something that would exist forever?

I've gotten the distinct impression that some would prefer to continue
to use their existing techniques under snapshot isolation. I was sort
of assuming that they would want a GUC to default to legacy behavior
with a new setting for standard compliant behavior.

Another alternative here would be to just change a request for a
serializable transation to give you a serializable transaction, and
document that the existing snapshot isolation is now available only by
requesting repeatable read mode. Right now you get snapshot isolation
mode on a request for either repeatable read mode or serializable
mode.

I think that many people only use read committed; they would not be
impacted at all.

What do you think would be best here?

-Kevin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:20:36
Message-ID:	1243470036.24838.168.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-05-27 at 18:54 -0500, Kevin Grittner wrote:
> I've gotten the distinct impression that some would prefer to continue
> to use their existing techniques under snapshot isolation. I was sort
> of assuming that they would want a GUC to default to legacy behavior
> with a new setting for standard compliant behavior.

That sounds like the "migration path" sort of GUC, which sounds
reasonable to me.

But what about all the other possible behaviors that were brought up
(mentioned in more detail in [1]), such as:

1. implementation of the paper's technique sans predicate locking, that
would avoid more serialization anomalies but not all?
2. various granularities of predicate locking?

Should these be things the user controls per-transaction? If so, how?

Regards,
Jeff Davis

[1] http://archives.postgresql.org/pgsql-hackers/2009-05/msg01128.php

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:38:49
Message-ID:	21808.1243471129@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2009-05-27 at 18:54 -0500, Kevin Grittner wrote:
>> I've gotten the distinct impression that some would prefer to continue
>> to use their existing techniques under snapshot isolation. I was sort
>> of assuming that they would want a GUC to default to legacy behavior
>> with a new setting for standard compliant behavior.

> That sounds like the "migration path" sort of GUC, which sounds
> reasonable to me.

> But what about all the other possible behaviors that were brought up
> (mentioned in more detail in [1]), such as:

> 1. implementation of the paper's technique sans predicate locking, that
> would avoid more serialization anomalies but not all?
> 2. various granularities of predicate locking?

> Should these be things the user controls per-transaction? If so, how?

I think it's important to draw a distinction between performance issues
and correctness issues. True serializability vs snapshot
serializability is a fundamental behavioral issue, whereas fooling
around with lock granularity might improve performance but it doesn't
make the difference between a correct application and an incorrect one.

A lesson that I think we've learned the hard way over the past few years
is that GUCs are fine for controlling performance issues, but you expose
yourself to all sorts of risks if you make fundamental semantics vary
depending on a GUC.

Putting those two thoughts together, I would say that the right thing
is

* SET TRANSACTION ISOLATION LEVEL SERIALIZABLE should mean what the spec
says.

* SET TRANSACTION ISOLATION LEVEL something-else should provide our
current snapshot-driven behavior. I don't have a strong feeling about
whether "something-else" should be spelled REPEATABLE READ or SNAPSHOT,
but lean slightly to the latter.

* Anything else you want to control should be a GUC, as long as it
doesn't affect any correctness properties.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:43:03
Message-ID:	603c8f070905271743k2407bcd0yd7461cf0729063f8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 27, 2009 at 7:54 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>> On Wed, 2009-05-27 at 15:34 -0500, Kevin Grittner wrote:
>
>>> (C) One or more GUCs will be added to control whether the new
>>> behavior is used when serializable transaction isolation is
>>> requested or whether, for compatibility with older PostgreSQL
>>> releases, the transaction actually runs with snapshot isolation.
>>> In any event, a request for repeatable read mode will provide the
>>> existing snapshot isolation mode.
>>
>> I'm not sure a GUC is the best way here, are you talking about as a
>> migration path, or something that would exist forever?
>
> I've gotten the distinct impression that some would prefer to continue
> to use their existing techniques under snapshot isolation. I was sort
> of assuming that they would want a GUC to default to legacy behavior
> with a new setting for standard compliant behavior.
>
> Another alternative here would be to just change a request for a
> serializable transation to give you a serializable transaction, and
> document that the existing snapshot isolation is now available only by
> requesting repeatable read mode. Right now you get snapshot isolation
> mode on a request for either repeatable read mode or serializable
> mode.
>
> I think that many people only use read committed; they would not be
> impacted at all.
>
> What do you think would be best here?

I think we should introduce a new value for SET TRANSACTION ISOLATION
LEVEL, maybe SNAPSHOT, intermediate between READ COMMITTED and
SERIALIZABLE.

IOW, SET TRANSACTION ISOLATION LEVEL READ COMMITTED should do what it
does now. SET TRANSACTION ISOLATION LEVEL SNAPSHOT should do what
SERIALIZABLE currently does, which is take and keep the same snapshot
for the whole transaction. And SERIALIZABLE should do that, plus
whatever new and better stuff we add.

...Robert

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:51:16
Message-ID:	4A1D99B4.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> 1. implementation of the paper's technique sans predicate locking,
> that would avoid more serialization anomalies but not all?

I saw that as a step along the way to support for fully serializable
transactions. If covered by a "migration path" GUC which defaulted to
current behavior, it would allow testing of all of the code except the
predicate lock tracking (before the predicate locking code was
created), in order to give proof of concept, check performance impact
of that part of the code, etc. I wasn't thinking that it would be a
useful long-term option without the addition of the predicate locks.

Arguably, it would actually be a very weak partial implemenation of
predicate locking, in that it would get a non-blocking lock on tuples
viewed, up to some limit. At the point where we added an escalation
to table locking for the limit, started with the table lock when we
knew it was a table scan, and locked the index range for an index
scan, we would actually have achieved fully serializable transactions.

> 2. various granularities of predicate locking?

I haven't seen that configurable by transaction, and I'm not entirely
sure that would make sense. I have seen products where a default
granularity was set with the equivalent of a global GUC, and it could
be overridden for particular tables. I see such a setting as the
default for access to rows accessed through indexes. If there is a
table scan, the lock would have to start at the table level,
regardless of settings. If too many locks accrue for one transaction
against a table at one granularity, those locks would need to be
consolidated to a coarser granularity to avoid exhausting lock
tracking space in RAM.

We're slipping into implementation details here, but I'm not sure how
we can discuss the GUCs needed without at least touching on that....

-Kevin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:52:05
Message-ID:	1243471925.24838.196.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-05-27 at 20:38 -0400, Tom Lane wrote:
> A lesson that I think we've learned the hard way over the past few years
> is that GUCs are fine for controlling performance issues, but you expose
> yourself to all sorts of risks if you make fundamental semantics vary
> depending on a GUC.

I agree with the philosophy here.

> Putting those two thoughts together, I would say that the right thing
> is
>
> * SET TRANSACTION ISOLATION LEVEL SERIALIZABLE should mean what the spec
> says.
>
> * SET TRANSACTION ISOLATION LEVEL something-else should provide our
> current snapshot-driven behavior. I don't have a strong feeling about
> whether "something-else" should be spelled REPEATABLE READ or SNAPSHOT,
> but lean slightly to the latter.
>
> * Anything else you want to control should be a GUC, as long as it
> doesn't affect any correctness properties.

But that still leaves out another behavior which avoids some of the
serialization anomalies currently possible, but still does not guarantee
true serializability (that is: implementation of the paper's technique
sans predicate locking). Is that behavior useful enough to include?

Just trying to come up with a name for that might be challenging.

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 00:55:18
Message-ID:	22144.1243472118@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> On Wed, 2009-05-27 at 20:38 -0400, Tom Lane wrote:
>> * Anything else you want to control should be a GUC, as long as it
>> doesn't affect any correctness properties.

> But that still leaves out another behavior which avoids some of the
> serialization anomalies currently possible, but still does not guarantee
> true serializability (that is: implementation of the paper's technique
> sans predicate locking). Is that behavior useful enough to include?

Hmm, what I gathered was that that's not changing any basic semantic
guarantees (and therefore is okay to control as a GUC). But I haven't
read the paper so maybe I'm missing something.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:00:03
Message-ID:	4A1D9BC3.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think we should introduce a new value for SET TRANSACTION
ISOLATION
> LEVEL, maybe SNAPSHOT, intermediate between READ COMMITTED and
> SERIALIZABLE.

The standard defines such a level, and calls it REPEATABLE READ.
Snapshot semantics are more strict than required for that level, which
is something you are allowed to get when you request a given level, so
it seems clear to me that when you request REPEATABLE READ mode, you
should get our current snapshot behavior. I'm not clear on what the
benefit would be of aliasing that with SNAPSHOT. If there is a
benefit, fine; if not, why add it?

-Kevin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:00:53
Message-ID:	1243472453.11796.6.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-05-27 at 19:51 -0500, Kevin Grittner wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > 1. implementation of the paper's technique sans predicate locking,
> > that would avoid more serialization anomalies but not all?
>
> I saw that as a step along the way to support for fully serializable
> transactions. If covered by a "migration path" GUC which defaulted to
> current behavior, it would allow testing of all of the code except the
> predicate lock tracking (before the predicate locking code was
> created), in order to give proof of concept, check performance impact
> of that part of the code, etc. I wasn't thinking that it would be a
> useful long-term option without the addition of the predicate locks.
>

OK, if that behavior is not ultimately useful, then I retract my
question.

We still need to know whether to use a GUC at all -- it won't actually
break applications to offer true serializability, it will only impact
performance.

Regards,
Jeff Davis

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:07:15
Message-ID:	4A1D9D73.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Hmm, what I gathered was that that's not changing any basic semantic
> guarantees (and therefore is okay to control as a GUC). But I
> haven't read the paper so maybe I'm missing something.

The paper never suggests attempting these techniques without a
predicate locking implementation. It was just something Robert Haas
noticed during our discussion at the bar (and he wasn't even consuming
any alcohol that night!) that it would be a possible development path.
I don't think either of us sees it as a useful end point.

Basically, if you just took out locks on the rows you happened to read
(rather than doing proper predicate locking) you would still prevent
some anomalies, in a more-or-less predictable and controllable way. I
think we both felt that the predicate locking might be the hardest
part to implement in PostgreSQL, so having such a proof of concept
partial implemenation without first implementing predicate locking
might fit with the "series of smaller patches" approach generally
preferred by the PostgreSQL developers.

-Kevin

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:08:00
Message-ID:	1243472880.11796.11.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, 2009-05-27 at 20:55 -0400, Tom Lane wrote:
> Hmm, what I gathered was that that's not changing any basic semantic
> guarantees (and therefore is okay to control as a GUC). But I haven't
> read the paper so maybe I'm missing something.

On second read of this comment:
http://archives.postgresql.org/pgsql-hackers/2009-05/msg01128.php

it says "reduce the frequency of serialization anomalies", which doesn't
necessarily mean that it makes new guarantees, I suppose. I should have
gone to the original source.

Anyway, it's a moot point, because apparently that's just a possible
step along the way toward true serializability, and doesn't need to be
separately distinguished.

Regards,
Jeff Davis

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:26:28
Message-ID:	603c8f070905271826g105dd843s7fc09a55755a6beb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 27, 2009 at 9:00 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
>> I think we should introduce a new value for SET TRANSACTION
> ISOLATION
>> LEVEL, maybe SNAPSHOT, intermediate between READ COMMITTED and
>> SERIALIZABLE.
>
> The standard defines such a level, and calls it REPEATABLE READ.
> Snapshot semantics are more strict than required for that level, which
> is something you are allowed to get when you request a given level, so
> it seems clear to me that when you request REPEATABLE READ mode, you
> should get our current snapshot behavior. I'm not clear on what the
> benefit would be of aliasing that with SNAPSHOT. If there is a
> benefit, fine; if not, why add it?

I guess my point is that we want to keep the two transaction isolation
levels we have now and add a third one that is "above" what we
currently call SERIALIZABLE. I don't much care what we call them.

...Robert

From:	Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:29:35
Message-ID:	9DD3ABFD-1C5F-4AE2-B43F-D30026C81DF3@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 28 May 2009, at 01:51, "Kevin Grittner"
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> At the point where we added an escalation
> to table locking for the limit, started with the table lock when we
> knew it was a table scan, and locked the index range for an index
> scan,

I still think you're stuck in the mssql/sybase mode of thought here.
Postgres supports a whole lot more scan types than just these two and
many of them use multiple indexes or indexes that don't correspond to
ranges of key values at all.

I think you have to forget about any connection between predicates and
either indexes or scan types. You need a way to represent predicates
which can be stored and looked up independently of any indexes.

Without any real way to represent predicates this is all pie in the
sky. The reason we don't have predicate locking is because of this
problem which it sounds like we're no closer to solving.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:49:19
Message-ID:	23417.1243475359@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> writes:
> Without any real way to represent predicates this is all pie in the
> sky. The reason we don't have predicate locking is because of this
> problem which it sounds like we're no closer to solving.

Yeah. The fundamental problem with all the "practical" approaches I've
heard of is that they only work for a subset of possible predicates
(possible WHERE clauses). The idea that you get true serializability
only if your queries are phrased just so is ... icky. So icky that
it doesn't sound like an improvement over what we have.

regards, tom lane

From:	Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:57:20
Message-ID:	A46E65F6-FAE0-474E-B5C6-185689312F09@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

--
Greg

On 28 May 2009, at 02:49, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> writes:
>> Without any real way to represent predicates this is all pie in the
>> sky. The reason we don't have predicate locking is because of this
>> problem which it sounds like we're no closer to solving.
>
> Yeah. The fundamental problem with all the "practical" approaches
> I've
> heard of is that they only work for a subset of possible predicates
> (possible WHERE clauses). The idea that you get true serializability
> only if your queries are phrased just so is ... icky. So icky that
> it doesn't sound like an improvement over what we have.
>

I think you get "true serializability" in the sense that you take out
a full table lock on every read. I.e. Your transactions end up
actually serialized... Well it would be a bit weaker than that due to
the weak read-locks but basically you would get random spurious
serialization failures which can't be explained by inspecting the
transactions without understanding the implementation.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 01:58:33
Message-ID:	4A1DA979.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> wrote:

> Postgres supports a whole lot more scan types than just these two
> and many of them use multiple indexes or indexes that don't
> correspond to ranges of key values at all.

Well, certainly all of the plans I've looked at which use btree
indexes could be handled this way. The fact that the index is scanned
as part of a bitmap process doesn't affect the logic at all, as far as
I can see. Do you see something I'm missing in regard to the btree
indexes?

In regard to other index types, if there is no way to note a GIN scan
such as

Index Cond: (text_tsv @@ '''amicus'' & ''brief'''::tsquery)

in a way that can be compared to DML operations, well, that just
means that some other type of lock would have to be used which is
broad enough to cover it. A table lock certainly would. In this
case, a column lock would be more precise and less likely to generate
false positives, so perhaps that will be found to be needed in the
tuning phase.

Although, looking at it, I would think that a predicate lock on a
condition such as this could be tested by seeing if there is a
difference in the truth of the test between "before" and "after"
images. That may well be naive, but I doubt that we've exhausted the
possibilities yet.

> I think you have to forget about any connection between predicates
> and either indexes or scan types. You need a way to represent
> predicates which can be stored and looked up independently of any
> indexes.

Sure. Heap or index pages, tables, columns, table segments -- there
are many options. We can clearly get correct behavior; the question
is about how best to tune it. That's why I was leaning toward an
initial "correct but crude" implementation, building up a large set of
tests for correctness, and then trying different approaches to
balancing the overhead of more accurate tracking against the cost of
dealing with transaction restarts.

> Without any real way to represent predicates this is all pie in the
> sky

And this is 180% opposite from what I just heard at PGCon should be
the focus of discussion at this point. Let's get agreement on what
would be nice user-facing behavior first. You can always critique
implementation suggestions later. Although, looking back I guess I
provoked this by lapsing into thoughts about an implementation path,
so I guess this one's on me. Apologies.

I tend to believe that if Microsoft can handle this, the PostgreSQL
developer community can get there, too -- even if we do have fancier
indexes.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 02:41:06
Message-ID:	603c8f070905271941v22f5e0bch23d446b3ea15dad8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Wed, May 27, 2009 at 9:49 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> writes:
>> Without any real way to represent predicates this is all pie in the
>> sky. The reason we don't have predicate locking is because of this
>> problem which it sounds like we're no closer to solving.
>
> Yeah. The fundamental problem with all the "practical" approaches I've
> heard of is that they only work for a subset of possible predicates
> (possible WHERE clauses). The idea that you get true serializability
> only if your queries are phrased just so is ... icky. So icky that
> it doesn't sound like an improvement over what we have.

I think we're veering off on a tangent, here.

As I understand it, the serialization anomalies that we have today are
caused by the fact that readers don't block concurrent writers. So if
I read some data from table A and write it to table B and meanwhile
someone reads from table B and writes to table A, we may pass each
other like ships in the night unless we remember to use SELECT ... FOR
SHARE to guard against concurrent UPDATEs and DELETEs and LOCK ... IN
SHARE MODE to guard against concurrent INSERTs.

It would be nice to be have the option to dispense with this explicit
locking and still get serializable behavior and AIUI that's what these
SIREAD locks are designed to do (they also don't lead to additional
blocking as explicit locks potentially do). The limitation is that
the granularity of the SIREAD locks isn't going to be magically better
than the granularity of your underlying lock subsystem. Fortunately,
our underlying locking system for protecting against UPDATE and DELETE
operations is already row-level and therefore as good as it gets. Our
underlying system for protecting against INSERT is pretty primitive by
comparison, so we'd have to decide whether to ignore inserts or take a
table-level SIREAD lock, and the latter would probably result in such
poor concurrency as to make the whole thing pointless.

But that doesn't mean that the entire project is pointless. It just
means that we'll be able to protect against concurrent UPDATEs and
DELETEs without explicit locking, if the transaction isolation level
is set to serializable, but we'll still fall short when it comes to
concurrent INSERTs. That would be a massive improvement versus where
we are now. I do a fair amount of explicit locking in my code and
it's nearly all row-level locks to protect against concurrent
updates/deletes, so I can't see that only handling those cases would
be a bad place to start. Fortunately, for my applications,
concurrency is low enough that explicit locking isn't a problem for me
anyway (also, I'm good at figuring out what to lock), but that's
clearly not true for everyone.

...Robert

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 05:33:36
Message-ID:	4A1E2230.90905@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> Greg Stark <greg(dot)stark(at)enterprisedb(dot)com> wrote:
>> Without any real way to represent predicates this is all pie in the
>> sky
>
> And this is 180% opposite from what I just heard at PGCon should be
> the focus of discussion at this point. Let's get agreement on what
> would be nice user-facing behavior first.

Ok, here goes:

1. Needs to be fully spec-compliant serializable behavior. No anomalities.

2. No locking that's not absolutely necessary, regardless of the
WHERE-clause used. No table locks, no page locks. Block only on
queries/updates that would truly conflict with concurrent updates.

3. No "serialization errors" that are not strictly necessary.

4. Reasonable performance. Performance in single-backend case should be
indistinguishable from what we have now and what we have with the more
lenient isolation levels.

5. Reasonable scalability. Shouldn't slow down noticeably when
concurrent updaters are added as long as they don't conflict.

6. No tuning knobs. It should just work.

Now let's discuss implementation. It may well be that there is no
solution that totally satisfies all those requirements, so there's
plenty of room for various tradeoffs to discuss. I think fully
spec-compliant behavior is a hard requirement, or we'll find ourselves
adding yet another isolation level in the next release to achieve it.
The others are negotiable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To:	"Kevin Grittner EXTERN" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 09:29:08
Message-ID:	D960CB61B694CF459DCFB4B0128514C202FF660A@exadv11.host.magwien.gv.at
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
>> 1. implementation of the paper's technique sans predicate locking,
>> that would avoid more serialization anomalies but not all?
>
> I saw that as a step along the way to support for fully serializable
> transactions. If covered by a "migration path" GUC which defaulted to
> current behavior, it would allow testing of all of the code except the
> predicate lock tracking (before the predicate locking code was
> created), in order to give proof of concept, check performance impact
> of that part of the code, etc. I wasn't thinking that it would be a
> useful long-term option without the addition of the predicate locks.

I cannot prove it, but I have a feeling that the impact on
performance and concurrency will be considerably higher for an
implementation with predicate locks. Every WHERE-clause in a SELECT
will add one or more checks for each concurrent writer.

So while I think it is a good idea to approach full serializability
in a step-by-step approach, it would be wise to consider the possibility
that we will not reach the goal (because implementing predicate locks
might be too difficult or the result perform too badly).

So any intermediate step should be useful in itself, unless we are
ready to rip out the whole thing again.

What would be the useful intermediate steps in this case?

From the user perspective, will an implementation of the paper's
approach as an intermediate step provide a useful and understandable
isolation level?

Yours,
Laurenz Albe

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 12:20:13
Message-ID:	200905281520.13620.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday 28 May 2009 04:49:19 Tom Lane wrote:
> Yeah. The fundamental problem with all the "practical" approaches I've
> heard of is that they only work for a subset of possible predicates
> (possible WHERE clauses). The idea that you get true serializability
> only if your queries are phrased just so is ... icky. So icky that
> it doesn't sound like an improvement over what we have.

Is it even possible to have a predicate locking implementation that can verify
whether an arbitrary predicate implies another arbitrary predicate? And this
isn't constraint exclusion, where it is acceptable to have false negatives.

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 12:24:59
Message-ID:	4A1E829B.3020406@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Eisentraut wrote:
> On Thursday 28 May 2009 04:49:19 Tom Lane wrote:
>> Yeah. The fundamental problem with all the "practical" approaches I've
>> heard of is that they only work for a subset of possible predicates
>> (possible WHERE clauses). The idea that you get true serializability
>> only if your queries are phrased just so is ... icky. So icky that
>> it doesn't sound like an improvement over what we have.
>
> Is it even possible to have a predicate locking implementation that can verify
> whether an arbitrary predicate implies another arbitrary predicate?

I don't think you need that for predicate locking. To determine if e.g
an INSERT and a SELECT conflict, you need to determine if the INSERTed
tuple matches the predicate in the SELECT. No need to deduce anything
between two predicates, but between a tuple and a predicate.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 12:29:47
Message-ID:	200905281529.47952.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday 28 May 2009 03:38:49 Tom Lane wrote:
> * SET TRANSACTION ISOLATION LEVEL something-else should provide our
> current snapshot-driven behavior. I don't have a strong feeling about
> whether "something-else" should be spelled REPEATABLE READ or SNAPSHOT,
> but lean slightly to the latter.

Could someone describe concisely what behavior "snapshot" isolation provides
that repeatable read does?

From:	Peter Eisentraut <peter_e(at)gmx(dot)net>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 12:43:25
Message-ID:	200905281543.26223.peter_e@gmx.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thursday 28 May 2009 15:24:59 Heikki Linnakangas wrote:
> I don't think you need that for predicate locking. To determine if e.g
> an INSERT and a SELECT conflict, you need to determine if the INSERTed
> tuple matches the predicate in the SELECT. No need to deduce anything
> between two predicates, but between a tuple and a predicate.

That might the easy part. The hard part is determining whether a SELECT and
an UPDATE conflict.

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 14:40:01
Message-ID:	4A1E5BF0.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> 1. Needs to be fully spec-compliant serializable behavior. No
> anomalities.

That is what the paper describes, and where I want to end up.

> 2. No locking that's not absolutely necessary, regardless of the
> WHERE-clause used. No table locks, no page locks. Block only on
> queries/updates that would truly conflict with concurrent updates

If you do a table scan, how do you not use a table lock?

Also, the proposal is to *not* block in *any* cases beyond where
snapshot isolation currently blocks. None. Period. This is the big
difference from traditional techniques to achieve serializable
transactions.

> 3. No "serialization errors" that are not strictly necessary.

That would require either the blocking approach which is has
traditionally been used, or a rigorous graphing of all read-write
dependencies (or anti-dependencies, depending on whose terminology you
prefer). I expect either approach would perform much worse than what
the techniques in the paper. Published benchmarks, some confirmed by
an ACM Repeatability Committee, have so far validated that intuition.

> 4. Reasonable performance. Performance in single-backend case should
> be indistinguishable from what we have now and what we have with the
> more lenient isolation levels.

This should have no impact on performance for those not choosing
serializable transactions. Benchmarks of the proposed technique have
so far shown performance ranging from marginally better than snapshot
to 15% below snapshot, whith traditional serializable techniques
benchmarking as much as 70% below snapshot.

> 5. Reasonable scalability. Shouldn't slow down noticeably when
> concurrent updaters are added as long as they don't conflict.

That should be no problem for this technique.

> 6. No tuning knobs. It should just work.

Well, I think some tuning knobs might be useful, but we can certainly
offer working defaults. Whether they should be exposed as knobs to
the users or kept away from their control depends, in my view, on how
much benefit there is to tweaking them for different environments and
how big a foot-gun they represent. "No tuning knobs" seems an odd
requirement to put on this one feature versus all other new features.

> Now let's discuss implementation. It may well be that there is no
> solution that totally satisfies all those requirements, so there's
> plenty of room for various tradeoffs to discuss.

Then they seem more like "desirable characteristics" than
requirements, but OK.

> I think fully spec-compliant behavior is a hard requirement, or
> we'll find ourselves adding yet another isolation level in the next
> release to achieve it. The others are negotiable.

There's an odd dichotomy to direction given in this area. On the one
hand, I often see the advice to submit small patches which advance
toward a goal without breaking anything, but then I see statements
like this, which seem at odds with that notion.

My personal inclination is to have a GUC (perhaps eliminated after the
implementation is complete, performant, and well-tested) to enable the
new techniques, initially defaulted to "off". There is a pretty clear
path to a mature implementation through a series of iterations. That
seems at least one order of magnitude more likely to succeed than
trying to come up with a single, final patch.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc:	<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 14:47:47
Message-ID:	4A1E5DC3.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at> wrote:

> Every WHERE-clause in a SELECT will add one or more checks for each
> concurrent writer.

That has not been the case in any implementation of predicate locks
I've used so far. It seems that any technique with those performance
characteristics would be one to avoid.

> From the user perspective, will an implementation of the paper's
> approach as an intermediate step provide a useful and understandable
> isolation level?

Well, to be clear, the paper states that predicate locking is a
requirement, but we've had some ideas about how we might make progress
without a full implemenation of that; so I guess your question should
be taken to mean "in the absence of full predicate locking support".

Possibly. It would reduce the frequency of anomalies for those not
doing explicit locking, and Robert Haas has said that it might allow
him to drop some existing explicit locking.

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 14:57:11
Message-ID:	4136ffa0905280757w354d432cg8cf25e084cd27f20@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 3:40 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> 2. No locking that's not absolutely necessary, regardless of the
>> WHERE-clause used. No table locks, no page locks. Block only on
>> queries/updates that would truly conflict with concurrent updates
>
> If you do a table scan, how do you not use a table lock?

Once again, the type of scan is not relevant. it's quite possible to
have a table scan and only read some of the records, or to have an
index scan and read all the records.

You need to store some representation of the qualifiers on the scan,
regardless of whether they're index conditions or filters applied
afterwards. Then check that condition on any inserted tuple to see if
it conflicts.

I think there's some room for some flexibility on the "not absolutely
necessary" but I would want any serialization failure to be
justifiable by simple inspection of the two transactions. That is, I
would want only queries where a user could see why the database could
not prove the two transactions were serializable even if she knows
they don't. Any case where the conditions are obviously mutually
exclusive should not generate spurious conflicts.

Offhand the problem cases seem to be conditions like "WHERE
func(column)" where func() is not immutable (I don't think STABLE is
enough here). I would be ok with discarding conditions like this -- if
they're the only conditions on the query that would effectively make
it a table lock like you're describing. But one we could justify to
the user -- any potential insert might cause a serialization failure
depending on the unknown semantics of func().

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Peter Eisentraut" <peter_e(at)gmx(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 14:57:14
Message-ID:	4A1E5FF9.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:

> Could someone describe concisely what behavior "snapshot" isolation
> provides that repeatable read does?

Phantom reads are not possible in snapshot isolation. They are
allowed to occur (though not required to occur) in repeatable read.

Note that in early versions of the SQL standard, this difference was
sufficient to qualify as serializable; but recent versions raised
the bar for serializable transactions.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 15:33:30
Message-ID:	4A1E687A.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> Once again, the type of scan is not relevant. it's quite possible to
> have a table scan and only read some of the records, or to have an
> index scan and read all the records.
>
> You need to store some representation of the qualifiers on the scan,
> regardless of whether they're index conditions or filters applied
> afterwards. Then check that condition on any inserted tuple to see
> if it conflicts.
>
> I think there's some room for some flexibility on the "not
> absolutely necessary" but I would want any serialization failure to
> be justifiable by simple inspection of the two transactions. That
> is, I would want only queries where a user could see why the
> database could not prove the two transactions were serializable even
> if she knows they don't. Any case where the conditions are obviously
> mutually exclusive should not generate spurious conflicts.
>
> Offhand the problem cases seem to be conditions like "WHERE
> func(column)" where func() is not immutable (I don't think STABLE is
> enough here). I would be ok with discarding conditions like this --
> if they're the only conditions on the query that would effectively
> make it a table lock like you're describing. But one we could
> justify to the user -- any potential insert might cause a
> serialization failure depending on the unknown semantics of func().

Can you cite anywhere that such techniques have been successfully used
in a production environment, or are you suggesting that we break new
ground here? (The techniques I've been assuming are pretty well-worn
and widely used.) I've got nothing against a novel implementation,
but I do think that it might be better to do that as an enhancement,
after we have the thing working using simpler techniques.

One other note -- I've never used Oracle, but years back I was told by
a fairly credible programmer who had, that when running a serializable
SELECT statement you could get a serialization failure even if it was
the only user query running on the system. Apparently (at least at
that time) background maintenance operations could deadlock with a
SELECT. Basically, I feel that the reason for using serializable
transactions is that you don't know what concurrent uses may happen in
advance or how they may conflict, and you should always be prepared to
handle serialization failures.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 15:37:30
Message-ID:	4A1E6969.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> I would want any serialization failure to be
> justifiable by simple inspection of the two transactions.

BTW, there are often three (or more) transaction involved in creating
a serialization failure, where any two of them alone would not fail.
You probably knew that, but just making sure....

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc:	pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 15:41:53
Message-ID:	603c8f070905280841t289ba506n4a6369e3407dc2b@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 8:43 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> On Thursday 28 May 2009 15:24:59 Heikki Linnakangas wrote:
>> I don't think you need that for predicate locking. To determine if e.g
>> an INSERT and a SELECT conflict, you need to determine if the INSERTed
>> tuple matches the predicate in the SELECT. No need to deduce anything
>> between two predicates, but between a tuple and a predicate.
>
> That might the easy part. The hard part is determining whether a SELECT and
> an UPDATE conflict.

What's hard about that? INSERTs are the hard case, because the rows
you care about don't exist yet. SELECT, UPDATE, and DELETE are easy
by comparison; you can lock the actual rows at issue. Unless I'm
confused?

...Robert

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 15:57:16
Message-ID:	4136ffa0905280857j79013e72g9cb15f829b703ed@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 4:33 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> Can you cite anywhere that such techniques have been successfully used
> in a production environment

Well there's a reason our docs say: "Such a locking system is complex
to implement and extremely expensive in execution"

> or are you suggesting that we break new
> ground here? (The techniques I've been assuming are pretty well-worn
> and widely used.)

Well they're well-worn in very different databases which have much
less flexibility in how they access data. In part that inflexibility
comes *from* their decision to implement transaction isolation using
locks and to tie those locks to the indexing infrastructure.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 16:21:06
Message-ID:	11392.1243527666@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> What's hard about that? INSERTs are the hard case, because the rows
> you care about don't exist yet. SELECT, UPDATE, and DELETE are easy
> by comparison; you can lock the actual rows at issue. Unless I'm
> confused?

UPDATE isn't really any easier than INSERT: the update might cause
the row to satisfy someone else's search condition that it didn't
previously satisfy.

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 16:29:30
Message-ID:	4A1E759A.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> On Thu, May 28, 2009 at 4:33 PM, Kevin Grittner wrote:
>>
>> Can you cite anywhere that such techniques have been successfully
>> used in a production environment
>
> Well there's a reason our docs say: "Such a locking system is
> complex to implement and extremely expensive in execution"

I'm not clear on the reason for insisting that we use techniques that
*nobody* expects will work well.

>> or are you suggesting that we break new
>> ground here? (The techniques I've been assuming are pretty
>> well-worn and widely used.)
>
> Well they're well-worn in very different databases which have much
> less flexibility in how they access data. In part that inflexibility
> comes *from* their decision to implement transaction isolation using
> locks and to tie those locks to the indexing infrastructure.

I really don't see that. The btree usage seems pretty clear. The
other indexes seem solvable, with some work. And there's an
incremental path this way, where we can get basic functionality
correct and tune one thing at a time until performance is acceptable.
At the high end, we could even break this new ground and see if it
works better, although I personally doubt it will.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 17:07:31
Message-ID:	603c8f070905281007g16a6a4cfub999f63eb441a7fd@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 12:21 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> What's hard about that? INSERTs are the hard case, because the rows
>> you care about don't exist yet. SELECT, UPDATE, and DELETE are easy
>> by comparison; you can lock the actual rows at issue. Unless I'm
>> confused?
>
> UPDATE isn't really any easier than INSERT: the update might cause
> the row to satisfy someone else's search condition that it didn't
> previously satisfy.

Good point.

...Robert

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	<Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 22:49:19
Message-ID:	4A1ECE9F.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> The fundamental problem with all the "practical" approaches I've
> heard of is that they only work for a subset of possible predicates
> (possible WHERE clauses). The idea that you get true
> serializability only if your queries are phrased just so is ...
> icky. So icky that it doesn't sound like an improvement over what
> we have.

I've never seen or heard of a production system which only gives you
serializable guarantees for some WHERE clauses. What I have always
seen is multi-granularity locking, where locks are based on indexes or
rows accessed -- essentially letting the DBMS figure out what rows the
predicate covers by seeing what it examines. If too many locks accrue
at a fine granularity, they are replaced with a lock at a coarser
granularity.

There have been papers published on the technique for decades, and it
has been used in popular databases for almost as long. The only
objection, outside of aesthetic ones, raised so far is that we don't
know of anyone using this approach with some of the innovative index
techniques available in PostgreSQL. I don't believe that means that
it can't be done.

Well, OK -- there is another objection -- that using this technique
creates locking on a less-than-surgically-precise set of data, leading
to blocking and/or serialization failures which would not happen with
a theoretically ideal implementation of predicate locks. The problem
is that the cost of a "perfect" predicate locking system is much
higher than the cost of letting some transaction block or roll back
for retry.

If someone has an approach to predicate locking which retains
precision in lock scope without excessive cost, I'm more than willing
to use it. Frankly, the fact that someone came up with a way to *use*
predicate locks to implement serializable transactions on top of MVCC,
without blocking beyond what's already there to support snapshot
isolation, has me believing that there could be more surprises around
the corner.

I do think that it might be best to get an initial implementation
using "conventional" locking, and *then* consider the fancier stuff.
That would allow an approach which has surgical precision for a subset
of WHERE clauses to be used where it can be, with the fall-back being
broader (but not ridiculous) conventional locks where the technique
can't be used, rather than falling back to failure of serializable
behavior.

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 23:12:40
Message-ID:	4136ffa0905281612w40895bcdw644a7831e80ac454@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 11:49 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> The problem
> is that the cost of a "perfect" predicate locking system is much
> higher than the cost of letting some transaction block or roll back
> for retry.

Surely that depends on how expensive it is to retry the transaction?
Like, how much would it suck to find your big data load abort after 10
hours loading data? And how much if it didn't wasn't even selecting
data which your data load conflicted with.

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-28 23:32:04
Message-ID:	4A1ED8A4.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> how much would it suck to find your big data load abort after 10
> hours loading data? And how much if it didn't wasn't even selecting
> data which your data load conflicted with.

That's certainly a fair question. The prototype implementation of the
technique gave preference to aborting the "pivot" transaction, which
by definition has both read data modified by another transaction and
written data read by another transaction; so as you haven't read other
data, you would be safe in the particular case you cite. They did
mention that it might be desirable to use some other bias, such as the
transaction with the earlier start time or which has a higher value
for some "work accomplished" metric.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>, "Kevin Grittner" <Kgrittn(dot)CCAP(dot)Courts(at)wicourts(dot)gov>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-29 00:12:46
Message-ID:	4A1EE22E.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> so as you haven't read other
> data, you would be safe in the particular case you cite.

Sorry, that's not true. If you run your bulk data load at
serializable isolation level, you could still get rolled back in this
scenario, even if you're just writing:

(1) A concurrent serializable transaction reads data from a table
into which your bulk load will subsequently insert, without there yet
being a conflict between that read and your bulk inserts. It also
modifies data somewhere. It commits.

(2) A serializable transaction concurrent with that mentioned in (1)
reads data which conflicts with the modification of (1). It may also
commit.

(3) Your bulk data load eventually (after the commit of (1)) inserts
data which conflicts with the read from (1). (1) is no longer
available for rollback. Unless (2) is still active, and preferred for
termination based on the bias chosen, your bulk load will be rolled
back.

That would suck.

There are several protections available, like taking an explicit lock
to protect the process, running at a less rigorous transaction
isolation level, etc., but it could happen.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-05-29 01:08:23
Message-ID:	603c8f070905281808yf50b13ds1d9fcb45bd8bf366@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, May 28, 2009 at 1:33 AM, Heikki Linnakangas
<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> Now let's discuss implementation. It may well be that there is no solution
> that totally satisfies all those requirements, so there's plenty of room for
> various tradeoffs to discuss. I think fully spec-compliant behavior is a
> hard requirement, or we'll find ourselves adding yet another isolation level
> in the next release to achieve it. The others are negotiable.

I'm sort of running out of enthusiasm for this thread, because it
seems like we're going around in circles again. But at least it
doesn't seem like anyone is seriously arguing that true
serializability wouldn't be a nice feature, if hypothetically we had
an agreed-upon implementation and a high-level developer with a lot of
time on their hands.

With respect to implementation, it seems fairly clear to me that there
are two major things that we lack for true serializability: protection
against old rows disappearing out from under us (DELETE/UPDATE case),
and protection from new rows that appear under us (INSERT/UPDATE
case). Protection against the former requires locking of existing
rows; protection against the latter requires predicate locking.
[Thanks to Tom for setting me straight on this point, a few emails
upthread.] This locking could be done with either traditional
blocking locks, or with the SIREAD locks discussed in the paper Kevin
cited.

I think the things we need to get clear on are:

- Is it feasible to think about implementing this with traditionally
blocking locks? Kevin seems to think it isn't, because it will suck
too much.

- Is it feasible to think about implementing this with SIREAD locks?
That has some complex requirements which someone more familiar with
the code than me will need to read and think about. I'm not sure
whether anyone is willing to do that analysis without money changing
hands, but we're not going to make any progress here otherwise.

- Why is this an all-or-nothing proposition? Given the undeniable
difficulty of getting large patches committed, tying the
locking-of-existing-rows part of the solution to the predicate-locking
part of the solution seems like a recipe for failure.

...Robert

From:	Markus Wanner <markus(at)bluegap(dot)ch>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Stark <stark(at)enterprisedb(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 17:27:47
Message-ID:	4A240F93.6050308@bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Kevin Grittner wrote:
> Greg Stark <stark(at)enterprisedb(dot)com> wrote:
>> I would want any serialization failure to be
>> justifiable by simple inspection of the two transactions.
>
> BTW, there are often three (or more) transaction involved in creating
> a serialization failure, where any two of them alone would not fail.
> You probably knew that, but just making sure....

I'm not that eager on the "justifiable by simple inspection" requirement
above. I don't think a DBA is commonly doing these inspections at all.

I think a tool to measure abort rates per transaction (type) would serve
the DBA better. Of course there may be false positives, but high abort
rates should point out the problematic transactions pretty quickly. The
DBA shouldn't need to care about rare serialization failures or their
justifiability.

But maybe that reveals another requirement: false positives should be
rare enough for the DBA to still be able to figure out which
transactions are problematic and actually lead to conflicts.

In general, getting good performance by allowing a certain
false-positive rate seems like a good approach to me.

Regards

Markus Wanner

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 17:45:19
Message-ID:	4A23CD5F.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> But at least it doesn't seem like anyone is seriously arguing that
> true serializability wouldn't be a nice feature, if hypothetically
> we had an agreed-upon implementation and a high-level developer with
> a lot of time on their hands.

If that's true, I think it represents a major shift in perspective on
this list. Does everyone *really* agree with the above?

> - Is it feasible to think about implementing this with traditionally
> blocking locks? Kevin seems to think it isn't, because it will suck
> too much.

I'm not sure it's without value to the project; I just don't know that
it would be worth using for us. It seems to be accepted in some other
DBMS products. Since some (like MS SQL Server) allow users to choose
snapshot isolation or blocking-based serializable transactions in
their MVCC implementation, it would be interesting to know how many
users have chosen the latter. Has anyone seen numbers (or even have
anecdotal evidence) on this point?

> - Is it feasible to think about implementing this with SIREAD locks?

I'd be willing to bet that if we solved the predicate locking issue,
the rest of it would be minor by comparison. But I am still trying
get comfortable with the train of thought I got onto when responding
to Greg Stark's last email on the topic.

With blocking techniques you always have at least two transactions
involved, and you can pick between at least two, when you need to roll
something back. With this new method, it is possible to discover the
dangerous structure which requires rollback when there is only one
participating transaction left active -- which might have done a lot
of work by that point. It seems like a pretty significant weakness.
Do others see that as a fatal flaw?

> - Why is this an all-or-nothing proposition? Given the undeniable
> difficulty of getting large patches committed, tying the locking-of-
> existing-rows part of the solution to the predicate-locking part of
> the solution seems like a recipe for failure.

Agreed. If we can get agreement on an approach, with a road map which
allows incremental progress, we might be able to contribute
programming for some parts, and might be able to draw in others for
reasonable chunks. Requiring all-or-nothing seems to me to be the
same as a straight thumbs-down, for all practical purposes.

Of course, there's no point starting on coding for even an incremental
change without concensus on the type of issues we've been covering in
this thread.

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Markus Wanner <markus(at)bluegap(dot)ch>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 17:55:19
Message-ID:	4136ffa0906011055t29fee97am7bc5423ab7454f3d@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 6:27 PM, Markus Wanner <markus(at)bluegap(dot)ch> wrote:
> I'm not that eager on the "justifiable by simple inspection" requirement
> above. I don't think a DBA is commonly doing these inspections at all.
>
> I think a tool to measure abort rates per transaction (type) would serve
> the DBA better. Of course there may be false positives, but high abort
> rates should point out the problematic transactions pretty quickly. The
> DBA shouldn't need to care about rare serialization failures or their
> justifiability.

I don't think that's true. It might be true for OLTP transactions
where having to repeat the occasional transaction once or twice for no
reason just means a slower response time. Even there I fear it means
the DBA would never be able to guarantee his response time since there
will always be a chance the transaction will have to be repeated too
many times to fall within the guarantee.

But it's certainly insufficient in an OLAP or DSS environment where
transactions can take hours. If you can never know for sure that
you've written your transaction safely and it might randomly fail and
need to be retried any given day due to internal implementation issues
you can't predict then I would call the system just broken.

--
greg

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 18:12:26
Message-ID:	20758.1243879946@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> But at least it doesn't seem like anyone is seriously arguing that
>> true serializability wouldn't be a nice feature, if hypothetically
>> we had an agreed-upon implementation and a high-level developer with
>> a lot of time on their hands.

> If that's true, I think it represents a major shift in perspective on
> this list. Does everyone *really* agree with the above?

I think we'd all love to have it, if we could get it with reasonable
performance and without an undue amount of complexity. What you're
up against is a lot of skepticism that that's going to be possible.
Which then translates into wondering whether partial solutions are
worthwhile, if they won't ever get extended to full solutions.

(So, in that sense, discussing possible implementations now is not
premature --- we need to calibrate what we believe is possible.)

regards, tom lane

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Markus Wanner" <markus(at)bluegap(dot)ch>, "Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 18:14:54
Message-ID:	4A23D44E.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> But it's certainly insufficient in an OLAP or DSS environment where
> transactions can take hours. If you can never know for sure that
> you've written your transaction safely and it might randomly fail
> and need to be retried any given day due to internal implementation
> issues you can't predict then I would call the system just broken.

I absolutely guarantee that it means that a transaction like that
should not be run at the SERIALIZABLE transaction isolation level
without some other protection. I don't know that I would say the
system is broken when that's true; it seems to me more a matter of
having a tool in you tookbox which isn't the right one for every job.

The question is, is it an unacceptably risky foot-gun?

-Kevin

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <greg(dot)stark(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 18:24:14
Message-ID:	4A241CCE.3050000@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin,

> I'm not sure it's without value to the project; I just don't know that
> it would be worth using for us. It seems to be accepted in some other
> DBMS products. Since some (like MS SQL Server) allow users to choose
> snapshot isolation or blocking-based serializable transactions in
> their MVCC implementation, it would be interesting to know how many
> users have chosen the latter. Has anyone seen numbers (or even have
> anecdotal evidence) on this point?

This approach allowed MSSQL to "clean up" on TPCE; to date their
performance on that benchmark is so much better than anyone else nobody
else wants to publish.

So, at least theoretically, anyone who had a traffic mix similar to TPCE
would benefit. Particularly, some long-running serializable
transactions thrown into a mix of Read Committed and Repeatable Read
transactions, for a stored procedure driven application.

In the field, we're not going so see a lot of requests for this because
most applications that complex run in Java middleware with pessimistic
locking. To the exent, though, that we want to promote PostgreSQL as
'better development platform' for transactional applications, it might
be beneficial to support more sophisticated serializablity.

Besides, I'd love to beat Microsoft on TPCE. ;-)

--
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Josh Berkus <josh(at)agliodbs(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 18:57:12
Message-ID:	4136ffa0906011157td9dc46bvf8cdb7518a386d42@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 7:24 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Since some (like MS SQL Server) allow users to choose
>> snapshot isolation or blocking-based serializable transactions in
>> their MVCC implementation
>
> This approach allowed MSSQL to "clean up" on TPCE; to date their performance
> on that benchmark is so much better than anyone else nobody else wants to
> publish.

Are you sure you aren't thinking of some other feature? An
implementation of Serializable transactions isn't going to suddenly
make MSSQL faster than Oracle which uses snapshots anyways.

From what I remember TPC-E actually spends most of its energy testing
things like check constraints, referential integrity checks, and
complex queries. What you describe is possible but it's seems more
likely to be due to some kind of optimization like materialized views
or cached query results or something like that.

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	<Markus Wanner <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 19:55:32
Message-ID:	4A23EBE3.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> If you can never know for sure that you've written your transaction
> safely

Whoa! I just noticed this phrase on a re-read. I think there might
be some misunderstanding here.

You can be sure you've written your transaction safely just as soon as
your COMMIT returns without error. Perhaps you're getting confused
because under the non-blocking approach, each transaction's read locks
(if any) continue to be tracked until all concurrent transactions
terminate in order to determine if some *other* transaction might need
to be rolled back.

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 20:08:08
Message-ID:	4136ffa0906011308u1a285672y2605b655e5666ad6@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 8:55 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> Whoa! I just noticed this phrase on a re-read. I think there might
> be some misunderstanding here.
>
> You can be sure you've written your transaction safely just as soon as
> your COMMIT returns without error.

I think we have different definitions of "safely". You only know that
you got away with it *this time* when the commit returns without
error.

I'm concerned with whether you can be sure that the 999th time you run
it the database won't randomly decide to declare a serialization
failure for reasons you couldn't predict were possible.

--
greg

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Greg Stark <stark(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 20:17:23
Message-ID:	603c8f070906011317q337be4e4v8ee843fc68acf623@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 4:08 PM, Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> On Mon, Jun 1, 2009 at 8:55 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>
>> Whoa! I just noticed this phrase on a re-read. I think there might
>> be some misunderstanding here.
>>
>> You can be sure you've written your transaction safely just as soon as
>> your COMMIT returns without error.
>
> I think we have different definitions of "safely". You only know that
> you got away with it *this time* when the commit returns without
> error.
>
> I'm concerned with whether you can be sure that the 999th time you run
> it the database won't randomly decide to declare a serialization
> failure for reasons you couldn't predict were possible.

Aren't serialization failures of any sort unpredictable, or any database?

...Robert

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 20:24:24
Message-ID:	4A23F2A7.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> On Mon, Jun 1, 2009 at 8:55 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> You can be sure you've written your transaction safely just as soon
>> as your COMMIT returns without error.
>
> I think we have different definitions of "safely". You only know
> that you got away with it *this time* when the commit returns
> without error.
>
> I'm concerned with whether you can be sure that the 999th time you
> run it the database won't randomly decide to declare a serialization
> failure for reasons you couldn't predict were possible.

Now you're questioning whether SERIALIZABLE transaction isolation
level is useful. Probably not for everyone, but definitely for some.

As stated before, the trade-off is that you don't need to know what
all the transactions look like or which ones might be run concurrently
in order to guarantee that you avoid anomalies; but you need to be
able to handle the rollback of any serializable transaction. Nothing
in the proposed techniques would create problems like you describe in
transactions running at other isolation levels, or preclude taking out
explicit locks to prevent this where you need additional guarantees --
like needing to be sure that a transaction won't be rolled back with a
serialization failure after 10 hours.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>
Cc:	"Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 20:43:42
Message-ID:	4A23F72E.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> This approach allowed MSSQL to "clean up" on TPCE; to date their
> performance on that benchmark is so much better than anyone else
> nobody else wants to publish.

Since they use a "compatibility level" setting to control whether a
request for a serializable transaction gives you snapshot isolation or
a true serializable transaction, you have to be careful interpreting
results like that. Are you sure which one they used for this
benchmark?

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 21:12:48
Message-ID:	4136ffa0906011412i12cbd828sbf2c81eae4616760@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 9:24 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> I'm concerned with whether you can be sure that the 999th time you
>> run it the database won't randomly decide to declare a serialization
>> failure for reasons you couldn't predict were possible.
>
> Now you're questioning whether SERIALIZABLE transaction isolation
> level is useful. Probably not for everyone, but definitely for some.

No, I'm not. I'm questioning whether a serializable transaction
isolation level that makes no guarantee that it won't fire spuriously
is useful.

Postgres doesn't take block level locks or table level locks to do
row-level operations. You can write code and know that it's safe from
deadlocks.

Heikki proposed a list of requirements which included a requirement
that you not get spurious serialization failures and you rejected that
on the basis that that's not how MSSQL and Sybase implement it.

I'm unhappy with the idea that if I access too many rows or my query
conditions aren't written just so then the database will forget which
rows I'm actually concerned with and "lock" other random unrelated
records and possibly roll my transaction back even though my I had no
way of knowing my code was at risk.

--
greg

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Greg Stark <stark(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, "<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 21:26:43
Message-ID:	1243891603.12209.45.camel@monkey-cat.sm.truviso.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2009-06-01 at 22:12 +0100, Greg Stark wrote:
> No, I'm not. I'm questioning whether a serializable transaction
> isolation level that makes no guarantee that it won't fire spuriously
> is useful.

I am also concerned (depending on implementation, of course) that
certain situations can make it almost certain that you will get
serialization failures every time. For instance, a change in the heap
order, or data distribution, could mean that your application is unable
to make progress at all.

Is this a valid concern, or are there ways of avoiding this situation?

I would think that we'd need some way to detect that this is happening,
give it a few tries, and then resort to full serialization for a few
transactions so that the application can make progress.

Regards,
Jeff Davis

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 22:07:16
Message-ID:	4A240AC4.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> On Mon, Jun 1, 2009 at 9:24 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> I'm concerned with whether you can be sure that the 999th time you
>>> run it the database won't randomly decide to declare a
>>> serialization failure for reasons you couldn't predict were
>>> possible.
>>
>> Now you're questioning whether SERIALIZABLE transaction isolation
>> level is useful. Probably not for everyone, but definitely for
>> some.
>
> No, I'm not. I'm questioning whether a serializable transaction
> isolation level that makes no guarantee that it won't fire
> spuriously is useful.

Well, the technique I'm advocating virtually guarantees that there
will be false positives, since it looks only for the "dangerous
structure" of two adjacent read-write dependences rather than building
a rigorous read-write dependency graph for every serializable
transaction. Even if you user very fine-grained locks (i.e., what
*columns* were modified in what rows) and had totally accurate
predicate locking, you would still get spurious rollbacks with this
technique.

In spite of that, I believe that it will run faster than traditional
serializable transactions, and in one benchmark it ran faster than
snapshot isolation -- apparently because it rolled back conflicting
transactions before they did updates and hit the update conflict
detection phase.

> Postgres doesn't take block level locks or table level locks to do
> row-level operations. You can write code and know that it's safe
> from deadlocks.

Who's talking about deadlocks? If you're speaking more broadly of all
serialization failures, you can certainly get them in PostgreSQL. So
one of us is not understanding the other here. To clarify what I'm
talking about -- this technique introduces no blocking and cannot
cause a deadlock.

> Heikki proposed a list of requirements which included a requirement
> that you not get spurious serialization failures and you rejected
> that on the basis that that's not how MSSQL and Sybase implement it.

No, I rejected that on the basis that it precludes the use of the
technique published in the paper I cited, and I believe that technique
is the best currently available. I'm perfectly happy to get to a
point where we have something which works correctly and have people
try to make it work better by tweaking the locking, but I think that
we'll find a point of diminishing returns -- where the cost of
tracking finer locks costs more than the cost of rerunning some
transactions. For obvious high-risk situations, where you are
expending extreme resources on one database transaction, I believe it
will be most cost-effective to count on developers to recognize the
risk and use existing techniques.

> I'm unhappy with the idea that if I access too many rows or my query
> conditions aren't written just so then the database will forget
> which rows I'm actually concerned with and "lock" other random
> unrelated records and possibly roll my transaction back even though
> my I had no way of knowing my code was at risk.

Then you would apparently not be a good candidate for serializable
transactions, since I don't know of any implementation which performs
well which doesn't have those characteristics. When Sybase introduced
row level locking, we benchmarked that against the page level locking,
and found that it was significantly slower for our mix. We did
identify a small number of small tables with high update rates where
switching them to row level locking provided a small performance gain.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 22:17:30
Message-ID:	4A240D29.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> On Mon, 2009-06-01 at 22:12 +0100, Greg Stark wrote:
>> No, I'm not. I'm questioning whether a serializable transaction
>> isolation level that makes no guarantee that it won't fire
>> spuriously is useful.
>
> I am also concerned (depending on implementation, of course) that
> certain situations can make it almost certain that you will get
> serialization failures every time. For instance, a change in the
> heap order, or data distribution, could mean that your application
> is unable to make progress at all.
>
> Is this a valid concern, or are there ways of avoiding this
> situation?

I've been concerned about that possibility -- in the traditional
blocking implementations it is OK to attempt the retry almost
immediately, since a conflicting transaction should then block you
until one of the original transactions in the conflict completes. It
appears to me that with the proposed technique you could jump back in
and hit exactly the same combination of read-write dependencies,
leading to repeated rollbacks. I'm not happy with the thought of
trying to handle that with simple delays (or even escalating delays)
before retry.

I'm not sure how big a problem this is likely to be in practice, so
I've been trying to avoid the trap of premature optimization on this
point. But a valid concern? Certainly.

> I would think that we'd need some way to detect that this is
> happening, give it a few tries, and then resort to full
> serialization for a few transactions so that the application can
> make progress.

I'd hate to go to actual serial execution of all serializable
transactions. Perhaps we could fall back to traditional blocking
techniques based on some heuristic? That would create blocking, and
would lead to occassional deadlocks; however, it might be the optimal
fix, if this is found to actually be a problem.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Josh Berkus" <josh(at)agliodbs(dot)com>
Cc:	"Greg Stark" <greg(dot)stark(at)enterprisedb(dot)com>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 22:32:00
Message-ID:	4A241090.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> wrote:

> So, at least theoretically, anyone who had a traffic mix similar to
> TPCE would benefit. Particularly, some long-running serializable
> transactions thrown into a mix of Read Committed and Repeatable
> Read transactions, for a stored procedure driven application.

A belated thought. The proposed technique does yield different
behavior from traditional techniques for Read Committed and Repeatable
Read transactions which are run concurrently with Serializable
transactions. In traditional blocking techniques, even a Read
Committed transaction only sees the database in a state consistent
with some serial execution of the serializable transactions. As far
as I can see, this is not required by the SQL standard, but it might
possibly be an implementation artifact upon which some software might
rely. Any idea whether this is the case with the TPC-E benchmark?

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-01 23:46:08
Message-ID:	4136ffa0906011646n2ab749bdk7a9a316b2692725a@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jun 1, 2009 at 11:07 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Greg Stark <stark(at)enterprisedb(dot)com> wrote:
>
>> No, I'm not. I'm questioning whether a serializable transaction
>> isolation level that makes no guarantee that it won't fire
>> spuriously is useful.
>
> Well, the technique I'm advocating virtually guarantees that there
> will be false positives, since it looks only for the "dangerous
> structure" of two adjacent read-write dependences rather than building
> a rigorous read-write dependency graph for every serializable
> transaction. Even if you user very fine-grained locks (i.e., what
> *columns* were modified in what rows) and had totally accurate
> predicate locking, you would still get spurious rollbacks with this
> technique.

Yeah, I'm ok compromising on things like having updates on other
columns or even no-op updates trigger serialization failures. For one
thing they do currently, but more importantly from my point of view
they can be explained in documentation and make sense from a user's
point of view.

More generally any time you have a set of transactions that are
touching and selecting from the same set of records, I think it's
obvious to a user that a serialization failure might be possible.

I'm not happy having things like "where x = 5 and y = 5" randomly
choose either to lock all records in one or the other index range (or
the whole table) when only the intersection are really interesting to
the plan. That leaves a careful programmer no way to tell which of his
transactions might conflict.

And I'm *really* unhappy with having the decision on which range to
lock depend on the planner decision. That means sometime (inevitably
in the middle of a night) the database will suddenly start getting
serialization failures on transactions that never did before
(inevitably critical batch jobs) because the planner switched plans.

> In spite of that, I believe that it will run faster than traditional
> serializable transactions, and in one benchmark it ran faster than
> snapshot isolation -- apparently because it rolled back conflicting
> transactions before they did updates and hit the update conflict
> detection phase.

"I can get the answer infinitely fast if it doesn't have to be right"

I know a serialization failure isn't a fatal error and the application
has to be prepared to retry. And I agree that some compromises are
reasonable, "serialization failure" doesn't have to mean "the database
ran a theorem prover and proved that it was impossible to serialize
these transactions". But I think a programmer has to be able to look
at the set of transactions and say "yeah I can see these transactions
all depend on the same records".

>> Postgres doesn't take block level locks or table level locks to do
>> row-level operations. You can write code and know that it's safe
>> from deadlocks.
>
> Who's talking about deadlocks? If you're speaking more broadly of all
> serialization failures, you can certainly get them in PostgreSQL. So
> one of us is not understanding the other here. To clarify what I'm
> talking about -- this technique introduces no blocking and cannot
> cause a deadlock.

Sorry, I meant to type a second paragraph there to draw the analogy.
Just as carefully written SQL code can be written to avoid deadlocks I
would expect to be able to look at SQL code and know it's safe from
serialization failures, or at least know where they might occur.

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 00:13:05
Message-ID:	4A242841.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> Just as carefully written SQL code can be written to avoid deadlocks
> I would expect to be able to look at SQL code and know it's safe
> from serialization failures, or at least know where they might
> occur.

This is the crux of our disagreement, I guess. I consider existing
techniques fine for situations where that's possible. But, could you
give me an estimate of how much time it would take you, up front and
ongoing, to do that review in our environment? About 8,700 queries
undergoing frequent modification, by 21 programmers, for enhancements
in our three-month release cycle. Plus various ad hoc queries. We
have one full-time person to run ad hoc data fixes and reports
requested by the legislature and various outside agencies, like
universities doing research.

The whole point of the serializable transaction isolation level is
that it is the solution where you *can't* look at all your SQL code
and know where it's safe or where conflicts might occur. If you can
do that, it's very likely that you don't need this feature. The
proposed implementation, unlike traditional blocking techniques, won't
affect you if you don't choose to use serializable transactions.

Some people might be picturing the kind of blocking inherent in
traditional techniques where, for example, we had to run our read-only
web application at Read Uncommitted to avoid deadlocks with the
updates from replication. I like this technique because I don't think
I'd have to do that.

-Kevin

From:	"Markus Wanner" <markus(at)bluegap(dot)ch>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 07:17:00
Message-ID:	20090602091700.541050vnsk1wx33g@mail.bluegap.ch
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Quoting "Greg Stark" <stark(at)enterprisedb(dot)com>:
> No, I'm not. I'm questioning whether a serializable transaction
> isolation level that makes no guarantee that it won't fire spuriously
> is useful.

It would certainly be an improvement compared to our status quo, where
truly serializable transactions aren't supported at all. And it seems
more promising than heading for a perfect *and* scalable implementation.

> Heikki proposed a list of requirements which included a requirement
> that you not get spurious serialization failures

That requirement is questionable. If we get truly serializable
transactions (i.e. no false negatives) with reasonably good
performance, that's more than enough and a good step ahead.

Why care about a few false positives (which don't seem to matter
performance wise)? We can probably reduce or eliminate them later on.
But eliminating false negatives is certainly more important to start
with.

What I'm more concerned is the requirement of the proposed algorithm
to keep track of the set of tuples read by any transaction and keep
that set until sometime well after the transaction committed (as
questioned by Neil [1]). That doesn't sound like a negligible overhead.

Maybe the proposed algorithm has to be applied to pages instead of
tuples, as they did it in the paper for Berkeley DB. Just to keep that
overhead reasonably low.

Regards

Markus Wanner

[1]: Neil Conway's blog, Serializable Snapshot Isolation:
http://everythingisdata.wordpress.com/2009/02/25/february-25-2009/

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 11:24:09
Message-ID:	4136ffa0906020424n191ea75en828cde17ad1d0dd8@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 2, 2009 at 1:13 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Greg Stark <stark(at)enterprisedb(dot)com> wrote:
>
>> Just as carefully written SQL code can be written to avoid deadlocks
>> I would expect to be able to look at SQL code and know it's safe
>> from serialization failures, or at least know where they might
>> occur.
>
> This is the crux of our disagreement, I guess. I consider existing
> techniques fine for situations where that's possible.

a) When is that possible? Afaict it's always possible, you can never
know and when it might happen could change any time.

b) What existing techniques, explicit locking?

> But, could you
> give me an estimate of how much time it would take you, up front and
> ongoing, to do that review in our environment? About 8,700 queries
> undergoing frequent modification, by 21 programmers, for enhancements
> in our three-month release cycle. Plus various ad hoc queries. We
> have one full-time person to run ad hoc data fixes and reports
> requested by the legislature and various outside agencies, like
> universities doing research.

Even in your environment I could easily imagine, say, a monthly job to
delete all records older than 3 months. That job could take hours or
even days. It would be pretty awful for it to end up needing to be
retried. All I'm saying is that if you establish a policy -- perhaps
enforced using views -- that no queries are allowed to access records
older than 3 months you shouldn't have to worry that you'll get a
spurious serialization failure working with those records.

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Markus Wanner" <markus(at)bluegap(dot)ch>, "Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 13:31:41
Message-ID:	4A24E36D.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Markus Wanner" <markus(at)bluegap(dot)ch> wrote:

> What I'm more concerned is the requirement of the proposed algorithm
> to keep track of the set of tuples read by any transaction and keep
> that set until sometime well after the transaction committed (as
> questioned by Neil). That doesn't sound like a negligible overhead.

Quick summary for those who haven't read the paper: with this
non-blocking technique, every serializable transaction which
successfully commits must have its read locks tracked until all
serializable transactions which are active at the commit also
complete.

In the prototype implementation, I think they periodically scanned to
drop old transactions, and also did a final check right before
deciding there is a conflict which requires rollback, cleaning up the
transaction if it had terminated after the last scan but in time to
prevent a problem.

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 13:44:20
Message-ID:	4A24E664.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:

> On Tue, Jun 2, 2009 at 1:13 AM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Greg Stark <stark(at)enterprisedb(dot)com> wrote:
>>
>>> Just as carefully written SQL code can be written to avoid
deadlocks
>>> I would expect to be able to look at SQL code and know it's safe
>>> from serialization failures, or at least know where they might
>>> occur.
>>
>> This is the crux of our disagreement, I guess. I consider existing
>> techniques fine for situations where that's possible.
>
> a) When is that possible? Afaict it's always possible, you can never
> know and when it might happen could change any time.

Sorry that I wasn't more clear -- I meant "I consider existing
techniques fine where it's possible to look at all the SQL code and
know what's safe from serialization failures or at least know where
they might occur". I don't believe that's possible in an environment
with 8,700 queries in the application software, under constant
modification, with ad hoc queries run every day.

> b) What existing techniques, explicit locking?

Whichever techniques you would use right now, today, in PostgreSQL
which you feel are adequate to your needs. You pick.

>> But, could you
>> give me an estimate of how much time it would take you, up front
and
>> ongoing, to do that review in our environment? About 8,700 queries
>> undergoing frequent modification, by 21 programmers, for
enhancements
>> in our three-month release cycle. Plus various ad hoc queries. We
>> have one full-time person to run ad hoc data fixes and reports
>> requested by the legislature and various outside agencies, like
>> universities doing research.
>
> Even in your environment I could easily imagine, say, a monthly job
to
> delete all records older than 3 months. That job could take hours or
> even days. It would be pretty awful for it to end up needing to be
> retried. All I'm saying is that if you establish a policy -- perhaps
> enforced using views -- that no queries are allowed to access
records
> older than 3 months you shouldn't have to worry that you'll get a
> spurious serialization failure working with those records.

You have totally lost me. We have next to nothing which can be
deleted after three months. We have next to nothing which we get to
decide is deletable. The elected Clerk of Court in each county is the
custodian of the records for that county, we facilitate their
record-keeping. Some counties back-loaded data for some case types
(for example, probate) back to the beginning, in the mid-1800s, and
that information is not likely to go away any time soon. Since
they've been using the software for about 20 years now, enough cases
are purgeable under Supreme Court records retention rules that we're
just now getting around to writing purge functions, but you don't even
*want* to know how complex the rules around that are....

The three month cycle I mentioned was how often we issue a major
release of the application software. Such a release generally
involves a lot of schema changes, and changes to hundreds of queries,
but no deletion of data.

-Kevin

From:	Greg Stark <stark(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 15:02:10
Message-ID:	4136ffa0906020802t766fc88ak223064a172eade5@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jun 2, 2009 at 2:44 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> Even in your environment I could easily imagine, say, a monthly job
> to
>> delete all records older than 3 months. That job could take hours or
>> even days. It would be pretty awful for it to end up needing to be
>> retried. All I'm saying is that if you establish a policy -- perhaps
>> enforced using views -- that no queries are allowed to access
> records
>> older than 3 months you shouldn't have to worry that you'll get a
>> spurious serialization failure working with those records.
>
> You have totally lost me. We have next to nothing which can be
> deleted after three months. We have next to nothing which we get to
> decide is deletable.

That's reassuring for a courts system.

But i said "I could easily imagine". The point was that even in a big
complex system with thousands of queries being constantly modified by
hundreds of people, it's possible there might be some baseline rules.
Those rules can even be enforced using tools like views. So it's not
true that no programmer could ever expect that they've written their
code to ensure there's no risk of serialization failures.

--
greg

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Greg Stark" <stark(at)enterprisedb(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-02 15:56:26
Message-ID:	4A250559.EE98.0025.1@wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> On Tue, Jun 2, 2009 at 2:44 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

>> We have next to nothing which can be deleted after three months.

> That's reassuring for a courts system.

:-)

> But i said "I could easily imagine". The point was that even in a
> big complex system with thousands of queries being constantly
> modified by hundreds of people, it's possible there might be some
> baseline rules. Those rules can even be enforced using tools like
> views. So it's not true that no programmer could ever expect that
> they've written their code to ensure there's no risk of
> serialization failures.

Now I see what you're getting at.

I think we've beat this horse to death and then some.

Recap:

(1) There is abstract, conceptual agreement that support for
serializable transactions would be A Good Thing.

(2) There is doubt that an acceptably performant implementation is
possible in PostgreSQL.

(3) Some, but not all, don't want to see an implementation which
produces false positive serialization faults with some causes, but
will accept them for other causes.

(4) Nobody believes that an implementation with acceptable
performance is possible without the disputed false positives mentioned
in (3).

(5) There is particular concern about how to handle repeated
rollbacks gracefully if we use the non-blocking technique.

(6) There is particular concern about how to protect long-running
transactions from rollback. (I'm not sure those concerns are confined
to the new technique.)

(7) Some, but not all, feel that it would be beneficial to have a
correct implementation (no false negatives) even if it had significant
false positives, as it would allow iterative refinement of the locking
techniques.

(8) One or two people feel that there would be benefit to an
implementation which reduces the false negatives, even if it doesn't
eliminate them entirely. (Especially if this could be a step toward a
full implementation.)

Are any of those observations in dispute?

What did I miss?

Where do we go from here?

-Kevin

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Greg Stark <stark(at)enterprisedb(dot)com>, "<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-03 23:35:23
Message-ID:	200906032335.n53NZNv19259@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Added to TODO:

Consider improving serialized transaction behavior to avoid anomalies

* http://archives.postgresql.org/pgsql-hackers/2009-05/msg01136.php
* http://archives.postgresql.org/pgsql-hackers/2009-06/msg00035.php

---------------------------------------------------------------------------

Kevin Grittner wrote:
> Greg Stark <stark(at)enterprisedb(dot)com> wrote:
> > On Tue, Jun 2, 2009 at 2:44 PM, Kevin Grittner
> > <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
> >> We have next to nothing which can be deleted after three months.
>
> > That's reassuring for a courts system.
>
> :-)
>
> > But i said "I could easily imagine". The point was that even in a
> > big complex system with thousands of queries being constantly
> > modified by hundreds of people, it's possible there might be some
> > baseline rules. Those rules can even be enforced using tools like
> > views. So it's not true that no programmer could ever expect that
> > they've written their code to ensure there's no risk of
> > serialization failures.
>
> Now I see what you're getting at.
>
> I think we've beat this horse to death and then some.
>
> Recap:
>
> (1) There is abstract, conceptual agreement that support for
> serializable transactions would be A Good Thing.
>
> (2) There is doubt that an acceptably performant implementation is
> possible in PostgreSQL.
>
> (3) Some, but not all, don't want to see an implementation which
> produces false positive serialization faults with some causes, but
> will accept them for other causes.
>
> (4) Nobody believes that an implementation with acceptable
> performance is possible without the disputed false positives mentioned
> in (3).
>
> (5) There is particular concern about how to handle repeated
> rollbacks gracefully if we use the non-blocking technique.
>
> (6) There is particular concern about how to protect long-running
> transactions from rollback. (I'm not sure those concerns are confined
> to the new technique.)
>
> (7) Some, but not all, feel that it would be beneficial to have a
> correct implementation (no false negatives) even if it had significant
> false positives, as it would allow iterative refinement of the locking
> techniques.
>
> (8) One or two people feel that there would be benefit to an
> implementation which reduces the false negatives, even if it doesn't
> eliminate them entirely. (Especially if this could be a step toward a
> full implementation.)
>
> Are any of those observations in dispute?
>
> What did I miss?
>
> Where do we go from here?
>
> -Kevin
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Bruce Momjian" <bruce(at)momjian(dot)us>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Greg Stark" <stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-04 15:32:22
Message-ID:	4A27A2B60200002500027503@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> Added to TODO:
>
> Consider improving serialized transaction behavior to avoid
> anomalies
>
> *
http://archives.postgresql.org/pgsql-hackers/2009-05/msg01136.php
> *
http://archives.postgresql.org/pgsql-hackers/2009-06/msg00035.php

It might be worth adding this reference, too, since it gets down to
some possible implementation techniques:

http://archives.postgresql.org/pgsql-hackers/2009-05/msg00217.php

I was going to try to scare up some resources to advance this if we
could get to some consensus. I don't get the feeling we're there yet.
Suggestions welcome.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Bruce Momjian <bruce(at)momjian(dot)us>, "<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <stark(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-04 16:16:50
Message-ID:	603c8f070906040916o6c3582c9x1e14be14f3b22cec@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Jun 4, 2009 at 11:32 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> I was going to try to scare up some resources to advance this if we
> could get to some consensus. I don't get the feeling we're there yet.
> Suggestions welcome.

I think I might've said this before, but I think you need to do (or
get someone with knowledge of the code to do) more looking at the lock
bookkeeping that's required to make the SIREAD stuff work and try to
figure out if it's even feasible for PostgreSQL and what the
performance costs would be (an idea of how much code complexity this
would introduce would be good too). A lot of the "lack of consensus"
at this point looks to me more like "lack of being sure whether this
can actually work". I don't know that we're going to get any closer
to consensus without some less-handwavy answer to that question.

...Robert

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Greg Stark" <stark(at)enterprisedb(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Bruce Momjian" <bruce(at)momjian(dot)us>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-04 16:55:45
Message-ID:	4A27B641020000250002750B@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Thu, Jun 4, 2009 at 11:32 AM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> I was going to try to scare up some resources to advance this if we
>> could get to some consensus. I don't get the feeling we're there
>> yet. Suggestions welcome.
>
> I think I might've said this before, but I think you need to do (or
> get someone with knowledge of the code to do) more looking at the
> lock bookkeeping that's required to make the SIREAD stuff work and
> try to figure out if it's even feasible for PostgreSQL and what the
> performance costs would be (an idea of how much code complexity this
> would introduce would be good too). A lot of the "lack of
> consensus" at this point looks to me more like "lack of being sure
> whether this can actually work". I don't know that we're going to
> get any closer to consensus without some less-handwavy answer to
> that question.

I'd feel a lot more comfortable about trying to go that route if there
weren't heavy hitters insisting that "no serialization failures
without a reason that can be easily explained to users" is a
requirement. I don't believe it will ever work that way, so I see no
point moving farther. Either that requirement needs to be removed, or
someone who thinks it can be made to work that way will have to take
up the cause if this is to go anywhere.

I do agree, wholeheartedly, that if we get consensus on functional
requirements, the next step would be to identify the specific changes
required. In other words, I haven't forgotten your previous
suggestion, and it seems like the *next* step if we can wrap *this*
one.

-Kevin

From:	Bruce Momjian <bruce(at)momjian(dot)us>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	"<Markus Wanner" <markus(at)bluegap(dot)ch>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Greg Stark <stark(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, "<pgsql-hackers(at)postgresql(dot)org>" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: User-facing aspects of serializable transactions
Date:	2009-06-04 23:10:32
Message-ID:	200906042310.n54NAWc03770@momjian.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Kevin Grittner wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > Added to TODO:
> >
> > Consider improving serialized transaction behavior to avoid
> > anomalies
> >
> > *
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg01136.php
> > *
> http://archives.postgresql.org/pgsql-hackers/2009-06/msg00035.php
>
>
> It might be worth adding this reference, too, since it gets down to
> some possible implementation techniques:
>
> http://archives.postgresql.org/pgsql-hackers/2009-05/msg00217.php

Done.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +