Re: Serializable Snapshot Isolation

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <gsstark(at)mit(dot)edu>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-25 22:28:05
Message-ID: 4C9E31250200002500035DC6@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:

> So T1 must have happened before TN because it wrote something based
> on data as it was before TN modified it. But T0 can see TN but not
> T1 so there's no complete ordering between the three transactions
> that makes them all make sense.

Correct.

> The thing is that the database state is reasonable, the database
> state is after it would be if the ordering were T1,TN with T0
> happening any time. And the backup state is reasonable, it's as if
> it occurred after TN and before T1. They just don't agree.

I agree that the database state eventually "settles" into a valid
long-term condition in this particular example. The point you are
conceding seems to be that the image captured by pg_dump is not
consistent with that. If so, I agree. You don't see that as a
problem; I do. I'm not sure where we go from there. Certainly that
is better than making pg_dump vulnerable to serialization failure --
if we don't implement the SERIALIZABLE READ ONLY DEFERRABLE
transactions I was describing, we can change pg_dump to use
REPEATABLE READ and we will be no worse off than we are now.

The new feature I was proposing was that we create a SERIALIZABLE
READ ONLY DEFERRABLE transaction style which would, rather than
acquiring predicate locks and watching for conflicts, potentially
wait until it could acquire a snapshot which was guaranteed to be
conflict-free. In the example discussed on this thread, if we
changed pg_dump to use such a mode, when it went to acquire a
snapshot it would see that it overlapped T1, which was not READ ONLY,
which in turn overlapped TN, which had written to a table and
committed. It would then block until completion of the T1
transaction and adjust its snapshot to make that transaction visible.
You would now have a backup entirely consistent with the long-term
state of the database, with no risk of serialization failure and no
bloating of the predicate lock structures.

The only down side is that there could be blocking when such a
transaction acquires its snapshot. That seems a reasonable price to
pay for backup integrity. Obviously, if we had such a mode, it would
be trivial to add a switch to the pg_dump command line which would
let the user choose between guaranteed dump integrity and guaranteed
lack of blocking at the start of the dump.

-Kevin


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: heikki(dot)linnakangas(at)enterprisedb(dot)com, drkp(at)csail(dot)mit(dot)edu, gsstark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: Serializable Snapshot Isolation
Date: 2010-09-25 22:38:14
Message-ID: AANLkTinqTbnEGwMdJbxV5wd4efiRmvunbnFH0M=fU9-p@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Just to be clear I wasn't saying it was or wasn't a problem, I was just
trying to see if I understand the problem and if I do maybe help bring
others up to speed.
On 25 Sep 2010 23:28, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Greg Stark wrote:
>
>> So T1 must have happened before TN because it wrote something based
>> on data as it was before TN modified it. But T0 can see TN but not
>> T1 so there's no complete ordering between the three transactions
>> that makes them all make sense.
>
> Correct.
>
>> The thing is that the database state is reasonable, the database
>> state is after it would be if the ordering were T1,TN with T0
>> happening any time. And the backup state is reasonable, it's as if
>> it occurred after TN and before T1. They just don't agree.
>
> I agree that the database state eventually "settles" into a valid
> long-term condition in this particular example. The point you are
> conceding seems to be that the image captured by pg_dump is not
> consistent with that. If so, I agree. You don't see that as a
> problem; I do. I'm not sure where we go from there. Certainly that
> is better than making pg_dump vulnerable to serialization failure --
> if we don't implement the SERIALIZABLE READ ONLY DEFERRABLE
> transactions I was describing, we can change pg_dump to use
> REPEATABLE READ and we will be no worse off than we are now.
>
> The new feature I was proposing was that we create a SERIALIZABLE
> READ ONLY DEFERRABLE transaction style which would, rather than
> acquiring predicate locks and watching for conflicts, potentially
> wait until it could acquire a snapshot which was guaranteed to be
> conflict-free. In the example discussed on this thread, if we
> changed pg_dump to use such a mode, when it went to acquire a
> snapshot it would see that it overlapped T1, which was not READ ONLY,
> which in turn overlapped TN, which had written to a table and
> committed. It would then block until completion of the T1
> transaction and adjust its snapshot to make that transaction visible.
> You would now have a backup entirely consistent with the long-term
> state of the database, with no risk of serialization failure and no
> bloating of the predicate lock structures.
>
> The only down side is that there could be blocking when such a
> transaction acquires its snapshot. That seems a reasonable price to
> pay for backup integrity. Obviously, if we had such a mode, it would
> be trivial to add a switch to the pg_dump command line which would
> let the user choose between guaranteed dump integrity and guaranteed
> lack of blocking at the start of the dump.
>
> -Kevin