SSI atomic commit

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: SSI atomic commit
Date: 2011-07-05 17:03:52
Message-ID: 4E12FDA8020000250003EFC1@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In reviewing the 2PC changes mentioned in a separate post, both Dan
and I realized that these were dependent on the assumption that
SSI's commitSeqNo is assigned in the order in which the transactions
became visible. There is a race condition such that this is not
necessarily true. It is a very narrow race condition, which would
come up very rarely in practice, but Murphy's Law being what it is,
I think we need to consider it a bug and fix it.

We considered a fix which would be contained within predicate.c code
and operate by making pessimistic assumptions, so that no false
negatives occurred. The reason we didn't go that way is that the
code would be very convoluted and fragile. The attached patch just
makes it atomic in a very direct way, and adjusts the predicate.c
code to use the right tests in the right places. We were careful
not to add any LW locking to the path that a normal transaction
without an XID is terminating, since there had obviously been
significant work put into keeping locks out of that code path.

Dan and I collaborated on this code over the holiday weekend, and
are in agreement that this is the right route. I know it's only six
days before beta3, but I'm hopeful that someone can review this for
commit in time to make it into that beta.

This patch applies over the top of the fix for the 2PC permutation
problems.

I apologize for having found this bug so late in the release cycle.

-Kevin

Attachment Content-Type Size
ssi-atomic-commit.patch text/plain 19.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-05 18:01:19
Message-ID: 4E13516F.50301@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.07.2011 20:03, Kevin Grittner wrote:
> In reviewing the 2PC changes mentioned in a separate post, both Dan
> and I realized that these were dependent on the assumption that
> SSI's commitSeqNo is assigned in the order in which the transactions
> became visible. There is a race condition such that this is not
> necessarily true. It is a very narrow race condition, which would
> come up very rarely in practice, but Murphy's Law being what it is,
> I think we need to consider it a bug and fix it.
>
> We considered a fix which would be contained within predicate.c code
> and operate by making pessimistic assumptions, so that no false
> negatives occurred. The reason we didn't go that way is that the
> code would be very convoluted and fragile. The attached patch just
> makes it atomic in a very direct way, and adjusts the predicate.c
> code to use the right tests in the right places. We were careful
> not to add any LW locking to the path that a normal transaction
> without an XID is terminating, since there had obviously been
> significant work put into keeping locks out of that code path.

Hmm, I think it would be simpler to decide that instead of
SerializableXactHashLock, you must hold ProcArrayLock to access
LastSxactCommitSeqNo, and move the assignment of commitSeqNo to
ProcArrayTransaction(). It's probably easiest to move
LastSxactCommitSeqno to ShmemVariableCache too. There's a few places
that would then need to acquire ProcArrayLock to read
LastSxactCommitSeqno, but I feel it might still be much simpler that way.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-05 18:15:13
Message-ID: 4E130E61020000250003EFD8@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Hmm, I think it would be simpler to decide that instead of
> SerializableXactHashLock, you must hold ProcArrayLock to access
> LastSxactCommitSeqNo, and move the assignment of commitSeqNo to
> ProcArrayTransaction(). It's probably easiest to move
> LastSxactCommitSeqno to ShmemVariableCache too. There's a few
> places that would then need to acquire ProcArrayLock to read
> LastSxactCommitSeqno, but I feel it might still be much simpler
> that way.

We considered that. I think the biggest problem was that when there
is no XID it wouldn't be covered by the lock on assignment. We
couldn't see a good way to increment and assign the value without LW
lock coverage, and we didn't want to add LW locking to that code
path. If you can see a way around that issue, I agree it would be
simpler.

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-05 18:16:20
Message-ID: 21785.1309889780@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Hmm, I think it would be simpler to decide that instead of
> SerializableXactHashLock, you must hold ProcArrayLock to access
> LastSxactCommitSeqNo, and move the assignment of commitSeqNo to
> ProcArrayTransaction(). It's probably easiest to move
> LastSxactCommitSeqno to ShmemVariableCache too. There's a few places
> that would then need to acquire ProcArrayLock to read
> LastSxactCommitSeqno, but I feel it might still be much simpler that way.

Yeah ... this patch creats the need to hold both
SerializableXactHashLock and ProcArrayLock during transaction commit,
which is a bit scary from a deadlock-risk perspective, and not pleasant
from the concurrency standpoint either. It'd be better to push some
functionality into the procarray code.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-05 18:35:37
Message-ID: 4E131329020000250003EFE3@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Hmm, I think it would be simpler to decide that instead of
>> SerializableXactHashLock, you must hold ProcArrayLock to access
>> LastSxactCommitSeqNo, and move the assignment of commitSeqNo to
>> ProcArrayTransaction(). It's probably easiest to move
>> LastSxactCommitSeqno to ShmemVariableCache too. There's a few
>> places that would then need to acquire ProcArrayLock to read
>> LastSxactCommitSeqno, but I feel it might still be much simpler
>> that way.
>
> Yeah ... this patch creats the need to hold both
> SerializableXactHashLock and ProcArrayLock during transaction
> commit, which is a bit scary from a deadlock-risk perspective,

If I remember correctly, we already do some calls to functions which
acquire ProcArrayLock while holding SerializableXactHashLock, in
order to acquire a snapshot with the right timing. Regardless of
what we do on transaction completion, we either need to document
that in the code comments, or do some major surgery at the last
minute, which is always scary.

> and not pleasant from the concurrency standpoint either.

On the bright side, both locks are not held at the same time unless
the transaction isolation level is serializable.

> It'd be better to push some functionality into the procarray code.

That's easily done if we don't mind taking out a ProcArrayLock
during completion of a transaction which has no XID, if only long
enough to increment a uint64 in shared memory, and then stash the
value -- somewhere -- so that SSI code can find and use it. We
thought it was best to avoid that so there wasn't any impact on
non-serializable transactions.

-Kevin


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-05 19:17:48
Message-ID: CA+Tgmoba7_6Hq4q0+uDQ=TBteic6s44aTfMpcEndzQhqSjQYuQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 5, 2011 at 2:35 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> It'd be better to push some functionality into the procarray code.
>
> That's easily done if we don't mind taking out a ProcArrayLock
> during completion of a transaction which has no XID, if only long
> enough to increment a uint64 in shared memory, and then stash the
> value -- somewhere -- so that SSI code can find and use it.

That sure sounds scary from a scalability perspective. If we can
piggyback on an existing ProcArrayLock acquisition, fine, but
additional ProcArrayLock acquisitions when SSI isn't even being used
sound like a real bad idea to me. I doubt you'll notice much of a
performance regression in the current code, but if/when we fix the
lock manager bottlenecks that my fastlock and lazy vxid lock patches
are intended to correct, then I suspect it's going to matter quite a
bit.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-05 19:30:43
Message-ID: 23263.1309894243@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Jul 5, 2011 at 2:35 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> That's easily done if we don't mind taking out a ProcArrayLock
>> during completion of a transaction which has no XID, if only long
>> enough to increment a uint64 in shared memory, and then stash the
>> value -- somewhere -- so that SSI code can find and use it.

> That sure sounds scary from a scalability perspective. If we can
> piggyback on an existing ProcArrayLock acquisition, fine, but
> additional ProcArrayLock acquisitions when SSI isn't even being used
> sound like a real bad idea to me.

Isn't SSI *already* forcing a new acquisition of an LWLock during
commits of read-only transactions that aren't using SSI? Perhaps
there's a bit less contention on SerializableXactHashLock than on
ProcArrayLock, but it's not obvious that the current situation is
a lot better than this would be.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: SSI atomic commit
Date: 2011-07-05 19:34:45
Message-ID: 4E132105020000250003EFE9@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Jul 5, 2011 at 2:35 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> It'd be better to push some functionality into the procarray
>>> code.
>>
>> That's easily done if we don't mind taking out a ProcArrayLock
>> during completion of a transaction which has no XID, if only long
>> enough to increment a uint64 in shared memory, and then stash the
>> value -- somewhere -- so that SSI code can find and use it.
>
> That sure sounds scary from a scalability perspective. If we can
> piggyback on an existing ProcArrayLock acquisition, fine, but
> additional ProcArrayLock acquisitions when SSI isn't even being
> used sound like a real bad idea to me. I doubt you'll notice much
> of a performance regression in the current code, but if/when we
> fix the lock manager bottlenecks that my fastlock and lazy vxid
> lock patches are intended to correct, then I suspect it's going to
> matter quite a bit.

Just to be clear, the patch as submitted does *not* acquire a
ProcArrayLock lock for completion of a transaction which has no XID.
What you quoted from me was explaining what would seem to be
required (unless Dan and I missed something) to simplify the patch
as Tom and Heikki proposed. We saw that approach and its
simplicity, but shied away from it because of the locking issue and
its potential performance impact.

We took the route we did in this fix patch to avoid such issues.
I'm not so sure that the impact on a load of many short read-only
transactions would be so benign as things stand. It seemed to me
that one of the reasons to avoid *getting* an XID was to avoid
acquiring ProcArrayLock on transaction completion. The way we did
this patch may indeed slow serializable transactions more than the
alternative; but from the beginning I thought the ground rules were
that SSI had to have no significant impact on those who didn't
choose to use it.

I suspect that most of the 9.2 work on SSI will be for improved
performance (including in that, as I do, a reduction in the
percentage of false positive serialization failures). Tuning this
should go on that list. It may well benefit from using some of the
techniques you've been working on.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-05 19:40:12
Message-ID: 4E13224C020000250003EFF0@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Isn't SSI *already* forcing a new acquisition of an LWLock during
> commits of read-only transactions that aren't using SSI?

During COMMIT PREPARED there is one. We could avoid that by storing
the transaction isolation level in the persistent data for a
prepared statement, but that seems inappropriate for 9.1 at this
point, and it's hard to be sure that would be a net win. Otherwise
I don't *think* there's an extra LW lock for a non-serializable
transaction (whether or not read-only). Do you see one I'm not
remembering?

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-05 19:54:36
Message-ID: 20110705195436.GB94047@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Jul 05, 2011 at 01:15:13PM -0500, Kevin Grittner wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
> > Hmm, I think it would be simpler to decide that instead of
> > SerializableXactHashLock, you must hold ProcArrayLock to access
> > LastSxactCommitSeqNo, and move the assignment of commitSeqNo to
> > ProcArrayTransaction(). It's probably easiest to move
> > LastSxactCommitSeqno to ShmemVariableCache too. There's a few
> > places that would then need to acquire ProcArrayLock to read
> > LastSxactCommitSeqno, but I feel it might still be much simpler
> > that way.
>
> We considered that. I think the biggest problem was that when there
> is no XID it wouldn't be covered by the lock on assignment.

One other issue is that after the sequence number is assigned, it still
needs to be stored in MySerializableXact->commitSeqNo. Modifying that
does require taking SerializableXactHashLock.

With the proposed patch, assigning the next commitSeqNo and storing it
in MySerializableXact happen atomically. That makes it possible to say
that a transaction that has a commitSeqNo must have committed before
one that doesn't. If the two steps are separated, that isn't true: two
transactions might get their commitSeqNos in one order and make them
visible in the other. We should be able to deal with that, but it will
make some of the commit ordering checks more complicated.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 15:45:14
Message-ID: 4E15D48A.7080203@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 05.07.2011 20:03, Kevin Grittner wrote:
> In reviewing the 2PC changes mentioned in a separate post, both Dan
> and I realized that these were dependent on the assumption that
> SSI's commitSeqNo is assigned in the order in which the transactions
> became visible.

This comment in the patch actually suggests a stronger requirement:

> + * For correct SERIALIZABLE semantics, the commitSeqNo must appear to be set
> + * atomically with the work of the transaction becoming visible to other
> + * transactions.

So, is it enough for the commitSeqNos to be set in the order the
transactions become visible to others? I'm assuming 'yes' for now, as
the approach being discussed to assign commitSeqNo in
ProcArrayEndTransaction() without also holding SerializableXactHashLock
is not going to work otherwise, and if I understood correctly you didn't
see any correctness issue with that. Please shout if I'm missing something.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 16:41:16
Message-ID: 4E159B5C020000250003F063@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 05.07.2011 20:03, Kevin Grittner wrote:
>> In reviewing the 2PC changes mentioned in a separate post, both
>> Dan and I realized that these were dependent on the assumption
>> that SSI's commitSeqNo is assigned in the order in which the
>> transactions became visible.
>
> This comment in the patch actually suggests a stronger
> requirement:
>
>> + * For correct SERIALIZABLE semantics, the commitSeqNo must
>> appear to be set
>> + * atomically with the work of the transaction becoming visible
>> to other
>> + * transactions.
>
> So, is it enough for the commitSeqNos to be set in the order the
> transactions become visible to others? I'm assuming 'yes' for now,
> as the approach being discussed to assign commitSeqNo in
> ProcArrayEndTransaction() without also holding
> SerializableXactHashLock is not going to work otherwise, and if I
> understood correctly you didn't see any correctness issue with
> that. Please shout if I'm missing something.

Hmm. The more strict semantics are much easier to reason about and
ensure correctness. I think that the looser semantics can be made
to work, but there needs to be more fussy logic in the SSI code to
deal with the fact that we don't know whether the visibility change
has occurred. Really what pushed us to the patch using the stricter
semantics was how much the discussion of how to cover the edge
conditions with the looser semantics made our heads spin. Anything
that confusing seems more prone to subtle bugs.

If people really think that route is better I can put a patch
together so that the two approaches can be compared side-by-side.
If we go that way, it seemed that maybe we should move
LastSxactCommitSeqNo to VariableCacheData, right below
latestCompletedXid. We need some way to communicate that the
commitSeqNo has been set -- probably a volatile boolean in local
memory, and we need to acquire ProcArrayLock in some places we
currently don't within the SSI code to get at the value. Off-hand,
I don't see how that can be done without holding
SerializableXactHashLock while grabbing ProcArrayLock; and if we
need to do that to grab the assigned value, I'm not sure how much
we're buying. Do you see some way to avoid that?

The other thing we need to do if we go with the looser semantics is
be pessimistic -- in all tests for dangerous structures we need to
assume that any transaction which is prepared *may* also be
committed, and may have committed before any other transaction which
is prepared or committed. It would be good to put some bounds on
this. I guess any monotonically increasing number in the system
which can be grabbed just before the commit would do.

Unless you've come up with some clever approach we missed, I fear
that a patch based on the looser semantics will be bigger and more
fragile.

Do you agree that the patch Dan and I posted *does not add any LW
locking* to a normal (not prepared) transaction if it either has no
XID or is not SERIALIZABLE? And that ProcArrayLock is not held
during COMMIT PREPARED or ROLLBACK PREPARED unless the transaction
is SERIALIZABLE? I'm a bit concerned that we're headed down this
other path because people aren't clear about these facts.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 18:50:45
Message-ID: 4E160005.1020300@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.07.2011 19:41, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 05.07.2011 20:03, Kevin Grittner wrote:
>>> In reviewing the 2PC changes mentioned in a separate post, both
>>> Dan and I realized that these were dependent on the assumption
>>> that SSI's commitSeqNo is assigned in the order in which the
>>> transactions became visible.
>>
>> This comment in the patch actually suggests a stronger
>> requirement:
>>
>>> + * For correct SERIALIZABLE semantics, the commitSeqNo must
>>> appear to be set
>>> + * atomically with the work of the transaction becoming visible
>>> to other
>>> + * transactions.
>>
>> So, is it enough for the commitSeqNos to be set in the order the
>> transactions become visible to others? I'm assuming 'yes' for now,
>> as the approach being discussed to assign commitSeqNo in
>> ProcArrayEndTransaction() without also holding
>> SerializableXactHashLock is not going to work otherwise, and if I
>> understood correctly you didn't see any correctness issue with
>> that. Please shout if I'm missing something.
>
> Hmm. The more strict semantics are much easier to reason about and
> ensure correctness.

True.

> I think that the looser semantics can be made
> to work, but there needs to be more fussy logic in the SSI code to
> deal with the fact that we don't know whether the visibility change
> has occurred. Really what pushed us to the patch using the stricter
> semantics was how much the discussion of how to cover the edge
> conditions with the looser semantics made our heads spin. Anything
> that confusing seems more prone to subtle bugs.

Let's step back and analyse a bit closer what goes wrong with the
current semantics. The problem is that commitSeqNo is assigned too late,
sometime after the transaction has become visible to others.

Looking at the comparisons that we do with commitSeqNos, they all have
conservative direction that we can err to, when in doubt. So here's an idea:

Let's have two sequence numbers for each transaction: prepareSeqNo and
commitSeqNo. prepareSeqNo is assigned when a transaction is prepared (in
PreCommit_CheckForSerializableConflicts), and commitSeqNo is assigned
when it's committed (in ReleasePredicateLocks). They are both assigned
from one counter, LastSxactCommitSeqNo, so that is advanced twice per
transaction, and prepareSeqNo is always smaller than commitSeqNo for a
transaction. Modify operations that currently use commitSeqNo to use
either prepareSeqNo or commitSeqNo, so that we err on the safe side.

That yields a much smaller patch (attached). How does this look to you,
am I missing anything?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
track-prepareseqno-1.patch text/x-diff 3.6 KB

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 18:59:17
Message-ID: 4E15BBB5020000250003F06C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Let's have two sequence numbers for each transaction: prepareSeqNo
> and commitSeqNo. prepareSeqNo is assigned when a transaction is
> prepared (in PreCommit_CheckForSerializableConflicts), and
> commitSeqNo is assigned when it's committed (in
> ReleasePredicateLocks). They are both assigned from one counter,
> LastSxactCommitSeqNo, so that is advanced twice per transaction,
> and prepareSeqNo is always smaller than commitSeqNo for a
> transaction. Modify operations that currently use commitSeqNo to
> use either prepareSeqNo or commitSeqNo, so that we err on the safe
> side.
>
> That yields a much smaller patch (attached). How does this look to
> you, am I missing anything?

Very clever. I'll need to study this and think about it. I'll try
to post a response before I go to bed tonight. Hopefully Dan can
weigh in, too. (I know he was traveling with limited Internet
access -- not sure about his return date.)

-Kevin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 20:24:55
Message-ID: 28477.1310070295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> That yields a much smaller patch (attached). How does this look to
>> you, am I missing anything?

> Very clever. I'll need to study this and think about it. I'll try
> to post a response before I go to bed tonight. Hopefully Dan can
> weigh in, too. (I know he was traveling with limited Internet
> access -- not sure about his return date.)

Just so everybody's clear --- this isn't making beta3, if it's not going
to get committed today.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 20:33:36
Message-ID: 4E15D1D0020000250003F092@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Just so everybody's clear --- this isn't making beta3, if it's not
> going to get committed today.

Yeah, Heikki let me know that off-list. I thought the last mention
on the -hackers list of a cutoff date for that was the 11th. Did I
misunderstand or was there some later change which didn't get
mentioned on the list?

In any event, priorities have been adjusted....

Thanks,

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 20:35:13
Message-ID: 4E161881.6090800@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 07.07.2011 21:50, Heikki Linnakangas wrote:
> On 07.07.2011 19:41, Kevin Grittner wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>> On 05.07.2011 20:03, Kevin Grittner wrote:
>>>> In reviewing the 2PC changes mentioned in a separate post, both
>>>> Dan and I realized that these were dependent on the assumption
>>>> that SSI's commitSeqNo is assigned in the order in which the
>>>> transactions became visible.
>>>
>>> This comment in the patch actually suggests a stronger
>>> requirement:
>>>
>>>> + * For correct SERIALIZABLE semantics, the commitSeqNo must
>>>> appear to be set
>>>> + * atomically with the work of the transaction becoming visible
>>>> to other
>>>> + * transactions.
>>>
>>> So, is it enough for the commitSeqNos to be set in the order the
>>> transactions become visible to others? I'm assuming 'yes' for now,
>>> as the approach being discussed to assign commitSeqNo in
>>> ProcArrayEndTransaction() without also holding
>>> SerializableXactHashLock is not going to work otherwise, and if I
>>> understood correctly you didn't see any correctness issue with
>>> that. Please shout if I'm missing something.
>>
>> Hmm. The more strict semantics are much easier to reason about and
>> ensure correctness.
>
> True.
>
>> I think that the looser semantics can be made
>> to work, but there needs to be more fussy logic in the SSI code to
>> deal with the fact that we don't know whether the visibility change
>> has occurred. Really what pushed us to the patch using the stricter
>> semantics was how much the discussion of how to cover the edge
>> conditions with the looser semantics made our heads spin. Anything
>> that confusing seems more prone to subtle bugs.
>
> Let's step back and analyse a bit closer what goes wrong with the
> current semantics. The problem is that commitSeqNo is assigned too late,
> sometime after the transaction has become visible to others.
>
> Looking at the comparisons that we do with commitSeqNos, they all have
> conservative direction that we can err to, when in doubt. So here's an
> idea:
>
> Let's have two sequence numbers for each transaction: prepareSeqNo and
> commitSeqNo. prepareSeqNo is assigned when a transaction is prepared (in
> PreCommit_CheckForSerializableConflicts), and commitSeqNo is assigned
> when it's committed (in ReleasePredicateLocks). They are both assigned
> from one counter, LastSxactCommitSeqNo, so that is advanced twice per
> transaction, and prepareSeqNo is always smaller than commitSeqNo for a
> transaction. Modify operations that currently use commitSeqNo to use
> either prepareSeqNo or commitSeqNo, so that we err on the safe side.
>
> That yields a much smaller patch (attached). How does this look to you,
> am I missing anything?

Committed, so that we get this into beta3. We had some quick offlist
discussion with Keving and Dan about this, Kevin pointed out a few more
places that needed to use prepareSeqNo instead of commitSeqNo, out of
which one seemed legitimate to me, and was included in the committed patch.

Kevin and Dan also pointed out that a 2PC transaction can stay in
prepared state for a long time, and we could optimize that by setting
prepareSeqNo only at the final COMMIT PREPARED. I objected to that, for
the reason that it's currently very convenient for testing purposes that
a transaction prepared with PREPARE TRANSACTION is in the exact same
state as regular transaction that's going through regular commit processing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 20:37:15
Message-ID: 2329.1310071035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> writes:
> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Just so everybody's clear --- this isn't making beta3, if it's not
>> going to get committed today.

> Yeah, Heikki let me know that off-list. I thought the last mention
> on the -hackers list of a cutoff date for that was the 11th. Did I
> misunderstand or was there some later change which didn't get
> mentioned on the list?

The 11th is the announce date. We normally wrap the Thursday before,
so that packagers (particularly the Windows-installer crew) have some
time to do their thing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 20:48:55
Message-ID: 2648.1310071735@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
> Kevin and Dan also pointed out that a 2PC transaction can stay in
> prepared state for a long time, and we could optimize that by setting
> prepareSeqNo only at the final COMMIT PREPARED. I objected to that, for
> the reason that it's currently very convenient for testing purposes that
> a transaction prepared with PREPARE TRANSACTION is in the exact same
> state as regular transaction that's going through regular commit processing.

Seems to me there's a more fundamental reason not to do that, which is
that once you've done PREPARE it is no longer legitimate to decide to
roll back the transaction to get out of a "dangerous" structure --- ie,
you have to target one of the other xacts involved instead. Shouldn't
the assignment of a prepareSeqNo correspond to the point where the xact
is no longer a target for SSI rollback?

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:04:41
Message-ID: 4E15D919020000250003F09E@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> writes:
>> Kevin and Dan also pointed out that a 2PC transaction can stay
>> in prepared state for a long time, and we could optimize that by
>> setting prepareSeqNo only at the final COMMIT PREPARED. I
>> objected to that, for the reason that it's currently very
>> convenient for testing purposes that a transaction prepared with
>> PREPARE TRANSACTION is in the exact same state as regular
>> transaction that's going through regular commit processing.
>
> Seems to me there's a more fundamental reason not to do that,
> which is that once you've done PREPARE it is no longer legitimate
> to decide to roll back the transaction to get out of a "dangerous"
> structure --- ie, you have to target one of the other xacts
> involved instead. Shouldn't the assignment of a prepareSeqNo
> correspond to the point where the xact is no longer a target for
> SSI rollback?

No, it wouldn't be useful at all if we had an accurate commitSeqNo.
It's being used to bracket the moment of visibility, which with
Heikki's patch now falls between the prepareSeqNo and commitSeqNo.
The name is perhaps misleading. It's useful only for determining
what *other* transactions require rollback.

In any event, SSI will never cause a transaction to fail after it
has prepared. This patch doesn't change that, nor would the change
Dan and I suggested.

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:08:07
Message-ID: 20110707210807.GE76634@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 07, 2011 at 04:48:55PM -0400, Tom Lane wrote:
> Seems to me there's a more fundamental reason not to do that, which is
> that once you've done PREPARE it is no longer legitimate to decide to
> roll back the transaction to get out of a "dangerous" structure --- ie,
> you have to target one of the other xacts involved instead. Shouldn't
> the assignment of a prepareSeqNo correspond to the point where the xact
> is no longer a target for SSI rollback?

That part is already accomplished by setting SXACT_FLAG_PREPARED (and
choosing a new victim if we think we want to abort a transaction with
that flag set).

prepareSeqNo is being used as a lower bound on the transaction's commit
sequence number. It's currently set at the same time as the PREPARED
flag, but it doesn't have to be.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:21:59
Message-ID: 20110707212159.GF76634@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We should also apply the attached patch, which corrects a minor issue
with the conditions for flagging transactions that could potentially
make a snapshot unsafe.

There's a small window wherein a transaction is committed but not yet
on the finished list, and we shouldn't flag it as a potential conflict
if so. We can also skip adding a doomed transaction to the list of
possible conflicts because we know it won't commit.

This is not really a related issue, but Kevin and I found it while
looking into this issue, and it was included in the patch we sent out.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Attachment Content-Type Size
ssi-possibleconflict-test.patch text/x-diff 677 bytes

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:24:21
Message-ID: 4E15DDB5020000250003F0A5@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> We should also apply the attached patch, which corrects a minor
> issue with the conditions for flagging transactions that could
> potentially make a snapshot unsafe.
>
> There's a small window wherein a transaction is committed but not
> yet on the finished list, and we shouldn't flag it as a potential
> conflict if so. We can also skip adding a doomed transaction to
> the list of possible conflicts because we know it won't commit.
>
> This is not really a related issue, but Kevin and I found it while
> looking into this issue, and it was included in the patch we sent
> out.

Agreed -- that was in the SSI atomic commit patch, but probably
should have been submitted separately. The issue is really
orthogonal -- it's just that we found it while looking at the other
race condition.

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:33:28
Message-ID: 4E162628.6080100@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.07.2011 00:21, Dan Ports wrote:
> We should also apply the attached patch, which corrects a minor issue
> with the conditions for flagging transactions that could potentially
> make a snapshot unsafe.
>
> There's a small window wherein a transaction is committed but not yet
> on the finished list, and we shouldn't flag it as a potential conflict
> if so. We can also skip adding a doomed transaction to the list of
> possible conflicts because we know it won't commit.

Hmm, it's now also possible for the transaction to be prepared, and
already visible to others, but not yet flagged as committed. Shouldn't
those be included too?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:35:24
Message-ID: 4E16269C.1000503@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 08.07.2011 00:33, Heikki Linnakangas wrote:
> On 08.07.2011 00:21, Dan Ports wrote:
>> We should also apply the attached patch, which corrects a minor issue
>> with the conditions for flagging transactions that could potentially
>> make a snapshot unsafe.
>>
>> There's a small window wherein a transaction is committed but not yet
>> on the finished list, and we shouldn't flag it as a potential conflict
>> if so. We can also skip adding a doomed transaction to the list of
>> possible conflicts because we know it won't commit.
>
> Hmm, it's now also possible for the transaction to be prepared, and
> already visible to others, but not yet flagged as committed. Shouldn't
> those be included too?

Oh, never mind, I read the test backwards. They are included, as the
patch stands. I'll commit this too..

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI atomic commit
Date: 2011-07-07 21:40:13
Message-ID: 4E15E16D020000250003F0AB@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I'll commit this too..

Thanks much for staying up past midnight to get these into beta3!

-Kevin