SSI patch version 15

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: SSI patch version 14
Date: 2011-01-25 03:30:58
Message-ID: 4D3DEF920200002500039BCC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached.

Dan and I have spent many, many hours desk-check and testing,
including pounding for many hours in DBT-2 at high concurrency
factors on a 16 processor box. In doing so, we found and fixed a few
issues. Neither of us is aware of any bugs or deficiencies
remaining, even after a solid day of pounding in DBT-2, other than
the failure to extend any new functionality to hot standby.

At Heikki's suggestion I have included a test to throw an error on an
attempt to switch to serializable mode during recovery. Anything
more to address that issue can be a small follow-up patch -- probably
mainly a few notes in the docs.

-Kevin

Attachment Content-Type Size
ssi-14.patch.gz application/octet-stream 84.7 KB

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-25 09:07:39
Message-ID: 1295946459.11513.338.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:
> Attached.
>
> Dan and I have spent many, many hours desk-check and testing,
> including pounding for many hours in DBT-2 at high concurrency
> factors on a 16 processor box. In doing so, we found and fixed a few
> issues. Neither of us is aware of any bugs or deficiencies
> remaining, even after a solid day of pounding in DBT-2, other than
> the failure to extend any new functionality to hot standby.
>
> At Heikki's suggestion I have included a test to throw an error on an
> attempt to switch to serializable mode during recovery. Anything
> more to address that issue can be a small follow-up patch -- probably
> mainly a few notes in the docs.

Here is my first crack at a review:

First of all, I am very impressed with the patch. I was pleased to see
that both READ ONLY DEFERRABLE and 2PC work! Also, I found the patch
very usable and did not encounter any bugs or big surprises.

Second, I do not understand this patch entirely, so the following
statements can be considered questions as much as answers.

At a high level, there is a nice conceptual simplicity. Let me try to
summarize it as I understand it:
* RW dependencies are detected using predicate locking.
* RW dependencies are tracked from the reading transaction (as an
"out") conflict; and from the writing transaction (as an "in"
conflict).
* Before committing a transaction, then by looking only at the RW
dependencies (and predicate locks) for current and past
transactions, you can determine if committing this transaction will
result in a cycle (and therefore a serialization anomaly); and if
so, abort it.

That's where the simplicity ends, however ;)

For starters, the above structures require unlimited memory, while we
have fixed amounts of memory. The predicate locks are referenced by a
global shared hash table as well as per-process SHMQueues. It can adapt
memory usage downward in three ways:
* increase lock granularity -- e.g. change N page locks into a
table lock
* "summarization" -- fold multiple locks on the same object across
many old committed transactions into a single lock
* use the SLRU

Tracking of RW conflicts of current and past transactions is more
complex. Obviously, it doesn't keep _all_ past transactions, but only
ones that overlap with a currently-running transaction. It does all of
this management using SHMQueue. There isn't much of an attempt to
gracefully handle OOM here as far as I can tell, it just throws an error
if there's not enough room to track a new transaction (which is
reasonable, considering that it should be quite rare and can be
mitigated by increasing max_connections).

The heavy use of SHMQueue is quite reasonable, but for some reason I
find the API very difficult to read. I think it would help (at least me)
quite a lot to annotate (i.e. with a comment in the struct) the various
lists and links with the types of data being held.

The actual checking of conflicts isn't quite so simple, either, because
we want to detect problems before the victim transaction has done too
much work. So, checking is done along the way in several places, and
it's a little difficult to follow exactly how those smaller checks add
up to the overall serialization-anomaly check (the third point in my
simple model).

There are also optimizations around transactions declared READ ONLY.
Some of these are a little difficult to follow as well, and I haven't
followed them all.

There is READ ONLY DEFERRABLE, which is a nice feature that waits for a
"safe" snapshot, so that the transaction will never be aborted.

Now, on to my code comments (non-exhaustive):

* I see that you use a union as well as using "outLinks" to also be a
free list. I suppose you did this to conserve shared memory space,
right?

* In RegisterSerializableTransactionInt(), for a RO xact, it considers
any concurrent RW xact a possible conflict. It seems like it would be
possible to know whether a RW transaction may have overlapped with any
committed RW transaction (in finishedLink queue), and only add those as
potential conflicts. Would that work? If so, that would make more
snapshots safe.

* When a transaction finishes, then PID should probably be set to zero.
You only use it for waking up a deferrable RO xact waiting for a
snapshot, right?

* Still some compiler warnings:
twophase.c: In function ‘FinishPreparedTransaction’:
twophase.c:1360: warning: implicit declaration of function
‘PredicateLockTwoPhaseFinish’

* You have a function called CreatePredTran. We are already using
"Transaction" and "Xact" -- we probably don't need "Tran" as well.

* HS error has a typo:
"ERROR: cannot set transaction isolationn level to serializable during
recovery"

* I'm a little unclear on summarization and writing to the SLRU. I don't
see where it's determining that the predicate locks can be safely
released. Couldn't the oldest transaction still have relevant predicate
locks?

* In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
goes into summarization. But summarization assumes that it has at least
one finished xact. Is that always true? If you have enough memory to
hold a transaction for each connection, plus max_prepared_xacts, plus
one, I think that's true. But maybe that could be made more clear?

I'll keep working on this patch. I hope I can be of some help getting
this committed, because I'm looking forward to this feature. And I
certainly think that you and Dan have applied the amount of planning,
documentation, and careful implementation necessary for a feature like
this.

Regards,
Jeff Davis


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-25 10:57:42
Message-ID: 20110125105741.GA11855@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks for working your way through this patch. I'm certainly well
aware that that's not a trivial task!

I'm suffering through a bout of insomnia, so I'll respond to some of
your high-level comments in hopes that serializability will help put me
to sleep (as it often does). I'll leave the more detailed code comments
for later when I'm actually looking at the code, or better yet Kevin
will take care of them and I won't have to. ;-)

On Tue, Jan 25, 2011 at 01:07:39AM -0800, Jeff Davis wrote:
> At a high level, there is a nice conceptual simplicity. Let me try to
> summarize it as I understand it:
> * RW dependencies are detected using predicate locking.
> * RW dependencies are tracked from the reading transaction (as an
> "out") conflict; and from the writing transaction (as an "in"
> conflict).
> * Before committing a transaction, then by looking only at the RW
> dependencies (and predicate locks) for current and past
> transactions, you can determine if committing this transaction will
> result in a cycle (and therefore a serialization anomaly); and if
> so, abort it.

This summary is right on. I would add one additional detail or
clarification to the last point, which is that rather than checking for
a cycle, we're checking for a transaction with both "in" and "out"
conflicts, which every cycle must contain.

> That's where the simplicity ends, however ;)

Indeed!

> Tracking of RW conflicts of current and past transactions is more
> complex. Obviously, it doesn't keep _all_ past transactions, but only
> ones that overlap with a currently-running transaction. It does all of
> this management using SHMQueue. There isn't much of an attempt to
> gracefully handle OOM here as far as I can tell, it just throws an error
> if there's not enough room to track a new transaction (which is
> reasonable, considering that it should be quite rare and can be
> mitigated by increasing max_connections).

If the OOM condition you're referring to is the same one from the
following comment, then it can't happen: (Apologies if I've
misunderstood what you're referring to.)

> * In RegisterSerializableTransactionInt, if it doesn't get an sxact, it
> goes into summarization. But summarization assumes that it has at least
> one finished xact. Is that always true? If you have enough memory to
> hold a transaction for each connection, plus max_prepared_xacts, plus
> one, I think that's true. But maybe that could be made more clear?

Yes -- the SerializableXact pool is allocated up front and it
definitely has to be bigger than the number of possible active
transactions. In fact, it's much larger: 10 * (MaxBackends +
max_prepared_xacts) to allow some room for the committed transactions
we still have to track.

> * In RegisterSerializableTransactionInt(), for a RO xact, it considers
> any concurrent RW xact a possible conflict. It seems like it would be
> possible to know whether a RW transaction may have overlapped with any
> committed RW transaction (in finishedLink queue), and only add those as
> potential conflicts. Would that work? If so, that would make more
> snapshots safe.

Interesting idea. That's worth some careful thought. I think it's
related to the condition that the RW xact needs to commit with a
conflict out to a transaction earlier than the RO xact. My first guess
is that this wouldn't make more transactions safe, but could detect
safe snapshots faster.

> * When a transaction finishes, then PID should probably be set to zero.
> You only use it for waking up a deferrable RO xact waiting for a
> snapshot, right?

Correct. It probably wouldn't hurt to clear that field when releasing
the transaction, but we don't use it after.

> * I'm a little unclear on summarization and writing to the SLRU. I don't
> see where it's determining that the predicate locks can be safely
> released. Couldn't the oldest transaction still have relevant predicate
> locks?

When a SerializableXact gets summarized to the SLRU, its predicate
locks aren't released; they're transferred to the dummy
OldCommittedSxact.

> I'll keep working on this patch. I hope I can be of some help getting
> this committed, because I'm looking forward to this feature. And I
> certainly think that you and Dan have applied the amount of planning,
> documentation, and careful implementation necessary for a feature like
> this.

Hopefully my comments here will help clarify the patch. It's not lost
on me that there's no shortage of complexity in the patch, so if you
found anything particularly confusing we should probably add some
documentation to README-SSI.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-25 17:17:06
Message-ID: 4D3EB1320200002500039C53@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> For starters, the above structures require unlimited memory, while
> we have fixed amounts of memory. The predicate locks are
> referenced by a global shared hash table as well as per-process
> SHMQueues. It can adapt memory usage downward in three ways:
> * increase lock granularity -- e.g. change N page locks into a
> table lock
> * "summarization" -- fold multiple locks on the same object
> across many old committed transactions into a single lock
> * use the SLRU

Those last two are related -- the summarization process takes the
oldest committed-but-still-significant transaction and does two
things with it:

(1) We move predicate locks to the "dummy" transaction, kept just
for this purpose. We combine locks against the same lock target,
using the more recent commit point to determine when the resulting
lock can be removed.

(2) We use SLRU to keep track of the top level xid of the old
committed transaction, and the earliest commit point of any
transactions to which it had an out-conflict.

The above reduces the information available to avoid serialization
failure in certain corner cases, and is more expensive to access
than the other structures, but it keeps us running in pessimal
circumstances, even if at a degraded level of performance.

> The heavy use of SHMQueue is quite reasonable, but for some reason
> I find the API very difficult to read. I think it would help (at
> least me) quite a lot to annotate (i.e. with a comment in the
> struct) the various lists and links with the types of data being
> held.

We've tried to comment enough, but when you have your head buried in
code you don't always recognize how mysterious something can look.
Can you suggest some particular places where more comments would be
helpful?

> The actual checking of conflicts isn't quite so simple, either,
> because we want to detect problems before the victim transaction
> has done too much work. So, checking is done along the way in
> several places, and it's a little difficult to follow exactly how
> those smaller checks add up to the overall serialization-anomaly
> check (the third point in my simple model).
>
> There are also optimizations around transactions declared READ
> ONLY. Some of these are a little difficult to follow as well, and
> I haven't followed them all.

Yeah. After things were basically working correctly, in terms of
not allowing any anomalies, we still had a lot of false positives.
Checks around the order and timing of commits, as well as read-only
status, helped bring these down. The infamous receipting example
was my main guide here. There are 210 permutations in the way the
statements can be interleaved, of which only six can result in
anomalies. At first we only managed to commit a couple dozen
permutations. As we added code to cover optimizations for various
special cases the false positive rate dropped a little at a time,
until that test hit 204 commits, six rollbacks. Although, all the
tests in the dcheck target are useful -- if I made a mistake in
implementing an optimization there would sometimes be just one or
two of the other tests which would fail. Looking at which
permutations got it right and which didn't was a really good way to
figure out what I did wrong.

> There is READ ONLY DEFERRABLE, which is a nice feature that waits
> for a "safe" snapshot, so that the transaction will never be
> aborted.

*And* will not contribute to causing any other transaction to be
rolled back, *and* dodges the overhead of predicate locking and
conflict checking. Glad you like it! ;-)

> Now, on to my code comments (non-exhaustive):
>
> * I see that you use a union as well as using "outLinks" to also
> be a free list. I suppose you did this to conserve shared memory
> space, right?

Yeah, we tried to conserve shared memory space where we could do so
without hurting performance. Some of it gets to be a little bit-
twiddly, but it seemed like a good idea at the time. Does any of it
seem like it creates a confusion factor which isn't worth it
compared to the shared memory savings?

> * Still some compiler warnings:
> twophase.c: In function *FinishPreparedTransaction*:
> twophase.c:1360: warning: implicit declaration of function
> *PredicateLockTwoPhaseFinish*

Ouch! That could cause bugs, since the implicit declaration didn't
actually *match* the actual definition. Don't know how we missed
that. I strongly recommend that anyone who wants to test 2PC with
the patch add this commit to it:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=8e020d97bc7b8c72731688515b6d895f7e243e27

> * You have a function called CreatePredTran. We are already using
> "Transaction" and "Xact" -- we probably don't need "Tran" as well.

OK. Will rename if you like. Since that creates the PredTran
structure, I assume you want that renamed, too?

> * HS error has a typo:
> "ERROR: cannot set transaction isolationn level to serializable
> during recovery"

Will fix.

> I'll keep working on this patch. I hope I can be of some help
> getting this committed, because I'm looking forward to this
> feature. And I certainly think that you and Dan have applied the
> amount of planning, documentation, and careful implementation
> necessary for a feature like this.

Thanks much! This effort was driven, for my part, by the needs of
my employer; but I have to admit it was kinda fun to do some serious
coding on innovative ideas again. It's been a while. I'm ready to
kick back and party a bit when this gets done, though. ;-)

-Kevin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-25 18:37:32
Message-ID: 1295980652.11513.342.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-01-25 at 11:17 -0600, Kevin Grittner wrote:
> > The heavy use of SHMQueue is quite reasonable, but for some reason
> > I find the API very difficult to read. I think it would help (at
> > least me) quite a lot to annotate (i.e. with a comment in the
> > struct) the various lists and links with the types of data being
> > held.
>
> We've tried to comment enough, but when you have your head buried in
> code you don't always recognize how mysterious something can look.
> Can you suggest some particular places where more comments would be
> helpful?

I think just annotating RWConflict.in/outLink and
PredTranList.available/activeList with the types of things they hold
would be a help.

Also, you say something about RWConflict and "when the structure is not
in use". Can you elaborate on that a bit?

I'll address the rest of your comments in a later email.

Regards,
Jeff Davis


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-25 20:41:18
Message-ID: 4D3F356E.7050202@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.01.2011 05:30, Kevin Grittner wrote:
> Attached.

The readme says this:
> + 4. PostgreSQL supports subtransactions -- an issue not mentioned
> +in the papers.

But I don't see any mention anywhere else on how subtransactions are
handled. If a subtransaction aborts, are its predicate locks immediately
released?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-25 20:53:02
Message-ID: 4D3EE3CE0200002500039C74@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 25.01.2011 05:30, Kevin Grittner wrote:

> The readme says this:
>> 4. PostgreSQL supports subtransactions -- an issue not mentioned
>> in the papers.
>
> But I don't see any mention anywhere else on how subtransactions
> are handled. If a subtransaction aborts, are its predicate locks
> immediately released?

No. Here's the reasoning. Within a top level transaction, you
might start a subtransaction, read some data, and then decide based
on what you read that the subtransaction should be rolled back. If
the decision as to what is part of the top level transaction can
depend on what is read in the subtransaction, predicate locks taken
by the subtransaction must survive rollback of the subtransaction.

Does that make sense to you? Is there somewhere you would like to
see that argument documented?

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-25 21:22:59
Message-ID: 4D3EEAD30200002500039C7A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> I think just annotating RWConflict.in/outLink and
> PredTranList.available/activeList with the types of things they
> hold would be a help.
>
> Also, you say something about RWConflict and "when the structure
> is not in use". Can you elaborate on that a bit?

Please let me know whether this works for you:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=325ec55e8c9e5179e2e16ff303af6afc1d6e732b

-Kevin


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 07:20:39
Message-ID: 4D3FCB47.6060805@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 25.01.2011 22:53, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 25.01.2011 05:30, Kevin Grittner wrote:
>
>> The readme says this:
>>> 4. PostgreSQL supports subtransactions -- an issue not mentioned
>>> in the papers.
>>
>> But I don't see any mention anywhere else on how subtransactions
>> are handled. If a subtransaction aborts, are its predicate locks
>> immediately released?
>
> No. Here's the reasoning. Within a top level transaction, you
> might start a subtransaction, read some data, and then decide based
> on what you read that the subtransaction should be rolled back. If
> the decision as to what is part of the top level transaction can
> depend on what is read in the subtransaction, predicate locks taken
> by the subtransaction must survive rollback of the subtransaction.
>
> Does that make sense to you?

Yes, that's what I suspected. And I gather that all the data structures
in predicate.c work with top-level xids, not subxids. When looking at an
xid that comes from a tuple's xmin or xmax, for example, you always call
SubTransGetTopmostTransaction() before doing much else with it.

> Is there somewhere you would like to
> see that argument documented?

README-SSI .

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 11:36:15
Message-ID: 1296041775.24767.986.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-01-24 at 21:30 -0600, Kevin Grittner wrote:

> Dan and I have spent many, many hours desk-check and testing,
> including pounding for many hours in DBT-2 at high concurrency
> factors on a 16 processor box. In doing so, we found and fixed a few
> issues. Neither of us is aware of any bugs or deficiencies
> remaining, even after a solid day of pounding in DBT-2, other than
> the failure to extend any new functionality to hot standby.

A couple of comments on this.

I looked at the patch to begin a review and immediately saw "dtest".
There's no docs to explain what it is, but a few comments fill me in a
little more. Can we document that please? And/or explain why its an
essential part of this commit? Could we keep it out of core, or if not,
just commit that part separately? I notice the code is currently
copyright someone else than PGDG.

Pounding for hours on 16 CPU box sounds good. What diagnostics or
instrumentation are included with the patch? How will we know whether
pounding for hours is actually touching all relevant parts of code? I've
done such things myself only to later realise I wasn't actually testing
the right piece of code.

When this runs in production, how will we know whether SSI is stuck or
is consuming too much memory? e.g. Is there a dynamic view e.g.
pg_prepared_xacts, or is there a log mode log_ssi_impact, etc??

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 13:01:22
Message-ID: 1296046882.24767.1166.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-01-26 at 11:36 +0000, Simon Riggs wrote:

> Pounding for hours on 16 CPU box sounds good. What diagnostics or
> instrumentation are included with the patch? How will we know whether
> pounding for hours is actually touching all relevant parts of code? I've
> done such things myself only to later realise I wasn't actually testing
> the right piece of code.

An example of this is the XIDCACHE_DEBUG code used in procarray.c to
validate TransactionIdIsInProgress().

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 15:22:12
Message-ID: 4D3FE7C40200002500039D03@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
> On 25.01.2011 22:53, Kevin Grittner wrote:
>> Is there somewhere you would like to
>> see that argument documented?
>
> README-SSI .

Done (borrowing some of your language).

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=470abc51c5626cf3c7cbd734b1944342973d0d47

Let me know if you think more is needed.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 16:01:28
Message-ID: 4D3FF0F80200002500039D18@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
> On Wed, 2011-01-26 at 11:36 +0000, Simon Riggs wrote:
>
>> Pounding for hours on 16 CPU box sounds good. What diagnostics or
>> instrumentation are included with the patch? How will we know
>> whether pounding for hours is actually touching all relevant
>> parts of code? I've done such things myself only to later realise
>> I wasn't actually testing the right piece of code.
>
> An example of this is the XIDCACHE_DEBUG code used in procarray.c
> to validate TransactionIdIsInProgress().

It isn't exactly equivalent, but on a conceptually similar note some
of the hours of DBT-2 pounding were done with #ifdef statements to
force code into code paths which are normally rarely used. We left
one of them in the codebase with the #define commented out, although
I know that's not strictly necessary. (It does provide a convenient
place to put a comment about what it's for, though.)

In looking at it just now, I noticed that after trying it in a
couple different places what was left in the repository was not the
optimal version for code coverage. I've put this back to the
version which did a better job, for reasons described in the commit
comment:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=8af1bc84318923ba0ec3d4413f374a3beb10bc70

Dan, did you have some others which should maybe be included?

I'm not sure I see any counts we could get from SSI which would be
useful beyond what we might get from a code coverage tool or
profiling, but I'm open to suggestions.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, "Markus Wanner" <markus(at)bluegap(dot)ch>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 17:07:18
Message-ID: 4D4000660200002500039D32@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> I looked at the patch to begin a review and immediately saw
> "dtest". There's no docs to explain what it is, but a few comments
> fill me in a little more. Can we document that please? And/or
> explain why its an essential part of this commit? Could we keep it
> out of core, or if not, just commit that part separately? I notice
> the code is currently copyright someone else than PGDG.

I'm including Markus on this reply, because he's the only one who
can address the copyright issue.

I will say that while the dcheck make target is not required for a
proper build, and the tests run for too long to consider including
this in the check target, I would not recommend that anybody hack on
SSI without regularly running these tests or some equivalent..

When I suggested breaking this out of the patch, everybody who spoke
up said not to do so. How the eventual committer commits it is of
course up to that person.

> Pounding for hours on 16 CPU box sounds good. What diagnostics or
> instrumentation are included with the patch? How will we know
> whether pounding for hours is actually touching all relevant parts
> of code? I've done such things myself only to later realise I
> wasn't actually testing the right piece of code.

We've looked at distributions of failed transactions by ereport
statement. This confirms that we are indeed exercising the vast
majority of the code. See separate post for how we pushed execution
into the summarization path to test code related to that.

I do have some concern that the 2PC testing hasn't yet covered all
code paths. I don't see how the problem found by Jeff during review
could have survived thorough testing there.

> When this runs in production, how will we know whether SSI is
> stuck

Stuck? I'm not sure what you mean by that. Other than LW locking
(which I believe is always appropriately brief, with rules for order
of acquisition which should prevent deadlocks), the only blocking
introduced by SSI is when there is an explicit request for
DEFERRABLE READ ONLY transactions. Such a transaction may take a
while to start. Is that what you're talking about?

> or is consuming too much memory?

Relevant shared memory is allocated at startup, with strategies for
living within that as suggested by Heikki and summarized in a recent
post by Jeff. It's theoretically possible to run out of certain
objects, in which case there is an ereport, but we haven't seen
anything like that since the mitigation and graceful degradation
changes were implemented.

> e.g. Is there a dynamic view e.g. pg_prepared_xacts,

We show predicate locks in the pg_locks view with mode SIReadLock.

> is there a log mode log_ssi_impact, etc??

Don't have that. What would you expect that to show?

-Kevin


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 18:52:27
Message-ID: 4D406D6B.10309@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kevin,

thanks for your heads-up.

On 01/26/2011 06:07 PM, Kevin Grittner wrote:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:
>
>> I looked at the patch to begin a review and immediately saw
>> "dtest". There's no docs to explain what it is, but a few comments
>> fill me in a little more. Can we document that please? And/or
>> explain why its an essential part of this commit? Could we keep it
>> out of core, or if not, just commit that part separately? I notice
>> the code is currently copyright someone else than PGDG.
>
> I'm including Markus on this reply, because he's the only one who
> can address the copyright issue.

I certainly agree to change the copyright notice for my parts of the
code in Kevin's SSI to say "Copyright ... Postgres Global Development
Group".

However, it's also worth mentioning that 'make dcheck' requires my
dtester python package to be installed. See [1]. I put that under the
Boost license, which seems very similar to the Postgres license.

> When I suggested breaking this out of the patch, everybody who spoke
> up said not to do so. How the eventual committer commits it is of
> course up to that person.

If the committer decides not to commit the dtests, I'm happy to add
these dtests to the "official" postgres-dtests repository [2]. There I
could let it follow the development of dtester.

If Kevin's dtests gets committed, either dtester needs to be backwards
compatible or the Postgres sources need to follow development of
dtester, which I'm not satisfied with, yet. (However, note that I
didn't have any time to work on dtester, recently, so that is somewhat
hypothetical anyway).

If you have any more questions, I'm happy to help.

Regards

Markus Wanner

[1]: dtester project site:
http://www.bluegap.ch/projects/dtester/

[2]: postgres dtests:
http://git.postgres-r.org/?p=postgres-dtests;a=summary


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Markus Wanner <markus(at)bluegap(dot)ch>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 19:08:35
Message-ID: 1296068915.24767.1500.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, 2011-01-26 at 11:07 -0600, Kevin Grittner wrote:

> > When this runs in production, how will we know whether SSI is
> > stuck
>
> Stuck? I'm not sure what you mean by that. Other than LW locking
> (which I believe is always appropriately brief, with rules for order
> of acquisition which should prevent deadlocks), the only blocking
> introduced by SSI is when there is an explicit request for
> DEFERRABLE READ ONLY transactions. Such a transaction may take a
> while to start. Is that what you're talking about?

I'm thinking of a general requirement for diagnostics.

What has been done so far looks great to me, so talking about this
subject is in no way meant to be negative. Everything has bugs and we
find them quicker if there are some ways of getting out more information
about what is happening in the guts of the system.

For HS, I put in a special debug mode and various information functions.
For HOT, I produced a set of page inspection functions.

I'm keen to have some ways where we can find out various metrics about
what is happening, so we can report back to you to check if there are
bugs. I foresee that some applications will be more likely to generate
serialization errors than others. People will ask questions and they may
claim there are bugs. I would like to be able to say "there is no bug -
look at XYZ and see that the errors you are getting are because of ABC".

> > or is consuming too much memory?
>
> Relevant shared memory is allocated at startup, with strategies for
> living within that as suggested by Heikki and summarized in a recent
> post by Jeff. It's theoretically possible to run out of certain
> objects, in which case there is an ereport, but we haven't seen
> anything like that since the mitigation and graceful degradation
> changes were implemented.
>
> > e.g. Is there a dynamic view e.g. pg_prepared_xacts,
>
> We show predicate locks in the pg_locks view with mode SIReadLock.

OK, that's good.

> > is there a log mode log_ssi_impact, etc??
>
> Don't have that. What would you expect that to show?
>
> -Kevin

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Kevin Grittner <kevin(dot)grittner(at)wicourts(dot)gov>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 19:25:27
Message-ID: 1296061935-sup-3470@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Excerpts from Kevin Grittner's message of mié ene 26 14:07:18 -0300 2011:
> Simon Riggs <simon(at)2ndQuadrant(dot)com> wrote:

> > Pounding for hours on 16 CPU box sounds good. What diagnostics or
> > instrumentation are included with the patch? How will we know
> > whether pounding for hours is actually touching all relevant parts
> > of code? I've done such things myself only to later realise I
> > wasn't actually testing the right piece of code.
>
> We've looked at distributions of failed transactions by ereport
> statement. This confirms that we are indeed exercising the vast
> majority of the code. See separate post for how we pushed execution
> into the summarization path to test code related to that.

BTW did you try "make coverage" and friends? See
http://www.postgresql.org/docs/9.0/static/regress-coverage.html
and
http://developer.postgresql.org/~petere/coverage/

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Markus Wanner" <markus(at)bluegap(dot)ch>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 19:42:23
Message-ID: 4D4024BF0200002500039D66@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> BTW did you try "make coverage" and friends? See
> http://www.postgresql.org/docs/9.0/static/regress-coverage.html
> and
> http://developer.postgresql.org/~petere/coverage/

I had missed that; thanks for pointing it out!

I'm doing a coverage build now, to see what coverage we get from
`make check` (probably not much) and `make dcheck`.

Dan, do you still have access to that machine you were using for the
DBT-2 runs? Could we get a coverage run with and without
TEST_OLDSERXID defined?

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Markus Wanner" <markus(at)bluegap(dot)ch>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:10:23
Message-ID: 4D402B4F0200002500039D76@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
>> BTW did you try "make coverage" and friends? See
>> http://www.postgresql.org/docs/9.0/static/regress-coverage.html
>> and
>> http://developer.postgresql.org/~petere/coverage/
>
> I had missed that; thanks for pointing it out!
>
> I'm doing a coverage build now, to see what coverage we get from
> `make check` (probably not much) and `make dcheck`.

Well, that was a bit better than I expected. While the overall code
coverage for PostgreSQL source code is:

Overall coverage rate:
lines......: 64.8% (130296 of 201140 lines)
functions..: 72.0% (7997 of 11109 functions)

The coverage for predicate.c, after running both check and dcheck,
was (formatted to match above):

lines......: 69.8% (925 of 1325 lines)
functions..: 85.7% (48 of 56 functions)

Most of what was missed was in the SLRU or 2PC code, which is
expected. I'll bet that the DBT-2 runs, between the "normal"
and TEST_OLDSERXID flavors, would get us about 2/3 of the way from
those numbers toward 100%, with almost all the residual being in
2PC.

Does anyone have suggestions for automated 2PC tests?

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:16:23
Message-ID: 20110126201623.GA57071@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 10:01:28AM -0600, Kevin Grittner wrote:
> In looking at it just now, I noticed that after trying it in a
> couple different places what was left in the repository was not the
> optimal version for code coverage. I've put this back to the
> version which did a better job, for reasons described in the commit
> comment:

Isn't this placement the same as the version we had before that didn't
work?

Specifically, aren't we going to have problems running with
TEST_OLDSERXID enabled because CreatePredTran succeeds and links a new
SerializableXact into the active list, but we don't initialize it
before we drop SerializableXactHashLock to call
SummarizeOldestCommittedSxact?

I seem to recall SummarizeOldestCommittedSxact failing before because
of the uninitialized entry, but more generally since we drop the lock
something else might scan the list.

(This isn't a problem in the non-test case because we'd only do that if
CreatePredTran fails.)

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Markus Wanner <markus(at)bluegap(dot)ch>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:20:09
Message-ID: 20110126202009.GB57071@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 01:42:23PM -0600, Kevin Grittner wrote:
> Dan, do you still have access to that machine you were using for the
> DBT-2 runs? Could we get a coverage run with and without
> TEST_OLDSERXID defined?

Sure, I'll give it a shot (once I figure out how to enable coverage...)

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:36:25
Message-ID: 4D4031690200002500039D82@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> Isn't this placement the same as the version we had before that
> didn't work?

You're right. How about this?:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=86b839291e2588e59875fb87d05432f8049575f6

Same benefit in terms of exercising more lines of code, but
*without* exposing the uninitialized structure to other threads.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: "Simon Riggs" <simon(at)2ndQuadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:43:26
Message-ID: 4D40330E0200002500039D89@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:

> You're right. How about this?:

That's even worse. I'm putting back to where you had it and taking
a break before I do anything else that dumb.

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-26 20:46:22
Message-ID: 20110126204622.GD57071@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jan 26, 2011 at 02:36:25PM -0600, Kevin Grittner wrote:
> Same benefit in terms of exercising more lines of code, but
> *without* exposing the uninitialized structure to other threads.

Won't this cause a deadlock because locks are being acquired out of
order?

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Markus Wanner" <markus(at)bluegap(dot)ch>, "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-27 15:44:32
Message-ID: 4D413E800200002500039E8B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:

> While the overall code coverage for PostgreSQL source code is:
>
> Overall coverage rate:
> lines......: 64.8% (130296 of 201140 lines)
> functions..: 72.0% (7997 of 11109 functions)

By the way, I discovered that the above is lower if you just run the
check target. The dcheck target helps with overall PostgreSQL code
coverage, even though it was targeted at the SSI code.

> The coverage for predicate.c, after running both check and dcheck,
> was (formatted to match above):
>
> lines......: 69.8% (925 of 1325 lines)
> functions..: 85.7% (48 of 56 functions)

Some minor tweaks to the regression tests boosts that to:

lines......: 73.1% (968 of 1325 lines)
functions..: 89.3% (50 of 56 functions)

Most of the code not covered in the regular build (above) is in the
OldSerXidXxxxx functions, which are covered well in a build with
TEST_OLDSERXID defined. The 2PC code is very well covered now,
except for the recovery-time function. We don't have a way to test
that in the `make check` process, do we?

There is some code which is not covered just because it is so hard
to hit -- for example, code which is only executed if vacuum cleans
up an index page when we are right at the edge of running out of the
memory used to track predicate locks. It would be hard to include a
test for that in the normal regression tests.

The regression test changes are here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=d4c1005d731c81049cc2622e50b7a2ebb99bbcac

-Kevin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-27 17:18:23
Message-ID: 1296148703.11513.451.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-01-25 at 05:57 -0500, Dan Ports wrote:
> This summary is right on. I would add one additional detail or
> clarification to the last point, which is that rather than checking for
> a cycle, we're checking for a transaction with both "in" and "out"
> conflicts, which every cycle must contain.

To clarify, this means that it will get some false positives, right?

For instance:

T1:
get snapshot

T2:
get snapshot
insert R1
commit

T1:
read R1
write R2

T3:
get snapshot
read R2

T3:
commit

T1:
commit -- throws error

T1 has a conflict out to T2, and T1 has a conflict in from T3.
T2 has a conflict in from T1.
T3 has a conflict out to T1.

T1 is canceled because it has both a conflict in and a conflict out. But
the results are the same as a serial order of execution: T3, T1, T2.

Is there a reason we can't check for a real cycle, which would let T1
succeed?

Regards,
Jeff Davis


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,"Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-27 18:22:45
Message-ID: 4D4163950200002500039EBE@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> To clarify, this means that it will get some false positives,
> right?

Yes. But the example you're about to get into isn't one of them.

> For instance:
>
> T1:
> get snapshot
>
> T2:
> get snapshot
> insert R1
> commit
>
> T1:
> read R1
> write R2
>
> T3:
> get snapshot
> read R2
>
> T3:
> commit
>
> T1:
> commit -- throws error
>
>
> T1 has a conflict out to T2, and T1 has a conflict in from T3.
> T2 has a conflict in from T1.
> T3 has a conflict out to T1.
>
> T1 is canceled because it has both a conflict in and a conflict
> out. But the results are the same as a serial order of execution:
> T3, T1, T2.
>
> Is there a reason we can't check for a real cycle, which would let
> T1 succeed?

Yes. Because T2 committed before T3 started, it's entirely possible
that there is knowledge outside the database server that the work of
T2 was done and committed before the start of T3, which makes the
order of execution: T2, T3, T1, T2. So you can have anomalies. Let
me give you a practical example.

Pretend there are receipting terminals in public places for the
organization. In most financial systems, those receipts are
assigned to batches of some type. Let's say that's done by an
insert for the new batch ID, which closes the old batch. Receipts
are always added with the maximum batch ID, reflecting the open
batch.

Your above example could be:

-- setup
test=# create table ctl (batch_id int not null primary key);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index
"ctl_pkey" for table "ctl"
CREATE TABLE
test=# create table receipt (batch_id int not null, amt
numeric(13,2) not null);
CREATE TABLE
test=# insert into ctl values (1),(2),(3);
INSERT 0 3
test=# insert into receipt values ((select max(batch_id) from ctl),
50),((select max(batch_id) from ctl), 100);
INSERT 0 2

-- receipting workstation
-- T1 starts working on receipt insert transaction
test=# begin transaction isolation level repeatable read;
BEGIN
test=# select 1; -- to grab snapshot, per above
?column?
----------
1
(1 row)

-- accounting workstation
-- T2 closes old receipt batch; opens new
test=# begin transaction isolation level repeatable read;
BEGIN
test=# insert into ctl values (4);
INSERT 0 1
test=# commit;
COMMIT

-- receipting workstation
-- T1 continues work on receipt
test=# select max(batch_id) from ctl;
max
-----
3
(1 row)

test=# insert into receipt values (3, 1000);
INSERT 0 1

-- accounting workstation
-- T3 lists receipts from the closed batch
-- (Hey, we inserted a new batch_id and successfully
-- committed, right? The old batch is closed.)
test=# begin transaction isolation level repeatable read;
BEGIN
test=# select * from receipt where batch_id = 3;
batch_id | amt
----------+--------
3 | 50.00
3 | 100.00
(2 rows)

test=# commit;
COMMIT

-- receipting workstation
-- T1 receipt insert transaction commits
test=# commit;
COMMIT

Now, with serializable transactions, as you saw, T1 will be rolled
back. With a decent software framework, it will be automatically
retried, without any user interaction. It will select max(batch_id)
of 4 this time, and the insert will succeed and be committed.
Accounting's list is accurate.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,"Jeff Davis" <pgsql(at)j-davis(dot)com>, "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI patch version 14
Date: 2011-01-27 18:39:35
Message-ID: 4D4167870200002500039EC4@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> Now, with serializable transactions, as you saw, T1 will be rolled
> back.

I should probably have mentioned, that if all the transactions were
SERIALIZABLE and the report of transactions from the batch was run
as SERIALIZABLE READ ONLY DEFERRABLE, the start of the report would
block until it was certain that it had a snapshot which could not
lead to an anomaly, so the BEGIN for T3 would wait until the COMMIT
of T1, get a new snapshot which it would determine to be safe, and
proceed. This would allow that last receipt to land in batch 3 and
show up on accounting's receipt list with no rollbacks *or*
anomalies.

-Kevin


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-28 16:54:34
Message-ID: 1296233674.11513.475.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2011-01-25 at 15:22 -0600, Kevin Grittner wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
>
> > I think just annotating RWConflict.in/outLink and
> > PredTranList.available/activeList with the types of things they
> > hold would be a help.
> >
> > Also, you say something about RWConflict and "when the structure
> > is not in use". Can you elaborate on that a bit?
>
> Please let me know whether this works for you:
>
> http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=325ec55e8c9e5179e2e16ff303af6afc1d6e732b

Looks good.

Jeff


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-29 04:36:40
Message-ID: 20110129043640.GN57071@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 27, 2011 at 09:18:23AM -0800, Jeff Davis wrote:
> On Tue, 2011-01-25 at 05:57 -0500, Dan Ports wrote:
> > This summary is right on. I would add one additional detail or
> > clarification to the last point, which is that rather than checking for
> > a cycle, we're checking for a transaction with both "in" and "out"
> > conflicts, which every cycle must contain.
>
> To clarify, this means that it will get some false positives, right?

Yes, this is correct.

> Is there a reason we can't check for a real cycle, which would let T1
> succeed?

I talked today with someone who experimented with doing exactly that in
MySQL as part of a research project (Perfectly Serializable Snapshot
Isolation, paper forthcoming in ICDE)

It confirmed my intuition that this is possible but not as
straightforward as it sounds. Some complications I thought of in
adapting that to what we're doing:

1) it requires tracking all edges in the serialization graph; besides
the rw-conflicts we track there are also the more mundane
rw-dependencies (if T2 sees an update made by T1, then T2 has to
come after T1) and ww-dependencies (if T2's write modifies a tuple
created by T1, then T2 has to come after T1).

We are taking advantage of the fact that any cycle must have two
adjacent rw-conflict edges, but the rest of the cycle can be
composed of other types. It would certainly be possible to track the
others, but it would add a bit more complexity and use more memory.

2) it requires doing a depth-first search to check for a cycle, which
is more expensive than just detecting two edges. That seems OK if
you only want to check it on commit, but maybe not if you want to
detect serialization failures at the time of the conflicting
read/write (as we do).

3) it doesn't seem to interact very well with our memory mitigation
efforts, wherein we discard lots of data about committed
transactions that we know we won't need. If we were checking for
cycles, we'd need to keep more of it. I suspect summarization (when
we combine two transactions' predicate locks to save memory) would
also cause problems.

None of these seem particularly insurmountable, but they suggest it
isn't a clear win to try to find an entire cycle.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-01-31 03:33:37
Message-ID: 1296444817.11513.717.camel@jdavis
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


1. In CheckForSerializableConflictIn(), I think the comment above may be
out of date. It says:

"A tuple insert is in conflict only if there is a predicate lock against
the entire relation."

That doesn't appear to be true if, for example, there's a predicate lock
on the index page that the tuple goes into. I examined it with gdb, and
it calls the function, and the function does identify the conflict.

2. Also in the comment above CheckForSerializableConflictIn(), I see:

"The call to this function also indicates that we need an entry in the
serializable transaction hash table, so that this write's conflicts can
be detected for the proper lifetime, which is until this transaction and
all overlapping serializable transactions have completed."

which doesn't make sense to me. The transaction should already have an
entry in the hash table at this point, right?

3. The comment above CheckForSerializableConflictOut() seems to trail
off, as though you may have meant to write more. It also seems to be out
of date.

And why are you reading the infomask directly? Do the existing
visibility functions not suffice?

I have made it through predicate.c, and I have a much better
understanding of what it's actually doing. I can't claim that I have a
clear understanding of everything involved, but the code looks like it's
in good shape (given the complexity involved) and well-commented.

I am marking the patch Ready For Committer, because any committer will
need time to go through the patch; and the authors have clearly applied
the thought, care, and testing required for something of this
complexity. I will continue to work on it, though.

The biggest issue on my mind is what to do about Hot Standby. The
authors have a plan, but there is also some resistance to it:

http://archives.postgresql.org/message-id/23698.1295566621@sss.pgh.pa.us

We don't need a perfect solution for 9.1, but it would be nice if we had
a viable plan for 9.2.

Regards,
Jeff Davis


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Jeff Davis" <pgsql(at)j-davis(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject: SSI patch version 15
Date: 2011-01-31 17:31:54
Message-ID: 4D469DAA020000250003A02A@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> wrote:

> 1. [& 2.] In CheckForSerializableConflictIn()

All of that was obsolete and could just be deleted. I did.

> 3. The comment above CheckForSerializableConflictOut()

I reworked the comment there. Hopefully it is now more clear.

> I am marking the patch Ready For Committer

Patch v15 is attached with fixes for all issues identified in v14.
There was one (two-line) bug fix. No other logic changes. We had
an addition to the README-SSI file, comment changes, doc changes,
changes to the text of a few messages, and some structure/field
renames to avoid using Tran as an abbreviation for transaction in
addition to the use of Xact as an abbreviation.

Pretty minimal differences from V14, but I figured it would save the
committer some work if I rolled them all up here.

-Kevin

Attachment Content-Type Size
ssi-15.patch.gz application/octet-stream 86.6 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 15
Date: 2011-01-31 18:05:28
Message-ID: AANLkTinYF62Z4fvTUZO86oenALqUpHOajyg23JeaRwtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 31, 2011 at 12:31 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Pretty minimal differences from V14, but I figured it would save the
> committer some work if I rolled them all up here.

Sounds good. I believe Heikki is planning to work on this one.
Hopefully that will happen soon, since we are running short on time.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 15
Date: 2011-01-31 21:35:53
Message-ID: 4D472B39.50104@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.01.2011 20:05, Robert Haas wrote:
> On Mon, Jan 31, 2011 at 12:31 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>> Pretty minimal differences from V14, but I figured it would save the
>> committer some work if I rolled them all up here.
>
> Sounds good. I believe Heikki is planning to work on this one.
> Hopefully that will happen soon, since we are running short on time.

Yeah, I can commit this. Jeff, are you satisfied with this patch now?
I'm glad you're reviewing this, more eyeballs helps a lot with a big
patch like this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 15
Date: 2011-01-31 22:46:48
Message-ID: 1296514008.12141.59.camel@jdavis-ux.asterdata.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2011-01-31 at 23:35 +0200, Heikki Linnakangas wrote:
> Yeah, I can commit this. Jeff, are you satisfied with this patch now?
> I'm glad you're reviewing this, more eyeballs helps a lot with a big
> patch like this.

I think the patch is very close. I am doing my best in my free time to
complete a thorough review. If you have other patches to review/commit
then I will still be making progress reviewing SSI.

However, I would recommend leaving yourself some time to think on this
one if you don't already understand the design well.

Regards,
Jeff Davis


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org, Markus Schiltknecht <markus(at)bluegap(dot)ch>
Subject: Re: SSI patch version 14
Date: 2011-02-02 10:20:53
Message-ID: 4D493005.2090706@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 26.01.2011 13:36, Simon Riggs wrote:
> I looked at the patch to begin a review and immediately saw "dtest".
> There's no docs to explain what it is, but a few comments fill me in a
> little more. Can we document that please? And/or explain why its an
> essential part of this commit? Could we keep it out of core, or if not,
> just commit that part separately? I notice the code is currently
> copyright someone else than PGDG.

We still haven't really resolved this..

Looking at the dtest suite, I'd really like to have those tests in the
PostgreSQL repository. However, I'd really like not to require python to
run them, and even less dtester (no offense Markus) and the dependencies
it has. I couldn't get it to run on my basic Debian system, clearly I'm
doing something wrong, but it will be even harder for people on more
exotic platforms.

The tests don't seem very complicated, and they don't seem to depend
much on the dtester features. How hard would it be to rewrite the test
engine in C or perl?

I'm thinking of defining each test in a simple text file, and write a
small test engine to parse that.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-02-02 12:48:54
Message-ID: 4D4952B6.2070505@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

On 02/02/2011 11:20 AM, Heikki Linnakangas wrote:
> I couldn't get it to run on my basic Debian system, clearly I'm
> doing something wrong, but it will be even harder for people on more
> exotic platforms.

On Debian you only need 'python-twisted' and the dtester sources [1],
IIRC. What issue did you run into?

> The tests don't seem very complicated, and they don't seem to depend
> much on the dtester features. How hard would it be to rewrite the test
> engine in C or perl?

I've taken a quick look at a perl framework similar to twisted (POE).
Doesn't seem too complicated, but I don't see the benefit of requiring
one over the other. Another async, event-based framework that I've been
playing with is asio (C++).

I don't think it's feasible to write anything similar without basing on
such a framework. (And twisted seems more mature and widespread (in
terms of supported platforms) than POE, but that's probably just my
subjective impression).

However, the question here probably is whether or not you need all of
the bells and whistles of dtester (as if there were that many) [2].

> I'm thinking of defining each test in a simple text file, and write a
> small test engine to parse that.

AFAIUI there are lots of possible permutations, so you don't want to
have to edit files for each manually (several hundreds, IIRC). So using
a scripting language to generate the permutations makes sense, IMO.

Pretty much everything else that Kevin currently uses dtester for (i.e.
initdb, running the postmaster, connecting with psql) is covered by the
existing regression testing infrastructure already, I think. So it
might be a matter of integrating the permutation generation and test
running code into the existing infrastructure. Kevin?

Regards

Markus Wanner

[1]: dtester project site:
http://www.bluegap.ch/projects/dtester/

[2]: side note:
Dtester mainly serves me to test Postgres-R, where the current
regression testing infrastructure simply doesn't suffice. With dtester
I tried to better structure the processes and services, their
dependencies and the test environment.


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org, Markus Schiltknecht <markus(at)bluegap(dot)ch>
Subject: Re: SSI patch version 14
Date: 2011-02-02 12:56:55
Message-ID: 4D495497.8050704@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.02.2011 12:20, Heikki Linnakangas wrote:
> On 26.01.2011 13:36, Simon Riggs wrote:
>> I looked at the patch to begin a review and immediately saw "dtest".
>> There's no docs to explain what it is, but a few comments fill me in a
>> little more. Can we document that please? And/or explain why its an
>> essential part of this commit? Could we keep it out of core, or if not,
>> just commit that part separately? I notice the code is currently
>> copyright someone else than PGDG.
>
> We still haven't really resolved this..
>
> Looking at the dtest suite, I'd really like to have those tests in the
> PostgreSQL repository. However, I'd really like not to require python to
> run them, and even less dtester (no offense Markus) and the dependencies
> it has. I couldn't get it to run on my basic Debian system, clearly I'm
> doing something wrong, but it will be even harder for people on more
> exotic platforms.
>
> The tests don't seem very complicated, and they don't seem to depend
> much on the dtester features. How hard would it be to rewrite the test
> engine in C or perl?
>
> I'm thinking of defining each test in a simple text file, and write a
> small test engine to parse that.

I took a stab at doing that. Attached, untar in the top source tree, and
do "cd src/test/isolation; make check". It's like "installcheck", so it
needs a postgres server to be running. Also available in my git
repository at git://git.postgresql.org/git/users/heikki/postgres.git,
branch "serializable".

I think this will work. This is a rough first version, but it already
works. We can extend the test framework later if we need more features,
but I believe this is enough for all the existing tests.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
isolationtester.tar.gz application/x-gzip 5.7 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Markus Wanner <markus(at)bluegap(dot)ch>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-02-02 13:04:28
Message-ID: 4D49565C.4080408@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 02.02.2011 14:48, Markus Wanner wrote:
> Heikki,
>
> On 02/02/2011 11:20 AM, Heikki Linnakangas wrote:
>> I couldn't get it to run on my basic Debian system, clearly I'm
>> doing something wrong, but it will be even harder for people on more
>> exotic platforms.
>
> On Debian you only need 'python-twisted' and the dtester sources [1],
> IIRC. What issue did you run into?

I installed dtester with:

PYTHONPATH=~/pythonpath/lib/python2.6/site-packages/ python ./setup.py
install --prefix=/home/heikki/pythonpath/

And that worked. But when I try to run Kevin's test suite, I get this:

~/git-sandbox/postgresql/src/test/regress (serializable)$
PYTHONPATH=~/pythonpath/lib/python2.6/site-packages/ make dcheck
./pg_dtester.py --temp-install --top-builddir=../../.. \
--multibyte=
Postgres dtester suite Copyright (c) 2004-2010, by Markus
Wanner

Traceback (most recent call last):
File "./pg_dtester.py", line 1376, in <module>
main(sys.argv[1:])
File "./pg_dtester.py", line 1370, in main
runner = Runner(reporter=TapReporter(sys.stdout, sys.stderr,
showTimingInfo=True),
TypeError: __init__() got an unexpected keyword argument 'showTimingInfo'
make: *** [dcheck] Virhe 1

At that point I had no idea what's wrong.

PS. I tried "python ./setup.py dtest", mentioned in the README, and it
just said "invalid command 'dtest'". But I presume the installation
worked anyway.

> Pretty much everything else that Kevin currently uses dtester for (i.e.
> initdb, running the postmaster, connecting with psql) is covered by the
> existing regression testing infrastructure already, I think. So it
> might be a matter of integrating the permutation generation and test
> running code into the existing infrastructure. Kevin?

Yeah, that was my impression too. (see my other post with WIP
implementation of that)

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Markus Wanner <markus(at)bluegap(dot)ch>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Simon Riggs <simon(at)2ndQuadrant(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI patch version 14
Date: 2011-02-02 13:22:24
Message-ID: 4D495A90.6090605@bluegap.ch
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki,

On 02/02/2011 02:04 PM, Heikki Linnakangas wrote:
> TypeError: __init__() got an unexpected keyword argument 'showTimingInfo'
> make: *** [dcheck] Virhe 1
>
> At that point I had no idea what's wrong.

Hm.. looks like Kevin already uses the latest dtester code from git.
You can either do that as well, or try to drop the 'showTimingInfo'
argument from the call to Runner() in pg_dtester.py:1370.

> PS. I tried "python ./setup.py dtest", mentioned in the README, and it
> just said "invalid command 'dtest'". But I presume the installation
> worked anyway.

Yeah, that's a gotcha in dtester. setup.py dtest requires dtester to be
installed - a classic chicken and egg problem. I certainly need to look
into this.

> Yeah, that was my impression too. (see my other post with WIP
> implementation of that)

I'd like to get Kevin's opinion on that, but it certainly sounds like a
pragmatic approach for now.

Regards

Markus Wanner