Quick Links

SSI patch version 12

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	SSI patch version 12
Date:	2011-01-14 23:54:32
Message-ID:	4D308DD80200002500039639@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dan found a few low-frequency assertions when he let the patch cook
in DBT-2 for extended periods. For the most part they were
over-zealous assertions, although he made a couple minor adjustments
related to their causes in tracking the process local predicate lock
information used to make decisions about lock granularity promotion.
There is still the possibility of some drift in that data from the
"official" copy in shared memory due to index page splits and
combines from other processes, but that doesn't affect correctness,
just granularity promotion; so it doesn't seem like it's worth
creating LW lock contention to deal with it.

So, as of this moment the code has been through some significant
stress testing which should have shaken out most or all race
conditions. All known defects are fixed, except for a couple areas
on which we're doing some last-minute work. The portions which I
think should be considered mandatory which aren't complete are
prepared transactions, which Dan is currently working on, and
documentation work for the Concurrency Control chapter and a README
file, which I guess I'm going to be working on this weekend (the guy
who had volunteered apparently having fallen off the face of the
earth). Lack of support for prepared transactions will not affect
anyone who isn't trying to use them.

The index types other than btree don't have fine-grained support,
which I don't think is a fatal defect, but it would be nice to
implement. I may be able to get GiST working again this weekend in
addition to the documentation work. The others might not get done
for 9.1 unless someone who knows their way around the guts of those
AMs can give us some advice soon.

-Kevin

Attachment	Content-Type	Size
ssi-12.patch.gz	application/octet-stream	62.5 KB

From:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 12
Date:	2011-01-17 07:58:35
Message-ID:	4D33F6AB.9090107@thl.fi
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

While I haven't tried this patch, I tried to break the version 11 of the
patch (some of the work was against earlier versions). In total I have
used a full work day just trying to break things, but haven't been able
to find anything after version 8. I can verify that the partial index
issue is fixed, and the count(*) performance is a lot better now.

One thing I have been thinking about is how does predicate locking
indexes work when using functional indexes and functions marked as
immutable but which really aren't. I don't know how predicate locking
indexes works, so it might be that this is a non-issue. I haven't been
able to break anything in this way, and even if I could, this is
probably something that doesn't need anything else than a warning that
if you label your index functions immutable when they aren't,
serializable transactions might not work.

On 01/15/2011 01:54 AM, Kevin Grittner wrote:
> The index types other than btree don't have fine-grained support,
> which I don't think is a fatal defect, but it would be nice to
> implement. I may be able to get GiST working again this weekend in
> addition to the documentation work. The others might not get done
> for 9.1 unless someone who knows their way around the guts of those
> AMs can give us some advice soon
I wonder if there are people using GiST and GIN indexes and serializable
transactions. When upgrading to 9.1 and if there is no backwards
compatibility GUC this could be a problem... The amount of users in this
category is probably very low anyways, so maybe this is not an issue
worth worrying about.

- Anssi

From:	David Fetter <david(at)fetter(dot)org>
To:	Anssi Kääriäinen <anssi(dot)kaariainen(at)thl(dot)fi>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 12
Date:	2011-01-17 15:05:05
Message-ID:	20110117150505.GC19146@fetter.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 17, 2011 at 09:58:35AM +0200, Anssi Kääriäinen wrote:
> One thing I have been thinking about is how does predicate locking
> indexes work when using functional indexes and functions marked as
> immutable but which really aren't. I don't know how predicate
> locking indexes works, so it might be that this is a non-issue. I
> haven't been able to break anything in this way, and even if I
> could, this is probably something that doesn't need anything else
> than a warning that if you label your index functions immutable when
> they aren't, serializable transactions might not work.

When you tell the system an untruth about the state of the world, such
as alleging immutability when it's not actually there, it will get its
revenge in ways more drastic than this.

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 12
Date:	2011-01-17 19:58:44
Message-ID:	4D349F74.7050207@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 15.01.2011 01:54, Kevin Grittner wrote:
> /*
> * for r/o transactions: list of concurrent r/w transactions that we could
> * potentially have conflicts with, and vice versa for r/w transactions
> */
> TransactionId topXid; /* top level xid for the transaction, if one
> * exists; else invalid */
> TransactionId finishedBefore; /* invalid means still running; else
> * the struct expires when no
> * serializable xids are before this. */
> TransactionId xmin; /* the transaction's snapshot xmin */
> uint32 flags; /* OR'd combination of values defined below */
> int pid; /* pid of associated process */
> } SERIALIZABLEXACT;

What does that comment about list of concurrent r/w transactions refer
to? I don't see any list there. Does it refer to
possibleUnsafeConflicts, which is above that comment?

Above SERIALIZABLEXID struct:
> * A hash table of these objects is maintained in shared memory to provide a
> * quick way to find the top level transaction information for a serializable
> * transaction, Because a serializable transaction can acquire a snapshot
> * and read information which requires a predicate lock before it has a
> * TransactionId, it must be keyed by VirtualTransactionId; this hashmap
> * allows a fast link from MVCC transaction IDs to the related serializable
> * transaction hash table entry.

I believe the comment is trying to say that there's some *other* hash
that is keyed by VirtualTransactionId, so we need this other one keyed
by TransactionId. It took me a while to understand that, it should be
rephrased.

Setting the high bit in OldSetXidAdd() seems a bit weird. How about just
using UINT64_MAX instead of 0 to mean no conflicts? Or 1, and start the
sequence counter from 2.

ReleasePredicateLocks() reads ShmemVariableCache->nextXid without
XidGenLock. Maybe it's safe, we assume that TransactionIds are atomic
elsewhere, but at least there needs to be a comment explaining it. But
it probably should use ReadNewTransactionId() instead.

Attached is a patch for some trivial changes, mostly typos.

Overall, this is very neat and well-commented code. All the data
structures make my head spin, but I don't see anything unnecessary or
have any suggestions for simplification. There's a few remaining TODO
comments in the code, which obviously need to be resolved one way or
another, but as soon as you're finished with any outstanding issues that
you know of, I think this is ready for commit.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment	Content-Type	Size
ssi-v12-typo-fixes.patch	text/x-diff	7.7 KB

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 12
Date:	2011-01-17 21:26:06
Message-ID:	20110117212606.GG87714@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Jan 17, 2011 at 09:58:44PM +0200, Heikki Linnakangas wrote:
> What does that comment about list of concurrent r/w transactions refer
> to? I don't see any list there. Does it refer to
> possibleUnsafeConflicts, which is above that comment?

Yes, that comment was supposed to be attached to
possibleUnsafeConflicts. It appears to have wandered down a couple
lines, maybe during some combination of git merges and pgindent runs.

> Above SERIALIZABLEXID struct:
> > * A hash table of these objects is maintained in shared memory to provide a
> > * quick way to find the top level transaction information for a serializable
> > * transaction, Because a serializable transaction can acquire a snapshot
> > * and read information which requires a predicate lock before it has a
> > * TransactionId, it must be keyed by VirtualTransactionId; this hashmap
> > * allows a fast link from MVCC transaction IDs to the related serializable
> > * transaction hash table entry.
>
> I believe the comment is trying to say that there's some *other* hash
> that is keyed by VirtualTransactionId, so we need this other one keyed
> by TransactionId. It took me a while to understand that, it should be
> rephrased.

Actually, I think that "other" hash no longer exists, it got replaced
with a list because we weren't actually using vxid -> sxact lookup. So
the comment appears to be both confusing and inaccurate and should be
removed entirely, other than to note somewhere that not every
SERIALIZABLEXACT will appear in SerializableXidHash because it might
not have a TransactionId.

The comment above SERIALIZABLEXACT also needs to be updated since it
refers to said hash table. And if I'm not mistaken (Kevin?), we can
eliminate SERIALIZABLEXACTTAG altogether and not bother putting the
vxid in the sxact.

While we're at it, it probably wouldn't hurt to rename
SerializableXactHashLock to PredTranLock or something, since there's no
SerializableXactHash anymore (although the lock is still being used
correctly)

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 12
Date:	2011-01-18 20:18:56
Message-ID:	4D35A150020000250003975A@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> There's a few remaining TODO comments in the code, which obviously
> need to be resolved one way or another

Not all of these are "must haves" for 9.1. Here's how they break
down:

The two in predicate_internals.h mark places which would need to be
touched if we further modify the btree AM to do index-tuple level
predicate locking. I know that Dan was eager to tackle that because
his group at MIT has predicate locking working at that granularity
for their transaction-aware caching which works with PostgreSQL. At
this point, though, I'm very skeptical about starting that effort
for 9.1; it seems like something for 9.2.

There's one TODO related to whether we can drop the volatile
modifier from MySerializableXact. I've started reviewing code
around this, but am not yet done. It wouldn't be terrible to leave
it there and worry about this for 9.2, but it will be even better if
we can clear it up one way or the other now.

Three are marking the spots we want to touch if a patch becomes
available which lets us cancel the transaction on one connection
from another. I'd love to take care of these, but we can't without
a separate patch that I know *I'm* not in a position to write for
9.1. We may need to leave these for 9.2, as well.

One is about the desirability of "taking some action" (probably
reporting a warning) in the SLRU summarization code if we've burned
through over a billion transaction IDs while one read write
transaction has remained active. That puts us close to where we'd
need to start canceling attempts to start a new transaction due to
resource issues. Seems like we should issue that warning for 9.1.
Not big or dangerous. I'll do it.

One is related to the heuristics for promoting the granularity of
predicate locks. We picked numbers out of the air which seemed
reasonable at the time. I'm sure we can make this more
sophisticated, but what we have seems to be working OK. I'm tempted
to suggest that we leave this alone until 9.2 unless someone has a
suggestion for a particular improvement.

One is related to the number of structures to allocate for detailed
conflict checking. We've never seen a problem with this limit, at
least that has reached me. On the other hand, it's certainly at
least theoretically possible to hit the limit and cause confusing
errors. Perhaps we can put this one on a GUC and make sure the
error message is clear with a hint to think about adjusting the GUC?
The GUC would be named something like
max_conflicts_per_serializable_transaction (or something to that
general effect). We should probably do that or bump up the limit.
If my back-of-the-envelope math is right, a carefully constructed
pessimal load could need up to (max_connections / 2)^2 -- so 100
connections could conceivably require 2500 structures, although such
a scenario would be hard to create. Current "picked from thin air"
numbers would have space for 500. The structure is six times the
pointer size, so 24 bytes each at 32 bit and 48 bytes each at 64
bit.

Three TODOs have popped up since Heikki's post because I pushed
Dan's 2PC WIP to the repo -- it's at a point where behavior should
be right for cases where the server keeps running. He's working on
the persistence aspect now, and there are TODOs in the new stubs
he's filling out. Definitely 9.1.

Then there's one still lurking outside the new predicate* files, in
heapam.c. It's about 475 lines into the heap_update function (25
lines from the bottom). In reviewing where we needed to acquire
predicate locks, this struck me as a place where we might need to
duplicate the predicate lock from one tuple to another, but I wasn't
entirely sure. I tried for a few hours one day to construct a
scenario which would show a serialization anomaly if I didn't do
this, and failed create one. If someone who understands both the
patch and heapam.c wants to look at this and offer an opinion, I'd
be most grateful. I'll take another look and see if I can get my
head around it better, but failing that, I'm disinclined to either
add locking I'm not sure we need or to remove the comment that says
we *might* need it there.

That's all of them.

-Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI patch version 12
Date:	2011-01-18 20:35:01
Message-ID:	AANLkTi=WFth2Z988MX9BZXOr140Zx1wnRxC_KbvawbOW@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Jan 18, 2011 at 3:18 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> That's all of them.

Our existing code has plenty of TODOs in it already, so I see no
problem with continuing to comment places where future enhancements
are possible, as long as they don't reflect deficiencies that are
crippling in the present.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Jeff Davis" <pgsql(at)j-davis(dot)com>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI patch version 12
Date:	2011-01-18 20:38:44
Message-ID:	4D35A5F40200002500039762@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> If my back-of-the-envelope math is right, a carefully constructed
> pessimal load could need up to (max_connections / 2)^2 -- so 100
> connections could conceivably require 2500 structures, although
> such a scenario would be hard to create. Current "picked from
> thin air" numbers would have space for 500.

Er, actually, we would have space for 5000, because it's five times
the number of SERIALIZABLEXACT structures which is ten times
max_connections. I guess that would explain why I've never seen a
report of a problem.

Still, someone who creates a very large number of connections and
pounds them heavily with SERIALIZABLE transactions might conceivably
run into it. Since that's something the docs explicitly warn you
*not* to do with serializable transactions, I'm not sure we need to
do more than make sure the error message and hint are good.
Thoughts?

-Kevin

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,"Jeff Davis" <pgsql(at)j-davis(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	SSI patch version 13
Date:	2011-01-20 20:36:19
Message-ID:	4D38486302000025000398FD@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> [questions about confusing (obsolete) comments]

> Setting the high bit in OldSetXidAdd() seems a bit weird. How
> about just using UINT64_MAX instead of 0 to mean no conflicts? Or
> 1, and start the sequence counter from 2.

> ReleasePredicateLocks() reads ShmemVariableCache->nextXid without
> XidGenLock. Maybe it's safe, we assume that TransactionIds are
> atomic elsewhere, but at least there needs to be a comment
> explaining it. But it probably should use ReadNewTransactionId()
> instead.

> Attached is a patch for some trivial changes, mostly typos.

> There's a few remaining TODO comments in the code, which obviously
> need to be resolved one way or another, but as soon as you're
> finished with any outstanding issues that you know of, I think
> this is ready for commit.

As of this moment, everything you explicitly mentioned is covered.
In addition, I've looked all over for lurking forgotten issues, and
found and dealt with a few small ones. Much to my chagrin, I found
one big one (interaction between this feature and hot standby) which
is being discussed on a separate thread; it's probably best not to
split the discussion between threads, so please don't reply
regarding that issue on this thread.

Some of the new code hasn't had very much testing beyond the make
targets: check, installcheck-world, and dcheck (new in this patch).
Dan did a few "crash the server between prepare and commit" tests,
but we need another round of stress-testing designed to beat up on
that part. I hand-tested some query sets to ensure that the "safe
retry" code is working as well as we can do without actively
interrupting the victim. (We simply flag the victim and it will
check the flag and kill itself the next time it hits certain
functions in SSI.)

The remaining "TODO SSI" comments are related to possible work for
future releases.

Other than needing a bit more testing, especially around 2PC and
"safe retry" the only outstanding issue is SSI+HS.

I'm attaching version 13 of the patch. I will be following up
shortly about the SSI+HS issue on the other thread. Clearly, that
will require at least one more version.

-Kevin

Attachment	Content-Type	Size
ssi-13.patch.gz	application/octet-stream	83.6 KB