Quick Links

Re: SSI bug?

Lists:	pgsql-hackers

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	<heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	<drkp(at)csail(dot)mit(dot)edu>,<yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-02-21 01:31:32
Message-ID:	4D616C14020000250003AD9A@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" wrote:

> I'm proceeding on this basis.

Result attached. I found myself passing around the tuple xmin value
just about everywhere that the predicate lock target tag was being
passed, so it finally dawned on me that this logically belonged as
part of the target tag. That simplified the changes, and the net
result of following Heikki's suggestion here is the reduction of
total lines of code by 178 while adding coverage for missed corner
cases and fixing bugs.

Thanks again, Heikki!

I will test this some more tomorrow. So far I haven't done more than
ensure it passes the standard regression tests and the isolation
tests added for SSI. The latter took awhile because the hash_any
function was including uninitialized bytes past the length of the tag
in its calculations. We should probably either fix that or document
it. I had to add another couple bytes to the tag to get it to a four
byte boundary to fix it. Easy once you know that's how it works...

The attached patch can also be viewed here:

http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=46fd5ea6728b566c521ec83048bc00a207289dd9

If this stands up to further testing, the only issue I know of for
the 9.1 release is to update the documentation of shared memory usage
to include the new structures.

-Kevin

Attachment	Content-Type	Size
ssi-multi-update-1.patch	application/octet-stream	23.0 KB

From:	yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To:	Kevin(dot)Grittner(at)wicourts(dot)gov
Cc:	heikki(dot)linnakangas(at)enterprisedb(dot)com, drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-02-21 23:42:36
Message-ID:	20110221234236.4635E19D574@mail.netbsd.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi,

> "Kevin Grittner" wrote:
>
>> I'm proceeding on this basis.
>
> Result attached. I found myself passing around the tuple xmin value
> just about everywhere that the predicate lock target tag was being
> passed, so it finally dawned on me that this logically belonged as
> part of the target tag. That simplified the changes, and the net
> result of following Heikki's suggestion here is the reduction of
> total lines of code by 178 while adding coverage for missed corner
> cases and fixing bugs.
>
> Thanks again, Heikki!
>
> I will test this some more tomorrow. So far I haven't done more than
> ensure it passes the standard regression tests and the isolation
> tests added for SSI. The latter took awhile because the hash_any
> function was including uninitialized bytes past the length of the tag
> in its calculations. We should probably either fix that or document
> it. I had to add another couple bytes to the tag to get it to a four
> byte boundary to fix it. Easy once you know that's how it works...
>
> The attached patch can also be viewed here:
>
> http://git.postgresql.org/gitweb?p=users/kgrittn/postgres.git;a=commitdiff;h=46fd5ea6728b566c521ec83048bc00a207289dd9
>
> If this stands up to further testing, the only issue I know of for
> the 9.1 release is to update the documentation of shared memory usage
> to include the new structures.
>
> -Kevin

i tested ede45e90dd1992bfd3e1e61ce87bad494b81f54d + ssi-multi-update-1.patch
with my application and got the following assertion failure.

YAMAMOTO Takashi

Core was generated by `postgres'.
Program terminated with signal 6, Aborted.
#0 0xbbba4cc7 in _lwp_kill () from /usr/lib/libc.so.12
(gdb) bt
#0 0xbbba4cc7 in _lwp_kill () from /usr/lib/libc.so.12
#1 0xbbba4c85 in raise (s=6) at /siro/nbsd/src/lib/libc/gen/raise.c:48
#2 0xbbba445a in abort () at /siro/nbsd/src/lib/libc/stdlib/abort.c:74
#3 0x0833d9f2 in ExceptionalCondition (
conditionName=0x8493f6d "!(locallock != ((void *)0))",
errorType=0x8370451 "FailedAssertion", fileName=0x8493ec3 "predicate.c",
lineNumber=3657) at assert.c:57
#4 0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
at predicate.c:3657
#5 0x0827bd74 in CheckForSerializableConflictIn (relation=0x99b5bad4,
tuple=0xbfbfcf78, buffer=234) at predicate.c:3772
#6 0x080a5be8 in heap_update (relation=0x99b5bad4, otid=0xbfbfd038,
newtup=0x99a0aa08, ctid=0xbfbfd03e, update_xmax=0xbfbfd044, cid=1,
crosscheck=0x0, wait=1 '\001') at heapam.c:2593
#7 0x081c81ca in ExecModifyTable (node=0x99a0566c) at nodeModifyTable.c:624
#8 0x081b2153 in ExecProcNode (node=0x99a0566c) at execProcnode.c:371
#9 0x081b0f82 in standard_ExecutorRun (queryDesc=0x99b378f0,
direction=ForwardScanDirection, count=0) at execMain.c:1247
#10 0xbbaaf352 in pgss_ExecutorRun (queryDesc=0x99b378f0,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:541
#11 0xbbab5ee5 in explain_ExecutorRun (queryDesc=0x99b378f0,
direction=ForwardScanDirection, count=0) at auto_explain.c:201
#12 0x08288ae3 in ProcessQuery (plan=0x99bb404c,
sourceText=0x99b3781c "UPDATE file SET ctime = $1 WHERE fileid = $2",
params=<value optimized out>, dest=0x84d1f38, completionTag=0xbfbfd2f0 "")
at pquery.c:197
#13 0x08288d0a in PortalRunMulti (portal=0x99b3f01c,
isTopLevel=<value optimized out>, dest=dwarf2_read_address: Corrupted DWARF expression.
) at pquery.c:1268
#14 0x0828984a in PortalRun (portal=0x99b3f01c, count=2147483647,
isTopLevel=1 '\001', dest=0x99b07428, altdest=0x99b07428,
completionTag=0xbfbfd2f0 "") at pquery.c:822
#15 0x08286c22 in PostgresMain (argc=2, argv=0xbb9196a4,
username=0xbb9195f8 "takashi") at postgres.c:2004
#16 0x082413f6 in ServerLoop () at postmaster.c:3590
#17 0x082421a8 in PostmasterMain (argc=3, argv=0xbfbfe594) at postmaster.c:1110
#18 0x081e0d09 in main (argc=3, argv=0xbfbfe594) at main.c:199
(gdb) fr 4
#4 0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
at predicate.c:3657
3657 Assert(locallock != NULL);
(gdb) list
3652
3653 locallock = (LOCALPREDICATELOCK *)
3654 hash_search_with_hash_value(LocalPredicateLockHash,
3655 targettag, targettaghash,
3656 HASH_FIND, NULL);
3657 Assert(locallock != NULL);
3658 Assert(locallock->held);
3659 locallock->held = false;
3660
3661 if (locallock->childLocks == 0)
(gdb)

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc:	Kevin(dot)Grittner(at)wicourts(dot)gov, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-02-22 02:41:29
Message-ID:	20110222024129.GC74913@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 21, 2011 at 11:42:36PM +0000, YAMAMOTO Takashi wrote:
> i tested ede45e90dd1992bfd3e1e61ce87bad494b81f54d + ssi-multi-update-1.patch
> with my application and got the following assertion failure.
>
> #4 0x0827977e in CheckTargetForConflictsIn (targettag=0xbfbfce78)
> at predicate.c:3657
> 3657 Assert(locallock != NULL);

It looks like CheckTargetForConflictsIn is making the assumption that
the backend-local lock table is accurate, which was probably even true
at the time it was written. Unfortunately, it hasn't been for a while,
and the new changes for tuple versions make it more likely that this
will actually come up.

The solution is only slightly more complicated than just removing the
assertion. Unless Kevin beats me to it, I'll put together a patch later
tonight or tomorrow. (I'm at the airport right now.)

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc:	<heikki(dot)linnakangas(at)enterprisedb(dot)com>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-02-22 16:51:05
Message-ID:	4D639519020000250003AE0D@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> It looks like CheckTargetForConflictsIn is making the assumption
> that the backend-local lock table is accurate, which was probably
> even true at the time it was written.

I remember we decided that it could only be false in certain ways
which allowed us to use it as a "lossy" first-cut test in a couple
places. I doubt that we can count on any of that any longer, and
should drop those heuristics.

> the new changes for tuple versions make it more likely that this
> will actually come up.

Yeah, as far as a I can recall the only divergence was in *page*
level entries for *indexes* until this latest patch. We now have
*tuple* level entries for *heap* relations, too.

> The solution is only slightly more complicated than just removing
> the assertion.

That's certainly true for that one spot, but we need an overall
review of where we might be trying to use LocalPredicateLockHash for
"first cut" tests as an optimization.

> Unless Kevin beats me to it, I'll put together a patch later
> tonight or tomorrow. (I'm at the airport right now.)

It would be great if you could get this one. Thanks.

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	heikki(dot)linnakangas(at)enterprisedb(dot)com, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-02-22 23:20:54
Message-ID:	20110222232054.GC61128@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>
> > It looks like CheckTargetForConflictsIn is making the assumption
> > that the backend-local lock table is accurate, which was probably
> > even true at the time it was written.
>
> I remember we decided that it could only be false in certain ways
> which allowed us to use it as a "lossy" first-cut test in a couple
> places. I doubt that we can count on any of that any longer, and
> should drop those heuristics.

Hmm. The theory was before that the local lock table would only have
false negatives, i.e. if it says we hold a lock then we really do. That
makes it a useful heuristic because we can bail out quickly if we're
trying to re-acquire a lock we already hold. It seems worthwhile to try
to preserve that.

This property holds as long as the only time one backend edits
another's lock list is to *add* a new lock, not remove an existing one.
Fortunately, this is true most of the time (at a glance, it seems to be
the case for the new tuple update code).

There are two exceptions; I think they're both OK but we need to be
careful here.
- if we're forced to promote an index page lock's granularity to avoid
running out of shared memory, we remove the fine-grained one. This
is fine as long as we relax our expectations to "if the local
lock table says we hold a lock, then we hold that lock or one that
covers it", which is acceptable for current users of the table.
- if we combine two index pages, we remove the lock entry for the page
being deleted. I think that's OK because the page is being removed
so we will not make any efforts to lock it.

> Yeah, as far as a I can recall the only divergence was in *page*
> level entries for *indexes* until this latest patch. We now have
> *tuple* level entries for *heap* relations, too.

*nod*. I'm slightly concerned about the impact of that on granularity
promotion -- the new locks created by heap updates won't get counted
toward the lock promotion thresholds. That's not a fatal problem of
anything, but it could make granularity promotion less effective at
conserving lock entries.

> > The solution is only slightly more complicated than just removing
> > the assertion.
>
> That's certainly true for that one spot, but we need an overall
> review of where we might be trying to use LocalPredicateLockHash for
> "first cut" tests as an optimization.

Yes, I'd planned to go through the references to LocalPredicateLockHash
to make sure none of them were making any unwarranted assumptions about
the results. Fortunately, there are not too many of them...

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc:	<heikki(dot)linnakangas(at)enterprisedb(dot)com>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-02-22 23:54:49
Message-ID:	4D63F869020000250003AE82@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> On Tue, Feb 22, 2011 at 10:51:05AM -0600, Kevin Grittner wrote:

> The theory was before that the local lock table would only have
> false negatives, i.e. if it says we hold a lock then we really do.
> That makes it a useful heuristic because we can bail out quickly
> if we're trying to re-acquire a lock we already hold. It seems
> worthwhile to try to preserve that.
>
> This property holds as long as the only time one backend edits
> another's lock list is to *add* a new lock, not remove an existing
> one. Fortunately, this is true most of the time (at a glance, it
> seems to be the case for the new tuple update code).
>
> There are two exceptions; I think they're both OK but we need to
> be careful here.
> - if we're forced to promote an index page lock's granularity to
> avoid running out of shared memory, we remove the fine-grained
> one. This is fine as long as we relax our expectations to "if
> the local lock table says we hold a lock, then we hold that
> lock or one that covers it", which is acceptable for current
> users of the table.

That makes sense. This one sounds OK.

> - if we combine two index pages, we remove the lock entry for the
> page being deleted. I think that's OK because the page is being
> removed so we will not make any efforts to lock it.

I'm not sure it's safe to assume that the index page won't get
reused before the local lock information is cleared. In the absence
of a clear proof that it is safe, or some enforcement mechanism to
ensure that it is, I don't think we should make this assumption.
Off-hand I can't think of a clever way to make this safe which would
cost less than taking out the LW lock and checking the definitive
shared memory HTAB, but that might be for lack of creative thinking
at the moment..

>> Yeah, as far as a I can recall the only divergence was in *page*
>> level entries for *indexes* until this latest patch. We now have
>> *tuple* level entries for *heap* relations, too.
>
> *nod*. I'm slightly concerned about the impact of that on
> granularity promotion -- the new locks created by heap updates
> won't get counted toward the lock promotion thresholds. That's not
> a fatal problem of anything, but it could make granularity
> promotion less effective at conserving lock entries.

This pattern doesn't seem to come up very often in most workloads.
Since it's feeding into a heuristic which already triggers pretty
quickly I think we should be OK with this. It makes me less tempted
to tinker with the threshold for promoting tuple locks to page
locks, though.

The only alternative I see would be to use some form of asynchronous
notification of the new locks so that the local table can be
maintained. That seems overkill without some clear evidence that it
is needed. I *really* wouldn't want to go back to needing LW locks
to maintain this info; as you know (and stated only for the benefit
of the list), it was a pretty serious contention point in early
profiling and adding the local table was a big part of getting an
early benchmark down from a 14+% performance hit for SSI to a 1.8%
performance hit.

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	heikki(dot)linnakangas(at)enterprisedb(dot)com, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-02-23 01:49:51
Message-ID:	20110223014951.GF61128@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 22, 2011 at 05:54:49PM -0600, Kevin Grittner wrote:
> I'm not sure it's safe to assume that the index page won't get
> reused before the local lock information is cleared. In the absence
> of a clear proof that it is safe, or some enforcement mechanism to
> ensure that it is, I don't think we should make this assumption.
> Off-hand I can't think of a clever way to make this safe which would
> cost less than taking out the LW lock and checking the definitive
> shared memory HTAB, but that might be for lack of creative thinking
> at the moment..

Hmm. Yeah, I wasn't sure about that one, and having now thought about
it some more I think it isn't safe -- consider adding a lock on an
index page concurrently with another backend merging that page into
another one.

The obvious solution to me is to just keep the lock on both the old and
new page. The downside is that because this requires allocating a new
lock and is in a context where we're not allowed to fail, we'll need to
fall back on acquiring the relation lock just as we do for page splits.
I was going to bemoan the extra complexity this would add -- but
actually, couldn't we just replace PredicateLockPageCombine with a call
to PredicateLockPageSplit since they'd now do the same thing?

> The only alternative I see would be to use some form of asynchronous
> notification of the new locks so that the local table can be
> maintained. That seems overkill without some clear evidence that it
> is needed.

I agree. It is certainly weird and undesirable that the backend-local
lock table is not always accurate, but I don't see a good way to keep
it up to date without the cure being worse than the disease.

> I *really* wouldn't want to go back to needing LW locks
> to maintain this info; as you know (and stated only for the benefit
> of the list), it was a pretty serious contention point in early
> profiling and adding the local table was a big part of getting an
> early benchmark down from a 14+% performance hit for SSI to a 1.8%
> performance hit.

Yes, it's definitely important for a backend to be able to check
whether it's already holding a lock (even if that's just a hint)
without having to take locks.

Let me add one more piece of info for the benefit of the list: a
backend's local lock table contains not just locks held by the backend,
but also an entry and refcount for every parent of a lock it holds.
This is used to determine when to promote to one of the coarser-grained
parent locks. It's both unnecessary and undesirable for that info to be
in shared memory.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-01 00:03:06
Message-ID:	20110301000306.GL10115@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

An updated patch to address this issue is attached. It fixes a couple
issues related to use of the backend-local lock table hint:

- CheckSingleTargetForConflictsIn now correctly handles the case
where a lock that's being held is not reflected in the local lock
table. This fixes the assertion failure reported in this thread.

- PredicateLockPageCombine now retains locks for the page that is
being removed, rather than removing them. This prevents a
potentially dangerous false-positive inconsistency where the local
lock table believes that a lock is held, but it is actually not.

- add some more comments documenting the times when the local lock
table can be inconsistent with reality, as reflected in the shared
memory table.

This patch also incorporates Kevin's changes to copy locks when
creating a new version of a tuple rather than trying to maintain a
linkage between different versions. So this is a patch that should
apply against HEAD and addresses all outstanding SSI bugs known to
Kevin or myself.

Besides the usual regression and isolation tests, I have tested this
by running DBT-2 on a 16-core machine to verify that there are no
assertion failures that only show up under concurrent access.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Attachment	Content-Type	Size
ssi-fixes.patch	text/x-diff	25.5 KB

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-01 17:07:42
Message-ID:	4D6D27DE.8080304@enterprisedb.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 01.03.2011 02:03, Dan Ports wrote:
> An updated patch to address this issue is attached. It fixes a couple
> issues related to use of the backend-local lock table hint:
>
> - CheckSingleTargetForConflictsIn now correctly handles the case
> where a lock that's being held is not reflected in the local lock
> table. This fixes the assertion failure reported in this thread.
>
> - PredicateLockPageCombine now retains locks for the page that is
> being removed, rather than removing them. This prevents a
> potentially dangerous false-positive inconsistency where the local
> lock table believes that a lock is held, but it is actually not.
>
> - add some more comments documenting the times when the local lock
> table can be inconsistent with reality, as reflected in the shared
> memory table.
>
> This patch also incorporates Kevin's changes to copy locks when
> creating a new version of a tuple rather than trying to maintain a
> linkage between different versions. So this is a patch that should
> apply against HEAD and addresses all outstanding SSI bugs known to
> Kevin or myself.

Thanks, committed with minor changes.

The ordering of the fields in PREDICATELOCKTAG was bizarre, so I just
expanded the offsetnumber fields to an uint32, instead of having the
padding field. I think that's a lot more readable.

I also added an optimization in PredicateLockTupleRowVersionLink() to
not try to transfer the page locks, if the new tuple is on the same page
as the old one. That's very cheap to check, and it's very common for an
update to stay within a page.

Was there test cases for any of the issues fixed by this patch that we
should add to the suite?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	"PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-01 17:43:58
Message-ID:	4D6CDBFE020000250003B203@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> committed with minor changes.

Thanks!

> The ordering of the fields in PREDICATELOCKTAG was bizarre, so I
> just expanded the offsetnumber fields to an uint32, instead of
> having the padding field. I think that's a lot more readable.

I can understand that, but I wonder if we shouldn't have a comment
mentioning that the offsetnumber field is larger than needed, in
case someone later needs to add another 16 bit field for some reason,
or we
go to a hash function without the same quirks.

> I also added an optimization in PredicateLockTupleRowVersionLink()
> to not try to transfer the page locks, if the new tuple is on the
> same page as the old one. That's very cheap to check, and it's
> very common for an update to stay within a page.

Thanks. I had it in mind to do that, but lost track of it.
Definitely worth doing.

> Was there test cases for any of the issues fixed by this patch
> that we should add to the suite?

No, but I'm still intending to look at that some more. It makes me
nervous to have an area which would be pretty easy for someone to
break without any tests to catch such breakage.

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-01 20:09:58
Message-ID:	20110301200957.GA91864@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Mar 01, 2011 at 07:07:42PM +0200, Heikki Linnakangas wrote:
> Was there test cases for any of the issues fixed by this patch that we
> should add to the suite?

Some of these issues are tricky to test, e.g. some of the code about
transferring predicate locks to a new target doesn't get exercised
unless an index page gets split while there are concurrent
transactions holding locks on that page.

I have not been able to find a good way to test these other than
system-level testing using a concurrent workload (usually the DBT-2
benchmark). I'd certainly be open to suggestions!

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To:	Kevin(dot)Grittner(at)wicourts(dot)gov
Cc:	heikki(dot)linnakangas(at)enterprisedb(dot)com, drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-18 02:26:21
Message-ID:	20110318022622.196E519CFCA@mail.netbsd.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi,

thanks for quickly fixing problems.
i tested the later version (a2eb9e0c08ee73208b5419f5a53a6eba55809b92)
and only errors i got was "out of shared memory". i'm not sure if
it was caused by SSI activities or not.

YAMAMOTO Takashi

the following is a snippet from my application log:

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: dynahash.c
PG_DIAG_SOURCE_LINE: 925
PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc:	<drkp(at)csail(dot)mit(dot)edu>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-18 17:50:16
Message-ID:	4D835508020000250003BAD2@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> thanks for quickly fixing problems.

Thanks for the rigorous testing. :-)

> i tested the later version
> (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
> was "out of shared memory". i'm not sure if it was caused by SSI
> activities or not.

> PG_DIAG_SEVERITY: WARNING
> PG_DIAG_SQLSTATE: 53200
> PG_DIAG_MESSAGE_PRIMARY: out of shared memory
> PG_DIAG_SOURCE_FILE: shmem.c
> PG_DIAG_SOURCE_LINE: 190
> PG_DIAG_SOURCE_FUNCTION: ShmemAlloc
>
> PG_DIAG_SEVERITY: ERROR
> PG_DIAG_SQLSTATE: 53200
> PG_DIAG_MESSAGE_PRIMARY: out of shared memory
> PG_DIAG_SOURCE_FILE: dynahash.c
> PG_DIAG_SOURCE_LINE: 925
> PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value

Nor am I. Some additional information would help.

(1) Could you post the non-default configuration settings?

(2) How many connections are in use in your testing?

(3) Can you give a rough categorization of how many of what types
of transactions are in the mix?

(4) Are there any long-running transactions?

(5) How many of these errors do you get in what amount of time?

(6) Does the application continue to run relatively sanely, or does
it fall over at this point?

(7) The message hint would help pin it down, or a stack trace at
the point of the error would help more. Is it possible to get
either? Looking over the code, it appears that the only places that
SSI could generate that error, it would cancel that transaction with
the hint "You might need to increase
max_pred_locks_per_transaction." and otherwise allow normal
processing.

Even with the above information it may be far from clear where
allocations are going past their maximum, since one HTAB could grab
more than its share and starve another which is staying below its
"maximum". I'll take a look at the possibility of adding a warning
or some such when an HTAB expands past its maximum size.

-Kevin

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-18 18:18:32
Message-ID:	20110318181832.GB72629@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

It would probably also be worth monitoring the size of pg_locks to see
how many predicate locks are being held.

On Fri, Mar 18, 2011 at 12:50:16PM -0500, Kevin Grittner wrote:
> Even with the above information it may be far from clear where
> allocations are going past their maximum, since one HTAB could grab
> more than its share and starve another which is staying below its
> "maximum". I'll take a look at the possibility of adding a warning
> or some such when an HTAB expands past its maximum size.

Yes -- considering how few shared memory HTABs have sizes that are
really dynamic, I'd be inclined to take a close look at SSI and
max_predicate_locks_per_transaction regardless of where the failed
allocation took place. But I am surprised to see that error message
without SSI's hint about increasing max_predicate_locks_per_xact.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>,<pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-18 20:51:11
Message-ID:	4D837F6F020000250003BAF6@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:

> I am surprised to see that error message without SSI's hint about
> increasing max_predicate_locks_per_xact.

After reviewing this, I think something along the following lines
might be needed, for a start. I'm not sure the Asserts are actually
needed; they basically are checking that the current behavior of
hash_search doesn't change.

I'm still looking at whether it's sane to try to issue a warning
when an HTAB exceeds the number of entries declared as its max_size
when it was created.

-Kevin

Attachment	Content-Type	Size
ssi-oosm-1.patch	text/plain	2.0 KB

From:	"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To:	"Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: SSI bug?
Date:	2011-03-18 21:57:32
Message-ID:	4D838EFC020000250003BB12@gw.wicourts.gov
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:

> I'm still looking at whether it's sane to try to issue a warning
> when an HTAB exceeds the number of entries declared as its
> max_size when it was created.

I think this does it.

If nothing else, it might be instructive to use it while testing the
SSI patch. Would it make any sense to slip this into 9.1, or should
I add it to the first 9.2 CF?

-Kevin

Attachment	Content-Type	Size
htab-excess-warning.patch	text/plain	2.6 KB

From:	yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To:	Kevin(dot)Grittner(at)wicourts(dot)gov
Cc:	drkp(at)csail(dot)mit(dot)edu, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-25 02:48:07
Message-ID:	20110325024807.9AF9F19CE21@mail.netbsd.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

hi,

> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> thanks for quickly fixing problems.
>
> Thanks for the rigorous testing. :-)
>
>> i tested the later version
>> (a2eb9e0c08ee73208b5419f5a53a6eba55809b92) and only errors i got
>> was "out of shared memory". i'm not sure if it was caused by SSI
>> activities or not.
>
>> PG_DIAG_SEVERITY: WARNING
>> PG_DIAG_SQLSTATE: 53200
>> PG_DIAG_MESSAGE_PRIMARY: out of shared memory
>> PG_DIAG_SOURCE_FILE: shmem.c
>> PG_DIAG_SOURCE_LINE: 190
>> PG_DIAG_SOURCE_FUNCTION: ShmemAlloc
>>
>> PG_DIAG_SEVERITY: ERROR
>> PG_DIAG_SQLSTATE: 53200
>> PG_DIAG_MESSAGE_PRIMARY: out of shared memory
>> PG_DIAG_SOURCE_FILE: dynahash.c
>> PG_DIAG_SOURCE_LINE: 925
>> PG_DIAG_SOURCE_FUNCTION: hash_search_with_hash_value
>
> Nor am I. Some additional information would help.
>
> (1) Could you post the non-default configuration settings?

none. it can happen with just initdb+createdb'ed database.

> (2) How many connections are in use in your testing?

> (3) Can you give a rough categorization of how many of what types
> of transactions are in the mix?

all transactions are SERIALIZABLE.

no transactions are with READ ONLY.
(but some of them are actually select-only.)

> (4) Are there any long-running transactions?

no.

> (5) How many of these errors do you get in what amount of time?

once it start happening, i see them somehow frequently.

> (6) Does the application continue to run relatively sanely, or does
> it fall over at this point?

my application just exits on the error.

if i re-run the application without rebooting postgres, it seems that
i will get the error sooner than the first run. (but it might be just
a matter of luck)

> (7) The message hint would help pin it down, or a stack trace at
> the point of the error would help more. Is it possible to get
> either? Looking over the code, it appears that the only places that
> SSI could generate that error, it would cancel that transaction with
> the hint "You might need to increase
> max_pred_locks_per_transaction." and otherwise allow normal
> processing.

no message hints. these errors are not generated by SSI code,
at least directly.
(please look at PG_DIAG_SOURCE_xxx in the above log.)

YAMAMOTO Takashi

> Even with the above information it may be far from clear where
> allocations are going past their maximum, since one HTAB could grab
> more than its share and starve another which is staying below its
> "maximum". I'll take a look at the possibility of adding a warning
> or some such when an HTAB expands past its maximum size.
>
> -Kevin

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-25 19:18:41
Message-ID:	AANLkTimgy7xgS0zdC0LiU=5zgsSGwKGTaPppeza7WAQb@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>
>> I'm still looking at whether it's sane to try to issue a warning
>> when an HTAB exceeds the number of entries declared as its
>> max_size when it was created.
>
> I think this does it.
>
> If nothing else, it might be instructive to use it while testing the
> SSI patch. Would it make any sense to slip this into 9.1, or should
> I add it to the first 9.2 CF?

I don't think it's too late to commit something like this, but I'm not
clear on whether we want it. Is this checking for what should be a
can't-happen case, or are these soft limits that we expect to be
exceeded from time to time?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-25 19:23:14
Message-ID:	AANLkTim-k8A0N82pZ+FL9cRFkUH-APES6ppV28Q5-0ES@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 18, 2011 at 4:51 PM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
>
>> I am surprised to see that error message without SSI's hint about
>> increasing max_predicate_locks_per_xact.
>
> After reviewing this, I think something along the following lines
> might be needed, for a start. I'm not sure the Asserts are actually
> needed; they basically are checking that the current behavior of
> hash_search doesn't change.
>
> I'm still looking at whether it's sane to try to issue a warning
> when an HTAB exceeds the number of entries declared as its max_size
> when it was created.

I don't see much advantage in changing these to asserts - in a debug
build, that will promote ERROR to PANIC; whereas in a production
build, they'll cause a random failure somewhere downstream.

The HASH_ENTER to HASH_ENTER_NULL changes look like they might be
needed, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-25 20:06:30
Message-ID:	15645.1301083590@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>> I'm still looking at whether it's sane to try to issue a warning
>>> when an HTAB exceeds the number of entries declared as its
>>> max_size when it was created.

> I don't think it's too late to commit something like this, but I'm not
> clear on whether we want it.

We do *not* want that.

Up to now, I believe the lockmgr's lock table is the only shared hash
table that is expected to grow past the declared size; that can happen
anytime a session exceeds max_locks_per_transaction, which we consider
to be only a soft limit. I don't know whether SSI has introduced any
other hash tables that behave similarly. (If it has, we might want to
rethink the amount of "slop" space we leave in shmem...)

There might perhaps be some value in adding a warning like this if it
were enabled per-table (and not enabled by default). But I'd want to
see a specific reason for it, not just "let's see if we can scare users
with a WARNING appearing out of nowhere".

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-25 20:22:48
Message-ID:	AANLkTik-m_3Q4T=D31DkVd-w8X8PzRSug9bvmWetNF3g@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 25, 2011 at 4:06 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Mar 18, 2011 at 5:57 PM, Kevin Grittner
>> <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
>>>> I'm still looking at whether it's sane to try to issue a warning
>>>> when an HTAB exceeds the number of entries declared as its
>>>> max_size when it was created.
>
>> I don't think it's too late to commit something like this, but I'm not
>> clear on whether we want it.
>
> We do *not* want that.
>
> Up to now, I believe the lockmgr's lock table is the only shared hash
> table that is expected to grow past the declared size; that can happen
> anytime a session exceeds max_locks_per_transaction, which we consider
> to be only a soft limit. I don't know whether SSI has introduced any
> other hash tables that behave similarly. (If it has, we might want to
> rethink the amount of "slop" space we leave in shmem...)
>
> There might perhaps be some value in adding a warning like this if it
> were enabled per-table (and not enabled by default). But I'd want to
> see a specific reason for it, not just "let's see if we can scare users
> with a WARNING appearing out of nowhere".

What about a system view that shows declared & actual sizes of all
these hash tables?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: SSI bug?
Date:	2011-03-26 21:54:52
Message-ID:	20110326215452.GA72629@csail.mit.edu
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Mar 25, 2011 at 04:06:30PM -0400, Tom Lane wrote:
> Up to now, I believe the lockmgr's lock table is the only shared hash
> table that is expected to grow past the declared size; that can happen
> anytime a session exceeds max_locks_per_transaction, which we consider
> to be only a soft limit. I don't know whether SSI has introduced any
> other hash tables that behave similarly. (If it has, we might want to
> rethink the amount of "slop" space we leave in shmem...)

I looked into this recently and the two lockmgr tables were indeed the
only ones that could vary in size. IIRC, the only other shared hash
tables were the shared buffer index and the shmem index.

SSI adds two more analogous tables (per-lock-target and per-xact-lock),
both of which are sized according to max_pred_locks_per_transaction,
which is also a soft limit. It also adds a per-transaction shared hash
table, but that has a clear maximum size.

I find the soft limit on htab size a strange model, and I suspect it
might be a source of problems now that we've got more than one table
that can actually exceed it its limit. (Particularly so given that once
shmem gets allocated to one htab, there's no getting it back.)

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/