Re: SSI bug?

Lists: pgsql-hackers
From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-03-27 20:16:54
Message-ID: 4D8F54E6020000250003BD16@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi wrote:
> Kevin Grittner wrote:

>> (1) Could you post the non-default configuration settings?
>
> none. it can happen with just initdb+createdb'ed database.
>
>> (2) How many connections are in use in your testing?
>
> 4.
>
>> (3) Can you give a rough categorization of how many of what types
>> of transactions are in the mix?
>
> all transactions are SERIALIZABLE.
>
> no transactions are with READ ONLY.
> (but some of them are actually select-only.)
>
>> (4) Are there any long-running transactions?
>
> no.
>
>> (5) How many of these errors do you get in what amount of time?
>
> once it start happening, i see them somehow frequently.
>
>> (6) Does the application continue to run relatively sanely, or
>> does it fall over at this point?
>
> my application just exits on the error.
>
> if i re-run the application without rebooting postgres, it seems
> that i will get the error sooner than the first run. (but it might
> be just a matter of luck)

If your application hits this again, could you check pg_stat_activity
and pg_locks and see if any SIReadLock entries are lingering after
the owning transaction and all overlapping transactions are
completed? If anything is lingering between runs of your
application, it *should* show up in one or the other of these.

>> (7) The message hint would help pin it down, or a stack trace at
>> the point of the error would help more. Is it possible to get
>> either? Looking over the code, it appears that the only places
>> that SSI could generate that error, it would cancel that
>> transaction with the hint "You might need to increase
>> max_pred_locks_per_transaction." and otherwise allow normal
>> processing.
>
> no message hints. these errors are not generated by SSI code,
> at least directly.

Right, that's because we were using HASH_ENTER instead of
HASH_ENTER_NULL. I've posted a patch which should correct that.

>> Even with the above information it may be far from clear where
>> allocations are going past their maximum, since one HTAB could
>> grab more than its share and starve another which is staying below
>> its "maximum". I'll take a look at the possibility of adding a
>> warning or some such when an HTAB expands past its maximum size.

I see from your later post that you are running with this patch. Has
that reported anything yet?

Thanks,

-Kevin


From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: drkp(at)csail(dot)mit(dot)edu, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-03-28 03:16:16
Message-ID: 20110328031616.CDAA719CEDF@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

>>> (6) Does the application continue to run relatively sanely, or
>>> does it fall over at this point?
>>
>> my application just exits on the error.
>>
>> if i re-run the application without rebooting postgres, it seems
>> that i will get the error sooner than the first run. (but it might
>> be just a matter of luck)
>
> If your application hits this again, could you check pg_stat_activity
> and pg_locks and see if any SIReadLock entries are lingering after
> the owning transaction and all overlapping transactions are
> completed? If anything is lingering between runs of your
> application, it *should* show up in one or the other of these.

this is 71ac48fd9cebd3d2a873635a04df64096c981f73 with your two patches.
this psql session was the only activity to the server at this point.

hoge=# select * from pg_stat_activity;
-[ RECORD 1 ]----+--------------------------------
datid | 16384
datname | hoge
procpid | 7336
usesysid | 10
usename | takashi
application_name | psql
client_addr |
client_hostname |
client_port | -1
backend_start | 2011-03-26 12:28:21.882226+09
xact_start | 2011-03-28 11:55:19.300027+09
query_start | 2011-03-28 11:55:19.300027+09
waiting | f
current_query | select * from pg_stat_activity;

hoge=# select count(*) from pg_locks where mode='SIReadLock';
-[ RECORD 1 ]
count | 7057

hoge=# select locktype,count(*) from pg_locks group by locktype; -[ RECORD 1 ]--------
locktype | virtualxid
count | 1
-[ RECORD 2 ]--------
locktype | relation
count | 1
-[ RECORD 3 ]--------
locktype | tuple
count | 7061

hoge=#

>
>>> (7) The message hint would help pin it down, or a stack trace at
>>> the point of the error would help more. Is it possible to get
>>> either? Looking over the code, it appears that the only places
>>> that SSI could generate that error, it would cancel that
>>> transaction with the hint "You might need to increase
>>> max_pred_locks_per_transaction." and otherwise allow normal
>>> processing.
>>
>> no message hints. these errors are not generated by SSI code,
>> at least directly.
>
> Right, that's because we were using HASH_ENTER instead of
> HASH_ENTER_NULL. I've posted a patch which should correct that.

sure, with your patch it seems that they turned into different ones.

PG_DIAG_SEVERITY: WARNING
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_SOURCE_FILE: shmem.c
PG_DIAG_SOURCE_LINE: 190
PG_DIAG_SOURCE_FUNCTION: ShmemAlloc

PG_DIAG_SEVERITY: ERROR
PG_DIAG_SQLSTATE: 53200
PG_DIAG_MESSAGE_PRIMARY: out of shared memory
PG_DIAG_MESSAGE_HINT: You might need to increase max_pred_locks_per_transaction.
PG_DIAG_SOURCE_FILE: predicate.c
PG_DIAG_SOURCE_LINE: 2049
PG_DIAG_SOURCE_FUNCTION: CreatePredicateLock

>>> Even with the above information it may be far from clear where
>>> allocations are going past their maximum, since one HTAB could
>>> grab more than its share and starve another which is staying below
>>> its "maximum". I'll take a look at the possibility of adding a
>>> warning or some such when an HTAB expands past its maximum size.
>
> I see from your later post that you are running with this patch. Has
> that reported anything yet?

i got nothing except the following one. (in the server log)

WARNING: hash table "ShmemIndex" has more entries than expected
DETAIL: The maximum was set to 32 on creation.

YAMAMOTO Takashi

>
> Thanks,
>
> -Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <drkp(at)csail(dot)mit(dot)edu>,<heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-03-31 13:31:40
Message-ID: 4D943BEC020000250003BFDA@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> hoge=# select locktype,count(*) from pg_locks group by locktype;
> -[ RECORD 1 ]--------
> locktype | virtualxid
> count | 1
> -[ RECORD 2 ]--------
> locktype | relation
> count | 1
> -[ RECORD 3 ]--------
> locktype | tuple
> count | 7061

I've stared at the code for hours and have only come up with one
race condition which can cause this, although the window is so small
it's hard to believe that you would get this volume of orphaned
locks. I'll keep looking, but if you could try this to see if it
has a material impact, that would be great.

I am very sure this patch is needed and that it is safe. It moves a
LWLockAcquire statement up to cover the setup for the loop that it
already covers. It also includes a fix to a comment that got missed
when we switched from the pointer between lock targets to
duplicating the locks.

-Kevin

Attachment Content-Type Size
ssi-old-tuple-locks.patch text/plain 1.6 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, drkp(at)csail(dot)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-03-31 15:50:19
Message-ID: 4D94A2BB.3090605@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.03.2011 16:31, Kevin Grittner wrote:
> I've stared at the code for hours and have only come up with one
> race condition which can cause this, although the window is so small
> it's hard to believe that you would get this volume of orphaned
> locks. I'll keep looking, but if you could try this to see if it
> has a material impact, that would be great.
>
> I am very sure this patch is needed and that it is safe. It moves a
> LWLockAcquire statement up to cover the setup for the loop that it
> already covers. It also includes a fix to a comment that got missed
> when we switched from the pointer between lock targets to
> duplicating the locks.

Ok, committed.

Did we get anywhere with the sizing of the various shared memory
structures? Did we find the cause of the "out of shared memory" warnings?

Would it help if we just pre-allocated all the shared memory hash tables
and didn't allow them to grow? It's bizarre that the hash table that
requests the slack shared memory space first gets it, and it can't be
reused for anything else without a server restart. I feel it would make
better to not allow the tables to grow, so that you get consistent
behavior across restarts.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: <drkp(at)csail(dot)mit(dot)edu>,"YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-03-31 16:06:30
Message-ID: 4D946036020000250003BFFC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Did we get anywhere with the sizing of the various shared memory
> structures? Did we find the cause of the "out of shared memory"
> warnings?

The patch you just committed is related to that. Some tuple locks
for summarized transactions were not getting cleaned up. I found
one access to the list not protected by the appropriate LW lock,
which is what this patch fixed. I'm not satisfied that was the only
issue, though; I'm still looking.

> Would it help if we just pre-allocated all the shared memory hash
> tables and didn't allow them to grow?

I've been thinking that it might be wise.

> It's bizarre that the hash table that requests the slack shared
> memory space first gets it, and it can't be reused for anything
> else without a server restart. I feel it would make better to not
> allow the tables to grow, so that you get consistent behavior
> across restarts.

Agreed. I think it was OK in prior releases because there was just
one HTAB in shared memory doing this. With multiple such tables, it
doesn't seem sane to allow unbounded lazy grabbing of the space this
way. The only thing I've been on the fence about is whether it
makes more sense to allocate it all up front or to continue to allow
incremental allocation but set a hard limit on the number of entries
allocated for each shared memory HTAB. Is there a performance-
related reason to choose one path or the other?

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-03-31 18:12:16
Message-ID: 20110331181216.GF81592@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
> The only thing I've been on the fence about is whether it
> makes more sense to allocate it all up front or to continue to allow
> incremental allocation but set a hard limit on the number of entries
> allocated for each shared memory HTAB. Is there a performance-
> related reason to choose one path or the other?

Seems like it would be marginally better to allocate it up front -- then
you don't have the cost of having to split buckets later as it grows.

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-03-31 18:23:50
Message-ID: 4D948066020000250003C00B@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dan Ports <drkp(at)csail(dot)mit(dot)edu> wrote:
> On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
>> The only thing I've been on the fence about is whether it
>> makes more sense to allocate it all up front or to continue to
allow
>> incremental allocation but set a hard limit on the number of
entries
>> allocated for each shared memory HTAB. Is there a performance-
>> related reason to choose one path or the other?
>
> Seems like it would be marginally better to allocate it up front --
then
> you don't have the cost of having to split buckets later as it
grows.

The attached patch should cover that.

-Kevin

Attachment Content-Type Size
htab-alloc.patch text/plain 4.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-03-31 18:31:37
Message-ID: 4D94C889.3050607@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.03.2011 21:23, Kevin Grittner wrote:
> Dan Ports<drkp(at)csail(dot)mit(dot)edu> wrote:
>> On Thu, Mar 31, 2011 at 11:06:30AM -0500, Kevin Grittner wrote:
>>> The only thing I've been on the fence about is whether it
>>> makes more sense to allocate it all up front or to continue to
> allow
>>> incremental allocation but set a hard limit on the number of
> entries
>>> allocated for each shared memory HTAB. Is there a performance-
>>> related reason to choose one path or the other?
>>
>> Seems like it would be marginally better to allocate it up front --
> then
>> you don't have the cost of having to split buckets later as it
> grows.
>
> The attached patch should cover that.

That's not enough. The hash tables can grow beyond the maximum size you
specify in ShmemInitHash. It's just a hint to size the directory within
the hash table.

We'll need to teach dynahash not to allocate any more entries after the
preallocation. A new HASH_NO_GROW flag to hash_create() seems like a
suitable interface.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-03-31 19:06:53
Message-ID: 4D948A7D020000250003C012@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> That's not enough. The hash tables can grow beyond the maximum
> size you specify in ShmemInitHash. It's just a hint to size the
> directory within the hash table.
>
> We'll need to teach dynahash not to allocate any more entries
> after the preallocation. A new HASH_NO_GROW flag to hash_create()
> seems like a suitable interface.

OK. If we're doing that, is it worth taking a look at the "safety
margin" added to the size calculations, and try to make the
calculations more accurate?

Would you like me to code a patch for this?

There are a couple other patches which I think should be applied, if
you have time to deal with them.

There was a fix for an assertion failure here:

http://archives.postgresql.org/pgsql-bugs/2011-03/msg00352.php

It just rechecks some conditions after dropping a shared LW lock and
acquiring an exclusive LW lock. Without this recheck there is a
window for the other transaction involved in the conflict to also
detect a conflict and flag first, leading to the assertion.

There's another area I need to review near there, but that is
orthogonal.

There is a patch to improve out-of-shared-memory error handling and
reporting here:

http://archives.postgresql.org/pgsql-hackers/2011-03/msg01170.php

This one is due to my earlier failure to spot the difference between
HASH_ENTER and HASH_ENTER_NULL. For a shared memory HTAB the
HASH_ENTER_NULL will return a null if unable to allocate the entry,
while HASH_ENTER will ereport ERROR with a generic message. This
patch leaves HASH_ENTER on the "can't happen" cases, but replaces
the ereport ERROR after it with an Assert because it's something
which should never happen. The other cases are changed to
HASH_ENTER_NULL so that the error message with the hint will be used
instead of the more generic message.

These patches are both in direct response to problems found during
testing by YAMAMOTO Takashi.

-Kevin


From: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Cc: YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: SSI bug?
Date: 2011-04-03 06:16:44
Message-ID: 20110403061644.GS81592@csail.mit.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I think I see what is going on now. We are sometimes failing to set the
commitSeqNo correctly on the lock. In particular, if a lock assigned to
OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
cleared.

The attached patch corrects this:
TransferPredicateLocksToNewTarget should initialize a new lock
entry's commitSeqNo to that of the old one being transferred, or take
the minimum commitSeqNo if it is merging two lock entries.

Also, CreatePredicateLock should initialize commitSeqNo for to
InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
actually affect anything, but we should be consistent.)

I also added a couple of assertions I used to track this down: a
lock's commitSeqNo should never be zero, and it should be
InvalidSerCommitSeqNo if and only if the lock is not held by
OldCommittedSxact.

Takashi, does this patch fix your problem with leaked SIReadLocks?

Dan

--
Dan R. K. Ports MIT CSAIL http://drkp.net/

Attachment Content-Type Size
patch-commitseqno.diff text/x-diff 2.3 KB

From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: drkp(at)csail(dot)mit(dot)edu
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin(dot)Grittner(at)wicourts(dot)gov, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: SSI bug?
Date: 2011-04-07 02:02:10
Message-ID: 20110407020210.D851019CF0A@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

> I think I see what is going on now. We are sometimes failing to set the
> commitSeqNo correctly on the lock. In particular, if a lock assigned to
> OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
> cleared.
>
> The attached patch corrects this:
> TransferPredicateLocksToNewTarget should initialize a new lock
> entry's commitSeqNo to that of the old one being transferred, or take
> the minimum commitSeqNo if it is merging two lock entries.
>
> Also, CreatePredicateLock should initialize commitSeqNo for to
> InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
> actually affect anything, but we should be consistent.)
>
> I also added a couple of assertions I used to track this down: a
> lock's commitSeqNo should never be zero, and it should be
> InvalidSerCommitSeqNo if and only if the lock is not held by
> OldCommittedSxact.
>
> Takashi, does this patch fix your problem with leaked SIReadLocks?

i'm currently running bf6848bc8c82e82f857d48185554bc3e6dcf1013 with this
patch applied. i haven't seen the symptom yet. i'll keep it running for
a while.

btw, i've noticed the following message in the server log. is it normal?

LOG: could not truncate directory "pg_serial": apparent wraparound

YAMAMOTO Takashi

>
> Dan
>
>
> --
> Dan R. K. Ports MIT CSAIL http://drkp.net/


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <drkp(at)csail(dot)mit(dot)edu>,"YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-04-07 15:21:21
Message-ID: 4D9D9021020000250003C4E2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:

> LOG: could not truncate directory "pg_serial": apparent
> wraparound

Did you get a warning with this text?:

memory for serializable conflict tracking is nearly exhausted

If not, there's some sort of cleanup bug to fix in the predicate
locking's use of SLRU. It may be benign, but we won't really know
until we find it. I'm investigating.

-Kevin


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <drkp(at)csail(dot)mit(dot)edu>,"YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>
Cc: <heikki(dot)linnakangas(at)enterprisedb(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-04-07 20:34:19
Message-ID: 4D9DD97B020000250003C53C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> LOG: could not truncate directory "pg_serial": apparent
>> wraparound

> there's some sort of cleanup bug to fix in the predicate
> locking's use of SLRU. It may be benign, but we won't really know
> until we find it. I'm investigating.

I'm pretty sure I found it. When the number serializable
transactions which need to be tracked gets high enough to push
things to the SLRU summarization, and then drops back down, we
haven't been truncating the head page of the active SLRU region
because if we go back into SLRU summarization that saves us from
zeroing the page again. The problem is that if we don't go back
into SLRU summarization for a long time, we might wrap around to
where SLRU is upset that our head is chasing our tail. This seems
like a bigger problem than we were trying to solve by not truncating
the page.

The issue is complicated a little bit by the fact that the SLRU API
has you specify the *page* for the truncation point, but silently
ignores the request unless the page is in a segment which is past a
segment in use. So adding the number of pages per SLRU segment to
the head page position should do the right thing. But it's all
weird enough that I felt it need a bit of commenting.

While I was there I noticed that we're doing the unnecessary
flushing (so people can glean information about the SLRU activity
from watching the disk files) right before truncating. I switched
the truncation to come before the flushing, since flushing pages to
a file and then deleting that file didn't seem productive.

Attached find a patch which modifies one line of code, switches the
order of two lines of code, and adds comments.

I will add this to the open items for 9.1. Thanks again to YAMAMOTO
Takashi for his rigorous testing.

-Kevin

Attachment Content-Type Size
ssi-slru-truncate-fix.patch text/plain 2.2 KB

From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: Kevin(dot)Grittner(at)wicourts(dot)gov
Cc: drkp(at)csail(dot)mit(dot)edu, heikki(dot)linnakangas(at)enterprisedb(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-04-08 03:36:29
Message-ID: 20110408033629.F14CD19CF07@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

> YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp> wrote:
>
>> LOG: could not truncate directory "pg_serial": apparent
>> wraparound
>
> Did you get a warning with this text?:
>
> memory for serializable conflict tracking is nearly exhausted

there is not such a warning near the above "aparent wraparound" record.
not sure if it was far before the record as i've lost the older log files.

YAMAMOTO Takashi

>
> If not, there's some sort of cleanup bug to fix in the predicate
> locking's use of SLRU. It may be benign, but we won't really know
> until we find it. I'm investigating.
>
> -Kevin


From: yamt(at)mwd(dot)biglobe(dot)ne(dot)jp (YAMAMOTO Takashi)
To: drkp(at)csail(dot)mit(dot)edu
Cc: pgsql-hackers(at)postgresql(dot)org, Kevin(dot)Grittner(at)wicourts(dot)gov, heikki(dot)linnakangas(at)enterprisedb(dot)com
Subject: Re: SSI bug?
Date: 2011-04-11 03:57:19
Message-ID: 20110411035719.4839E19CF5D@mail.netbsd.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

hi,

> hi,
>
>> I think I see what is going on now. We are sometimes failing to set the
>> commitSeqNo correctly on the lock. In particular, if a lock assigned to
>> OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
>> cleared.
>>
>> The attached patch corrects this:
>> TransferPredicateLocksToNewTarget should initialize a new lock
>> entry's commitSeqNo to that of the old one being transferred, or take
>> the minimum commitSeqNo if it is merging two lock entries.
>>
>> Also, CreatePredicateLock should initialize commitSeqNo for to
>> InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
>> actually affect anything, but we should be consistent.)
>>
>> I also added a couple of assertions I used to track this down: a
>> lock's commitSeqNo should never be zero, and it should be
>> InvalidSerCommitSeqNo if and only if the lock is not held by
>> OldCommittedSxact.
>>
>> Takashi, does this patch fix your problem with leaked SIReadLocks?
>
> i'm currently running bf6848bc8c82e82f857d48185554bc3e6dcf1013 with this
> patch applied. i haven't seen the symptom yet. i'll keep it running for
> a while.

i haven't seen the symptom since them. so i guess it was fixed by
your patch. thanks!

YAMAMOTO Takashi

>
> btw, i've noticed the following message in the server log. is it normal?
>
> LOG: could not truncate directory "pg_serial": apparent wraparound
>
> YAMAMOTO Takashi
>
>>
>> Dan
>>
>>
>> --
>> Dan R. K. Ports MIT CSAIL http://drkp.net/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-04-11 08:33:06
Message-ID: 4DA2BCC2.7080903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 31.03.2011 22:06, Kevin Grittner wrote:
> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>
>> That's not enough. The hash tables can grow beyond the maximum
>> size you specify in ShmemInitHash. It's just a hint to size the
>> directory within the hash table.
>>
>> We'll need to teach dynahash not to allocate any more entries
>> after the preallocation. A new HASH_NO_GROW flag to hash_create()
>> seems like a suitable interface.
>
> OK. If we're doing that, is it worth taking a look at the "safety
> margin" added to the size calculations, and try to make the
> calculations more accurate?
>
> Would you like me to code a patch for this?

I finally got around to look at this. Attached patch adds a
HASH_FIXED_SIZE flag, which disables the allocation of new entries after
the initial allocation. I believe we have consensus to make the
predicate lock hash tables fixed-size, so that there's no competition of
the slack shmem space between predicate lock structures and the regular
lock maanager.

I also noticed that there's a few hash_search(HASH_ENTER) calls in
predicate.c followed by check for a NULL result. But with HASH_ENTER,
hash_search never returns NULL, it throws an "out of shared memory"
error internally. I changed those calls to use HASH_ENTER_NULL, so you
now get the intended error message with the hint to raise
max_pred_locks_per_transaction.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
fixed-size-shmem-3.patch text/x-diff 5.3 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-04-11 08:38:30
Message-ID: 4DA2BE06.1060806@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.04.2011 11:33, Heikki Linnakangas wrote:
> I also noticed that there's a few hash_search(HASH_ENTER) calls in
> predicate.c followed by check for a NULL result. But with HASH_ENTER,
> hash_search never returns NULL, it throws an "out of shared memory"
> error internally. I changed those calls to use HASH_ENTER_NULL, so you
> now get the intended error message with the hint to raise
> max_pred_locks_per_transaction.

Oops, those were already fixed. Never mind.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Dan Ports <drkp(at)csail(dot)mit(dot)edu>, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: SSI bug?
Date: 2011-04-11 10:58:40
Message-ID: 4DA2DEE0.50801@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11.04.2011 11:33, Heikki Linnakangas wrote:
> On 31.03.2011 22:06, Kevin Grittner wrote:
>> Heikki Linnakangas<heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>
>>> That's not enough. The hash tables can grow beyond the maximum
>>> size you specify in ShmemInitHash. It's just a hint to size the
>>> directory within the hash table.
>>>
>>> We'll need to teach dynahash not to allocate any more entries
>>> after the preallocation. A new HASH_NO_GROW flag to hash_create()
>>> seems like a suitable interface.
>>
>> OK. If we're doing that, is it worth taking a look at the "safety
>> margin" added to the size calculations, and try to make the
>> calculations more accurate?
>>
>> Would you like me to code a patch for this?
>
> I finally got around to look at this. Attached patch adds a
> HASH_FIXED_SIZE flag, which disables the allocation of new entries after
> the initial allocation. I believe we have consensus to make the
> predicate lock hash tables fixed-size, so that there's no competition of
> the slack shmem space between predicate lock structures and the regular
> lock maanager.

Ok, committed that.

I left the safety margins in the size calculations alone for now.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Dan Ports <drkp(at)csail(dot)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org, YAMAMOTO Takashi <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: SSI bug?
Date: 2011-04-11 11:01:22
Message-ID: 4DA2DF82.1090407@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 03.04.2011 09:16, Dan Ports wrote:
> I think I see what is going on now. We are sometimes failing to set the
> commitSeqNo correctly on the lock. In particular, if a lock assigned to
> OldCommittedSxact is marked with InvalidSerCommitNo, it will never be
> cleared.
>
> The attached patch corrects this:
> TransferPredicateLocksToNewTarget should initialize a new lock
> entry's commitSeqNo to that of the old one being transferred, or take
> the minimum commitSeqNo if it is merging two lock entries.
>
> Also, CreatePredicateLock should initialize commitSeqNo for to
> InvalidSerCommitSeqNo instead of to 0. (I don't think using 0 would
> actually affect anything, but we should be consistent.)
>
> I also added a couple of assertions I used to track this down: a
> lock's commitSeqNo should never be zero, and it should be
> InvalidSerCommitSeqNo if and only if the lock is not held by
> OldCommittedSxact.
>

Thanks, committed this.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: "Dan Ports" <drkp(at)csail(dot)mit(dot)edu>, "YAMAMOTO Takashi" <yamt(at)mwd(dot)biglobe(dot)ne(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: SSI bug?
Date: 2011-04-11 15:16:15
Message-ID: 4DA2D4EF020000250003C634@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> I finally got around to look at this. Attached patch adds a
> HASH_FIXED_SIZE flag, which disables the allocation of new entries
> after the initial allocation. I believe we have consensus to make
> the predicate lock hash tables fixed-size, so that there's no
> competition of the slack shmem space between predicate lock
> structures and the regular lock maanager.

OK, I can see why you preferred this -- the existing exchange of
slack space with the HW lock tables remains unchanged this way, and
only the new tables for predicate locking have the stricter limits.
This makes it very unlikely to break current apps which might be
unknowingly relying on existing allocation behavior in the HW
locking area. Smart.

I hadn't picked up on your intent that the new flag would only be
used for the new tables, which is why it wasn't quite making sense
to me before.

Thanks!

-Kevin