From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Some bogus results from prairiedog |
Date: | 2014-07-24 12:18:50 |
Message-ID: | CA+TgmoaGjfWA+Zz-D_mFRbLgiSXgcL3b8dUGkp1LqWUCXnORsQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Jul 22, 2014 at 8:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Anyway, to cut to the chase, the crash seems to be from this:
>>> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 0)", File: "lock.c", Line: 2957)
>>> So there is still something rotten in the fastpath lock logic.
>
>> Gosh, that sucks.
>
>> The inconstancy of this problem would seem to suggest some kind of
>> locking bug rather than a flat-out concurrency issue, but it looks to
>> me like everything relevant is marked volatile.
>
> I don't think that you need any big assumptions about machine-specific
> coding issues to spot the problem.
I don't think that I'm making what could be described as big
assumptions; I think we should fix and back-patch the PPC64 spinlock
change.
But...
> The assert in question is here:
>
> /*
> * Decrement strong lock count. This logic is needed only for 2PC.
> */
> if (decrement_strong_lock_count
> && ConflictsWithRelationFastPath(&lock->tag, lockmode))
> {
> uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
>
> SpinLockAcquire(&FastPathStrongRelationLocks->mutex);
> Assert(FastPathStrongRelationLocks->count[fasthashcode] > 0);
> FastPathStrongRelationLocks->count[fasthashcode]--;
> SpinLockRelease(&FastPathStrongRelationLocks->mutex);
> }
>
> and it sure looks to me like that
> "ConflictsWithRelationFastPath(&lock->tag" is looking at the tag of the
> shared-memory lock object you just released. If someone else had managed
> to recycle that locktable entry for some other purpose, the
> ConflictsWithRelationFastPath call might incorrectly return true.
>
> I think s/&lock->tag/locktag/ would fix it, but maybe I'm missing
> something.
...this is probably the real cause of the failures we've actually been
seeing. I'll go back-patch that change.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2014-07-24 12:27:31 | shm_mq bug |
Previous Message | Braunstein, Alan | 2014-07-24 11:56:34 | Re: Exporting Table-Specified BLOBs Only? |