Re: Some bogus results from prairiedog

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Some bogus results from prairiedog
Date: 2014-07-24 12:18:50
Message-ID: CA+TgmoaGjfWA+Zz-D_mFRbLgiSXgcL3b8dUGkp1LqWUCXnORsQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jul 22, 2014 at 8:14 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Tue, Jul 22, 2014 at 12:24 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Anyway, to cut to the chase, the crash seems to be from this:
>>> TRAP: FailedAssertion("!(FastPathStrongRelationLocks->count[fasthashcode] > 0)", File: "lock.c", Line: 2957)
>>> So there is still something rotten in the fastpath lock logic.
>
>> Gosh, that sucks.
>
>> The inconstancy of this problem would seem to suggest some kind of
>> locking bug rather than a flat-out concurrency issue, but it looks to
>> me like everything relevant is marked volatile.
>
> I don't think that you need any big assumptions about machine-specific
> coding issues to spot the problem.

I don't think that I'm making what could be described as big
assumptions; I think we should fix and back-patch the PPC64 spinlock
change.

But...

> The assert in question is here:
>
> /*
> * Decrement strong lock count. This logic is needed only for 2PC.
> */
> if (decrement_strong_lock_count
> && ConflictsWithRelationFastPath(&lock->tag, lockmode))
> {
> uint32 fasthashcode = FastPathStrongLockHashPartition(hashcode);
>
> SpinLockAcquire(&FastPathStrongRelationLocks->mutex);
> Assert(FastPathStrongRelationLocks->count[fasthashcode] > 0);
> FastPathStrongRelationLocks->count[fasthashcode]--;
> SpinLockRelease(&FastPathStrongRelationLocks->mutex);
> }
>
> and it sure looks to me like that
> "ConflictsWithRelationFastPath(&lock->tag" is looking at the tag of the
> shared-memory lock object you just released. If someone else had managed
> to recycle that locktable entry for some other purpose, the
> ConflictsWithRelationFastPath call might incorrectly return true.
>
> I think s/&lock->tag/locktag/ would fix it, but maybe I'm missing
> something.

...this is probably the real cause of the failures we've actually been
seeing. I'll go back-patch that change.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-07-24 12:27:31 shm_mq bug
Previous Message Braunstein, Alan 2014-07-24 11:56:34 Re: Exporting Table-Specified BLOBs Only?