Re: Any MIPS assembly experts in the house?

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Martin Pitt <martin(at)piware(dot)de>
Subject: Any MIPS assembly experts in the house?
Date: 2005-08-26 18:59:52
Message-ID: 16583.1125082792@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I see the latest buildfarm result from a mipsel machine is failing:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-26%2005:30:07

and the failure is this:

TRAP: FailedAssertion("!(lock->shared > 0)", File: "lwlock.c", Line: 456)
LOG: server process (PID 10112) was terminated by signal 6

which makes it seem highly probable that this recently committed patch
to convert the MIPS out-of-line spinlock code into inline assembler
isn't right:
http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php

Can anyone spot the problem? If not I fear we'll have to revert this.

Also, upon looking more closely at the new inline code, it looks like
it's designed to *loop* if it fails to get the lock, which is *wrong*.
The assembly fragment should test once and fall out. This doesn't
explain the regression test failure, it would just cause stuck-lock
detection to not work.

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Any MIPS assembly experts in the house?
Date: 2005-08-26 20:09:21
Message-ID: 430F76F1.9010404@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I see the latest buildfarm result from a mipsel machine is failing:
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-26%2005:30:07
>
> and the failure is this:
>
> TRAP: FailedAssertion("!(lock->shared > 0)", File: "lwlock.c", Line: 456)
> LOG: server process (PID 10112) was terminated by signal 6
>
> which makes it seem highly probable that this recently committed patch
> to convert the MIPS out-of-line spinlock code into inline assembler
> isn't right:
> http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php
>
> Can anyone spot the problem? If not I fear we'll have to revert this.

As the owner of said machine I was about to report the problem - but on
a subsequent run of the buildfarm-script(to get access to the compiled
source for further debugging and testing) it completed without an error.

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Any MIPS assembly experts in the house?
Date: 2005-08-26 20:20:49
Message-ID: 22443.1125087649@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> Tom Lane wrote:
>> which makes it seem highly probable that this recently committed patch
>> to convert the MIPS out-of-line spinlock code into inline assembler
>> isn't right:
>> http://archives.postgresql.org/pgsql-committers/2005-08/msg00319.php

> As the owner of said machine I was about to report the problem - but on
> a subsequent run of the buildfarm-script(to get access to the compiled
> source for further debugging and testing) it completed without an error.

Unfortunately, that just says that the failure is intermittent, which is
pretty much exactly what you'd expect from not-quite-right locking code.
Try running the parallel regression tests several times in a row. (The
serial test mode is unlikely to exhibit any problem.) I tend to do this
by installing the test postmaster and then doing "make
installcheck-parallel" over and over, rather than repeating "make
check"; repeatedly building a test installation is just wasting time
for this sort of thing.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Martin Pitt <martin(at)piware(dot)de>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Subject: Re: Any MIPS assembly experts in the house?
Date: 2005-08-26 21:20:50
Message-ID: 23189.1125091250@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Can anyone spot the problem? If not I fear we'll have to revert this.

After a bit of reading MIPS documentation, I found out that the proposed
patch is exactly backward: it returns 1 if it gets the lock and 0 if the
lock is already held :-(

Because callers will loop on a nonzero return, the second iteration
falls through, which is why the thing isn't an infinite loop. Only
problem is when we hit the lock at an instant when somebody else
already has it.

Given the short duration of our spinlock holds, it was probably quite
a coincidence that Stefan's machine got a failure almost immediately.
We might have had the problem lurking for awhile.

I'll try to commit something that really works in a little bit.

regards, tom lane


From: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgreSQL(dot)org, Martin Pitt <martin(at)piware(dot)de>
Subject: Re: Any MIPS assembly experts in the house?
Date: 2005-08-27 09:07:21
Message-ID: 43102D49.1020305@kaltenbrunner.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I wrote:
>
>>Can anyone spot the problem? If not I fear we'll have to revert this.
>
>
> After a bit of reading MIPS documentation, I found out that the proposed
> patch is exactly backward: it returns 1 if it gets the lock and 0 if the
> lock is already held :-(
>
> Because callers will loop on a nonzero return, the second iteration
> falls through, which is why the thing isn't an infinite loop. Only
> problem is when we hit the lock at an instant when somebody else
> already has it.
>
> Given the short duration of our spinlock holds, it was probably quite
> a coincidence that Stefan's machine got a failure almost immediately.
> We might have had the problem lurking for awhile.
>
> I'll try to commit something that really works in a little bit.

well not sure if that counts as "really works" :-)

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-27%2006:33:05

Stefan


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>
Cc: pgsql-hackers(at)postgresql(dot)org, Martin Pitt <martin(at)piware(dot)de>
Subject: Re: Any MIPS assembly experts in the house?
Date: 2005-08-27 16:25:03
Message-ID: 1974.1125159903@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc> writes:
> well not sure if that counts as "really works" :-)
> http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=lionfish&dt=2005-08-27%2006:33:05

Nope, apparently the example I looked at was wrong about how to write
constants in MIPS assembler. Sigh. Newer version committed.

regards, tom lane