Re: Stuck Spinlock Error Message

Lists: pgsql-admin
From: Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com>
To: PostgreSQL Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Stuck Spinlock Error Message
Date: 2003-07-26 08:17:23
Message-ID: 20030726081723.8152.qmail@web21601.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi:

I notice the following error message in my
postgresql log file:

FATAL : s_lock (0x401db020) at lwlock.c Stuck
spinlock. Aborting

Version of my postgresql :

PostgreSQL 7.2.3 on i686-pc-linux-gnu compiled by
GCC 2.96

Operating System : RedHat 7.1

What can cause a stuck spinlock?

Thanks in advance,
ludwig lim

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com>
Cc: PostgreSQL Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Stuck Spinlock Error Message
Date: 2003-08-01 00:38:02
Message-ID: 2213.1059698282@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com> writes:
> What can cause a stuck spinlock?

In theory, that shouldn't ever happen. Can you reproduce it?

regards, tom lane


From: Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Stuck Spinlock Error Message
Date: 2003-08-05 03:55:12
Message-ID: 20030805035512.67568.qmail@web21603.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi:

--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > What can cause a stuck spinlock?
> In theory, that shouldn't ever happen. Can you
> reproduce it?
> > regards, tom lane

I could not reproduce it, but I'll describe how
error happen. I have a program that read a file large
file which 20,000 records and spawn a process that
execute a PLPGSQL stored function based on the content
of the file.

The following is a table of the SQL statement
generated:
process 1 SELECT f1(120, 123.3);
process 2 SELECT f1(120, 53.3);
process 3 SELECT f1(120, 31.3);
..
..
process n SELECT f1(120, 2.3);

the function f1 is basically defined as
CREATE OR REPLACE FUNCTION f1(integer, float8)
RETURN INTEGER
AS'
DECLARE
-- some variable declaration BEGIN
-- Lock the table based on the first parameter
-- of the stored function (use record level lock)
SELECT *
FROM t1
WHERE field1 = $1
FOR UPDATE;
--a batch of SQL statements here --
END;'
LANGUAGE 'plpgsql';

As you noticed the the first parameter of the called
function is the same (Due to bug on our program).
Since it performs a record level lock on the record,
the processes will queue (i.e. will execute if only a
process relinquish its lock). I'm guessing that the
there was just to many postmaster process trying to
concurrently trying to access the same record being
lock by a record-lock. When I execute the "top"
command in linux there are a lot of postmaster process
in the process list

Is the spinlock error possible given that scenario?
Is this error related to the following error messages:
fatal 2: cannot write block 3 of 16556/148333 blind
: too many open files in sysytem.

Note : I was able to correct the above error
messages by increasing the file-max parameter in the
"sysctl.conf".

I'm guessing that the spinlock error occurs after
there are around hundreds (or thousands) of queued
postmaster processes.

best regards,
ludwig

__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com>
Cc: PostgreSQL Mailing List <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Stuck Spinlock Error Message
Date: 2003-08-05 19:56:22
Message-ID: 12041.1060113382@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Ludwig Isaac Lim <ludz_lim(at)yahoo(dot)com> writes:
> I'm guessing that the spinlock error occurs after
> there are around hundreds (or thousands) of queued
> postmaster processes.

Thousands? How large is your max_connections parameter, anyway
(and do you really have big enough iron to support it)?

The stuck spinlock error implies that some work that should have
taken a fraction of a microsecond (namely the time to check and update
the internal state of an LWLock structure) took upwards of a minute.

Since the process holding the spinlock could lose the CPU, it's
certainly physically possible for the actual duration of holding the
spinlock to be much more than a microsecond. But the odds of losing
the CPU while holding the spinlock are not large, since it's held for
just a small number of instructions. And to get an actual "stuck
spinlock" failure would imply that the holding process didn't get
scheduled again for more than a minute (while some other process that
wanted the spinlock *did* get scheduled again --- repeatedly). I
suppose this is possible if your machine is sufficiently badly
overloaded.

regards, tom lane