FATAL: semctl(1672698088, 12, SETVAL, 0) failed

Lists: pgsql-bugs
From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-02-22 03:17:51
Message-ID: dtglar$15tv$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I encountered an error when I fast shutdown 8.1.1 on Win2k:

FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation
was interrupted by a call to WSACancelBlockingCall.

A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
dig out the
original post from our web archives):

From: Niederland
Date: Tues, Dec 13 2005 9:49 am

2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A
non-blocking socket operation could not be completed immediately.

---

There are two problems here:

(1) Why a socket error?
In port/win32.h, we have

#undef EAGAIN
#undef EINTR
#define EINTR WSAEINTR
#define EAGAIN WSAEWOULDBLOCK

What's the rationale of doing so?

(2) What's happened here?
It may come from PGSemaphoreReset(), and win32 semop() looks like this:

ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
IPC_NOWAIT) ? 0 : INFINITE, TRUE);
...
else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
{
pgwin32_dispatch_queued_signals();
errno = EINTR;
}
else if (ret == WAIT_TIMEOUT)
errno = EAGAIN;

So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
by a TIMEOUT ... any ideas?

Regards,
Qingqing


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-02-28 19:05:02
Message-ID: 200602281905.k1SJ52b19021@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Qingqing Zhou wrote:
> I encountered an error when I fast shutdown 8.1.1 on Win2k:
>
> FATAL: semctl(1672698088, 12, SETVAL, 0) failed: A blocking operation
> was interrupted by a call to WSACancelBlockingCall.
>
> A similar error on 8.1/win2003 was reported on pgsql-general (sorry, I can't
> dig out the
> original post from our web archives):
>
> From: Niederland
> Date: Tues, Dec 13 2005 9:49 am
>
> 2005-12-12 20:30:00 FATAL: semctl(50884184, 15, SETVAL, 0) failed: A
> non-blocking socket operation could not be completed immediately.
>
> ---
>
> There are two problems here:
>
> (1) Why a socket error?
> In port/win32.h, we have
>
> #undef EAGAIN
> #undef EINTR
> #define EINTR WSAEINTR
> #define EAGAIN WSAEWOULDBLOCK
>
> What's the rationale of doing so?

We did this so that our code could refer to EINTR/EAGAIN without
port-specific tests.

> (2) What's happened here?
> It may come from PGSemaphoreReset(), and win32 semop() looks like this:
>
> ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
> IPC_NOWAIT) ? 0 : INFINITE, TRUE);
> ...
> else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
> {
> pgwin32_dispatch_queued_signals();
> errno = EINTR;
> }
> else if (ret == WAIT_TIMEOUT)
> errno = EAGAIN;
>
> So it seems the EINTR is caused by an incoming signal, the EAGAIN is caused
> by a TIMEOUT ... any ideas?

I looked at the documentation for the function:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp

and it isn't clear what return failure values it has. We certainly
could loop on WSAEINTR. Can you test it?

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-03-01 02:08:58
Message-ID: du2vtq$eu3$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


"Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote
> > In port/win32.h, we have
> >
> > #undef EAGAIN
> > #undef EINTR
> > #define EINTR WSAEINTR
> > #define EAGAIN WSAEWOULDBLOCK
> >
> > What's the rationale of doing so?
>
> We did this so that our code could refer to EINTR/EAGAIN without
> port-specific tests.
>

AFAICS, by doing so, the EINTR/EAGAIN will be translated into
WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
appropriate for the code not involving any socket stuff ... I think we need
a fix here.

> > (2) What's happened here?
> > It may come from PGSemaphoreReset(), and win32 semop() looks like this:
> >
> > ret = WaitForMultipleObjectsEx(2, wh, FALSE, (sops[0].sem_flg &
> > IPC_NOWAIT) ? 0 : INFINITE, TRUE);
> > ...
> > else if (ret == WAIT_OBJECT_0 + 1 || ret == WAIT_IO_COMPLETION)
> > {
> > pgwin32_dispatch_queued_signals();
> > errno = EINTR;
> > }
> > else if (ret == WAIT_TIMEOUT)
> > errno = EAGAIN;
> >
> > So it seems the EINTR is caused by an incoming signal, the EAGAIN is
caused
> > by a TIMEOUT ... any ideas?
>
> I looked at the documentation for the function:
>
>
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp
>
> and it isn't clear what return failure values it has. We certainly
> could loop on WSAEINTR. Can you test it?
>

Yeah, looking at other code of using semop(), we could plug in a loop in the
win32 semctl():

/* Quickly lock/unlock the semaphore (if we can) */
+ do
+ {
+ errStatus = semop(semId, &sops, 1);
+ } while (errStatus < 0 && errno == EINTR);

if (semop(semId, &sops, 1) < 0)
return -1;

But:
(1) The EINTR problem happens rather rare, so testing it is difficult;
(2) I would rather not doing the above changes before we understand what's
happened here, especially when we have seen a EAGAIN reported here.

Regards,
Qingqing


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-03-01 03:08:37
Message-ID: 200603010308.k2138bO09022@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Qingqing Zhou wrote:
>
> "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote
> > > In port/win32.h, we have
> > >
> > > #undef EAGAIN
> > > #undef EINTR
> > > #define EINTR WSAEINTR
> > > #define EAGAIN WSAEWOULDBLOCK
> > >
> > > What's the rationale of doing so?
> >
> > We did this so that our code could refer to EINTR/EAGAIN without
> > port-specific tests.
> >
>
> AFAICS, by doing so, the EINTR/EAGAIN will be translated into
> WSAINTR/WSAEWOULDBLOCK through *all* the backend code. That's seems not
> appropriate for the code not involving any socket stuff ... I think we need
> a fix here.

Uh, how do we handle it now? I thought we did just that.

> http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/waitformultipleobjectsex.asp
> >
> > and it isn't clear what return failure values it has. We certainly
> > could loop on WSAEINTR. Can you test it?
> >
>
> Yeah, looking at other code of using semop(), we could plug in a loop in the
> win32 semctl():
>
> /* Quickly lock/unlock the semaphore (if we can) */
> + do
> + {
> + errStatus = semop(semId, &sops, 1);
> + } while (errStatus < 0 && errno == EINTR);
>
> if (semop(semId, &sops, 1) < 0)
> return -1;
>
> But:
> (1) The EINTR problem happens rather rare, so testing it is difficult;
> (2) I would rather not doing the above changes before we understand what's
> happened here, especially when we have seen a EAGAIN reported here.

OK, so how do we find the answer?

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +


From: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-03-01 03:20:52
Message-ID: Pine.LNX.4.58.0602282216010.22190@eon.cs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, 28 Feb 2006, Bruce Momjian wrote:

>
> Uh, how do we handle it now? I thought we did just that.
>
> OK, so how do we find the answer?
>

For both problems, I am uncertain (or I've sent a patch already :-(). Call
more artillery support here ...

Regards,
Qingqing


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: FATAL: semctl(1672698088, 12, SETVAL, 0) failed
Date: 2006-03-06 15:20:20
Message-ID: 200603061520.k26FKKo07143@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


Thread added to TODO.detail for Win32:

o Check WSACancelBlockingCall() for interrupts (win32intr)

---------------------------------------------------------------------------

Qingqing Zhou wrote:
>
>
> On Tue, 28 Feb 2006, Bruce Momjian wrote:
>
> >
> > Uh, how do we handle it now? I thought we did just that.
> >
> > OK, so how do we find the answer?
> >
>
> For both problems, I am uncertain (or I've sent a patch already :-(). Call
> more artillery support here ...
>
> Regards,
> Qingqing
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +