Re: BUG #2371: database crashes with semctl failed

Lists: pgsql-bugs
From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #2371: database crashes with semctl failed error
Date: 2006-04-10 20:00:36
Message-ID: 443A7314020000BE00002BCF@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Hi all,

We were bitten by this same bug over the weekend (PG 8.1.3 / Windows
Server 2003). The exact error was:

FATAL: semctl(170688872, 6, SETVAL, 0) failed: A non-blocking socket
operation could not be completed immediately.

The start of the errors corresponded to a nightly "vacuum analyze"
(both Saturday and Sunday) run. Things appeared to clear up after the
"vacuum analyze" completed.

One thing of note is that the semctl parameters were identical across
both nights (and a smaller incident Monday morning). The Monday morning
occurence was also somewhat odd in that not much should have been going
on then. Also, three other servers which faced identical update/insert
transaction streams did not have any trouble. The select load might
have been higher on the server that failed though.

One question I had:

In src/backend/port/win32/sema.c, semctl() is implemented in terms of a
call to semop(). However, the man page for semctl() doesn't list EAGAIN
and EINTR as possible error returns, whereas for semop() it does. Is
that just a mistake in the man page or a problem with the Win32
emulation call?

(See also
http://archives.postgresql.org/pgsql-bugs/2006-02/msg00233.php )

I'm afraid we're in the same category as everyone else with no good way
to reproduce the bug, but is there anything else we could do if this
happens again?

Pete


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #2371: database crashes with semctl failed error
Date: 2006-04-25 05:01:33
Message-ID: e2kam4$1ppc$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


""Peter Brant"" <Peter(dot)Brant(at)wicourts(dot)gov> wrote
>
> I'm afraid we're in the same category as everyone else with no good way
> to reproduce the bug, but is there anything else we could do if this
> happens again?
>

There is a "Win32 semaphore patch" in the patch list, but we are lack of
evidence to prove its usefulness. If you can try to apply it to your *test*
server (8.0.*, 8.1.* are all ok), that would be very helpful to see the
result.

Regards,
Qingqing


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #2371: database crashes with semctl failed
Date: 2006-04-25 14:46:39
Message-ID: 444DEFFF020000BE0000325F@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Sure.

I should note that we're moving to Linux for our production servers so
our interest in the Windows port is waning (at least for the time
being). In particular, the stuck WAL segment rename problem has
occasionally been rather a pain in the neck.

As long as we still have Windows test servers around though, it's easy
enough to apply a patch and load up the database to see if anything
interesting happens.

Pete

>>> "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> 04/25/06 7:01 am >>>
There is a "Win32 semaphore patch" in the patch list, but we are lack
of
evidence to prove its usefulness. If you can try to apply it to your
*test*
server (8.0.*, 8.1.* are all ok), that would be very helpful to see
the
result.


From: "Peter Brant" <Peter(dot)Brant(at)wicourts(dot)gov>
To: <Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu>,<pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #2371: database crashes with semctl failed
Date: 2006-05-01 16:50:50
Message-ID: 4455F61A020000BE000034CB@gwmta.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

With the patch applied, I let an inhouse stress test run for several
hours and it completed without incident. I also ran two runs of pgbench
with 50 connections x 1000 transactions and one run of 50 connections x
5000 transactions. All completed successfully. (Test server is a dual
Xeon with HyperThreading enabled, Windows Server 2003, PG 8.1.3).

Pete

>>> "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> 04/25/06 7:01 am >>>
There is a "Win32 semaphore patch" in the patch list, but we are lack
of
evidence to prove its usefulness. If you can try to apply it to your
*test*
server (8.0.*, 8.1.* are all ok), that would be very helpful to see
the
result.