Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

Lists: pgsql-bugspgsql-hackers
From: "Marcin Waldowski" <M(dot)Waldowski(at)sulechow(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-19 09:43:03
Message-ID: 200704190943.l3J9h38o003664@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


The following bug has been logged online:

Bug reference: 3242
Logged by: Marcin Waldowski
Email address: M(dot)Waldowski(at)sulechow(dot)net
PostgreSQL version: 8.2.3 and 8.2.1
Operating system: Windows XP SP2
Description: FATAL: could not unlock semaphore: error code 298
Details:

Hello.

After some time of performace test of our aplication (50 concurrent database
connections making lots of quick transactions with prepared statements) we
found problem in PostgreSQL log - "could not unlock semaphore: error code
298". After that connections were hanged blocked on update operations.

We are investigating problem now. What another information should we
provide?

Log from 8.2.1
2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
2007-04-19 08:52:11 STATEMENT: COMMIT
2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state

Log from 8.2.3
2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
c = c + $3 where id = $4

Regards, Marcin


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-19 12:50:19
Message-ID: 4627658B.4020409@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

We've made some tests on Linux and it seems like it never happen on this
platform, but we use 8.1, not 8.2.

select version()
PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)

Regards, Marcin

Marcin Waldowski wrote:
> The following bug has been logged online:
>
> Bug reference: 3242
> Logged by: Marcin Waldowski
> Email address: M(dot)Waldowski(at)sulechow(dot)net
> PostgreSQL version: 8.2.3 and 8.2.1
> Operating system: Windows XP SP2
> Description: FATAL: could not unlock semaphore: error code 298
> Details:
>
> Hello.
>
> After some time of performace test of our aplication (50 concurrent database
> connections making lots of quick transactions with prepared statements) we
> found problem in PostgreSQL log - "could not unlock semaphore: error code
> 298". After that connections were hanged blocked on update operations.
>
> We are investigating problem now. What another information should we
> provide?
>
> Log from 8.2.1
> 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 08:52:11 STATEMENT: COMMIT
> 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state
>
> Log from 8.2.3
> 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
> c = c + $3 where id = $4
>
> Regards, Marcin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>
>


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 07:20:05
Message-ID: 462869A5.6080203@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Hello.

I've made some analysis of PostgreSQL code. It looks like void
PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was
executed one time more than needed.

Error code 298 means "Too many posts were made to a semaphore":
http://msdn2.microsoft.com/en-us/library/ms681382.aspx (sorry for
posting microsoft links ;))

Below is an example when it happens:
http://www.tech-archive.net/Archive/Development/microsoft.public.win32.programmer.kernel/2004-02/0406.html

If I understand it correctly it means that function ReleaseSemaphore
(http://msdn2.microsoft.com/en-us/library/ms685071.aspx) which is
executed from PGSemaphoreUnlock, was executed one time more than needed.

I'm afraid than problem may lie above win32_sema.c :(

Regards, Marcin

Marcin Waldowski wrote:
> The following bug has been logged online:
>
> Bug reference: 3242
> Logged by: Marcin Waldowski
> Email address: M(dot)Waldowski(at)sulechow(dot)net
> PostgreSQL version: 8.2.3 and 8.2.1
> Operating system: Windows XP SP2
> Description: FATAL: could not unlock semaphore: error code 298
> Details:
>
> Hello.
>
> After some time of performace test of our aplication (50 concurrent database
> connections making lots of quick transactions with prepared statements) we
> found problem in PostgreSQL log - "could not unlock semaphore: error code
> 298". After that connections were hanged blocked on update operations.
>
> We are investigating problem now. What another information should we
> provide?
>
> Log from 8.2.1
> 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 08:52:11 STATEMENT: COMMIT
> 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state
>
> Log from 8.2.3
> 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
> c = c + $3 where id = $4
>
> Regards, Marcin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>
>


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 07:51:10
Message-ID: 20070420075110.GA9033@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
> Hello.
>
> I've made some analysis of PostgreSQL code. It looks like void
> PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was
> executed one time more than needed.

Certainly looks that way.

I've looked at the code there, and can't find a clear problem. One way it
could happen is if the actual PGSemaphoreUnlock() is called once more than
needed.

CC:ing to hackers for this question:

Any chance that's happening? If this happens with SysV semaphores, will
they error out, or just say it was done and do nothing? (meaning should we
actuallyi be ignoring this error on windows?)

//Magnus


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 08:09:39
Message-ID: 46287543.6070501@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
> I've looked at the code there, and can't find a clear problem. One way it
> could happen is if the actual PGSemaphoreUnlock() is called once more than
> needed.
>
> CC:ing to hackers for this question:
>
> Any chance that's happening? If this happens with SysV semaphores, will
> they error out, or just say it was done and do nothing? (meaning should we
> actuallyi be ignoring this error on windows?)
>

Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
happens. As I mentioned previously after it happens others connections
were hung on update operations. What is strange we cannot reproduce this
problem on Linux. But we can do this on Windows. What another
information should we provide?

Regards, Marcin


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 08:12:14
Message-ID: 20070420081214.GB9033@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Fri, Apr 20, 2007 at 10:09:39AM +0200, Marcin Waldowski wrote:
> Magnus Hagander wrote:
> >I've looked at the code there, and can't find a clear problem. One way it
> >could happen is if the actual PGSemaphoreUnlock() is called once more than
> >needed.
> >
> >CC:ing to hackers for this question:
> >
> >Any chance that's happening? If this happens with SysV semaphores, will
> >they error out, or just say it was done and do nothing? (meaning should we
> >actuallyi be ignoring this error on windows?)
> >
>
> Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
> happens.

No. It does ereport(FATAL) which terminates the backend.

> As I mentioned previously after it happens others connections
> were hung on update operations. What is strange we cannot reproduce this
> problem on Linux. But we can do this on Windows. What another
> information should we provide?

Doesn't the postmaster restart all other backends due to the FATAL error?
Are you saying that you can no longer make new connections to the server,
or is the problem coming from that the aplpication doesn't like that the
server kicked out all connections?

If you can produce a self-contained test-case, that would certainly make
debugging a lot easier. So if it's possible - but I realise that might not
be easy for a problem like this :-)

//Magnus


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 08:41:59
Message-ID: 46287CD7.40706@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
>> Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
>> happens.
>>
>
> No. It does ereport(FATAL) which terminates the backend.
>
>

Oh, now I see, sorry :) Indeed on this one connection we receive
exception "FATAL: could not unlock semaphore", after that rollback
failed because of IO error during write to connection and that was
caused by "Connection reset by peer: socket write error".

>> As I mentioned previously after it happens others connections
>> were hung on update operations. What is strange we cannot reproduce this
>> problem on Linux. But we can do this on Windows. What another
>> information should we provide?
>>
>
> Doesn't the postmaster restart all other backends due to the FATAL error?
> Are you saying that you can no longer make new connections to the server,
> or is the problem coming from that the aplpication doesn't like that the
> server kicked out all connections?
>

No, we are sure that he didn't do that. As I mentioned above one
connection was terminated, but other ones were hung on update
operations. In this state it was possible to create new connection from
PGAdmin and do some select and update operations. In addition I can say
that we use only read-commited transactions and all operations are based
on prepared statemens which are reused.

> If you can produce a self-contained test-case, that would certainly make
> debugging a lot easier. So if it's possible - but I realise that might not
> be easy for a problem like this :-)
>

Our test case is our application, but unfortunately I cannot send it to
you. I will think about test case, but I need to find a time for writing
it :( I can reproduce error and provide all information you need from
PostgreSQL. Please instruct me what to do :)

Regards, Marcin


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 10:59:51
Message-ID: 46289D27.80008@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Marcin Waldowski wrote:
>>
>> Doesn't the postmaster restart all other backends due to the FATAL
>> error?
>> Are you saying that you can no longer make new connections to the
>> server,
>> or is the problem coming from that the aplpication doesn't like that the
>> server kicked out all connections?
>>
>
> No, we are sure that he didn't do that. As I mentioned above one
> connection was terminated, but other ones were hung on update
> operations. In this state it was possible to create new connection
> from PGAdmin and do some select and update operations. In addition I
> can say that we use only read-commited transactions and all operations
> are based on prepared statemens which are reused.

It may mean that PGSemaphoreUnlock(PGSemaphore sema) was executed for
unintended sema "object". That's why PGSemaphoreUnlock() for unintended
sema "object" failed and PGSemaphoreUnlock() for intended sema "object"
*never* happens. That would explain why other connections were hung on
update operations.

I think it sounds quite reasonable to be one of possibilities ;)

Regards, Marcin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 14:58:35
Message-ID: 29400.1177081115@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
>>> I've looked at the code there, and can't find a clear problem. One way it
>>> could happen is if the actual PGSemaphoreUnlock() is called once more than
>>> needed.

> CC:ing to hackers for this question:

> Any chance that's happening? If this happens with SysV semaphores, will
> they error out, or just say it was done and do nothing? (meaning should we
> actuallyi be ignoring this error on windows?)

How is it possible for a semaphore to be unlocked "too many times"?
It's supposed to be a running counter of the net V's minus P's, and
yes it had better be able to count higher than one. Have we chosen
the wrong Windows primitive to implement this?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 16:46:09
Message-ID: 4628EE51.70205@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
>>>> I've looked at the code there, and can't find a clear problem. One way it
>>>> could happen is if the actual PGSemaphoreUnlock() is called once more than
>>>> needed.
>
>> CC:ing to hackers for this question:
>
>> Any chance that's happening? If this happens with SysV semaphores, will
>> they error out, or just say it was done and do nothing? (meaning should we
>> actuallyi be ignoring this error on windows?)
>
> How is it possible for a semaphore to be unlocked "too many times"?
> It's supposed to be a running counter of the net V's minus P's, and
> yes it had better be able to count higher than one. Have we chosen
> the wrong Windows primitive to implement this?

No, it's definitly the right primitive. But we're creating it with a max
count of 1. Not sure if that's right or not, too tired to think straight
about that right now, but here's a summary:

* Object is "signalled" when count > 0.

* We create with an initial count of 1.

* Calling WaitFor...() decreases the count. We call waitFor() in
PGsemaphoreLock(). If count reaches zero, waitfor() will block.

* Calling ReleaseSemaphore() increases the count. If count leaves zero
for 1, a blocking waitfor() is released. If count ends up >1 (or
whatever the limit is set to), we get said error. We call
ReleaseSemaphore() in PGSemaphoreUnlock().

So basically this says we've called PGSemaphoreUnlock() more times than
we've called PGSemaphoreLock().

Should we be creating it with a higher maximum value, and that's it? (it
sounds like it, but I'm not entirely sure)

//Magnus


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 16:55:00
Message-ID: 7149.1177088100@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Tom Lane wrote:
>> How is it possible for a semaphore to be unlocked "too many times"?
>> It's supposed to be a running counter of the net V's minus P's, and
>> yes it had better be able to count higher than one. Have we chosen
>> the wrong Windows primitive to implement this?

> No, it's definitly the right primitive. But we're creating it with a max
> count of 1.

That's definitely wrong. There are at least three reasons for a PG
process's semaphore to be signaled (heavyweight lock release, LWLock
release, pin count waiter), and at least two of them can occur
concurrently (eg, if deadlock checker fires, it will need to take
LWLocks, but there's nothing saying that the original lock won't be
released while it waits for an LWLock).

The effective max count on Unixen is typically in the thousands,
and I'd suggest the same on Windows unless there's some efficiency
reason to keep it small (in which case, maybe ten would do).

I'm astonished that we've not seen this reported before. Has the
Windows sema code always been like that?

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 17:01:06
Message-ID: 4628F1D2.6050302@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> Tom Lane wrote:
>>> How is it possible for a semaphore to be unlocked "too many times"?
>>> It's supposed to be a running counter of the net V's minus P's, and
>>> yes it had better be able to count higher than one. Have we chosen
>>> the wrong Windows primitive to implement this?
>
>> No, it's definitly the right primitive. But we're creating it with a max
>> count of 1.
>
> That's definitely wrong. There are at least three reasons for a PG
> process's semaphore to be signaled (heavyweight lock release, LWLock
> release, pin count waiter), and at least two of them can occur
> concurrently (eg, if deadlock checker fires, it will need to take
> LWLocks, but there's nothing saying that the original lock won't be
> released while it waits for an LWLock).
>
> The effective max count on Unixen is typically in the thousands,
> and I'd suggest the same on Windows unless there's some efficiency
> reason to keep it small (in which case, maybe ten would do).

AFAIK there's no problem with huge numbers (it takes an int32, and the
documentation says nothing about a limit - I'm sure it's just a 32-bit
counter in the kernel). I'll give that a shot.

Marcin - can you test a source patch? Or should I try to build you a
binary for testing? It'd be good if you can confirm that it works before
we commit anything, I think.

> I'm astonished that we've not seen this reported before. Has the
> Windows sema code always been like that?

It could be an 8.2 problem, actually, since we had new semaphore code
there. Looking at
http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32/Attic/sema.c?rev=1.13;content-type=text%2Fx-cvsweb-markup,
it looks like we may have used a *semaphore* with just one as top, but
then kept a counter in userspace as well... (Haven't looked through the
details of the code, but it looks that way from a casual view)

//Magnus


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-20 17:34:31
Message-ID: 4628F9A7.70200@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
>>
>> The effective max count on Unixen is typically in the thousands,
>> and I'd suggest the same on Windows unless there's some efficiency
>> reason to keep it small (in which case, maybe ten would do).
>>
>
> AFAIK there's no problem with huge numbers (it takes an int32, and the
> documentation says nothing about a limit - I'm sure it's just a 32-bit
> counter in the kernel). I'll give that a shot.
>
>

Linux manpage suggests local max is 32767, so that's probably a good
value to try.

cheers

andrew


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-21 21:07:08
Message-ID: 462A7CFC.5070706@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Magnus Hagander wrote:
> Tom Lane wrote:
>
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>
>>> No, it's definitly the right primitive. But we're creating it with a max
>>> count of 1.
>>>
>> That's definitely wrong. There are at least three reasons for a PG
>> process's semaphore to be signaled (heavyweight lock release, LWLock
>> release, pin count waiter), and at least two of them can occur
>> concurrently (eg, if deadlock checker fires, it will need to take
>> LWLocks, but there's nothing saying that the original lock won't be
>> released while it waits for an LWLock).
>>
>> The effective max count on Unixen is typically in the thousands,
>> and I'd suggest the same on Windows unless there's some efficiency
>> reason to keep it small (in which case, maybe ten would do).
>>
>
> AFAIK there's no problem with huge numbers (it takes an int32, and the
> documentation says nothing about a limit - I'm sure it's just a 32-bit
> counter in the kernel). I'll give that a shot.
>

Magnus, Tom, thank you for finding what causes the problem :) I hope
that was also a reason why other transactions were hung (because that is
a prior, I think).

> Marcin - can you test a source patch? Or should I try to build you a
> binary for testing? It'd be good if you can confirm that it works before
> we commit anything, I think.
>

Of course I will check fix :) I will be able to do tests on monday. I
think source path should be enought, despite I've newer build PostgreSQL
on Windows (I definitely should try). If i have problems then I will ask
you for binary.

Regards, Marcin


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-bugs(at)postgresql(dot)org, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-22 18:52:18
Message-ID: 462BAEE2.2040904@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Marcin Waldowski wrote:
> Magnus Hagander wrote:
>> Tom Lane wrote:
>>
>>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>>
>>>> No, it's definitly the right primitive. But we're creating it with a
>>>> max
>>>> count of 1.
>>>>
>>> That's definitely wrong. There are at least three reasons for a PG
>>> process's semaphore to be signaled (heavyweight lock release, LWLock
>>> release, pin count waiter), and at least two of them can occur
>>> concurrently (eg, if deadlock checker fires, it will need to take
>>> LWLocks, but there's nothing saying that the original lock won't be
>>> released while it waits for an LWLock).
>>>
>>> The effective max count on Unixen is typically in the thousands,
>>> and I'd suggest the same on Windows unless there's some efficiency
>>> reason to keep it small (in which case, maybe ten would do).
>>>
>>
>> AFAIK there's no problem with huge numbers (it takes an int32, and the
>> documentation says nothing about a limit - I'm sure it's just a 32-bit
>> counter in the kernel). I'll give that a shot.
>>
>
> Magnus, Tom, thank you for finding what causes the problem :) I hope
> that was also a reason why other transactions were hung (because that is
> a prior, I think).
>
>> Marcin - can you test a source patch? Or should I try to build you a
>> binary for testing? It'd be good if you can confirm that it works before
>> we commit anything, I think.
>>
>
> Of course I will check fix :) I will be able to do tests on monday. I
> think source path should be enought, despite I've newer build PostgreSQL
> on Windows (I definitely should try). If i have problems then I will ask
> you for binary.

Great, please try the attached trivial patch.

//Magnus

Attachment Content-Type Size
win32_semaphore.patch text/plain 775 bytes

From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: magnus(at)hagander(dot)net
Cc: pgsql-bugs(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-23 09:42:20
Message-ID: 462C7F7C.2020900@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


> The following bug has been logged online:
>
> Bug reference: 3242
> Logged by: Marcin Waldowski
> Email address: M(dot)Waldowski(at)sulechow(dot)net
> PostgreSQL version: 8.2.3 and 8.2.1
> Operating system: Windows XP SP2
> Description: FATAL: could not unlock semaphore: error code 298
> Details:
>

Magnus,

I have applied your patch on 8.2.3 source and built it in mingw
environment (with default settings). After 3 hours of performance test
nothing happened, log is clear :) In previous test error occur always
within first hour (most often within first 20 minutes), so I can
initialy confirm that this fix works. We will also leave test running
for the night to be absolutely sure that it's ok. I will inform you
about the result.

Thanks again, Marcin


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-23 16:44:54
Message-ID: 462CE286.2000101@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Marcin Waldowski wrote:
>
>> The following bug has been logged online:
>>
>> Bug reference: 3242
>> Logged by: Marcin Waldowski
>> Email address: M(dot)Waldowski(at)sulechow(dot)net
>> PostgreSQL version: 8.2.3 and 8.2.1
>> Operating system: Windows XP SP2
>> Description: FATAL: could not unlock semaphore: error code 298
>> Details:
>
> Magnus,
>
> I have applied your patch on 8.2.3 source and built it in mingw
> environment (with default settings). After 3 hours of performance test
> nothing happened, log is clear :) In previous test error occur always
> within first hour (most often within first 20 minutes), so I can
> initialy confirm that this fix works. We will also leave test running
> for the night to be absolutely sure that it's ok. I will inform you
> about the result.

Great!

I'll await your confirmation tomorrow before applying the patch.

//Magnus


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: magnus(at)hagander(dot)net
Cc: pgsql-bugs(at)postgresql(dot)org, tgl(at)sss(dot)pgh(dot)pa(dot)us
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-24 06:11:36
Message-ID: 462D9F98.2030700@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers


> Magnus,
>
> I have applied your patch on 8.2.3 source and built it in mingw
> environment (with default settings). After 3 hours of performance test
> nothing happened, log is clear :) In previous test error occur always
> within first hour (most often within first 20 minutes), so I can
> initialy confirm that this fix works. We will also leave test running
> for the night to be absolutely sure that it's ok. I will inform you
> about the result.
>
> Thanks again, Marcin
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>

Ok, that's the end of the story :) My workmate has confirmed that during
night test PostgreSQL worked perfectly :)

Regards, Marcin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: magnus(at)hagander(dot)net, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-24 06:25:41
Message-ID: 24451.1177395941@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net> writes:
> Ok, that's the end of the story :) My workmate has confirmed that during
> night test PostgreSQL worked perfectly :)

That fix was Obviously Necessary even without testing -- you're being
too conservative about not applying it, Magnus.

regards, tom lane


From: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, magnus(at)hagander(dot)net
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-24 07:10:12
Message-ID: 462DAD54.2040008@sulechow.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Tom Lane wrote:
> Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net> writes:
>
>> Ok, that's the end of the story :) My workmate has confirmed that during
>> night test PostgreSQL worked perfectly :)
>>
>
> That fix was Obviously Necessary even without testing -- you're being
> too conservative about not applying it, Magnus.
>
> regards, tom lane
>

I was thinking that this fix will be included in 8.2.4 :) Now I know
that it's too late (8.2.4 is released). Now I trying to find what
conditions causes to realease new version of PostgreSQL (with fixes).
Does any document exist about that?

I just need to know when can I expect 8.2.5 with this fix
(approximately) :/ Thanks in advance.

Regards, Marcin


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>
Cc: magnus(at)hagander(dot)net, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-24 07:38:47
Message-ID: 24944.1177400327@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net> writes:
> I was thinking that this fix will be included in 8.2.4 :) Now I know
> that it's too late (8.2.4 is released). Now I trying to find what
> conditions causes to realease new version of PostgreSQL (with fixes).
> Does any document exist about that?

No, there's no fixed policy about that, because every time it's
different.

We don't normally like to release updates oftener than once every couple
of months --- the overhead of an update, for committers, packagers, and
users, is just too high to do it oftener than we must.

As far as this specific issue goes, I don't feel a need for a prompt
update, because (a) it's platform-specific, and (b) it's extremely
infrequent, as demonstrated by the lack of prior complaints.

But it's not unlikely that a new release might happen PDQ, once
we have figured out all the implications of this problem:
http://archives.postgresql.org/pgsql-bugs/2007-04/msg00129.php
That one's looking pretty dang ugly from here.

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net>, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #3242: FATAL: could not unlock semaphore: error code 298
Date: 2007-04-24 12:26:02
Message-ID: 20070424122602.GA15856@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs pgsql-hackers

On Tue, Apr 24, 2007 at 02:25:41AM -0400, Tom Lane wrote:
> Marcin Waldowski <M(dot)Waldowski(at)sulechow(dot)net> writes:
> > Ok, that's the end of the story :) My workmate has confirmed that during
> > night test PostgreSQL worked perfectly :)
>
> That fix was Obviously Necessary even without testing -- you're being
> too conservative about not applying it, Magnus.

Well, better safe than sorry :-)

Committed now. Thanks for testing, even though Tom claims it's unnecessary
:-P

//Magnus