Win XP SP2 SMP locking (8.1.4)

Lists: pgsql-hackers
From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-05 17:40:36
Message-ID: Pine.GSO.4.63.0610052111570.18168@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi there,

I'm looking into strange locking, which happens on WinXP SP2 SMP
machine running 8.1.4 with stats_row_level=on. This is the only
combination (# of cpu and stats_row_level) which has problem -
SMP + stats_row_level.

The same test runs fine with one cpu (restarted machine with /numproc=1)
disregarding to stats_row_level option.

Customer's application loads data into database and sometimes process
stopped, no cpu, no io activity. PgAdmin shows current query is 'COMMIT'.
I tried to attach gdb to postgres and client processes, but backtrace looks
useless (see below). Running vacuum analyze of this database in separate
process cause loading process to continue ! Weird.

It's interesting, that there is no problem with 8.2beta1 in all
combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
affect the problem ?

postgres.exe:

(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x019effd0 in ?? ()
#6 0xf784e548 in ?? ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
address 0x19f0000

application:
(gdb) bt
#0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
C:\WINDOWS\system32\ntdll.dll
#1 0x7c9507a8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#2 0x00000005 in ?? ()
#3 0x00000004 in ?? ()
#4 0x00000001 in ?? ()
#5 0x0196ffd0 in ?? ()
#6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
#7 0xffffffff in ?? ()
#8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
#9 0x7c9507c8 in ntdll!KiIntSystemCall () from
C:\WINDOWS\system32\ntdll.dll
#10 0x00000000 in ?? () from
#11 0x00000000 in ?? () from
#12 0x00000000 in ?? () from
#13 0x00000000 in ?? () from
(gdb) Cannot access memory at address 0x1970000

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-05 17:44:41
Message-ID: 45254489.9030304@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> It's interesting, that there is no problem with 8.2beta1 in all
> combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
> affect the problem ?

What do you mean locking? Do you mean the postgresql process locks up?
E.g; can you still connect to PostgreSQL from another connection? If not
is there an error?

Joshua D. Drake

>
>
> postgres.exe:
>
> (gdb) bt
> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
> C:\WINDOWS\system32\ntdll.dll
> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #2 0x00000005 in ?? ()
> #3 0x00000004 in ?? ()
> #4 0x00000001 in ?? ()
> #5 0x019effd0 in ?? ()
> #6 0xf784e548 in ?? ()
> #7 0xffffffff in ?? ()
> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
> ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
> address 0x19f0000
>
>
> application:
> (gdb) bt
> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
> C:\WINDOWS\system32\ntdll.dll
> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #2 0x00000005 in ?? ()
> #3 0x00000004 in ?? ()
> #4 0x00000001 in ?? ()
> #5 0x0196ffd0 in ?? ()
> #6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
> #7 0xffffffff in ?? ()
> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from
> C:\WINDOWS\system32\ntdll.dll
> #10 0x00000000 in ?? () from
> #11 0x00000000 in ?? () from
> #12 0x00000000 in ?? () from
> #13 0x00000000 in ?? () from
> (gdb) Cannot access memory at address 0x1970000
>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly
>

--

=== The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive PostgreSQL solutions since 1997
http://www.commandprompt.com/


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-05 17:53:45
Message-ID: Pine.GSO.4.63.0610052149280.18168@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 5 Oct 2006, Joshua D. Drake wrote:

>>
>> It's interesting, that there is no problem with 8.2beta1 in all
>> combinations ! Any idea what changes from 8.1.4 to 8.2beta1 could
>> affect the problem ?
>
> What do you mean locking? Do you mean the postgresql process locks up?
> E.g; can you still connect to PostgreSQL from another connection? If not
> is there an error?

It looks like application is waiting something from postgresql, but
postgresql thinks it did the job. vacuum analyze gets things moving.
I could connect to PostgreSQL from another connection, for example
pgAdmin still works with this database.

>
> Joshua D. Drake
>
>
>>
>>
>> postgres.exe:
>>
>> (gdb) bt
>> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
>> C:\WINDOWS\system32\ntdll.dll
>> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #2 0x00000005 in ?? ()
>> #3 0x00000004 in ?? ()
>> #4 0x00000001 in ?? ()
>> #5 0x019effd0 in ?? ()
>> #6 0xf784e548 in ?? ()
>> #7 0xffffffff in ?? ()
>> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
>> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #10 0x00000000 in ?? () from #11 0x00000000 in ?? () from #12 0x00000000 in
>> ?? () from #13 0x00000000 in ?? () from (gdb) Cannot access memory at
>> address 0x19f0000
>>
>>
>> application:
>> (gdb) bt
>> #0 0x7c901231 in ntdll!DbgUiConnectToDbg () from
>> C:\WINDOWS\system32\ntdll.dll
>> #1 0x7c9507a8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #2 0x00000005 in ?? ()
>> #3 0x00000004 in ?? ()
>> #4 0x00000001 in ?? ()
>> #5 0x0196ffd0 in ?? ()
>> #6 0x7c97c0d8 in ntdll!NtAccessCheckByTypeResultListAndAuditAlarm ()
>> #7 0xffffffff in ?? ()
>> #8 0x7c90ee18 in strchr () from C:\WINDOWS\system32\ntdll.dll
>> #9 0x7c9507c8 in ntdll!KiIntSystemCall () from
>> C:\WINDOWS\system32\ntdll.dll
>> #10 0x00000000 in ?? () from
>> #11 0x00000000 in ?? () from
>> #12 0x00000000 in ?? () from
>> #13 0x00000000 in ?? () from
>> (gdb) Cannot access memory at address 0x1970000
>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
>> message can get through to the mailing list cleanly
>>
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>, "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-05 17:59:06
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA0FC23@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hi there,
>
> I'm looking into strange locking, which happens on WinXP SP2
> SMP machine running 8.1.4 with stats_row_level=on. This is
> the only combination (# of cpu and stats_row_level) which has
> problem - SMP + stats_row_level.
>
> The same test runs fine with one cpu (restarted machine with
> /numproc=1) disregarding to stats_row_level option.
>
> Customer's application loads data into database and sometimes
> process stopped, no cpu, no io activity. PgAdmin shows
> current query is 'COMMIT'.
> I tried to attach gdb to postgres and client processes, but
> backtrace looks useless (see below). Running vacuum analyze
> of this database in separate process cause loading process to
> continue ! Weird.
>
> It's interesting, that there is no problem with 8.2beta1 in
> all combinations ! Any idea what changes from 8.1.4 to
> 8.2beta1 could affect the problem ?

There is a new implementations of semaphores in 8.2. That could possibly
be it.

//Magnus


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-06 09:15:11
Message-ID: Pine.GSO.4.63.0610061313280.18168@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 5 Oct 2006, Magnus Hagander wrote:

>> Hi there,
>>
>> I'm looking into strange locking, which happens on WinXP SP2
>> SMP machine running 8.1.4 with stats_row_level=on. This is
>> the only combination (# of cpu and stats_row_level) which has
>> problem - SMP + stats_row_level.
>>
>> The same test runs fine with one cpu (restarted machine with
>> /numproc=1) disregarding to stats_row_level option.
>>
>> Customer's application loads data into database and sometimes
>> process stopped, no cpu, no io activity. PgAdmin shows
>> current query is 'COMMIT'.
>> I tried to attach gdb to postgres and client processes, but
>> backtrace looks useless (see below). Running vacuum analyze
>> of this database in separate process cause loading process to
>> continue ! Weird.
>>
>> It's interesting, that there is no problem with 8.2beta1 in
>> all combinations ! Any idea what changes from 8.1.4 to
>> 8.2beta1 could affect the problem ?
>
> There is a new implementations of semaphores in 8.2. That could possibly
> be it.

I backported them to REL8_1_STABLE but it doesn't helped. Any other idea
what to do, or how to debug the situation ?

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
Cc: "Pgsql Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Win XP SP2 SMP locking (8.1.4)
Date: 2006-10-06 09:59:52
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA357A2@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> >> I'm looking into strange locking, which happens on WinXP SP2 SMP
> >> machine running 8.1.4 with stats_row_level=on. This is the only
> >> combination (# of cpu and stats_row_level) which has problem -
> SMP +
> >> stats_row_level.
> >>
> >> The same test runs fine with one cpu (restarted machine with
> >> /numproc=1) disregarding to stats_row_level option.
> >>
> >> Customer's application loads data into database and sometimes
> process
> >> stopped, no cpu, no io activity. PgAdmin shows current query is
> >> 'COMMIT'.
> >> I tried to attach gdb to postgres and client processes, but
> backtrace
> >> looks useless (see below). Running vacuum analyze of this
> database in
> >> separate process cause loading process to continue ! Weird.
> >>
> >> It's interesting, that there is no problem with 8.2beta1 in all
> >> combinations ! Any idea what changes from 8.1.4 to
> >> 8.2beta1 could affect the problem ?
> >
> > There is a new implementations of semaphores in 8.2. That could
> > possibly be it.
>
> I backported them to REL8_1_STABLE but it doesn't helped. Any other
> idea what to do, or how to debug the situation ?

Unfortunatly, the debugger support for mingw is absolutely horrible. But
you can try process explorer from www.sysinternals.com and see if it'll
give you a decent backtrace. Sometimes it works when others don't.
Either that, or try the Visual Studio or Windows debuggers, they can
usually at least show you if it's stuck waiting on something in the
kernel.

//Magnus