Re: [PERFORM] Hanging queries on dual CPU windows

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-13 08:46:13
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA3510E@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> > Ok, I've coded up a patch that changes the code to use a
> mutex instead.
>
> Are we asserting the problem is caused by the spinlock random
> wake-up order?

Not asserting, more making a wild guess. Which I, as I said, no lnoger
really beleive in - but since the patch was already coded up it's worth
a try.

> I am not sure why this would fix the problem. If my memory
> serves, a critical section might be a problem if one process
> aborts unexpected while it is inside. Other waiting processes
> can never have a chance to enter it (also have no chance to
> handle SIGQUIT) -- so this patch may solve this.

A critical section only exists within a single process, so that realliy
doesn't apply. And if a thread crashes, the whole process exists.

> There is another suspect in
> http://www.devisser-siderius.com/stack1.jpg,
> i.e., process 3 does shmctl. I once filed a server core dump
> bug in win32 of reporting WSAEWOULDBLOCK.
> (http://archives.postgresql.org/pgsql-bugs/2006-02/msg00185.ph
> p). AFAICS, it is actually an mistranslated EINTR. There
> seems some relation between these issues, but I didn't come
> up with a complete theory of it.

There could well be. Except the link you sent pointed to a thread stuck
in pgwin32_waitforsinglesocket() insider pgwin32_send() - this is where
I beleive the problem is now.

I'm less-than-trusting the function names in the stacktrace after
examining some more. I'm suspecting process explorer can only see
non-static functions, and that the "pg_queue_signal+0x120" actually
points into a different function. (really, pg_queue_signal cannot
possibly be 0x120 bytes machine code..) I bet it's just in
pg_signal_thread(), which is a perfectlyi normal place to block. It also
matches the behaviour I see on a completely fresh backend - which also
shows that pg_queue_signal+0x120.

A good thing to test would be to rebuild signal.c and socket.c without
any functions declared as static and see if the picture changes. (If
nothing else it would confirm this behaviour in process explorer)

Mvh,
Magnus

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2006-03-13 10:21:49 Re: [COMMITTERS] pgsql: Remove Jan Wieck`s name from copyrights, and put in standard
Previous Message ITAGAKI Takahiro 2006-03-13 08:38:01 Re: [PATCHES] Automatic free space map filling