From: | "Magnus Hagander" <mha(at)sollentuna(dot)net> |
---|---|
To: | "Jan de Visser" <jdevisser(at)digitalfairway(dot)com>, <pgsql-performance(at)postgresql(dot)org> |
Subject: | Re: Hanging queries on dual CPU windows |
Date: | 2006-03-10 15:11:00 |
Message-ID: | 6BCB9D8A16AC4241919521715F4D8BCEA35109@algol.sollentuna.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
> > > > I dunno
> > > >
> > > > > if you've got anything gdb-equivalent under Windows,
> but that's
> > > > > the first thing I'd be interested in ...
> > > >
> > > > Here ya go:
> > > >
> > > > http://www.devisser-siderius.com/stack1.jpg
> > > > http://www.devisser-siderius.com/stack2.jpg
> > > > http://www.devisser-siderius.com/stack3.jpg
> > > >
> > > > There are three threads in the process. I guess thread 1
> > > > (stack1.jpg) is the most interesting.
> > > >
> > > > I also noted that cranking up concurrency in my app
> reproduces the
> > > > problem in about 4 minutes ;-)
> >
> > Just reproduced again.
> >
> > > Actually, stack2 looks very interesting. Does it "stay stuck" in
> > > pg_queue_signal? That's really not supposed to happen.
> >
> > Yes it does.
>
> An update on that: There is actually *two* processes in this
> state, both hanging in pg_queue_signal. I've looked at the
> source of that, and the obvious candidate for hanging is
> EnterCriticalSection. I also found this:
>
> http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx
>
> where they say:
>
> "
> In addition, for Windows 2003, SP1, the EnterCriticalSection
> API has a subtle change that's intended tor resolve many of
> the lock convoy issues. Before
> Win2003 SP1, if 10 threads were blocked on
> EnterCriticalSection and all 10 threads had the same
> priority, then EnterCriticalSection would service those
> threads in a FIFO (first -in, first-out) basis. Starting in
> Windows 2003 SP1, the EnterCriticalSection will wake up a
> random thread from the waiting threads. If all the threads
> are doing the same thing (like a thread pool) this won't make
> much of a difference, but if the different threads are doing
> different work (like the critical section protecting a widely
> accessed object), this will go a long way towards removing
> lock convoy semantics.
> "
>
> Could it be they broke it when they did that????
In theory, yes, but it still seems a bit far fetched :-(
If you have the env to rebuild, can you try changing the order of the lines:
ResetEvent(pgwin32_signal_event);
LeaveCriticalSection(&pg_signal_crit_sec);
in backend/port/win32/signal.c
And if not, can you also try disabling the stats collector and see if that makes a difference. (Could be a workaround..)
//Magnus
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2006-03-10 15:15:49 | Re: pg_reset_stats + cache I/O % |
Previous Message | Jan de Visser | 2006-03-10 14:47:22 | Re: Hanging queries on dual CPU windows |