Re: Hanging queries on dual CPU windows

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Jan de Visser" <jdevisser(at)digitalfairway(dot)com>, <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Hanging queries on dual CPU windows
Date: 2006-03-10 15:11:00
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA35109@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

> > > >  I dunno
> > > >
> > > > > if you've got anything gdb-equivalent under Windows,
> but that's
> > > > > the first thing I'd be interested in ...
> > > >
> > > > Here ya go:
> > > >
> > > > http://www.devisser-siderius.com/stack1.jpg
> > > > http://www.devisser-siderius.com/stack2.jpg
> > > > http://www.devisser-siderius.com/stack3.jpg
> > > >
> > > > There are three threads in the process. I guess thread 1
> > > > (stack1.jpg) is the most interesting.
> > > >
> > > > I also noted that cranking up concurrency in my app
> reproduces the
> > > > problem in about 4 minutes ;-)
> >
> > Just reproduced again.
> >
> > > Actually, stack2 looks very interesting. Does it "stay stuck" in
> > > pg_queue_signal? That's really not supposed to happen.
> >
> > Yes it does.
>
> An update on that: There is actually *two* processes in this
> state, both hanging in pg_queue_signal. I've looked at the
> source of that, and the obvious candidate for hanging is
> EnterCriticalSection. I also found this:
>
> http://blogs.msdn.com/larryosterman/archive/2005/03/02/383685.aspx
>
> where they say:
>
> "
> In addition, for Windows 2003, SP1, the EnterCriticalSection
> API has a subtle change that's intended tor resolve many of
> the lock convoy issues. Before
> Win2003 SP1, if 10 threads were blocked on
> EnterCriticalSection and all 10 threads had the same
> priority, then EnterCriticalSection would service those
> threads in a FIFO (first -in, first-out) basis. Starting in
> Windows 2003 SP1, the EnterCriticalSection will wake up a
> random thread from the waiting threads. If all the threads
> are doing the same thing (like a thread pool) this won't make
> much of a difference, but if the different threads are doing
> different work (like the critical section protecting a widely
> accessed object), this will go a long way towards removing
> lock convoy semantics.
> "
>
> Could it be they broke it when they did that????

In theory, yes, but it still seems a bit far fetched :-(

If you have the env to rebuild, can you try changing the order of the lines:
ResetEvent(pgwin32_signal_event);
LeaveCriticalSection(&pg_signal_crit_sec);

in backend/port/win32/signal.c

And if not, can you also try disabling the stats collector and see if that makes a difference. (Could be a workaround..)

//Magnus

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2006-03-10 15:15:49 Re: pg_reset_stats + cache I/O %
Previous Message Jan de Visser 2006-03-10 14:47:22 Re: Hanging queries on dual CPU windows