Re: [PERFORM] Hanging queries on dual CPU windows

Lists: pgsql-hackers
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Jan de Visser" <jdevisser(at)digitalfairway(dot)com>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-12 14:40:19
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA0F856@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> > If so,
> > we could perhaps recode that part using a Mutex instead of
> a critical
> > section - since it's not a performance critical path, the
> difference
> > shouldn't be large. If I code up a patch for that, can you re-apply
> > SP1 and test it? Or is this a production system you can't
> really touch?
>
> I can do whatever the hell I want with it, so if you could
> cook up a patch that would be great.
>
> As a BTW: I reinstalled SP1 and turned stats collection off.
> That also seems to work, but is not really a solution since
> we want to use autovacuuming.

Ok, I've coded up a patch that changes the code to use a mutex instead.
Patch attached. You can get a precompiled postgres.exe at
http://www.hagander.net/download/postgres.exe_mutex.zip. You need to
copy this file to postmaster.exe as well - they are supposed to be
identical. It's based off a snapshot of 8.1-stable.

Looking a my system while testing this it still loooked like it was
hanging on that plac ein the code, even though I saw no problems. So I'm
not convinced we can actually trust the stacktrace from the non-default
threads. So I don't think this patch will actually work :-( But it's
worth a try.

(Oh, and I moved the thread over to -hackers, seems more correct at this
time)

//Magnus

Attachment Content-Type Size
mutex.patch application/octet-stream 4.4 KB

From: Jan de Visser <jdevisser(at)digitalfairway(dot)com>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-12 18:10:59
Message-ID: 200603121310.59861.jdevisser@digitalfairway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> > > If so,
> > > we could perhaps recode that part using a Mutex instead of
> >
> > a critical
> >
> > > section - since it's not a performance critical path, the
> >
> > difference
> >
> > > shouldn't be large. If I code up a patch for that, can you re-apply
> > > SP1 and test it? Or is this a production system you can't
> >
> > really touch?
> >
> > I can do whatever the hell I want with it, so if you could
> > cook up a patch that would be great.
> >
> > As a BTW: I reinstalled SP1 and turned stats collection off.
> > That also seems to work, but is not really a solution since
> > we want to use autovacuuming.
>
> Ok, I've coded up a patch that changes the code to use a mutex instead.
> Patch attached. You can get a precompiled postgres.exe at
> http://www.hagander.net/download/postgres.exe_mutex.zip. You need to
> copy this file to postmaster.exe as well - they are supposed to be
> identical. It's based off a snapshot of 8.1-stable.
>
> Looking a my system while testing this it still loooked like it was
> hanging on that plac ein the code, even though I saw no problems. So I'm
> not convinced we can actually trust the stacktrace from the non-default
> threads. So I don't think this patch will actually work :-( But it's
> worth a try.
>
> (Oh, and I moved the thread over to -hackers, seems more correct at this
> time)

Thanks Magnus,

I'll try tomorrow. Will let you know ASAP (8:30 EST I guess :).

If this doesn't work, how do we progress?

>
> //Magnus

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser(at)digitalfairway(dot)com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------


From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-13 05:38:58
Message-ID: dv30nr$1aaf$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


""Magnus Hagander"" <mha(at)sollentuna(dot)net> wrote
> Ok, I've coded up a patch that changes the code to use a mutex instead.

Are we asserting the problem is caused by the spinlock random wake-up order?
I am not sure why this would fix the problem. If my memory serves, a
critical section might be a problem if one process aborts unexpected while
it is inside. Other waiting processes can never have a chance to enter it
(also have no chance to handle SIGQUIT) -- so this patch may solve this.

There is another suspect in http://www.devisser-siderius.com/stack1.jpg,
i.e., process 3 does shmctl. I once filed a server core dump bug in win32 of
reporting WSAEWOULDBLOCK.
(http://archives.postgresql.org/pgsql-bugs/2006-02/msg00185.php). AFAICS, it
is actually an mistranslated EINTR. There seems some relation between these
issues, but I didn't come up with a complete theory of it.

Regards,
Qingqing


From: Jan de Visser <jdevisser(at)digitalfairway(dot)com>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-13 14:26:29
Message-ID: 200603130926.30080.jdevisser@digitalfairway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> Looking a my system while testing this it still loooked like it was
> hanging on that plac ein the code, even though I saw no problems. So I'm
> not convinced we can actually trust the stacktrace from the non-default
> threads. So I don't think this patch will actually work :-( But it's
> worth a try.

I'm afraid you're right. Hangs again :(

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser(at)digitalfairway(dot)com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------


From: Jan de Visser <jdevisser(at)digitalfairway(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PERFORM] Hanging queries on dual CPU windows
Date: 2006-03-13 15:32:03
Message-ID: 200603131032.03990.jdevisser@digitalfairway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 13 March 2006 09:26, Jan de Visser wrote:
> On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> > Looking a my system while testing this it still loooked like it was
> > hanging on that plac ein the code, even though I saw no problems. So I'm
> > not convinced we can actually trust the stacktrace from the non-default
> > threads. So I don't think this patch will actually work :-( But it's
> > worth a try.
>
> I'm afraid you're right. Hangs again :(

I now have the toolchain set up, so if you want me to try stuff, please let me
know. Resolving this is important to us.

On a whim, I replaced InitializeCriticalSection with
InitializeCriticalSectionAndSpinCount, since MSDN told me that would be
better for SMP. No joy.

jan

--
--------------------------------------------------------------
Jan de Visser                     jdevisser(at)digitalfairway(dot)com

                Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------