Lists: | pgsql-hackers |
---|
From: | "Magnus Hagander" <mha(at)sollentuna(dot)net> |
---|---|
To: | "Jan de Visser" <jdevisser(at)digitalfairway(dot)com> |
Cc: | "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Hanging queries on dual CPU windows |
Date: | 2006-03-12 14:40:19 |
Message-ID: | 6BCB9D8A16AC4241919521715F4D8BCEA0F856@algol.sollentuna.se |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
> > If so,
> > we could perhaps recode that part using a Mutex instead of
> a critical
> > section - since it's not a performance critical path, the
> difference
> > shouldn't be large. If I code up a patch for that, can you re-apply
> > SP1 and test it? Or is this a production system you can't
> really touch?
>
> I can do whatever the hell I want with it, so if you could
> cook up a patch that would be great.
>
> As a BTW: I reinstalled SP1 and turned stats collection off.
> That also seems to work, but is not really a solution since
> we want to use autovacuuming.
Ok, I've coded up a patch that changes the code to use a mutex instead.
Patch attached. You can get a precompiled postgres.exe at
http://www.hagander.net/download/postgres.exe_mutex.zip. You need to
copy this file to postmaster.exe as well - they are supposed to be
identical. It's based off a snapshot of 8.1-stable.
Looking a my system while testing this it still loooked like it was
hanging on that plac ein the code, even though I saw no problems. So I'm
not convinced we can actually trust the stacktrace from the non-default
threads. So I don't think this patch will actually work :-( But it's
worth a try.
(Oh, and I moved the thread over to -hackers, seems more correct at this
time)
//Magnus
Attachment | Content-Type | Size |
---|---|---|
mutex.patch | application/octet-stream | 4.4 KB |
From: | Jan de Visser <jdevisser(at)digitalfairway(dot)com> |
---|---|
To: | "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Hanging queries on dual CPU windows |
Date: | 2006-03-12 18:10:59 |
Message-ID: | 200603121310.59861.jdevisser@digitalfairway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> > > If so,
> > > we could perhaps recode that part using a Mutex instead of
> >
> > a critical
> >
> > > section - since it's not a performance critical path, the
> >
> > difference
> >
> > > shouldn't be large. If I code up a patch for that, can you re-apply
> > > SP1 and test it? Or is this a production system you can't
> >
> > really touch?
> >
> > I can do whatever the hell I want with it, so if you could
> > cook up a patch that would be great.
> >
> > As a BTW: I reinstalled SP1 and turned stats collection off.
> > That also seems to work, but is not really a solution since
> > we want to use autovacuuming.
>
> Ok, I've coded up a patch that changes the code to use a mutex instead.
> Patch attached. You can get a precompiled postgres.exe at
> http://www.hagander.net/download/postgres.exe_mutex.zip. You need to
> copy this file to postmaster.exe as well - they are supposed to be
> identical. It's based off a snapshot of 8.1-stable.
>
> Looking a my system while testing this it still loooked like it was
> hanging on that plac ein the code, even though I saw no problems. So I'm
> not convinced we can actually trust the stacktrace from the non-default
> threads. So I don't think this patch will actually work :-( But it's
> worth a try.
>
> (Oh, and I moved the thread over to -hackers, seems more correct at this
> time)
Thanks Magnus,
I'll try tomorrow. Will let you know ASAP (8:30 EST I guess :).
If this doesn't work, how do we progress?
>
> //Magnus
jan
--
--------------------------------------------------------------
Jan de Visser jdevisser(at)digitalfairway(dot)com
Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------
From: | "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PERFORM] Hanging queries on dual CPU windows |
Date: | 2006-03-13 05:38:58 |
Message-ID: | dv30nr$1aaf$1@news.hub.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
""Magnus Hagander"" <mha(at)sollentuna(dot)net> wrote
> Ok, I've coded up a patch that changes the code to use a mutex instead.
Are we asserting the problem is caused by the spinlock random wake-up order?
I am not sure why this would fix the problem. If my memory serves, a
critical section might be a problem if one process aborts unexpected while
it is inside. Other waiting processes can never have a chance to enter it
(also have no chance to handle SIGQUIT) -- so this patch may solve this.
There is another suspect in http://www.devisser-siderius.com/stack1.jpg,
i.e., process 3 does shmctl. I once filed a server core dump bug in win32 of
reporting WSAEWOULDBLOCK.
(http://archives.postgresql.org/pgsql-bugs/2006-02/msg00185.php). AFAICS, it
is actually an mistranslated EINTR. There seems some relation between these
issues, but I didn't come up with a complete theory of it.
Regards,
Qingqing
From: | Jan de Visser <jdevisser(at)digitalfairway(dot)com> |
---|---|
To: | "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: [PERFORM] Hanging queries on dual CPU windows |
Date: | 2006-03-13 14:26:29 |
Message-ID: | 200603130926.30080.jdevisser@digitalfairway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> Looking a my system while testing this it still loooked like it was
> hanging on that plac ein the code, even though I saw no problems. So I'm
> not convinced we can actually trust the stacktrace from the non-default
> threads. So I don't think this patch will actually work :-( But it's
> worth a try.
I'm afraid you're right. Hangs again :(
jan
--
--------------------------------------------------------------
Jan de Visser jdevisser(at)digitalfairway(dot)com
Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------
From: | Jan de Visser <jdevisser(at)digitalfairway(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: [PERFORM] Hanging queries on dual CPU windows |
Date: | 2006-03-13 15:32:03 |
Message-ID: | 200603131032.03990.jdevisser@digitalfairway.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Monday 13 March 2006 09:26, Jan de Visser wrote:
> On Sunday 12 March 2006 09:40, Magnus Hagander wrote:
> > Looking a my system while testing this it still loooked like it was
> > hanging on that plac ein the code, even though I saw no problems. So I'm
> > not convinced we can actually trust the stacktrace from the non-default
> > threads. So I don't think this patch will actually work :-( But it's
> > worth a try.
>
> I'm afraid you're right. Hangs again :(
I now have the toolchain set up, so if you want me to try stuff, please let me
know. Resolving this is important to us.
On a whim, I replaced InitializeCriticalSection with
InitializeCriticalSectionAndSpinCount, since MSDN told me that would be
better for SMP. No joy.
jan
--
--------------------------------------------------------------
Jan de Visser jdevisser(at)digitalfairway(dot)com
Baruk Khazad! Khazad ai-menu!
--------------------------------------------------------------