urgent: upgraded to 8.2, getting kernel panics

Lists: pgsql-generalpgsql-hackers
From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "postgres general" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-23 22:14:47
Message-ID: b42b73150702231414m7c528b3ey521f290e5ba7d5a4@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Ok,

This may the wrong place to look for answers to this, but I figured it
couldn't hurt...so here goes:

On friday we upgraded a critical backend server to postgresql 8.2
running on fedora core 4. Since then we have received three kernel
panics during periods of moderate to high load (twice during the
pg_dump backup run).

Platform is IBM x360 series running SCSI, software raid on the backplane.

After the first crash we yum updated the system which obviously did
not fix the problem. I was leaning hardware problem until this last
time and I was able to catch the following off the terminal:

BUG: spinlock recursion CPU0 postmaster...not tainted.
bunch of other stuff ending in:
Kernel Panic: not syncing: Bad locking

One of the other developers snapped a picture of the kernel panic with
his digital camera and is going to send over the pictures when he gets
home this evening.

Has anybody seen any problem like this or have any suggestions about
possible resolution...should I be posting to the LKML? Any
suggestions are welcome and appreciated.

At this juncture we are going to downgrade the postmaster back to 8.1
and see if that fixes the panics. If it doesn't this discussion is
over but if it does we are extremely curious about looking for a fix
for this issue...we have about 8 weeks of development that is on hold
until we can put a 8.2 server in production. Management has already
authorized a new server but they want a 100% guarantee this is going
to fix the problem.

thanks in advance,
merlin


From: Devrim GUNDUZ <devrim(at)CommandPrompt(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: postgres general <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-23 22:33:48
Message-ID: 1172270028.11054.5.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


On Fri, 2007-02-23 at 17:14 -0500, Merlin Moncure wrote:

> BUG: spinlock recursion CPU0 postmaster...not tainted.

<snip>

> Has anybody seen any problem like this or have any suggestions about
> possible resolution...should I be posting to the LKML?

AFAIR (+ some quick Googling), this is related to a problem in kernel.
You may need to update to a newer Fedora release since FC4 is not
supported anymore :(.

Even if you report to LKML, they will probably suggest you using a newer
kernel. However, I think system will not let you compile a new kernel
and panic again during a high load... So...

If you have a free space, install a newer Fedora release on this system,
mount the existing $PGDATA and try if this fixes the problem...
--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: "postgres general" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-23 23:14:25
Message-ID: 8372.1172272465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

"Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> On friday we upgraded a critical backend server to postgresql 8.2
> running on fedora core 4.

Umm ... why that particular choice of OS? Red Hat dropped update
support for FC4 some time ago, and AFAIK the Fedora Legacy project
is not getting things done. How old is the kernel you're using?

> At this juncture we are going to downgrade the postmaster back to 8.1
> and see if that fixes the panics.

Even assuming that Postgres is related to the panics, I don't think you
will find anyone maintaining that a kernel panic is not the kernel's
problem. If an application *is* able to provoke a kernel panic, the
standard description of the problem would be "critical kernel security
flaw".

regards, tom lane


From: "CAJ CAJ" <pguser(at)gmail(dot)com>
To: "postgres general" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-24 01:41:00
Message-ID: 467669b30702231741h7bc84301p15109072ae163790@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/23/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > On friday we upgraded a critical backend server to postgresql 8.2
> > running on fedora core 4.
>
> Umm ... why that particular choice of OS? Red Hat dropped update
> support for FC4 some time ago, and AFAIK the Fedora Legacy project
> is not getting things done. How old is the kernel you're using?
>
> > At this juncture we are going to downgrade the postmaster back to 8.1
> > and see if that fixes the panics.
>
> Even assuming that Postgres is related to the panics, I don't think you
> will find anyone maintaining that a kernel panic is not the kernel's
> problem. If an application *is* able to provoke a kernel panic, the
> standard description of the problem would be "critical kernel security
> flaw".

I vaguely remember running into spinlock problems with FC4 and it wasn't due
to PostgreSQL. We didn't have database running on FC4.

If you are running a critical server you should switch to atleast CentOS.


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "postgres general" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, devrim(at)commandprompt(dot)com
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-26 13:24:00
Message-ID: b42b73150702260524t7587ad84i9a347a374928b7ab@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/23/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > On friday we upgraded a critical backend server to postgresql 8.2
> > running on fedora core 4.
>
> Umm ... why that particular choice of OS? Red Hat dropped update
> support for FC4 some time ago, and AFAIK the Fedora Legacy project
> is not getting things done. How old is the kernel you're using?

Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006
i686 i686 i386 GNU/Linux

Unfortunately, the decision about which kernel to run is more or less
out of my hands. I would personally really dislike fedora and would
much prefer to be running centos/redhat as. That said, your comments
and those of others are very helpul in regards to fixing that.

we tried update to the latest via yum update with no help.

as promised, here is the best photo of the panic we could get:
http://img144.imageshack.us/my.php?image=dumpic6.jpg

We did an emergency downgrade to 8.1 and will monitor the
situation...the decision to get a new server has already been made and
hopefully it will be on a more stable platform.

big thanks to all who took a few minutes out of their day to lend a hand.

merlin


From: Devrim GUNDUZ <devrim(at)CommandPrompt(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres general <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-02-26 13:57:02
Message-ID: 1172498222.3074.24.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

On Mon, 2007-02-26 at 08:24 -0500, Merlin Moncure wrote:
> we tried update to the latest via yum update with no help.

As Tom stated, FC4 is no more supported; therefore you won't be able to
get newer kernel via yum.

> as promised, here is the best photo of the panic we could get:
> http://img144.imageshack.us/my.php?image=dumpic6.jpg

...bad locking...

The picture reminded me a SCSI driver bug in older kernels -- I google'd
again now and I saw a post that says "native drivers are being used in
FC5+ kernels". If this is the real case, you may hit the problem
sometime later.

Upgrading OS will probably solve your problem; since there is no way to
upgrade FC4 kernel unless you want to compile kernel source on your
system.

Regards,

--
Devrim GÜNDÜZ
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, ODBCng - http://www.commandprompt.com/


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, postgres general <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-03-01 03:48:00
Message-ID: 20070301034800.GA16793@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Fri, Feb 23, 2007 at 18:14:25 -0500,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > On friday we upgraded a critical backend server to postgresql 8.2
> > running on fedora core 4.
>
> Umm ... why that particular choice of OS? Red Hat dropped update
> support for FC4 some time ago, and AFAIK the Fedora Legacy project
> is not getting things done. How old is the kernel you're using?

The Fedora Legacy project is officially gone now.


From: Bruno Wolff III <bruno(at)wolff(dot)to>
To: Devrim GUNDUZ <devrim(at)CommandPrompt(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres general <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-03-01 03:53:21
Message-ID: 20070301035321.GB16793@wolff.to
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Feb 26, 2007 at 15:57:02 +0200,
Devrim GUNDUZ <devrim(at)CommandPrompt(dot)com> wrote:
>
> Upgrading OS will probably solve your problem; since there is no way to
> upgrade FC4 kernel unless you want to compile kernel source on your
> system.

And good luck with that. Fedora still back patches stuff from later kernels
than the one you think you have based on the name. Building a Linus kernel
and getting the right mix of versions to work on a particular version of
Fedora might be hard to do. If you can find the patch that fixes the
problem, your best bet (assuming you have to use FC4) would be to try to apply
that fix to the latest Fedora kernel for FC4.


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "postgres general" <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, devrim(at)commandprompt(dot)com
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-03-01 13:32:57
Message-ID: b42b73150703010532q5c911c49gefc211b250b2a160@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On 2/26/07, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On 2/23/07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > "Merlin Moncure" <mmoncure(at)gmail(dot)com> writes:
> > > On friday we upgraded a critical backend server to postgresql 8.2
> > > running on fedora core 4.
> >
> > Umm ... why that particular choice of OS? Red Hat dropped update
> > support for FC4 some time ago, and AFAIK the Fedora Legacy project
> > is not getting things done. How old is the kernel you're using?
>
> Linux mojo 2.6.17-1.2142_FC4smp #1 SMP Tue Jul 11 22:57:02 EDT 2006
> i686 i686 i386 GNU/Linux
>
>
> Unfortunately, the decision about which kernel to run is more or less
> out of my hands. I would personally really dislike fedora and would
> much prefer to be running centos/redhat as. That said, your comments
> and those of others are very helpul in regards to fixing that.
>
> we tried update to the latest via yum update with no help.
>
> as promised, here is the best photo of the panic we could get:
> http://img144.imageshack.us/my.php?image=dumpic6.jpg
>
> We did an emergency downgrade to 8.1 and will monitor the
> situation...the decision to get a new server has already been made and
> hopefully it will be on a more stable platform.
>
> big thanks to all who took a few minutes out of their day to lend a hand.

Following an emergency downgrade back to 8.1, the kernel panics went
away. Note that I don't believe for a second that the database was
the root cause here...research suggest that the problem is due to some
type of bug in the scsi driver. Exactly why 8.2 brings this out is a
mystery...working on getting an enterprise kernel on the server.

merlin


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres general <pgsql-general(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, devrim(at)commandprompt(dot)com
Subject: Re: [HACKERS] urgent: upgraded to 8.2, getting kernel panics
Date: 2007-03-01 13:50:10
Message-ID: 20070301135010.GB4202@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Thu, Mar 01, 2007 at 08:32:57AM -0500, Merlin Moncure wrote:
>
> Following an emergency downgrade back to 8.1, the kernel panics went
> away. Note that I don't believe for a second that the database was
> the root cause here...research suggest that the problem is due to some
> type of bug in the scsi driver. Exactly why 8.2 brings this out is a
> mystery...working on getting an enterprise kernel on the server.

Probably it's pushing some part of the I/O system harder than 8.1, thus
exposing the bug faster.

//Magnus