Re: Background writer process terminating

Lists: pgsql-bugs
From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Background writer process terminating
Date: 2005-03-28 23:45:06
Message-ID: 9f9152698d73bed109115f1254e3985c@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

I have a server running 8.0.1 which keeps terminating. I first noticed this
today: it has happened about 7 times already. Periodically, the background
writer process is killed and drags the rest of the server down with it.
Most times PG restarts on its own, but twice I've found a dead server on
the floor and had to resusciate it manually with pg_ctl start. Here's a log
snippet:

<2005-03-28 12:28:27 EST >LOG: background writer process (PID 17409) was terminated by signal 9
<2005-03-28 12:28:27 EST >LOG: terminating any other active server processes
<2005-03-28 12:28:27 EST >LOG: all server processes terminated; reinitializing
<2005-03-28 12:28:27 EST >LOG: database system was interrupted at 2005-03-28 12:27:49 EST
<2005-03-28 12:28:41 EST >LOG: startup process (PID 17452) was terminated by signal 9
<2005-03-28 12:28:41 EST >LOG: aborting startup due to startup process failure

I bumped the logging way up and caught it happening again. The logs are too big
to reproduce here, but can be found at:

http://www.gtsm.com/pgboom3.txt

Memory seems to be ok. No entries in /var/log/messages (this is a Linux
2.4 series kernel, gcc 3.4.1, dual 686 CPU, 1 SCSI drive system). There
are two other 8.01 servers running on this box: they are also experiencing
this problem, but only once each so far. There was a recent problem with
running out of filehandles, but this was boosted and is probably not an
issue now. Plenty of disk space on the partitions. Suggestions and hints
very welcome.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200503281839
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCSJazvJuQZxSWSsgRAk1MAJ0R8Rc4V5ud5XWkdNNZ9/logSdiIgCg05+I
ufwoOhbUaHQIZQ14SerUYbY=
=2XvT
-----END PGP SIGNATURE-----


From: Michael Fuhr <mike(at)fuhr(dot)org>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 00:17:34
Message-ID: 20050329001734.GA17409@winnie.fuhr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Mon, Mar 28, 2005 at 11:45:06PM -0000, Greg Sabino Mullane wrote:
>
> I have a server running 8.0.1 which keeps terminating. I first noticed this
> today: it has happened about 7 times already.

Could the situation have been going on longer than just today? How
far back do your logs go? What, if anything, has changed on the
system since the last time you're certain this problem wasn't
happening?

> Periodically, the background writer process is killed and drags
> the rest of the server down with it.

Is there any pattern to "periodically"?

> Most times PG restarts on its own, but twice I've found a dead server on
> the floor and had to resusciate it manually with pg_ctl start. Here's a log
> snippet:
>
> <2005-03-28 12:28:27 EST >LOG: background writer process (PID 17409) was terminated by signal 9

Any chance you have a "terminate with extreme prejudice" daemon or
cron job that kills processes meeting certain conditions? Do you
see any pattern in the timestamps when this happens?

--
Michael Fuhr
http://www.fuhr.org/~mfuhr/


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 00:42:49
Message-ID: a4dfc61f713999d5ec28379c76877ff4@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Could the situation have been going on longer than just today? How
> far back do your logs go? What, if anything, has changed on the
> system since the last time you're certain this problem wasn't
> happening?

Logs go back to February 22, when 8.0.1 was put in place. No major changes
lately that I can think of. Started today at 3:38 am. There is a nightly
cron job that fires at 3:37 to vacuum and is undoubtably related, but the
others do not seem to correspond to anything in particular (there are always
users hitting the db via web scripts of course).

> Is there any pattern to "periodically"?

Not many data points yet, but here's all the occurances:

$ grep "signal 9" 5810.log
<2005-03-28 03:38:14 EST >LOG: server process (PID 29216) was terminated by signal 9
<2005-03-28 10:15:45 EST >LOG: background writer process (PID 29271) was terminated by signal 9
<2005-03-28 11:05:00 EST >LOG: background writer process (PID 2202) was terminated by signal 9
<2005-03-28 12:22:26 EST >LOG: background writer process (PID 7748) was terminated by signal 9
<2005-03-28 12:27:37 EST >LOG: background writer process (PID 17119) was terminated by signal 9
<2005-03-28 12:27:48 EST >LOG: background writer process (PID 17404) was terminated by signal 9
<2005-03-28 12:28:27 EST >LOG: background writer process (PID 17409) was terminated by signal 9
<2005-03-28 12:28:41 EST >LOG: startup process (PID 17452) was terminated by signal 9
<2005-03-28 13:52:00 EST >LOG: 00000: background writer process (PID 21456) was terminated by signal 9
<2005-03-28 15:26:25 EST 21453>LOG: 00000: background writer process (PID 24526) was terminated by signal 9
<2005-03-28 15:39:16 EST 21453>LOG: 00000: background writer process (PID 655) was terminated by signal 9
<2005-03-28 16:05:02 EST 21453>LOG: 00000: background writer process (PID 1579) was terminated by signal 9
<2005-03-28 16:05:09 EST 21453>LOG: 00000: background writer process (PID 4141) was terminated by signal 9
<2005-03-28 16:15:06 EST 21453>LOG: 00000: background writer process (PID 4159) was terminated by signal 9
<2005-03-28 16:15:20 EST 21453>LOG: 00000: background writer process (PID 5822) was terminated by signal 9

> Any chance you have a "terminate with extreme prejudice" daemon or
> cron job that kills processes meeting certain conditions?

Nope. I've been happily running pg on this server for many years. I'd frankly suspect
the computer before Postgres, however, but a better error message (e.g. pointing
to a corrupt disk or something) would be helpful. The first one was a server process
and not a background writer process, FWIW.

- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200503281939
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCSKR2vJuQZxSWSsgRAiR+AKCPRgwvtXYpS9svHPEwxYhn+OXjqwCeIlZr
+YywueTkKDXTdSFdIHasVhY=
=MvWn
-----END PGP SIGNATURE-----


From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 01:08:17
Message-ID: 20050328170441.R18713@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Tue, 29 Mar 2005, Greg Sabino Mullane wrote:

> Not many data points yet, but here's all the occurances:
>
> $ grep "signal 9" 5810.log
> <2005-03-28 03:38:14 EST >LOG: server process (PID 29216) was terminated by signal 9
> <2005-03-28 10:15:45 EST >LOG: background writer process (PID 29271) was terminated by signal 9
> <2005-03-28 11:05:00 EST >LOG: background writer process (PID 2202) was terminated by signal 9
> <2005-03-28 12:22:26 EST >LOG: background writer process (PID 7748) was terminated by signal 9
> <2005-03-28 12:27:37 EST >LOG: background writer process (PID 17119) was terminated by signal 9
> <2005-03-28 12:27:48 EST >LOG: background writer process (PID 17404) was terminated by signal 9
> <2005-03-28 12:28:27 EST >LOG: background writer process (PID 17409) was terminated by signal 9
> <2005-03-28 12:28:41 EST >LOG: startup process (PID 17452) was terminated by signal 9
> <2005-03-28 13:52:00 EST >LOG: 00000: background writer process (PID 21456) was terminated by signal 9
> <2005-03-28 15:26:25 EST 21453>LOG: 00000: background writer process (PID 24526) was terminated by signal 9
> <2005-03-28 15:39:16 EST 21453>LOG: 00000: background writer process (PID 655) was terminated by signal 9
> <2005-03-28 16:05:02 EST 21453>LOG: 00000: background writer process (PID 1579) was terminated by signal 9
> <2005-03-28 16:05:09 EST 21453>LOG: 00000: background writer process (PID 4141) was terminated by signal 9
> <2005-03-28 16:15:06 EST 21453>LOG: 00000: background writer process (PID 4159) was terminated by signal 9
> <2005-03-28 16:15:20 EST 21453>LOG: 00000: background writer process (PID 5822) was terminated by signal 9
>
> > Any chance you have a "terminate with extreme prejudice" daemon or
> > cron job that kills processes meeting certain conditions?
>
> Nope. I've been happily running pg on this server for many years. I'd frankly suspect
> the computer before Postgres, however, but a better error message (e.g. pointing
> to a corrupt disk or something) would be helpful. The first one was a server process
> and not a background writer process, FWIW.

Well, it seems to be saying that it was terminated by SIGKILL which I
can't see a reason to be internally generated. Is there anything else
running on the system that might (for example) be taxing memory to cause
an OOM killing spree or some such?


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 03:01:48
Message-ID: 6a4ca3379b9a18c92523b5cff9254a27@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Well, it seems to be saying that it was terminated by SIGKILL which I
> can't see a reason to be internally generated. Is there anything else
> running on the system that might (for example) be taxing memory to cause
> an OOM killing spree or some such?

Yes, that is certainly possible. I've got some mod_perl kids that could
potentially flare up when the site gets busy. I don't think that explains
the one at 3:30 in the morning though, but I've reduced the number of kids
and I'll see if that helps. So far, no more occurances since 16:15 EST.

Thanks,
- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200503282200
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iD8DBQFCSMUPvJuQZxSWSsgRAqwnAJ4lGO4ZBboE9xCbva6Wtk2b+MPOaACgxrA9
AoKz1mGOsf/L8pAy1+ybM4E=
=zYYP
-----END PGP SIGNATURE-----


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 03:18:41
Message-ID: 17730.1112066321@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Greg Sabino Mullane" <greg(at)turnstep(dot)com> writes:
> <2005-03-28 12:28:27 EST >LOG: background writer process (PID 17409) was terminated by signal 9

> Memory seems to be ok. No entries in /var/log/messages (this is a Linux
> 2.4 series kernel, gcc 3.4.1, dual 686 CPU, 1 SCSI drive system).

Signal 9 almost has to be the OOM killer. You sure there's nothing
about "Out of Memory" in /var/log/messages?

regards, tom lane


From: "Greg Sabino Mullane" <greg(at)turnstep(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Background writer process terminating
Date: 2005-03-29 12:25:23
Message-ID: 2e33d3e7be13117aaa1b48a80910216e@biglumber.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Lane wrote:
>Signal 9 almost has to be the OOM killer. You sure there's nothing
>about "Out of Memory" in /var/log/messages?

There is nothing, but I certainly can believe in the OOM scenario.
The problem has not happened again since I tweaked the httpd settings,
so I'll assume that was it until I see it again.

Thanks,
- --
Greg Sabino Mullane greg(at)turnstep(dot)com
PGP Key: 0x14964AC8 200503290723
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFCSUkwvJuQZxSWSsgRAjoFAJ9gk3WSh6K4asgk3JBgAaQNMrhvdgCgwALz
lrZlKbATcg//odNHOUz3TkA=
=BSgg
-----END PGP SIGNATURE-----