Setting oom_adj on linux?

Lists: pgsql-hackers
From: Magnus Hagander <magnus(at)hagander(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Setting oom_adj on linux?
Date: 2010-01-04 15:39:14
Message-ID: 9837222c1001040739l79fa0978ib1816a112798008e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I realize this is a very platform-specific thing, but should we
consider setting the value of /proc/<pid>/oom_adj when running on
linux? See:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;h=220cc6376ef80e0c9bcfec162d45552e729cdf5a;hb=45d28b097280a78893ce25a5d0db41e6a2717853

section 3.1.

To get the best benefit, I think it needs to be done in cooperation
between the startup scripts and PostgreSQL. We'd want -17 ("never oom
kill") on the postmaster, but some different value on regular backends
(since if it has to kill someone, it's better to pick a regular
backend so we can do a controlled restart). Only root can drop the
value, so the startup script needs to be part of it. But if we then
want to increase it for sub-processes, that'd require something in the
backend itself...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 15:45:41
Message-ID: 20100104154540.GH3778@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> I realize this is a very platform-specific thing, but should we
> consider setting the value of /proc/<pid>/oom_adj when running on
> linux? See:

http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 15:57:28
Message-ID: 9837222c1001040757r3a3edc85hc192f52a1f8e0fba@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Magnus Hagander wrote:
>> I realize this is a very platform-specific thing, but should we
>> consider setting the value of /proc/<pid>/oom_adj when running on
>> linux? See:
>
> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org

Grr. I had zero recollectoin of that :S

Can't find a useful consensus though?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:03:47
Message-ID: 85a5f8fd12ab571b21e7863f64424761@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 4 Jan 2010 16:57:28 +0100, Magnus Hagander <magnus(at)hagander(dot)net>
wrote:
> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera
<alvherre(at)commandprompt(dot)com>
> wrote:
>> Magnus Hagander wrote:
>>> I realize this is a very platform-specific thing, but should we
>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>> linux? See:
>>
>>
http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>
> Grr. I had zero recollectoin of that :S
>
> Can't find a useful consensus though?

I don't think we should set a setting like that automatically. Perhaps a
warning on startup?

Joshua D. Drake

--
PostgreSQL - XMPP: jdrake(at)jabber(dot)postgresql(dot)org
Consulting, Development, Support, Training
503-667-4564 - http://www.commandprompt.com/
The PostgreSQL Company, serving since 1997


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:07:31
Message-ID: 4B421243.2080106@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
>> Magnus Hagander wrote:
>>
>>> I realize this is a very platform-specific thing, but should we
>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>> linux? See:
>>>
>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>
>
> Grr. I had zero recollectoin of that :S
>
> Can't find a useful consensus though?
>
>

It is probably worth trying to protect the postmaster in the init
script. Beyond that things probably start to get fairly difficult.

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:10:54
Message-ID: 9837222c1001040810t5e10fcf0scc432965e0609598@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
>> wrote:
>>
>>>
>>> Magnus Hagander wrote:
>>>
>>>>
>>>> I realize this is a very platform-specific thing, but should we
>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>> linux? See:
>>>>
>>>
>>>
>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>
>>
>> Grr. I had zero recollectoin of that :S
>>
>> Can't find a useful consensus though?
>>
>>
>
> It is probably worth trying to protect the postmaster in the init script.
> Beyond that things probably start to get fairly difficult.

Right. But AFAICS (though I haven't tested with -17), it will become
inherited to children, which is something we'd want to *undo*, no?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:17:19
Message-ID: 4B42148F.4020406@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
>
>> Magnus Hagander wrote:
>>
>>> I realize this is a very platform-specific thing, but should we
>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>> linux? See:
>>>
>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>
>
> Can't find a useful consensus though?
>

In http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php Tom
points out that while you could make this adjustment in the init scripts
for PostgreSQL, actually doing so is quite questionable as a packaging
decision. That's where that thread ended as far as I was concerned.
The best I think anyone could do here is to add such a capability into
some of the init scripts, but it would probably need to be disabled by
default. Since that sort of defeats the purpose of the change, I'm not
sure what the benefit there is--if you have to turn it on, you might as
well do something at a higher level instead.

--
Greg Smith 2ndQuadrant Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.com


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:40:02
Message-ID: 4B4219E2.4020004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>> Magnus Hagander wrote:
>>
>>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
>>> wrote:
>>>
>>>
>>>> Magnus Hagander wrote:
>>>>
>>>>
>>>>> I realize this is a very platform-specific thing, but should we
>>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>>> linux? See:
>>>>>
>>>>>
>>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>>
>>>>
>>> Grr. I had zero recollectoin of that :S
>>>
>>> Can't find a useful consensus though?
>>>
>>>
>>>
>> It is probably worth trying to protect the postmaster in the init script.
>> Beyond that things probably start to get fairly difficult.
>>
>
> Right. But AFAICS (though I haven't tested with -17), it will become
> inherited to children, which is something we'd want to *undo*, no?
>

[experiments]

Yes, darnit, you're right. But it looks like the oom_adj file can be set
to the default by the process owner:

[andrew(at)sophia ~]$ ls -l /proc/6520/oom_adj
-rw-r--r-- 1 andrew andrew 0 2010-01-04 12:37 /proc/6520/oom_adj
[andrew(at)sophia ~]$ cat /proc/6520/oom_adj
0
[andrew(at)sophia ~]$ id
uid=500(andrew) gid=500(andrew) groups=10(wheel),500(andrew)
[andrew(at)sophia ~]$ echo -17 > /proc/6520/oom_adj
-bash: echo: write error: Permission denied
[andrew(at)sophia ~]$ echo 0 > /proc/6520/oom_adj
[andrew(at)sophia ~]$ echo -17 > /proc/6520/oom_adj
-bash: echo: write error: Permission denied
[andrew(at)sophia ~]$

But that would be a pain to have to do.

OTOH, disabling the OOM killer is not always an option. I recently tried
it on one system and had to revert it rapidly because the system stopped
working in minutes. Some software just doesn't live well in such
environments, sadly.

cheers

andrew


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:41:42
Message-ID: 9837222c1001040841t5c80b616tc56114e7f98a0970@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 17:40, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>
>
> Magnus Hagander wrote:
>>
>> On Mon, Jan 4, 2010 at 17:07, Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>
>>>
>>> Magnus Hagander wrote:
>>>
>>>>
>>>> On Mon, Jan 4, 2010 at 16:45, Alvaro Herrera
>>>> <alvherre(at)commandprompt(dot)com>
>>>> wrote:
>>>>
>>>>
>>>>>
>>>>> Magnus Hagander wrote:
>>>>>
>>>>>
>>>>>>
>>>>>> I realize this is a very platform-specific thing, but should we
>>>>>> consider setting the value of /proc/<pid>/oom_adj when running on
>>>>>> linux? See:
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> http://archives.postgresql.org/message-id/20080201223336.GC24780%40alvh.no-ip.org
>>>>>
>>>>>
>>>>
>>>> Grr. I had zero recollectoin of that :S
>>>>
>>>> Can't find a useful consensus though?
>>>>
>>>>
>>>>
>>>
>>> It is probably worth trying to protect the postmaster in the init script.
>>> Beyond that things probably start to get fairly difficult.
>>>
>>
>> Right. But AFAICS (though I haven't tested with -17), it will become
>> inherited to children, which is something we'd want to *undo*, no?
>>
>
>
> [experiments]
>
> Yes, darnit, you're right. But it looks like the oom_adj file can be set to
> the default by the process owner:
>
>   [andrew(at)sophia ~]$ ls -l /proc/6520/oom_adj
>   -rw-r--r-- 1 andrew andrew 0 2010-01-04 12:37 /proc/6520/oom_adj
>   [andrew(at)sophia ~]$ cat /proc/6520/oom_adj
>   0
>   [andrew(at)sophia ~]$ id
>   uid=500(andrew) gid=500(andrew) groups=10(wheel),500(andrew)
>   [andrew(at)sophia ~]$ echo -17 > /proc/6520/oom_adj
>   -bash: echo: write error: Permission denied
>   [andrew(at)sophia ~]$ echo 0 > /proc/6520/oom_adj
>   [andrew(at)sophia ~]$ echo -17 > /proc/6520/oom_adj
>   -bash: echo: write error: Permission denied
>   [andrew(at)sophia ~]$
>
> But that would be a pain to have to do.
>
> OTOH, disabling the OOM killer is not always an option. I recently tried it
> on one system and had to revert it rapidly because the system stopped
> working in minutes. Some software just doesn't live well in such
> environments, sadly.

Right. Which is why I like the idea of disabling the OOM killer for
the *postmaster*, but not the regular backends. Gives it a chance to
recover. It's not nice, but it's better than nothing.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:49:26
Message-ID: 14818.1262623766@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Smith <greg(at)2ndquadrant(dot)com> writes:
> In http://archives.postgresql.org/pgsql-hackers/2008-02/msg00049.php Tom
> points out that while you could make this adjustment in the init scripts
> for PostgreSQL, actually doing so is quite questionable as a packaging
> decision.

I just wondered if it would be questioned, I didn't say there was a
problem.

However, the long and the short of this is that we can't do anything
without the close cooperation of an init script. I think that moves
it out of the realm of what Postgres as a project should be doing.
It seems more like a patch that the Linux-based packagers should be
carrying.

Memo to self: get off duff and prepare such a patch for the Red Hat/Fedora
packages.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:55:15
Message-ID: 20100104165515.GI3778@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:

> Right. Which is why I like the idea of disabling the OOM killer for
> the *postmaster*, but not the regular backends. Gives it a chance to
> recover. It's not nice, but it's better than nothing.

It doesn't sound like the init script can reenable the killer for the
child processes though. So, if there's anything that the core code
ought to do, is re-enable OOM-killer for postmaster children, after
being disabled by the initscript.

BTW, is it possible for pg_ctl to disable OOM-killer? I guess not,
since it's not run by root ...

Is there a way to disable memory overcommit for particular processes?
That would be very useful -- just disable overcommit for all Postgres
processes, and there shouldn't be much need to enable the killer for
backends.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:59:00
Message-ID: 9837222c1001040859r5f0d8f5cv5f22ac85790a1c52@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 17:55, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Magnus Hagander wrote:
>
>> Right. Which is why I like the idea of disabling the OOM killer for
>> the *postmaster*, but not the regular backends. Gives it a chance to
>> recover. It's not nice, but it's better than nothing.
>
> It doesn't sound like the init script can reenable the killer for the
> child processes though.  So, if there's anything that the core code
> ought to do, is re-enable OOM-killer for postmaster children, after
> being disabled by the initscript.

Yeah, that's why the backend code would need to be involved.

> BTW, is it possible for pg_ctl to disable OOM-killer?  I guess not,
> since it's not run by root ...

No, it has to run as root.

> Is there a way to disable memory overcommit for particular processes?
> That would be very useful -- just disable overcommit for all Postgres
> processes, and there shouldn't be much need to enable the killer for
> backends.

Not that I've been able to find.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 16:59:46
Message-ID: 15030.1262624386@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Is there a way to disable memory overcommit for particular processes?

I would think not --- the very essence of overcommit is that you're
promising more total memory than the system has got, and that's
inherently a global proposition.

regards, tom lane


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 17:00:28
Message-ID: 20100104170028.GK3778@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander wrote:
> On Mon, Jan 4, 2010 at 17:55, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:

> > BTW, is it possible for pg_ctl to disable OOM-killer?  I guess not,
> > since it's not run by root ...
>
> No, it has to run as root.

We could at least make it work on Windows, since it is often run as
Administrator and drops privileges afterwards ...

... oh, wait ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 18:09:43
Message-ID: 16157.1262628583@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> I realize this is a very platform-specific thing, but should we
> consider setting the value of /proc/<pid>/oom_adj when running on
> linux? See:
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/proc.txt;h=220cc6376ef80e0c9bcfec162d45552e729cdf5a;hb=45d28b097280a78893ce25a5d0db41e6a2717853

One interesting thing I read there is:

Swapped out tasks are killed first. Half of each child's memory size is
added to the parent's score if they do not share the same memory.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This suggests that PG's shared memory ought not be counted in the
postmaster's OOM score, which would mean that the problem shouldn't be
quite as bad as we've believed. I wonder if that is a recent change?
Or maybe it's supposed to be that way and is not implemented correctly?

BTW, the given link shows only chapter 1, see

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob_plain;f=Documentation/filesystems/proc.txt;hb=45d28b097280a78893ce25a5d0db41e6a2717853

for the whole file.

regards, tom lane


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Magnus Hagander" <magnus(at)hagander(dot)net>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 18:32:36
Message-ID: 4B41DFE4020000250002DCB2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> This suggests that PG's shared memory ought not be counted in the
> postmaster's OOM score, which would mean that the problem
> shouldn't be quite as bad as we've believed. I wonder if that is
> a recent change? Or maybe it's supposed to be that way and is not
> implemented correctly?

I've wondered about that based on my experience. When I found that
memory leak back in 8.2devel, running on a SLES 9 SP 3 system, the
OOM killer killed the offending backend rather than the postmaster,
although it took out a couple Java middle tier processes before
starting in on PostgreSQL.

-Kevin


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-04 21:34:45
Message-ID: 4B425EF5.80409@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>> ...oom_adj...
>
> One interesting thing I read there is:
> Swapped out tasks are killed first. Half of each child's memory size is
> added to the parent's score if they do not share the same memory.
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This suggests that PG's shared memory ought not be counted in the
> postmaster's OOM score, which would mean that the problem shouldn't be
> quite as bad as we've believed. I wonder if that is a recent change?
> Or maybe it's supposed to be that way and is not implemented correctly?

The code for oom_kill.c looks fairly readable (link below [1]):

96 points = mm->total_vm;
....
117 list_for_each_entry(child, &p->children, sibling) {
118 task_lock(child);
119 if (child->mm != mm && child->mm)
120 points += child->mm->total_vm/2 + 1;
121 task_unlock(child);
122 }

Which seems to add points for each child who doesn't share the
same mm structure as the parent. Which I think is a quite a bit
stricter interpretation of "if they do not share the same memory".

[1] http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/oom_kill.c;h=f52481b1c1e5442c9a5b16b06b22221b75b9bb7c;hb=HEAD


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-07 23:58:16
Message-ID: 34d269d41001071558o38b25919k3deb3c5b4d75c655@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 4, 2010 at 09:55, Alvaro Herrera <alvherre(at)commandprompt(dot)com> wrote:
> Magnus Hagander wrote:
>
>> Right. Which is why I like the idea of disabling the OOM killer for
>> the *postmaster*, but not the regular backends. Gives it a chance to
>> recover. It's not nice, but it's better than nothing.
>
> It doesn't sound like the init script can reenable the killer for the
> child processes though.  So, if there's anything that the core code
> ought to do, is re-enable OOM-killer for postmaster children, after
> being disabled by the initscript.

Exactly.

FWIW here is the patch I run. Stupid as the patch may be, count it as
a +1 for people in the field doing this. Hence a reason to think
about doing something in core. maybe.

This has some oddities like it does not reset oom to 0 for the (wal)
writer process. Plus assuming you do oom, the stats collector has a
good chance of being hit. Although normal backends will probably have
a higher score.

[ oom_adj gets set to -17 in the startup script. I run this on top of
disabling overcommit, color me paranoid ]

*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 362,367 **** StartAutoVacLauncher(void)
--- 362,370 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/fork_process.c
--- b/src/backend/postmaster/fork_process.c
***************
*** 65,68 **** fork_process(void)
--- 65,84 ----
return result;
}

+ void
+ oom_adjust(void)
+ {
+ /* adjust oom */
+ FILE *oom = fopen("/proc/self/oom_adj", "w");
+
+ /*
+ * ignore errors we dont really care
+ */
+ if (oom)
+ {
+ fprintf(oom, "0\n");
+ fclose(oom);
+ }
+ }
+
#endif /* ! WIN32 */
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 161,166 **** pgarch_start(void)
--- 161,169 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***************
*** 622,627 **** pgstat_start(void)
--- 622,630 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(false);

*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 3056,3061 **** BackendStartup(Port *port)
--- 3056,3063 ----
{
free(bn);

+ oom_adjust();
+
/*
* Let's clean up ourselves as the postmaster child, and close the
* postmaster's listen sockets. (In EXEC_BACKEND case this is all
*** a/src/backend/postmaster/syslogger.c
--- b/src/backend/postmaster/syslogger.c
***************
*** 530,535 **** SysLogger_Start(void)
--- 530,538 ----
#ifndef EXEC_BACKEND
case 0:
/* in postmaster child ... */
+
+ oom_adjust();
+
/* Close the postmaster's sockets */
ClosePostmasterPorts(true);

*** a/src/include/postmaster/fork_process.h
--- b/src/include/postmaster/fork_process.h
***************
*** 13,17 ****
--- 13,18 ----
#define FORK_PROCESS_H

extern pid_t fork_process(void);
+ extern void oom_adjust(void);

#endif /* FORK_PROCESS_H */

Attachment Content-Type Size
oom_adj.patch text/x-patch 2.3 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 03:26:14
Message-ID: 25346.1262921174@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> FWIW here is the patch I run. Stupid as the patch may be, count it as
> a +1 for people in the field doing this. Hence a reason to think
> about doing something in core. maybe.

Thanks for the patch --- it's certainly a fine starting point.

We can either drop this in core (with a lot of #ifdef LINUX added)
or expect Linux packagers to carry it as a patch. Given that the
packagers would also have to modify their init scripts to go with,
the patch route is not unreasonable. Comments?

> This has some oddities like it does not reset oom to 0 for the (wal)
> writer process.

FWIW, I think that's probably a feature --- I'd vote for only resetting
in regular backends and possibly autovac workers.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 03:46:23
Message-ID: 34d269d41001071946u72dee551kccb1182aebfcee61@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:

> We can either drop this in core (with a lot of #ifdef LINUX added)

Any thoughts on doing something like (in fork_process.c)

#ifdef LINUX
void oom_adjust()
{
...
}
#else
void oom_adjust() {}
#endif

So there is only one #ifdef? It still leaves the ugly calls to the function...

> or expect Linux packagers to carry it as a patch.  Given that the
> packagers would also have to modify their init scripts to go with,
> the patch route is not unreasonable.  Comments?

Id plus +1 for core. The problem certainly does not look to be going
away soon (if ever).

>> This has some oddities like it does not reset oom to 0 for the (wal)
>> writer process.
>
> FWIW, I think that's probably a feature --- I'd vote for only resetting
> in regular backends and possibly autovac workers.

I think that makes sense +1. In-fact thats why the patch has it as a
separate function instead of hacked into fork_process(). However its
mainly odd because IIRC I greped for all instances of fork_process()
and added the oom_adjusting to the callers. Given that it seems the
wall writer procs should also be set to 0. My guess is its a race
with my startup script launching postgres and then setting oom_adj. Or
maybe I missed a caller? Maybe they don't use fork_process()? Ill
check it out.


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 12:07:21
Message-ID: 9837222c1001080407j2769d098m84885a6863889698@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 04:46, Alex Hunsaker <badalex(at)gmail(dot)com> wrote:
> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>
>> We can either drop this in core (with a lot of #ifdef LINUX added)
>
> Any thoughts on doing something like (in fork_process.c)
>
> #ifdef LINUX
> void oom_adjust()
> {
> ...
> }
> #else
> void oom_adjust() {}
> #endif
>
> So there is only one #ifdef?  It still leaves the ugly calls to the function...

Seems like a much better way, yes. Especially if we in the future want
to do this for more than one platform (if it becomes necessary).

>> or expect Linux packagers to carry it as a patch.  Given that the
>> packagers would also have to modify their init scripts to go with,
>> the patch route is not unreasonable.  Comments?
>
> Id plus +1 for core.  The problem certainly does not look to be going
> away soon (if ever).

Yeah, I think core is better. It's not like it's enough code to cause
a huge maintenance problem, I think.

Do we need to make the value configurable? I'd certainly find it
interesting to set backends to say 5 or something like that, that
makes them less likely to be killed than any old "oops opened too big
file in an editor"-process, but still possible to kill if the system
is *really* running out of memory.

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 14:27:53
Message-ID: 5140.1262960873@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Do we need to make the value configurable? I'd certainly find it
> interesting to set backends to say 5 or something like that, that
> makes them less likely to be killed than any old "oops opened too big
> file in an editor"-process, but still possible to kill if the system
> is *really* running out of memory.

I don't want to go to the trouble of creating (and documenting) a
configure option for this. Much less a GUC ;-)

What I suggest is that we do something like

#ifdef LINUX_OOM_ADJ
...
fprintf(oom, "%d\n", LINUX_OOM_ADJ);
...
#endif

Then, somebody who wants the feature would build with, say,
-DLINUX_OOM_ADJ=0
or another value if they want that.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 14:53:59
Message-ID: 201001081453.o08Erxt01807@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker wrote:
> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>
> > We can either drop this in core (with a lot of #ifdef LINUX added)
>
> Any thoughts on doing something like (in fork_process.c)
>
> #ifdef LINUX
> void oom_adjust()
> {
> ...
> }
> #else
> void oom_adjust() {}
> #endif
>
> So there is only one #ifdef? It still leaves the ugly calls to the function...

The usual solution for this kind of thing is:

#ifdef LINUX
#define OOM_ADJUST oom_adjust()
#else
#define OOM_ADJUST do {} while (0)
#endif

so there is no call or dummy function and you reference it in the code
as:

OOM_ADJUST;

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 15:24:05
Message-ID: 34d269d41001080724w4e70deb4n701f549bfb727c31@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 07:53, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Alex Hunsaker wrote:
>> On Thu, Jan 7, 2010 at 20:26, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> > Alex Hunsaker <badalex(at)gmail(dot)com> writes:

> The usual solution for this kind of thing is:
>
>        #ifdef LINUX
>        #define OOM_ADJUST oom_adjust()
>        #else
>        #define OOM_ADJUST do {} while (0)
>        #endif
>
> so there is no call or dummy function and you reference it in the code
> as:

Surely any compiler worth its salt would turn a call to an empty void
function into a noop? Then again maybe I just hate macros :)


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 16:01:31
Message-ID: 34d269d41001080801w5cb66147h8457699f88c19b03@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Then, somebody who wants the feature would build with, say,
>        -DLINUX_OOM_ADJ=0
> or another value if they want that.

Here is a stab at that.

It sets oom_adj for:
autovacuum workers
archivers (pgarch.c)
regular backends

Also it updates the contrib linux starup script to start under an oom_adj of -17

Comments?

*** a/contrib/start-scripts/linux
--- b/contrib/start-scripts/linux
***************
*** 53,58 **** DAEMON="$prefix/bin/postmaster"
--- 53,63 ----
# What to use to shut down the postmaster
PGCTL="$prefix/bin/pg_ctl"

+ # Adjust oom_adj on linux to avoid the postmaster from be killed
+ # note you probably want to compile postgres with -DLINUX_OOM_ADJ=0
+ # so that regular backends will be killed on oom
+ OOM_ADJ=-17
+
set -e

# Only start if we can find the postmaster.
***************
*** 62,67 **** test -x $DAEMON || exit 0
--- 67,73 ----
case $1 in
start)
echo -n "Starting PostgreSQL: "
+ echo $OOM_ADJ > /proc/self/oom_adj
su - $PGUSER -c "$DAEMON -D '$PGDATA' &" >>$PGLOG 2>&1
echo "ok"
;;
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
***************
*** 1403,1408 **** StartAutoVacWorker(void)
--- 1403,1411 ----
/* Lose the postmaster's on-exit routines */
on_exit_reset();

+ /* allow us to be killed on oom */
+ oom_adjust();
+
AutoVacWorkerMain(0, NULL);
break;
#endif
*** a/src/backend/postmaster/fork_process.c
--- b/src/backend/postmaster/fork_process.c
***************
*** 66,68 **** fork_process(void)
--- 66,97 ----
}

#endif /* ! WIN32 */
+
+ #if defined(__linux__) && defined(LINUX_OOM_ADJ)
+ /*
+ * By default linux really likes to kill the postmaster on oom.
+ * Because linux does not take SYSV shared mem into account it
+ * (almost) always will SIGKILL the postmaster on an oom event.
+ *
+ * In the event you started the postmaster under a low (negative)
+ * oom_adj. This will adjust regular backends, autovac and archivers
+ * to LINUX_OOM_ADJ on fork(). Allowing them to be killed in an oom
+ * event.
+ *
+ * Later we might add other OS oom type stuff here.
+ */
+ void
+ oom_adjust(void)
+ {
+ FILE *oom = fopen("/proc/self/oom_adj", "w");
+
+ /* ignore errors we dont really care */
+ if (oom)
+ {
+ fprintf(oom, "%d\n", LINUX_OOM_ADJ);
+ fclose(oom);
+ }
+ }
+ #else
+ void oom_adjust(void) { }
+ #endif
*** a/src/backend/postmaster/pgarch.c
--- b/src/backend/postmaster/pgarch.c
***************
*** 170,175 **** pgarch_start(void)
--- 170,178 ----
/* Drop our connection to postmaster's shared memory, as well */
PGSharedMemoryDetach();

+ /* allow us to be killed on oom */
+ oom_adjust();
+
PgArchiverMain(0, NULL);
break;
#endif
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
***************
*** 3076,3081 **** BackendStartup(Port *port)
--- 3076,3084 ----
/* Perform additional initialization and collect startup packet */
BackendInitialize(port);

+ /* allow us to be killed on oom */
+ oom_adjust();
+
/* And run the backend */
proc_exit(BackendRun(port));
}
*** a/src/include/postmaster/fork_process.h
--- b/src/include/postmaster/fork_process.h
***************
*** 13,17 ****
--- 13,18 ----
#define FORK_PROCESS_H

extern pid_t fork_process(void);
+ extern void oom_adjust(void);

#endif /* FORK_PROCESS_H */

Attachment Content-Type Size
oom_adjust_v2.patch text/x-patch 3.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 16:37:10
Message-ID: 7381.1262968630@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Then, somebody who wants the feature would build with, say,
>> -DLINUX_OOM_ADJ=0
>> or another value if they want that.

> Here is a stab at that.

Anybody have an objection to this basic approach? I'm in a bit of a
hurry to get something like this into the Fedora RPMs, so barring
objections I'm going to review this, commit it into HEAD, and then
make a back-ported patch I can use with 8.4 in Fedora.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 16:45:50
Message-ID: 201001081645.o08GjoT15764@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >> -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
>
> > Here is a stab at that.
>
> Anybody have an objection to this basic approach? I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

Go for it, but FYI, we need to udpate the our OOM documentation mention
to reflect this change.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:03:36
Message-ID: 20100108170336.GR17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Magnus Hagander (magnus(at)hagander(dot)net) wrote:
> Do we need to make the value configurable? I'd certainly find it
> interesting to set backends to say 5 or something like that, that
> makes them less likely to be killed than any old "oops opened too big
> file in an editor"-process, but still possible to kill if the system
> is *really* running out of memory.

We do need to make it configurable, in at least the sense of being able
to control if it's done or not. There are some environments where you
won't be able to set it. Perhaps just handling failure gracefully would
work, but I'd be happier if you could just disable it in the config
file.

Thanks,

Stephen


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:07:14
Message-ID: 20100108170714.GS17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> I don't want to go to the trouble of creating (and documenting) a
> configure option for this. Much less a GUC ;-)

Requiring a custom build to disable it would be horrible, in my view.
Or, at best, just means that the packagers won't enable it, which
obviously would be less than ideal.

Sorry if it's a pain, but I think it needs to either be configurable or
not done. As I said before, it definitely needs to handle failure
gracefully, but I worry that even that won't be sufficient in some
cases. Just thinking about how we run PG under VServers and Linux
Containers and whatnot, we ran into some issues with OpenSSH trying to
monkey with proc values and I'd really hate to run into the same issues
with PG.

Thanks,

Stephen


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:11:00
Message-ID: 34d269d41001080911u2d6fca4dofc02bf9cd143f1b2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 10:07, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this.  Much less a GUC ;-)
>
> Requiring a custom build to disable it would be horrible, in my view.
> Or, at best, just means that the packagers won't enable it, which
> obviously would be less than ideal.

FWIW I agree.

> Sorry if it's a pain, but I think it needs to either be configurable or
> not done.  As I said before, it definitely needs to handle failure
> gracefully, but I worry that even that won't be sufficient in some
> cases.  Just thinking about how we run PG under VServers and Linux
> Containers and whatnot, we ran into some issues with OpenSSH trying to
> monkey with proc values and I'd really hate to run into the same issues
> with PG.

As long as the VM/container you are running under wont kill postmaster
for trying to access proc-- the patch I posted should work fine. It
just ignores any error (I assumed you might be running in a chroot
without proc or some such).


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:12:13
Message-ID: 20100108171213.GT17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >>        -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
>
> > Here is a stab at that.
>
> Anybody have an objection to this basic approach? I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

Whoah, I would caution against doing that without being very confident
it won't break when installed under things like VServer, Linux
Containers, SELinux configurations, etc, when back-porting it. I don't
expect people would be too pleased to discover their "nice, simple,
no-expected-issues" upgrade of a minor point release to all of a sudden
mean their database doesn't start anymore..

Sorry I havn't got time right now to run down the issue I had with
OpenSSH doing something similar, and it might have just been poor coding
in OpenSSH, but I wanted to voice my concern.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:24:18
Message-ID: 8716.1262971458@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this. Much less a GUC ;-)

> Requiring a custom build to disable it would be horrible, in my view.
> Or, at best, just means that the packagers won't enable it, which
> obviously would be less than ideal.

I'm a packager, and I think that this approach is perfectly fine.
The place where the rubber meets the road is in the init script,
which is the packager's responsibility. If the packager is going
to provide an init script that sets oom_adj in the first place,
he can turn on the compensation code inside the binary. If not,
the compensation code has no purpose anyhow. There are no moving
parts in this as far as the end user is concerned.

> Sorry if it's a pain, but I think it needs to either be configurable or
> not done. As I said before, it definitely needs to handle failure
> gracefully,

We just ignore any error from the attempt to write to /proc.

> but I worry that even that won't be sufficient in some
> cases. Just thinking about how we run PG under VServers and Linux
> Containers and whatnot,

I think you are missing the point that the code won't even be compiled
except on platforms where the packager has determined that it's sensible
to have it.

regards, tom lane


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:24:39
Message-ID: 20100108172439.GV17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex,

* Alex Hunsaker (badalex(at)gmail(dot)com) wrote:
> As long as the VM/container you are running under wont kill postmaster
> for trying to access proc-- the patch I posted should work fine. It
> just ignores any error (I assumed you might be running in a chroot
> without proc or some such).

As I recall, oom_adj wasn't visible in the container because you
explicitly set what proc entries processes can have access to when using
VServers, and OpenSSH didn't handle that cleanly. Guess what I'm just
saying is "don't just assume everything is as it would be on a 'normal'
system when dealing with /proc and friends", and, of course, test, test,
test when you're talking about back-porting things.

Thanks,

Stephen


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:34:29
Message-ID: 4B476CA5.5080007@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I don't want to go to the trouble of creating (and documenting) a
> configure option for this. Much less a GUC ;-)
>
> What I suggest is that we do something like
>
> #ifdef LINUX_OOM_ADJ
> ...
> fprintf(oom, "%d\n", LINUX_OOM_ADJ);
> ...
> #endif
>
> Then, somebody who wants the feature would build with, say,
> -DLINUX_OOM_ADJ=0
> or another value if they want that.
>
>

+1 for this. Looks like a sound approach.

cheers

andrew


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:37:11
Message-ID: 603c8f071001080937re102634wbea89c8424896d2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 12:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Stephen Frost <sfrost(at)snowman(dot)net> writes:
>> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>>> I don't want to go to the trouble of creating (and documenting) a
>>> configure option for this.  Much less a GUC ;-)
>
>> Requiring a custom build to disable it would be horrible, in my view.
>> Or, at best, just means that the packagers won't enable it, which
>> obviously would be less than ideal.
>
> I'm a packager, and I think that this approach is perfectly fine.
> The place where the rubber meets the road is in the init script,
> which is the packager's responsibility.  If the packager is going
> to provide an init script that sets oom_adj in the first place,
> he can turn on the compensation code inside the binary.  If not,
> the compensation code has no purpose anyhow.  There are no moving
> parts in this as far as the end user is concerned.

There could well be moving parts if the user wants to adjust the value
being written to oom_adj, and can't because it's compiled in. I don't
see why we can't just add a GUC for this and be done with it.

...Robert


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:45:46
Message-ID: 9191.1262972746@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> * Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
>> I don't want to go to the trouble of creating (and documenting) a
>> configure option for this. Much less a GUC ;-)

> Requiring a custom build to disable it would be horrible, in my view.

BTW, maybe you're confused about the intention here? You'd need a
custom build to *enable* it, not to disable it.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Alex Hunsaker <badalex(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 17:56:28
Message-ID: 9376.1262973388@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> There could well be moving parts if the user wants to adjust the value
> being written to oom_adj, and can't because it's compiled in. I don't
> see why we can't just add a GUC for this and be done with it.

The number of users who will want to do that might be different from
epsilon, but not by enough to justify a GUC. Furthermore, we don't have
any reasonable infrastructure for supporting platform-specific GUCs,
which means that the amount of effort you're proposing is extremely
large.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 19:30:44
Message-ID: 34d269d41001081130q4b8f4ac0v9139eb0f29e05a7c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 10:24, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> As I recall, oom_adj wasn't visible in the container because you
> explicitly set what proc entries processes can have access to when using
> VServers, and OpenSSH didn't handle that cleanly.  Guess what I'm just
> saying is "don't just assume everything is as it would be on a 'normal'
> system when dealing with /proc and friends", and, of course, test, test,
> test when you're talking about back-porting things.

Sure this was openssh? I just looked through the entire cvs history
for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
Maybe something distro specific?

[ for the curious here is what I tried ]
$ git clone git://git.infradead.org/openssh.git
$ cd openssh
$ git grep oom_adj
$ git grep 'oom'
$ git grep oom | grep -vi loomis | grep -v room
$ git log -p | grep oom | grep -vi loomis | grep -v room | grep -v tsoome


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 19:45:05
Message-ID: 11220.1262979905@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> Sure this was openssh? I just looked through the entire cvs history
> for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
> Maybe something distro specific?

FWIW, I see no evidence that sshd on Fedora does anything to change its
oom score --- the oom_adj file reads as zero for both the parent daemon
and its children. Kinda scary to realize the OOM killer could easily
lock me out of boxes I run headless.

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 20:22:38
Message-ID: 34d269d41001081222k11200dbfjf3176af2b4ffac18@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 12:45, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>> Sure this was openssh? I just looked through the entire cvs history
>> for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
>> Maybe something distro specific?
>
> FWIW, I see no evidence that sshd on Fedora does anything to change its
> oom score --- the oom_adj file reads as zero for both the parent daemon
> and its children.  Kinda scary to realize the OOM killer could easily
> lock me out of boxes I run headless.

[ OT, CC trimmed]

Yeah, for me sshd has a score of 24 and has the 13th lowest oom_score
on my box. While postgres with 0 connections and shared_buffers =
28MB has a score of 26558 and has the 5th highest oom_score. Only
chromium and xmonad are above it. With 5 connections it just about
doubles its score (to 47238). Had I been headless postgres would
certainly die on oom. But even something like alsamixer or bash is
higher than sshd *shrug*.

For the curious here below is the raw data and how i generated it [
yes its far from perfect... ]:
(for file in /proc/*/; do echo `cat $file/oom_score` `cat
$file/cmdline` $file; done) | sort -n

6 /sbin/agetty-838400tty1linux /proc/2204/
6 /sbin/agetty-838400tty2linux /proc/1977/
6 /sbin/agetty-838400tty3linux /proc/1978/
6 /usr/sbin/crond /proc/1954/
6 /usr/sbin/uptimed /proc/1971/
10 /usr/sbin/irqbalance /proc/1951/
12 /usr/sbin/ntpd-s /proc/1967/
12 /usr/sbin/smartd /proc/2128/
13 hald-addon-input: Listening on /dev/input/event2 /dev/input/event1
/proc/2044/
13 hald-addon-storage: polling /dev/sr0 (every 2 sec) /proc/2088/
13 /usr/lib/hal/hald-addon-cpufreq /proc/2092/
19 /usr/sbin/syslog-ng /proc/1639/
24 /usr/sbin/sshd /proc/1943/
26 /usr/lib/sa/sadc-F-L-SDISK6006- /proc/2716/
27 supervising syslog-ng /proc/1638/
38 hald-runner /proc/1992/
53 cat/proc/self//cmdline /proc/self/
64 /usr/lib/postfix/master /proc/2089/
90 /usr/bin/X-nolistentcp /proc/2173/
102 -bash /proc/10997/
108 /home/alex/.cabal/bin/xmobar-x1 /proc/2190/
140 /usr/bin/dbus-daemon--fork--print-pid5--print-address7--session /proc/2256/
140 /usr/bin/dbus-daemon--system /proc/1983/
191 /usr/lib/hal/hald-addon-acpi /proc/2093/
194 dbus-launch--autolaunch004d7f457c373938f22d796a4ae05b60--binary-syntax--close-stderr
/proc/2255/
199 /usr/sbin/ntpd-s /proc/1960/
206 ssh-agentxmonad /proc/2188/
287 tail-f/var/log/httpd/error.log /proc/8688/
310 firefox /proc/18571/
339 xbindkeys /proc/2192/
341 -bash /proc/10589/
354 /usr/local/bin/cmus /proc/2205/
400 /bin/sh/usr/bin/startx /proc/2155/
456 sort-n /proc/10998/
487 /usr/sbin/hald /proc/1991/
525 qmgr-l-tfifo-u /proc/2097/
639 /bin/sh/home/alex/.xinitrc /proc/2180/
736 /usr/lib/GConf/gconfd-2 /proc/18578/
861 tail-f/var/log/httpd/error.log /proc/1621/
1122 urxvtd-q-f-o /proc/2189/
1151 /usr/lib/chromium/chromium /proc/2220/
1172 -bash /proc/22276/
1351 mutt-y /proc/27637/
1497 alsamixer /proc/3528/
1528 ssh192.168.0.15 /proc/1523/
1534 -bash /proc/2863/
1534 -bash /proc/2881/
1534 -bash /proc/2891/
1793 /usr/lib/chromium/chromium /proc/2219/
1828 ssh70.98.186.4 /proc/2860/
2066 pickup-l-tfifo-u /proc/26254/
2195 postgres: stats collector process /proc/10602/
2281 -bash /proc/3521/
2299 -bash /proc/1516/
2447 -bash /proc/23990/
2573 -bash /proc/1538/
3082 /usr/lib/chromium/chromium --channel=2219.aa9eb00.196295212
--type=renderer --lang=en-US
--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2392/
3305 -bash /proc/20490/
3796 /usr/sbin/httpd-kstart /proc/1553/
4697 /usr/bin/knotify4 /proc/31084/
6117 /usr/lib/chromium/chromium --channel=2219.afcb450.251437212
--type=renderer --lang=en-US
--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/18311/
6564 /proc/self/exe--channel=2219.a684ce8.115295409--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2242/
6921 /proc/self/exe--channel=2219.a6862e0.261342415--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/2245/
7453 /proc/self/exe--type=plugin--plugin-path=/usr/lib/mozilla/plugins/libflashplayer.so--lang=en-US--plugin-data-dir=/home/alex/.config/chromium/Default--channel=2219.afd16a58.340779570
/proc/18316/
10132 -bash /proc/2195/
10154 postgres: wal writer process /proc/10600/
10154 postgres: writer process /proc/10599/
10299 postgres: autovacuum launcher process /proc/10601/
11019 init [3] /proc/1/
17332 /usr/sbin/httpd-kstart /proc/1616/
17365 /usr/sbin/httpd-kstart /proc/1611/
17366 /usr/sbin/httpd-kstart /proc/1612/
17398 /usr/sbin/httpd-kstart /proc/1613/
17567 /usr/sbin/httpd-kstart /proc/1614/
19954 xinit/home/alex/.xinitrc--/etc/X11/xinit/xserverrc:0-auth/tmp/serverauth.NHLLZS74xg
/proc/2172/
26558 bin/postgres-Dblah /proc/10597/
26882 /proc/self/exe--channel=2219.b1692d18.1483224477--type=extension--lang=en-US--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/1597/
52201 /usr/lib/chromium/chromium--type=zygote /proc/2221/
53839 /home/alex/.xmonad/xmonad-i386-linux /proc/2187/
61530 /usr/lib/chromium/chromium --channel=2219.aa993d0.1197730980
--type=renderer --lang=en-US
--force-fieldtest=AsyncSlowStart/_AsyncSlowStart_off/CacheSize/CacheSizeGroup_6/DnsImpact/_max_500ms_queue_prefetch/GlobalSdch/_global_disable_sdch/SocketLateBinding/_enable_late_binding/
/proc/1527/


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 22:24:06
Message-ID: 20100108222406.GW17756@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

* Tom Lane (tgl(at)sss(dot)pgh(dot)pa(dot)us) wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> > Sure this was openssh? I just looked through the entire cvs history
> > for opensshp and found 0 references to 'oom' let alone 'oom_adj'.
> > Maybe something distro specific?
>
> FWIW, I see no evidence that sshd on Fedora does anything to change its
> oom score --- the oom_adj file reads as zero for both the parent daemon
> and its children. Kinda scary to realize the OOM killer could easily
> lock me out of boxes I run headless.

There were a few issues, as it turns out, the particularly annoying one
was in the init script which caused upgrades to fail due to sshd not
being restarted, bug report here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473573

The other issue was with a Debian-specific patch which was applied to
OpenSSH which basically just created noise in the log file, bug report
here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325

In the end, the problem was with errors being returned from attempts to
modify oom_adj. As long as we can just ignore those hopefully there
won't be any issues.

Thanks,

Stephen


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-08 22:38:22
Message-ID: 14771.1262990302@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> The other issue was with a Debian-specific patch which was applied to
> OpenSSH which basically just created noise in the log file, bug report
> here:

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325

Hmm, that's pretty interesting, specifically this:

: After some discussion with the Linux-Vserver folks, they found some
: interesting information I thought it worth adding. First EPERM is not
: the error that they expected, and that inside a vserver guest its really
: strict about what options you open it with, both O_CREAT and O_TRUNC are
: forbidden, and O_WRONLY lets you write 0\n to it.

That suggests it might be worth our trouble to use open/write rather
than fopen, so that we can ensure the flags are correct to avoid this
type of restriction.

But in any case the main takeaway seems to be to not insist on the
operation succeeding ;-)

regards, tom lane


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 03:46:10
Message-ID: 34d269d41001081946tad0030s1fb1be40f9861e03@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 8, 2010 at 15:24, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> There were a few issues, as it turns out, the particularly annoying one
> was in the init script which caused upgrades to fail due to sshd not
> being restarted, bug report here:

Thanks for the pointers!

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=473573

The changes I proposed to the example linux startup script wont suffer
from that.

> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=487325
>
> In the end, the problem was with errors being returned from attempts to
> modify oom_adj.  As long as we can just ignore those hopefully there
> won't be any issues.

Yep sounds good.

Thanks again!

Tom, sounds like you got busy with other stuff :) Should I submit a
new patch that uses open and O_WRONLY?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 03:57:07
Message-ID: 27996.1263009427@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> Tom, sounds like you got busy with other stuff :) Should I submit a
> new patch that uses open and O_WRONLY?

No, I was just waiting to see if there were more comments. I can
take it from here.

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 21:06:59
Message-ID: 1263071219.1339.20.camel@vanquo.pezone.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On fre, 2010-01-08 at 11:37 -0500, Tom Lane wrote:
> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> > On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >> Then, somebody who wants the feature would build with, say,
> >> -DLINUX_OOM_ADJ=0
> >> or another value if they want that.
>
> > Here is a stab at that.
>
> Anybody have an objection to this basic approach? I'm in a bit of a
> hurry to get something like this into the Fedora RPMs, so barring
> objections I'm going to review this, commit it into HEAD, and then
> make a back-ported patch I can use with 8.4 in Fedora.

I find this whole approach a bit evil. If word of this gets out, every
server process on Linux will excuse itself from the OOM killer. And
then the kernel guys will add another setting to override the process
preference. It's an arms race, but maybe that's what's needed.


From: Alex Hunsaker <badalex(at)gmail(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 21:53:24
Message-ID: 34d269d41001091353hf69576dx16fc256d42b041e2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jan 9, 2010 at 14:06, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> I find this whole approach a bit evil.

I would tend to agree but this type of thing has been known since about 2004...

See http://thoughts.j-davis.com/2009/11/29/linux-oom-killer/,
particularly the comment from Greg Smith.

> If word of this gets out, every
> server process on Linux will excuse itself from the OOM killer.  And
> then the kernel guys will add another setting to override the process
> preference.

Yes, and note debian is already doing that with things like ssh. Who
knows what else. (Id be curious to know)

Plus maybe it will convince them its time to fix the damn thing.
Although postgres really is kind of special in this regard. All the
other daemons on my system include X had way lower oom scores.
Alsamixer was 3 times more likely to get killed than the first daemon
with the highest score (hald) while postgres was 55 times more likely.
Yes its the kernel being stupid, but its been known for more than 6
years...

(oom scores: alsamxier: 1497, hald: 487, postgres: 26558)

> It's an arms race, but maybe that's what's needed.

Well *shrug* regardless of what core does... Ill certainly be doing it
on my postgres linux builds :) Maybe it would convince them more if
we could get distros to accept patches that fix the kernel to do
correct/better shared mem accounting? May I add good luck? :)


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alex Hunsaker <badalex(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 22:04:34
Message-ID: 4B48FD72.9030004@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> On fre, 2010-01-08 at 11:37 -0500, Tom Lane wrote:
>
>> Alex Hunsaker <badalex(at)gmail(dot)com> writes:
>>
>>> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>>
>>>> Then, somebody who wants the feature would build with, say,
>>>> -DLINUX_OOM_ADJ=0
>>>> or another value if they want that.
>>>>
>>> Here is a stab at that.
>>>
>> Anybody have an objection to this basic approach? I'm in a bit of a
>> hurry to get something like this into the Fedora RPMs, so barring
>> objections I'm going to review this, commit it into HEAD, and then
>> make a back-ported patch I can use with 8.4 in Fedora.
>>
>
> I find this whole approach a bit evil. If word of this gets out, every
> server process on Linux will excuse itself from the OOM killer. And
> then the kernel guys will add another setting to override the process
> preference. It's an arms race, but maybe that's what's needed.
>

The trouble is that the OOM heuristics are pretty bad, and many Linux
hackers aren't interested in improving them. One of the most prominent
told me some years ago "Just turn it off."

And the point of this patch is to allow the postmaster to *remove* OOM
protection from normal postgres backends. We at least would be playing
nice, and not engaging in an arms race.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-09 22:07:29
Message-ID: 17917.1263074849@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> On Sat, Jan 9, 2010 at 14:06, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> If word of this gets out, every
>> server process on Linux will excuse itself from the OOM killer. And
>> then the kernel guys will add another setting to override the process
>> preference.

> ... maybe it will convince them its time to fix the damn thing.
> Although postgres really is kind of special in this regard.

Yeah. If they had saner handling of shared-memory accounting, maybe
there wouldn't be a need for us to kluge around the OOM logic.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alex Hunsaker <badalex(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Setting oom_adj on linux?
Date: 2010-01-11 18:43:27
Message-ID: 17315.1263235407@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alex Hunsaker <badalex(at)gmail(dot)com> writes:
> On Fri, Jan 8, 2010 at 07:27, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Then, somebody who wants the feature would build with, say,
>> -DLINUX_OOM_ADJ=0
>> or another value if they want that.

> Here is a stab at that.

Applied with some editorialization. I concluded that it'd be better to
put the oom_adj reset right into fork_process, rather than scattering
the support across several different files. The latter seems vulnerable
to errors of omission in future versions, and there's no really strong
reason to not have all the child processes behave the same. Also, a
single-file patch is a lot easier for packagers to borrow and apply to
existing releases, should they choose to. (Already done and tested in
Fedora packages ...)

regards, tom lane