Quick Links

configurability of OOM killer

Lists:	pgsql-hackers

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	configurability of OOM killer
Date:	2008-02-01 22:33:36
Message-ID:	20080201223336.GC24780@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

This page

http://linux-mm.org/OOM_Killer

says that you can hint the OOM killer to be more deferential towards
certain processes.

I am wondering if we can set the system up so that it skips postmaster,
bgwriter etc, and feels more preference towards normal backends (but
then, we would try to give them less points than other regular
processes). That could make the system more robust overall, even if the
sysadmin hasn't configured it.

Incidentally, the same page notes that points are substracted from
processes with raw IO capability; which means *r*cle is probably
avoiding this problem altogether.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 00:08:24
Message-ID:	18162.1201910904@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> This page
> http://linux-mm.org/OOM_Killer

Egad. Whoever thought *this* was a good idea should be taken out
and shot:

The independent memory size of any child (except a kernel thread) is added to the score:

/*
* Processes which fork a lot of child processes are likely
* a good choice. We add the vmsize of the childs if they
* have an own mm. This prevents forking servers to flood the
* machine with an endless amount of childs
*/

In other words, server daemons are preferentially killed, and the parent
will *always* get zapped in place of its child (since the child cannot
have a higher score). No wonder we have to turn off OOM kill.

regards, tom lane

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 00:36:54
Message-ID:	47A3BB26.2030602@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>
>> This page
>> http://linux-mm.org/OOM_Killer
>>
>
> Egad. Whoever thought *this* was a good idea should be taken out
> and shot:
>
> The independent memory size of any child (except a kernel thread) is added to the score:
>
> /*
> * Processes which fork a lot of child processes are likely
> * a good choice. We add the vmsize of the childs if they
> * have an own mm. This prevents forking servers to flood the
> * machine with an endless amount of childs
> */
>
> In other words, server daemons are preferentially killed, and the parent
> will *always* get zapped in place of its child (since the child cannot
> have a higher score). No wonder we have to turn off OOM kill.
>
>

That was pretty much my reaction.

And it looks like you can't turn it off for postgres processes because
that works by process group and we call setsid(), so we aren't in a
single process group.

cheers

andrew

From:	Florian Weimer <fweimer(at)bfk(dot)de>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 16:21:50
Message-ID:	82k5lngwgx.fsf@mid.bfk.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Alvaro Herrera:

> I am wondering if we can set the system up so that it skips postmaster,
> bgwriter etc, and feels more preference towards normal backends (but
> then, we would try to give them less points than other regular
> processes). That could make the system more robust overall, even if the
> sysadmin hasn't configured it.

How much does that help? Postmaster &c still need to be shut down
when a regular backend dies due to SIGKILL.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Florian Weimer <fweimer(at)bfk(dot)de>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 17:40:35
Message-ID:	7788.1201974035@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian Weimer <fweimer(at)bfk(dot)de> writes:
> * Alvaro Herrera:
>> I am wondering if we can set the system up so that it skips postmaster,

> How much does that help? Postmaster &c still need to be shut down
> when a regular backend dies due to SIGKILL.

The $64 problem is that if the parent postmaster process is victimized
by the OOM killer, you won't get an automatic restart. In most people's
eyes that is considerably worse than the momentary DOS imposed by a kill
of a child backend. And what we now find, which is truly staggeringly
stupid on the kernel's part, is that it *preferentially* kills the
parent instead of whatever child might actually be eating the memory.

regards, tom lane

From:	Florian Weimer <fweimer(at)bfk(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 18:14:59
Message-ID:	8263x7gr8c.fsf@mid.bfk.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tom Lane:

>> How much does that help? Postmaster &c still need to be shut down
>> when a regular backend dies due to SIGKILL.
>
> The $64 problem is that if the parent postmaster process is victimized
> by the OOM killer, you won't get an automatic restart.

The classic answer to that is to put it into inittab. 8-/

> In most people's eyes that is considerably worse than the momentary
> DOS imposed by a kill of a child backend. And what we now find,
> which is truly staggeringly stupid on the kernel's part, is that it
> *preferentially* kills the parent instead of whatever child might
> actually be eating the memory.

IIRC, the idea is to get the machine out of OOM land with one killed
process, even if it causes dependent processes to fail. No matter
what you do at this point, you lose. If you prefer the child instead
of the parent, the parent might just reattempt the fork() (which
succeeds thanks to COW), and the child runs into the same OOM
condition.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Florian Weimer <fweimer(at)bfk(dot)de>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 19:15:48
Message-ID:	8953.1201979748@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian Weimer <fweimer(at)bfk(dot)de> writes:
> * Tom Lane:
>> The $64 problem is that if the parent postmaster process is victimized
>> by the OOM killer, you won't get an automatic restart.

> The classic answer to that is to put it into inittab. 8-/

Except that no standard services are actually run that way, for
sundry good-n-sufficient reasons.

>> In most people's eyes that is considerably worse than the momentary
>> DOS imposed by a kill of a child backend. And what we now find,
>> which is truly staggeringly stupid on the kernel's part, is that it
>> *preferentially* kills the parent instead of whatever child might
>> actually be eating the memory.

> IIRC, the idea is to get the machine out of OOM land with one killed
> process, even if it causes dependent processes to fail.

You're just parroting the reasoning given on the cited webpage, which
is loony because it takes no account whatsoever of actual practice.
Postgres is hardly the only daemon for which killing the parent results
in far worse DOS than not doing so. sendmail, sshd, inetd, and mysqld
are examples that come to mind immediately, and I am fairly sure that
it's true for apache as well.

Also, how is killing parent and child less invasive than killing only
the child (which is the one actually eating memory, in these cases)?
The reasoning isn't even self-consistent.

> No matter what you do at this point, you lose.

Well, since the authors of the overcommit logic appear uninterested
in running stable userland services, turning it off is the only way
not to lose.

regards, tom lane

From:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 19:24:27
Message-ID:	47A4C36B.8060703@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Florian Weimer <fweimer(at)bfk(dot)de> writes:
>> * Alvaro Herrera:
>>> I am wondering if we can set the system up so that it skips postmaster,
>
>> How much does that help? Postmaster &c still need to be shut down
>> when a regular backend dies due to SIGKILL.
>
> The $64 problem is that if the parent postmaster process is victimized
> by the OOM killer, you won't get an automatic restart. In most people's
> eyes that is considerably worse than the momentary DOS imposed by a kill
> of a child backend. And what we now find, which is truly staggeringly
> stupid on the kernel's part, is that it *preferentially* kills the
> parent instead of whatever child might actually be eating the memory.

Maybe we should just react equally brute-force, and just disable the
OOM-Killer for the postmaster if we're running on linux. It seems that
something like "echo -17 > /proc/<pid>/oom_adj" should do the trick.

And maybe add a note to the docs telling people to disable memory
overcommit on dedicated database servers if that isn't already there...

regards, Florian Pflug

From:	Florian Weimer <fweimer(at)bfk(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 19:39:15
Message-ID:	82wspnf8rg.fsf@mid.bfk.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

* Tom Lane:

>> IIRC, the idea is to get the machine out of OOM land with one killed
>> process, even if it causes dependent processes to fail.
>
> You're just parroting the reasoning given on the cited webpage, which
> is loony because it takes no account whatsoever of actual practice.

Oops, I hadn't actually read it (I can't reach the Web from this
terminal).

> Postgres is hardly the only daemon for which killing the parent results
> in far worse DOS than not doing so. sendmail, sshd, inetd, and mysqld
> are examples that come to mind immediately, and I am fairly sure that
> it's true for apache as well.

Historically, the OOM killer was mainly there to avoid a total lock-up
or straight reboot on single-user machines with text-mode console and
the occassional broken shell script. For example, it used to kill the
X server, too. Anyway, a dead SSH session or database server is less
of a DoS than a lock-up due to the OOM killer's inability to recover
resources in a reasonable time frame. (I'd need to check if it
prefers killing the main sshd daemon. That would be rather
inconvenient.)

And let me repeat: If some shell script à la

for x in *; do foo $x; done

is causing the trouble, you need to kill the parent (the shell) to
bring the system back. Killing foo brings only very short-term
relief.

Fortunately, it's possible to turn off overcommitment nowadays, so
it's not such a huge issue anymore (for me, at least). Some
applications are still not fully compatible with this mode (SBCL, for
instance, and the Sun JVM doesn't perform as well as it could,
either), but there are astonishingly few problems with
vm.overcommit_memory=2.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 20:01:41
Message-ID:	47A4CC25.8080501@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Florian G. Pflug wrote:
>
> Maybe we should just react equally brute-force, and just disable the
> OOM-Killer for the postmaster if we're running on linux. It seems that
> something like "echo -17 > /proc/<pid>/oom_adj" should do the trick.

That will protect the postmaster but none of the children. And it will
be very fragile, as only root can do it.

>
> And maybe add a note to the docs telling people to disable memory
> overcommit on dedicated database servers if that isn't already there...
>
>

It is there, and has been for years.

cheers

andrew

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 20:17:29
Message-ID:	9756.1201983449@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Florian G. Pflug wrote:
>> Maybe we should just react equally brute-force, and just disable the
>> OOM-Killer for the postmaster if we're running on linux. It seems that
>> something like "echo -17 > /proc/<pid>/oom_adj" should do the trick.

> That will protect the postmaster but none of the children. And it will
> be very fragile, as only root can do it.

However, init-scripts do run as root, so this is something that the RPM
packages could theoretically do. I wonder whether it would be seen as
good packaging practice ;-)

Not protecting the children is probably sane, since it's perfectly
possible for one of them to blow up memory-wise. If you're going
to protect them then there's little point in enabling the OOM killer
at all.

>> And maybe add a note to the docs telling people to disable memory
>> overcommit on dedicated database servers if that isn't already there...

> It is there, and has been for years.

Another thought is to tell people to run the postmaster under a
per-process memory ulimit that is conservative enough so that the
system can't get into the regime where the OOM killer activates.
ulimit actually behaves the way we want, ie, it's polite about
telling you you can't have more memory ;-).

The problem with that is that the DBA has to do the math about what he
can afford as a per-process ulimit, and it seems a fairly error-prone
calculation. Is there any way we could automate it, in whole or
in part? We are certainly capable of setting the ulimit ourselves
if we can figure out what it should be.

regards, tom lane

From:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andrew Dunstan <andrew(at)dunslane(dot)net>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 20:49:05
Message-ID:	47A4D741.1060604@phlo.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Another thought is to tell people to run the postmaster under a
> per-process memory ulimit that is conservative enough so that the
> system can't get into the regime where the OOM killer activates.
> ulimit actually behaves the way we want, ie, it's polite about
> telling you you can't have more memory ;-).

That will only work if postgres in the only service running on the
machine, though, no? If the postmaster and it's chilren use up 80% of
the available memory, then launching a forkbomb will still lead to the
postmaster being killed (Since it will get the most points). Or at least
this is how I interpret link posted originally.

And *if* postgres is the only service, does setting a ulimit have an
advantage over disabling memory overcommitting?

AFAICS, memory overcommit helps if a program creates 50mb of mosty
read-only data, and than forks 10 times, or if it maps a large amount of
memory but writes to that block only sparsely. Since postgres does
neither, a dedicated postgres server won't see any benefits from
overcommitting memory I'd think.

regards, Florian Pflug

From:	Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Cc:	Florian Weimer <fweimer(at)bfk(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject:	Re: configurability of OOM killer
Date:	2008-02-02 21:37:32
Message-ID:	200802022237.35904.dfontaine@hi-media.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

Le Saturday 02 February 2008 20:39:15 Florian Weimer, vous avez écrit :
> Oops, I hadn't actually read it (I can't reach the Web from this
> terminal).

I had a friend in the same situation as you seem to be in and implemented a
mail bot for him to somewhat access documents on the www:
http://mbot.nah-ko.org/
http://packages.debian.org/mbot

If you have a "friendly" mail server machine where to host the mbot software,
you then gain back the ability to read online stuff from any mail-only
terminal setup.

Please consider I've not been working on this software for a long time now
(several *years*) and I'm not planning to do anytime soon... and it doesn't
relay POST requests, only GET ones, at the moment.

Regards, hope this helps,
--
dim

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-03 11:55:31
Message-ID:	20080203115531.GA11431@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Sat, Feb 02, 2008 at 09:49:05PM +0100, Florian G. Pflug wrote:
> AFAICS, memory overcommit helps if a program creates 50mb of mosty
> read-only data, and than forks 10 times, or if it maps a large amount of
> memory but writes to that block only sparsely. Since postgres does
> neither, a dedicated postgres server won't see any benefits from
> overcommitting memory I'd think.

While this was probably intented to be funny, postgres does in fact
load 10mb of mostly read-only data (the
binary/libc/ssl/locales/kerberos add up to about 10mb on my machine) it
subsequently forks a dozen times, one for each connection. So postgres
is *exactly* such a program. If you start preloading
plperl/plpython/etc it grows even faster.

Now, postgres almost certainly will never change much of it so it's not
a big deal, but it could if it wanted to and that what overcommit was
designed for: banking on the fact that 99% of the time, that space
isn't written to. Overcommit is precisely what makes forking as cheap
as threads.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Those who make peaceful revolution impossible will make violent revolution inevitable.
> -- John F Kennedy

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-03 13:16:01
Message-ID:	47A5BE91.5080902@dunslane.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout wrote:
> On Sat, Feb 02, 2008 at 09:49:05PM +0100, Florian G. Pflug wrote:
>
>> AFAICS, memory overcommit helps if a program creates 50mb of mosty
>> read-only data, and than forks 10 times, or if it maps a large amount of
>> memory but writes to that block only sparsely. Since postgres does
>> neither, a dedicated postgres server won't see any benefits from
>> overcommitting memory I'd think.
>>
>
> While this was probably intented to be funny, postgres does in fact
> load 10mb of mostly read-only data (the
> binary/libc/ssl/locales/kerberos add up to about 10mb on my machine) it
> subsequently forks a dozen times, one for each connection. So postgres
> is *exactly* such a program. If you start preloading
> plperl/plpython/etc it grows even faster.
>
> Now, postgres almost certainly will never change much of it so it's not
> a big deal, but it could if it wanted to and that what overcommit was
> designed for: banking on the fact that 99% of the time, that space
> isn't written to. Overcommit is precisely what makes forking as cheap
> as threads.
>
>
>

1. Isn't most of that space program text in segments marked read-only?
2. I always turn on strict memory accounting on Linux. I haven't noticed
that it has had any performance effect. But it does pretty much do away
with the likelihood of having postgres killed from under me, AFAIK.

cheers

andrew

From:	Gregory Stark <stark(at)enterprisedb(dot)com>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Florian Weimer" <fweimer(at)bfk(dot)de>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-03 13:25:45
Message-ID:	873asa9nom.fsf@oxford.xeocode.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

"Martijn van Oosterhout" <kleptog(at)svana(dot)org> writes:

> On Sat, Feb 02, 2008 at 09:49:05PM +0100, Florian G. Pflug wrote:
>> AFAICS, memory overcommit helps if a program creates 50mb of mosty
>> read-only data, and than forks 10 times, or if it maps a large amount of
>> memory but writes to that block only sparsely. Since postgres does
>> neither, a dedicated postgres server won't see any benefits from
>> overcommitting memory I'd think.
>
> While this was probably intented to be funny, postgres does in fact
> load 10mb of mostly read-only data (the
> binary/libc/ssl/locales/kerberos add up to about 10mb on my machine) it
> subsequently forks a dozen times, one for each connection. So postgres
> is *exactly* such a program. If you start preloading
> plperl/plpython/etc it grows even faster.
>
> Now, postgres almost certainly will never change much of it so it's not
> a big deal, but it could if it wanted to

Actually no, at least on Linux the shared library linker maps shared libraries
read-only, so it really can't. Not without changing the mapping at which point
the kernel could adjust its memory counts.

However the reference to plperl and plpython is more apt. At least with perl
on Apache it's quite common to arrange to load as many modules as possible
before forking. That way the worker processes have shared copies of those
modules which, even though they're most certainly in writable memory are
mostly kept shared.

The real screw case that overcommit is intended for is actually large programs
-- like postgres -- which call fork/exec small programs often. So for example
if you have postgres calling system() it should be allowed to do the fork even
if there aren't many megabytes free because it's only going to exec some small
program like pg_standby. This is especially nasty when you realize that bash
itself is one such large program...

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	"Florian G(dot) Pflug" <fgp(at)phlo(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Florian Weimer <fweimer(at)bfk(dot)de>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-03 15:56:32
Message-ID:	11881.1202054192@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Now, postgres almost certainly will never change much of it so it's not
> a big deal, but it could if it wanted to and that what overcommit was
> designed for: banking on the fact that 99% of the time, that space
> isn't written to. Overcommit is precisely what makes forking as cheap
> as threads.

Nonsense. Copy-on-write is what makes forking as cheap as threads.

Now it's true that strict accounting requires the kernel to be prepared
to make a lot of page copies that it will never actually need in
practice. In my mind that's what swap space is for: it's the buffer
that the kernel *would* need if there were suddenly a lot more
copies-on-write than it'd been expecting.

As already noted, code pages are generally read-only and need not factor
into the calculation at all. I'm not sure how much potentially-writable
storage is really forked off by the postmaster, but I doubt it's in the
tens-of-MB range.

regards, tom lane

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 09:39:59
Message-ID:	1202117999.4252.281.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2008-02-01 at 19:08 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > This page
> > http://linux-mm.org/OOM_Killer
>
> Egad. Whoever thought *this* was a good idea should be taken out
> and shot:
>
> The independent memory size of any child (except a kernel thread) is added to the score:
>
> /*
> * Processes which fork a lot of child processes are likely
> * a good choice. We add the vmsize of the childs if they
> * have an own mm. This prevents forking servers to flood the
> * machine with an endless amount of childs
> */
>
> In other words, server daemons are preferentially killed, and the parent
> will *always* get zapped in place of its child (since the child cannot
> have a higher score). No wonder we have to turn off OOM kill.

This does look bad.

I think we should fix this problem, though I see the problem as being
Postgres not being able to set and adhere to memory limits. The OS
doesn't favour us on this point, but I think we will be ignored when we
have to explain that we don't strictly control the memory we allocate
and use.

I would like there to be a way for us to say "The server is limited to
using at most X amount of memory." There might be various ways of doing
it, but I'd like to agree that as an important goal for 8.4 dev

The benefit of doing this is that we won't have to allocate a certain
percentage of memory as contingency to avoid swapping and OOM killers.
So putting in place a memory limit will effectively increase the
available memory the server has access to and/or limit swapping, either
of which will be a performance increase.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 14:00:01
Message-ID:	47A71A61.4030608@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>
>> ... OOM_Killer
>>
>
> Egad. Whoever thought *this* was a good idea should be taken out
> and shot:
>
If I read this right, http://lkml.org/lkml/2007/2/9/275 even the
shared memory is counted many times (once per child) for the
parent process - even though it's (obviously) not copy-on-write
so the shared memory's unlikely to contribute to problems.

I wonder if postgres startup should write something (warning?
at least log?) in the log file if the OOM killer is enabled. I assume
most people who care deeply about their database dying would notice a
warning in log files; while most people who don't mind the OOM killer
also wouldn't be too bothered by extra noise in the file.

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 18:57:26
Message-ID:	1202151446.10057.759.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

> /*
> * Processes which fork a lot of child processes are likely
> * a good choice. We add the vmsize of the childs if they
> * have an own mm. This prevents forking servers to flood the
> * machine with an endless amount of childs
> */
>
> In other words, server daemons are preferentially killed, and the parent
> will *always* get zapped in place of its child (since the child cannot
> have a higher score). No wonder we have to turn off OOM kill.
>

Technically, the child could have a higher score, because it only counts
half of the total vm size of the children. At first glance it's not that
bad of an idea, except that it takes into account the total vm size
(including shared memory), not only memory that is exclusive to the
process in question.

It's pretty easy to see that badness() (the function that determines
which process is killed when the OOM killer is invoked) will count the
same byte of memory many times over when calculating the "badness" of a
process like the postgres daemon. If you have shared_buffers=1GB on a
4GB box, and 100 connections open, badness() apparently thinks
postgresql is using about 50GB of memory. Oops. One would think a VM
hacker would know better.

I tried bringing this up on LKML several times (Ron Mayer linked to one
of my posts: http://lkml.org/lkml/2007/2/9/275). If anyone has an inside
connection to the linux developer community, I suggest that they raise
this issue.

If you want to experiment, start a postgres process with shared_buffers
set at 25% of the available memory, and then start about 100 idle
connections. Then, start a process that just slowly eats memory, such
that it will invoke the OOM killer after a couple minutes (badness()
takes into account the time the process has been alive, as well, so you
can't just eat memory in a tight loop).

The postgres process will always be killed, and then it will realize
that it didn't alleviate the memory pressure much, and then kill the
runaway process.

Regards,
Jeff Davis

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 19:29:33
Message-ID:	1202153373.4252.555.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 10:57 -0800, Jeff Davis wrote:

> I tried bringing this up on LKML several times (Ron Mayer linked to one
> of my posts: http://lkml.org/lkml/2007/2/9/275). If anyone has an inside
> connection to the linux developer community, I suggest that they raise
> this issue.
>
> If you want to experiment, start a postgres process with shared_buffers
> set at 25% of the available memory, and then start about 100 idle
> connections. Then, start a process that just slowly eats memory, such
> that it will invoke the OOM killer after a couple minutes (badness()
> takes into account the time the process has been alive, as well, so you
> can't just eat memory in a tight loop).
>
> The postgres process will always be killed, and then it will realize
> that it didn't alleviate the memory pressure much, and then kill the
> runaway process.

I think the badness() thing sucks badly too, but if we don't keep our
own house in order then they're not going to listen.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 19:38:44
Message-ID:	1202153924.10057.762.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 19:29 +0000, Simon Riggs wrote:
> On Mon, 2008-02-04 at 10:57 -0800, Jeff Davis wrote:
>
> > I tried bringing this up on LKML several times (Ron Mayer linked to one
> > of my posts: http://lkml.org/lkml/2007/2/9/275). If anyone has an inside
> > connection to the linux developer community, I suggest that they raise
> > this issue.
> >
> > If you want to experiment, start a postgres process with shared_buffers
> > set at 25% of the available memory, and then start about 100 idle
> > connections. Then, start a process that just slowly eats memory, such
> > that it will invoke the OOM killer after a couple minutes (badness()
> > takes into account the time the process has been alive, as well, so you
> > can't just eat memory in a tight loop).
> >
> > The postgres process will always be killed, and then it will realize
> > that it didn't alleviate the memory pressure much, and then kill the
> > runaway process.
>
> I think the badness() thing sucks badly too, but if we don't keep our
> own house in order then they're not going to listen.

I am missing something, can you elaborate? What is postgresql doing
wrong?

Regards,
Jeff Davis

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 20:06:47
Message-ID:	1202155608.4252.570.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 11:38 -0800, Jeff Davis wrote:
> On Mon, 2008-02-04 at 19:29 +0000, Simon Riggs wrote:
> > On Mon, 2008-02-04 at 10:57 -0800, Jeff Davis wrote:
> >
> > > I tried bringing this up on LKML several times (Ron Mayer linked to one
> > > of my posts: http://lkml.org/lkml/2007/2/9/275). If anyone has an inside
> > > connection to the linux developer community, I suggest that they raise
> > > this issue.
> > >
> > > If you want to experiment, start a postgres process with shared_buffers
> > > set at 25% of the available memory, and then start about 100 idle
> > > connections. Then, start a process that just slowly eats memory, such
> > > that it will invoke the OOM killer after a couple minutes (badness()
> > > takes into account the time the process has been alive, as well, so you
> > > can't just eat memory in a tight loop).
> > >
> > > The postgres process will always be killed, and then it will realize
> > > that it didn't alleviate the memory pressure much, and then kill the
> > > runaway process.
> >
> > I think the badness() thing sucks badly too, but if we don't keep our
> > own house in order then they're not going to listen.
>
> I am missing something, can you elaborate? What is postgresql doing
> wrong?

We make no attempt to limit our overall memory usage. We limit
individual sessions by default, but don't prevent them from increasing
that allocation as they choose. We don't try to reallocate memory once
it has finished being used.

This isn't criticism; we are where we are. I just want to gain agreement
that we should be looking at that as a high priority for the next
release.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 20:31:15
Message-ID:	7806.1202157075@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On Mon, 2008-02-04 at 11:38 -0800, Jeff Davis wrote:
>> I am missing something, can you elaborate? What is postgresql doing
>> wrong?

> We make no attempt to limit our overall memory usage. We limit
> individual sessions by default, but don't prevent them from increasing
> that allocation as they choose. We don't try to reallocate memory once
> it has finished being used.

Even if that were true (and of your three claims, the last two are
wrong), there's still not anything much wrong with what Postgres is
doing. The problem is with which process the kernel chooses to kill
when it's under memory pressure. We cannot guarantee that the kernel
will never be under memory pressure, at least not in a machine that is
doing anything at all besides running Postgres ... and on a dedicated
machine you might just as well disable overcommit.

> This isn't criticism; we are where we are. I just want to gain agreement
> that we should be looking at that as a high priority for the next
> release.

Frankly, I'm entirely unpersuaded. It will do zilch to improve the OOM
problem, and I cannot see any way of restricting global memory
consumption that won't hurt performance and flexibility.

regards, tom lane

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 20:41:29
Message-ID:	20080204204129.GI16380@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Tom Lane wrote:

> Frankly, I'm entirely unpersuaded. It will do zilch to improve the OOM
> problem, and I cannot see any way of restricting global memory
> consumption that won't hurt performance and flexibility.

Yeah, the only way to improve the OOM problem would be to harass the
Linux developers to tweak badness() so that it considers the postmaster
to be an essential process rather than the one to preferentially kill.

As you said, perhaps the way to improve the current situation is to get
packagers to tweak /proc/xyz/oom_adj on the initscript.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 20:46:26
Message-ID:	1202157986.4252.590.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 15:31 -0500, Tom Lane wrote:

> I cannot see any way of restricting global memory
> consumption that won't hurt performance and flexibility.

We've discussed particular ways of doing this previously and not got
very far, its true. I think we need to separate problem identification
from problem resolution, so we can get past the first stage and look for
solutions.

This is my longest running outstanding problem with managing Postgres on
operational systems.

Sure, OOM killer sucks. So there's two problems, not one.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 21:31:53
Message-ID:	47A78449.4000104@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Alvaro Herrera wrote:
> Yeah, the only way to improve the OOM problem would be to harass the
> Linux developers to tweak badness() so that it considers the postmaster
> to be an essential process rather than the one to preferentially kill.

Wouldn't the more general rule that Jeff Davis pointed out upstream
make more sense?

That shared memory of the children should not be added to the size
of the parent process multiple times regardless of if something's
an essential process or not. Since those bytes are shared, it
seems such bytes should only be added to the badness once, no?

(assuming I understood Jeff correctly)

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 21:40:49
Message-ID:	1202161249.10057.787.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 20:06 +0000, Simon Riggs wrote:
> We make no attempt to limit our overall memory usage. We limit
> individual sessions by default, but don't prevent them from increasing
> that allocation as they choose. We don't try to reallocate memory once
> it has finished being used.
>

Did you read my post on LKML? Varying memory allocations are not the
problem here. The problem is if you have a daemon-subprocess
architecture that uses substantial amounts of shared memory.

Here's another post that explains it:
http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202

In that case (i.e. in the case of postgresql), it can count the same
byte of allocated memory against the postgresql daemon (1+N/2) times,
where N is the number of processes. That is plain wrong, in my opinion.

PostgreSQL does not change the shared memory allocation at all during
operation. Yet, even with a reasonable shared memory size that doesn't
cause any memory pressure, and many *idle* connections, badness()
decides (almost invariably) that the PostgreSQL daemon has more
"badness" than anything else, even if a much worse process exists. On a
box with 4GB of memory, badness() might plausibly think that

This overcounting that punishes postgresql is the problem.

Regards,
Jeff Davis

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 21:48:19
Message-ID:	8985.1202161699@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> That shared memory of the children should not be added to the size
> of the parent process multiple times regardless of if something's
> an essential process or not. Since those bytes are shared, it
> seems such bytes should only be added to the badness once, no?

Certainly that would help, and it might be an easier sell to the kernel
hackers: instead of arguing "this policy is foolish", we only have to
say "your VM accounting is wildly inaccurate". We'd still end up with a
postmaster at more risk than we'd like, but at least not at dozens of
times more risk than any backend.

regards, tom lane

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 21:52:50
Message-ID:	1202161970.10057.797.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 13:31 -0800, Ron Mayer wrote:
> Alvaro Herrera wrote:
> > Yeah, the only way to improve the OOM problem would be to harass the
> > Linux developers to tweak badness() so that it considers the postmaster
> > to be an essential process rather than the one to preferentially kill.
>
> Wouldn't the more general rule that Jeff Davis pointed out upstream
> make more sense?
>
> That shared memory of the children should not be added to the size
> of the parent process multiple times regardless of if something's
> an essential process or not. Since those bytes are shared, it
> seems such bytes should only be added to the badness once, no?
>
>
> (assuming I understood Jeff correctly)

Yes, that is exactly my complaint.

I am not trying to delve into the heuristics used by badness. It is not
some subtle thing that I think linux should tweak in the favor of
postgresql.

I just see something that (as I see it) is clearly wrong with the
calculation that they are using, and I want linux to fix it. It's very
easy to see, if you look at the badness algorithm, that even a well-
behaved idle postgresql daemon (or any other software of similar
architecture) will almost always be the target of the OOM killer -- even
if another process has a larger VM size (larger than postgresql
*including shared memory*, just to be clear) and is growing. And I can
demonstrate the problem with a simple test, too.

Regards,
Jeff Davis

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 22:00:23
Message-ID:	1202162423.4252.610.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 13:40 -0800, Jeff Davis wrote:

> Did you read my post on LKML?

Nice post, BTW. I think you should just submit a patch. There was a
similar problem sometime recently with counting mapped files incorrectly
towards the dirty ratio, so your issue has both clear error and
similar-ish precedent.

My comments are slightly tangential, but nonetheless relevant. I don't
think we will be listened to if we have unresolved issues in that area,
if you say you are from PGDG and want it fixed.

Maybe if I said nothing, it would help with LKML, but I say what I see.
Find me a person that thinks our memory management is easy to tune in
comparison to other DBMS.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-04 22:01:44
Message-ID:	1202162504.10057.806.camel@dogma.ljc.laika.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, 2008-02-04 at 16:48 -0500, Tom Lane wrote:
> Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> writes:
> > That shared memory of the children should not be added to the size
> > of the parent process multiple times regardless of if something's
> > an essential process or not. Since those bytes are shared, it
> > seems such bytes should only be added to the badness once, no?
>
> Certainly that would help, and it might be an easier sell to the kernel
> hackers: instead of arguing "this policy is foolish", we only have to
> say "your VM accounting is wildly inaccurate". We'd still end up with a
> postmaster at more risk than we'd like, but at least not at dozens of
> times more risk than any backend.
>

I agree completely, and that's exactly the argument I tried to make on
LKML a year ago:

http://kerneltrap.org/mailarchive/linux-kernel/2007/2/12/54202

Regards,
Jeff Davis

From:	Decibel! <decibel(at)decibel(dot)org>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-05 21:33:39
Message-ID:	20080205213339.GI1212@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Feb 04, 2008 at 08:46:26PM +0000, Simon Riggs wrote:
> On Mon, 2008-02-04 at 15:31 -0500, Tom Lane wrote:
>
> > I cannot see any way of restricting global memory
> > consumption that won't hurt performance and flexibility.
>
> We've discussed particular ways of doing this previously and not got
> very far, its true. I think we need to separate problem identification
> from problem resolution, so we can get past the first stage and look for
> solutions.
>
> This is my longest running outstanding problem with managing Postgres on
> operational systems.
>
> Sure, OOM killer sucks. So there's two problems, not one.

Yes, this problem goes way beyond OOM. Just try and configure
work_memory aggressively on a server that might see 50 database
connections, and do it in such a way that you won't swap. Good luck.

We really do need a way to limit how much memory we will use in total.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	"Decibel!" <decibel(at)decibel(dot)org>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-05 21:54:17
Message-ID:	47A8DB09.6000803@cheapcomplexdevices.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Decibel! wrote:
>
> Yes, this problem goes way beyond OOM. Just try and configure
> work_memory aggressively on a server that might see 50 database
> connections, and do it in such a way that you won't swap. Good luck.

That sounds like an even broader and more difficult problem
than managing memory.

If you have 50 connections that all want to perform large sorts,
what do you want to have happen?

a) they each do their sorts in parallel with small amounts
of memory for each; probably all spilling to disk?
b) they each get a big chunk of memory but some have to
wait for each other?
c) something else?

Seems (a)'s already possible today with setting small work_mem.
Seems (b)'s already possible today with a larger work_mem and
pg_pool.

Stepping back from the technical details, what do you think
should happen. (though perhaps it should be taken to a different
thread)

From:	Decibel! <decibel(at)decibel(dot)org>
To:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-06 01:39:48
Message-ID:	20080206013948.GL1212@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Feb 05, 2008 at 01:54:17PM -0800, Ron Mayer wrote:
> Decibel! wrote:
> >
> > Yes, this problem goes way beyond OOM. Just try and configure
> > work_memory aggressively on a server that might see 50 database
> > connections, and do it in such a way that you won't swap. Good luck.
>
> That sounds like an even broader and more difficult problem
> than managing memory.
>
> If you have 50 connections that all want to perform large sorts,
> what do you want to have happen?
>
> a) they each do their sorts in parallel with small amounts
> of memory for each; probably all spilling to disk?
> b) they each get a big chunk of memory but some have to
> wait for each other?
> c) something else?
>
> Seems (a)'s already possible today with setting small work_mem.
> Seems (b)'s already possible today with a larger work_mem and
> pg_pool.

b is not possible with pgpool; you're assuming that all connections are
trying to use work_mem.

> Stepping back from the technical details, what do you think
> should happen. (though perhaps it should be taken to a different
> thread)

Yes... it's been discussed in the past. As Simon said, the first step is
deciding that this is a problem, then we can try and figure out a
solution.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

From:	"Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To:	"Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>
Cc:	Decibel! <decibel(at)decibel(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-07 19:22:42
Message-ID:	758d5e7f0802071122w6c5cd873l29ffe72d125cec8e@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 5, 2008 10:54 PM, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> Decibel! wrote:
> >
> > Yes, this problem goes way beyond OOM. Just try and configure
> > work_memory aggressively on a server that might see 50 database
> > connections, and do it in such a way that you won't swap. Good luck.
>
> That sounds like an even broader and more difficult problem
> than managing memory.
>
> If you have 50 connections that all want to perform large sorts,
> what do you want to have happen?
>
> a) they each do their sorts in parallel with small amounts
> of memory for each; probably all spilling to disk?
> b) they each get a big chunk of memory but some have to
> wait for each other?
> c) something else?

Something else. :-)

I think there could be some additional parameter which would
control how much memory there is in total, say:
process_work_mem = 128MB # Some other name needed...
process_work_mem_percent = 20% # Yeah, defenately some other name...
total_work_mem = 1024MB # how much there is for you in total.

Your postgres spawns 50 processes which initially don't
use much work_mem. They would all register their current
work_mem usage, in shared memory.

Each process, when it expects largish sort tries to determine
how much memory there is for the taking, to calculate is own
work_mem. work_mem should not exceed process_work_mem,
and not exceed 20% of total available free mem.

So, one backend needs to make a huge sort. Determines the
limit for it is 128MB and allocates it.

Another backend starts sorting. Deletermines the current free
mem is about (1024-128)*20% =~ 179MB. Takes 128MB

Some time passes, 700MB of total_work_mem is used, and
another backend decides it needs much memory.
It determines its current free mem to be not more than
(1024-700) * 20% =~ 64MB, so it sets it work_mem to 64MB
and sorts away.

Noooow, I know work_mem is not "total per process limit", but
rather per sort/hash/etc operation. I know the scheme is a bit
sketchy, but I think this would allow more memory-greedy
operations to use memory, while taking in consideration that
they are not the only ones out there. And that these settings
would be more like hints than the actual limits.

....while we are at it -- one feature would be great for 8.4, an
ability to shange shared buffers size "on the fly". I expect
it is not trivial, but would help fine-tuning running database.
I think DBA would need to set maximum shared buffers size
along the normal setting.

Regards,
Dawid

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-07 22:59:32
Message-ID:	20080207225932.GD15660@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, Feb 07, 2008 at 08:22:42PM +0100, Dawid Kuroczko wrote:
> Noooow, I know work_mem is not "total per process limit", but
> rather per sort/hash/etc operation. I know the scheme is a bit
> sketchy, but I think this would allow more memory-greedy
> operations to use memory, while taking in consideration that
> they are not the only ones out there. And that these settings
> would be more like hints than the actual limits.

Given that we don't even control memory usage within a single process
that accuratly, it seems a bit difficult to do it across the board. You
just don't know when you start a query how much memory you're going to
use...

> ....while we are at it -- one feature would be great for 8.4, an
> ability to shange shared buffers size "on the fly". I expect
> it is not trivial, but would help fine-tuning running database.
> I think DBA would need to set maximum shared buffers size
> along the normal setting.

Shared memory segments can't be resized... There's not even a kernel
API to do it.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Dawid Kuroczko <qnex42(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-07 23:25:07
Message-ID:	18184.1202426707@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Thu, Feb 07, 2008 at 08:22:42PM +0100, Dawid Kuroczko wrote:
>> ....while we are at it -- one feature would be great for 8.4, an
>> ability to shange shared buffers size "on the fly".

> Shared memory segments can't be resized... There's not even a kernel
> API to do it.

Even if there were, it seems unlikely that we could reallocate shared
memory without stopping all active transactions, so it'd be barely less
invasive than a postmaster restart anyhow.

regards, tom lane

From:	"Markus Bertheau" <mbertheau(dot)pg(at)googlemail(dot)com>
To:	"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>, "Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 02:45:37
Message-ID:	684362e10802071845o3f4dd58auf1398c5aab40a56c@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

2008/2/8, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
>
> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > On Thu, Feb 07, 2008 at 08:22:42PM +0100, Dawid Kuroczko wrote:
> >> ....while we are at it -- one feature would be great for 8.4, an
> >> ability to shange shared buffers size "on the fly".
>
> > Shared memory segments can't be resized... There's not even a kernel
> > API to do it.
>
> Even if there were, it seems unlikely that we could reallocate shared
> memory without stopping all active transactions, so it'd be barely less
> invasive than a postmaster restart anyhow.

What about allowing shared_buffers to be only greater than it was at server
start and allocating the extra shared_buffers in one or more additional shm
segments?

Markus

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 06:40:40
Message-ID:	1202452840.4247.41.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2008-02-07 at 20:22 +0100, Dawid Kuroczko wrote:
> On Feb 5, 2008 10:54 PM, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com> wrote:
> > Decibel! wrote:
> > >
> > > Yes, this problem goes way beyond OOM. Just try and configure
> > > work_memory aggressively on a server that might see 50 database
> > > connections, and do it in such a way that you won't swap. Good luck.
> >
> > That sounds like an even broader and more difficult problem
> > than managing memory.
> >
> > If you have 50 connections that all want to perform large sorts,
> > what do you want to have happen?
> >
> > a) they each do their sorts in parallel with small amounts
> > of memory for each; probably all spilling to disk?
> > b) they each get a big chunk of memory but some have to
> > wait for each other?
> > c) something else?
>
> Something else. :-)
>
> I think there could be some additional parameter which would
> control how much memory there is in total, say:
> process_work_mem = 128MB # Some other name needed...
> process_work_mem_percent = 20% # Yeah, defenately some other name...
> total_work_mem = 1024MB # how much there is for you in total.
>
>
> Your postgres spawns 50 processes which initially don't
> use much work_mem. They would all register their current
> work_mem usage, in shared memory.
>
> Each process, when it expects largish sort tries to determine
> how much memory there is for the taking, to calculate is own
> work_mem. work_mem should not exceed process_work_mem,
> and not exceed 20% of total available free mem.
>
> So, one backend needs to make a huge sort. Determines the
> limit for it is 128MB and allocates it.
>
> Another backend starts sorting. Deletermines the current free
> mem is about (1024-128)*20% =~ 179MB. Takes 128MB
>
> Some time passes, 700MB of total_work_mem is used, and
> another backend decides it needs much memory.
> It determines its current free mem to be not more than
> (1024-700) * 20% =~ 64MB, so it sets it work_mem to 64MB
> and sorts away.
>
> Noooow, I know work_mem is not "total per process limit", but
> rather per sort/hash/etc operation. I know the scheme is a bit
> sketchy, but I think this would allow more memory-greedy
> operations to use memory, while taking in consideration that
> they are not the only ones out there. And that these settings
> would be more like hints than the actual limits.

I like the sketch and I think we need to look for a solution along those
lines.

Perhaps we might go for a mechanism that allows us to increase but not
decrease memory. Perhaps that might be easier.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Dawid Kuroczko <qnex42(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 06:42:07
Message-ID:	1202452927.4247.44.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, 2008-02-08 at 08:45 +0600, Markus Bertheau wrote:

> What about allowing shared_buffers to be only greater than it was at
> server start and allocating the extra shared_buffers in one or more
> additional shm segments?

Sounds possible.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	Simon Riggs <simon(at)2ndquadrant(dot)com>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Dawid Kuroczko <qnex42(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 06:43:47
Message-ID:	1202453027.4247.47.camel@ebony.site
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Thu, 2008-02-07 at 23:59 +0100, Martijn van Oosterhout wrote:
> On Thu, Feb 07, 2008 at 08:22:42PM +0100, Dawid Kuroczko wrote:
> > Noooow, I know work_mem is not "total per process limit", but
> > rather per sort/hash/etc operation. I know the scheme is a bit
> > sketchy, but I think this would allow more memory-greedy
> > operations to use memory, while taking in consideration that
> > they are not the only ones out there. And that these settings
> > would be more like hints than the actual limits.
>
> Given that we don't even control memory usage within a single process
> that accuratly, it seems a bit difficult to do it across the board. You
> just don't know when you start a query how much memory you're going to
> use...

I know systems that do manage memory well, so I have a different
perspective. It is a problem and we should look for solutions; there are
always many non-solutions out there.

We could, for example, allocate large query workspace out of a shared
memory pool. When we have finished with it we could return it to the
pool.

--
Simon Riggs
2ndQuadrant http://www.2ndQuadrant.com

From:	"Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>
Cc:	"Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 08:59:37
Message-ID:	758d5e7f0802080059t69196689n3c2c2ee56d54c2fe@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Feb 7, 2008 11:59 PM, Martijn van Oosterhout <kleptog(at)svana(dot)org> wrote:
> On Thu, Feb 07, 2008 at 08:22:42PM +0100, Dawid Kuroczko wrote:
> > Noooow, I know work_mem is not "total per process limit", but
> > rather per sort/hash/etc operation. I know the scheme is a bit
> > sketchy, but I think this would allow more memory-greedy
> > operations to use memory, while taking in consideration that
> > they are not the only ones out there. And that these settings
> > would be more like hints than the actual limits.
>
> Given that we don't even control memory usage within a single process
> that accuratly, it seems a bit difficult to do it across the board. You
> just don't know when you start a query how much memory you're going to
> use...

Of course. My idea does nothing to guarantee memory usage control.
It is that backends a slightly more aware of their siblings when they
allocate memory. There is nothing wrong with one backend taking
512MB of RAM for its use, when nobody else is needing it. There is
something wrong with it taking 512MB of RAM when three others
already did the same.

Hmm, I guess it would be possible to emulate this with help of cron job
which would examine current PostgreSQL's memory consumption, calculate
the new "suggested work_mem", write it into postgres.conf and reload the
config file. Ugly at best (and calculating total memory used would be a pain),
but could be used to test if this proposal has any merit at all.

> > ....while we are at it -- one feature would be great for 8.4, an
> > ability to shange shared buffers size "on the fly". I expect
> > it is not trivial, but would help fine-tuning running database.
> > I think DBA would need to set maximum shared buffers size
> > along the normal setting.
>
> Shared memory segments can't be resized... There's not even a kernel
> API to do it.

That is true. However it is possible to allocate more than one shared memory
segment. At simplest I would assume that DBA should specify minimum
shared memory size (say, 1GB) and expected maximum (2GB). And that
between minimum and maximum SHM should be allocated in reasonably
sized chunks. Say 128MB chunks. So that DBA could resize shared buffers
to 1.5GB, decide this was not a good idea after all and reduce it to 1280MB.
>From the allocation point of view it would be:
1) one big chunk of 1GB
2) one 128MB chunk
3) another 128MB chunk
4) 128MB chunk declared dead -- new pages are prohibited, old pages are
there until every backend gets rid of them.
5) 128MB same as 4.

I am not sure that chunk size should be constant -- but it should be something
reasonably small IF we want to be able to deallocate them.

Now, it would give DBA an ability to start with fail safe settings,
and gradually
increase share buffers without forcing a restart. And ability to
(yes, it would be
a slow process) rollback ;-) from overallocating memory.

Regards,
Dawid

From:	"Zeugswetter Andreas ADI SD" <Andreas(dot)Zeugswetter(at)s-itsolutions(dot)at>
To:	"Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
Cc:	"Ron Mayer" <rm_pg(at)cheapcomplexdevices(dot)com>, "Decibel!" <decibel(at)decibel(dot)org>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Jeff Davis" <pgsql(at)j-davis(dot)com>, "Alvaro Herrera" <alvherre(at)commandprompt(dot)com>, "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 09:22:12
Message-ID:	E1539E0ED7043848906A8FF995BDA57902C24A2F@m0143.s-mxs.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Yes, but the typical way around that is to allocate additional segments.
You would want a configurable size and a limit though.
Just wanted to air this possibility, cause it seems nobody is aware
here.
It does cause all sorts of issues, but it's not like there is no way to
increase shared memory.

The dba would then reconfigure and restart at a convenient time to
reduce
the number of segments because that is typically more performant.

Andreas

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 10:35:21
Message-ID:	20080208103521.GA4162@svana.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 08, 2008 at 09:59:37AM +0100, Dawid Kuroczko wrote:
> That is true. However it is possible to allocate more than one shared memory
> segment. At simplest I would assume that DBA should specify minimum
> shared memory size (say, 1GB) and expected maximum (2GB). And that
> between minimum and maximum SHM should be allocated in reasonably
> sized chunks. Say 128MB chunks. So that DBA could resize shared buffers
> to 1.5GB, decide this was not a good idea after all and reduce it to 1280MB.

I think the biggest problem is that the shared memory segments have to be
mapped to the same address in every process. The chance that can happen
after the server has been running for a while is small. Perhaps if the
postmaster allocated it waited for all the clients to refork...

Would people be OK with an indeterminate delay between changing the
setting and when it takes effect?

From:	Decibel! <decibel(at)decibel(dot)org>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Dawid Kuroczko <qnex42(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Jeff Davis <pgsql(at)j-davis(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 17:36:14
Message-ID:	20080208173614.GX1212@decibel.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Fri, Feb 08, 2008 at 06:42:07AM +0000, Simon Riggs wrote:
> On Fri, 2008-02-08 at 08:45 +0600, Markus Bertheau wrote:
>
> > What about allowing shared_buffers to be only greater than it was at
> > server start and allocating the extra shared_buffers in one or more
> > additional shm segments?
>
> Sounds possible.

If we build that, it's probably not a far stretch to just allocate
shared memory as a number of smaller segments; that would allow us to
grow as well as shrink.
--
Decibel!, aka Jim C. Nasby, Database Architect decibel(at)decibel(dot)org
Give your computer some brain candy! www.distributed.net Team #1828

From:	Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Markus Bertheau <mbertheau(dot)pg(at)googlemail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Dawid Kuroczko <qnex42(at)gmail(dot)com>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, Decibel! <decibel(at)decibel(dot)org>, Jeff Davis <pgsql(at)j-davis(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: configurability of OOM killer
Date:	2008-02-08 19:59:05
Message-ID:	20080208195905.GI31022@alvh.no-ip.org
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Simon Riggs escribió:
> On Fri, 2008-02-08 at 08:45 +0600, Markus Bertheau wrote:
>
> > What about allowing shared_buffers to be only greater than it was at
> > server start and allocating the extra shared_buffers in one or more
> > additional shm segments?
>
> Sounds possible.

Hmm, but then you have to create new locks too. Seems really messy.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support