Re: Pre-allocation of shared memory ...

Lists: pgsql-hackers
From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 15:55:49
Message-ID: 200306130855.49217.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom, et al,

> > Given that swap space is cheap, and that killing random processes is
> > obviously bad, it's not apparent to me why people think this is not
> > a good approach --- at least for high-reliability servers. And Linux
> > would definitely like to think of itself as a server-grade OS.

Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
example), include adequate swap space in their "suggested" disk formatting.
Some versions of some distributions do not create a swap partition at all;
others allocate only 130mb to this partition regardless of actual RAM.

So regardless of what they *should* be doing, there's thousands of Linux users
out there with too little or no swap on disk ...

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 16:04:38
Message-ID: 200306131604.h5DG4ck16508@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> Tom, et al,
>
> > > Given that swap space is cheap, and that killing random processes is
> > > obviously bad, it's not apparent to me why people think this is not
> > > a good approach --- at least for high-reliability servers. And Linux
> > > would definitely like to think of itself as a server-grade OS.
>
> Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> example), include adequate swap space in their "suggested" disk formatting.
> Some versions of some distributions do not create a swap partition at all;
> others allocate only 130mb to this partition regardless of actual RAM.
>
> So regardless of what they *should* be doing, there's thousands of Linux users
> out there with too little or no swap on disk ...

Yes, I have seen that on BSD's too. I am unsure if we need actual swap
backing store, or just sufficient RAM to allow fork expansion for dirty
pages.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 16:32:24
Message-ID: 200306131232.24233.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 13 June 2003 11:55, Josh Berkus wrote:
> Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> example), include adequate swap space in their "suggested" disk formatting.
> Some versions of some distributions do not create a swap partition at all;
> others allocate only 130mb to this partition regardless of actual RAM.

Incidentally, Red Hat as of about 7.0 began insisting on swap space at least
as large as twice RAM size. In my case on my 512MB RAM notebook, that meant
it wanted 1GB swap. If you upgrade your RAM you could get into trouble. In
that case, you create a swap file on one of your other partitions that the
kernel can use.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 16:41:28
Message-ID: 200306131641.h5DGfSF19866@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Lamar Owen wrote:
> On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > example), include adequate swap space in their "suggested" disk formatting.
> > Some versions of some distributions do not create a swap partition at all;
> > others allocate only 130mb to this partition regardless of actual RAM.
>
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least
> as large as twice RAM size. In my case on my 512MB RAM notebook, that meant
> it wanted 1GB swap. If you upgrade your RAM you could get into trouble. In
> that case, you create a swap file on one of your other partitions that the
> kernel can use.

Oh, that's interesting. I know the newer BSD releases got rid of the
large swap requirement, on the understanding that you usually aren't
going to be using it anyway.

What old BSD releases used to do was to allocate swap space as backing
_all_ RAM, even when it wasn't going to need it, while later releases
allocated swap only when it was needed, so it was only for cases
_exceeding_ RAM, so your virtual memory was now RAM _plus_ swap.

Of course, if you exceed swap, your system hangs.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
To: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 16:46:43
Message-ID: Pine.LNX.4.21.0306131739000.15872-100000@ponder.fairway2k.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 13 Jun 2003, Lamar Owen wrote:

> On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > example), include adequate swap space in their "suggested" disk formatting.
> > Some versions of some distributions do not create a swap partition at all;
> > others allocate only 130mb to this partition regardless of actual RAM.
>
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least
> as large as twice RAM size. In my case on my 512MB RAM notebook, that meant
> it wanted 1GB swap. If you upgrade your RAM you could get into trouble. In
> that case, you create a swap file on one of your other partitions that the
> kernel can use.

I'm not sure I agree with this. To a large extent these days of cheap memory
swap space is there to give you time to notice the excessive use of it and
repair the system, since you'd normally be running everything in RAM.

Using the old measure of twice physical memory for swap is excessive on a
decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
might if I've got a 16GB machine with the potential for an excessive
but transitory workload, or say 4-8GB machine with a few very large memory
usage processes that can be started as part of the normal work load.

In short, imo these days swap is there to prevent valid processes dying for
lack of system memory and not to provide normal workspace for them.

Having said all that, I haven't read the start of this thread so I've probably
missed the reason for the complaint about lack of swap space, like a problem on
a small memory system.

--
Nigel J. Andrews


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 17:10:04
Message-ID: 200306131710.h5DHA4P22365@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I will say I do use swap sometimes when I am editing a huge image or
something --- there are peak times when it is required.

---------------------------------------------------------------------------

Nigel J. Andrews wrote:
> On Fri, 13 Jun 2003, Lamar Owen wrote:
>
> > On Friday 13 June 2003 11:55, Josh Berkus wrote:
> > > Regrettably, few of the GUI installers for Linux (SuSE or Red Hat, for
> > > example), include adequate swap space in their "suggested" disk formatting.
> > > Some versions of some distributions do not create a swap partition at all;
> > > others allocate only 130mb to this partition regardless of actual RAM.
> >
> > Incidentally, Red Hat as of about 7.0 began insisting on swap space at least
> > as large as twice RAM size. In my case on my 512MB RAM notebook, that meant
> > it wanted 1GB swap. If you upgrade your RAM you could get into trouble. In
> > that case, you create a swap file on one of your other partitions that the
> > kernel can use.
>
> I'm not sure I agree with this. To a large extent these days of cheap memory
> swap space is there to give you time to notice the excessive use of it and
> repair the system, since you'd normally be running everything in RAM.
>
> Using the old measure of twice physical memory for swap is excessive on a
> decent system imo. I certainly would not allocate 1GB of swap! Well, okay, I
> might if I've got a 16GB machine with the potential for an excessive
> but transitory workload, or say 4-8GB machine with a few very large memory
> usage processes that can be started as part of the normal work load.
>
> In short, imo these days swap is there to prevent valid processes dying for
> lack of system memory and not to provide normal workspace for them.
>
> Having said all that, I haven't read the start of this thread so I've probably
> missed the reason for the complaint about lack of swap space, like a problem on
> a small memory system.
>
>
> --
> Nigel J. Andrews
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: most folks find a random_page_cost between 1 or 2 is ideal
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: "Jeroen T(dot) Vermeulen" <jtv(at)xs4all(dot)nl>
To: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 19:18:51
Message-ID: 20030613191851.GO31141@xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 13, 2003 at 12:32:24PM -0400, Lamar Owen wrote:
>
> Incidentally, Red Hat as of about 7.0 began insisting on swap space at least
> as large as twice RAM size. In my case on my 512MB RAM notebook, that meant
> it wanted 1GB swap. If you upgrade your RAM you could get into trouble. In
> that case, you create a swap file on one of your other partitions that the
> kernel can use.

RedHat's position may be influenced by the fact that, AFAIR, they use
the Rik van Riel virtual memory system which is inclusive--i.e., you need
at least as much swap as you have physical memory before you really have
any virtual memory at all. This was fixed by the competing Andrea
Arcangeli system, which became standard for the Linux kernel around
2.4.10 or so.

Jeroen


From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-13 19:29:03
Message-ID: 200306131529.03944.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 13 June 2003 12:46, Nigel J. Andrews wrote:
> On Fri, 13 Jun 2003, Lamar Owen wrote:
> > Incidentally, Red Hat as of about 7.0 began insisting on swap space at
> > least as large as twice RAM size. In my case on my 512MB RAM notebook,
> > that meant it wanted 1GB swap. If you upgrade your RAM you could get
> > into trouble. In that case, you create a swap file on one of your other
> > partitions that the kernel can use.

> I'm not sure I agree with this. To a large extent these days of cheap
> memory swap space is there to give you time to notice the excessive use of
> it and repair the system, since you'd normally be running everything in
> RAM.

It is or was a Linux kernel problem. The 2.2 kernel required double swap
space, even though it wasn't well documented. Early 2.4 kernels also
required double swap space, and it was better documented. Current Red Hat
2.4 kernels, I'm not sure which VM system is in use. The old VM certainly
DID require double physical memory swap space.

From a message I wrote in January of 2002:
"On Tuesday 22 January 2002 03:48 pm, Jim Wilcoxson wrote:
> I should have said, we're running this way on 2.2.19, not 2.4 -J

> > Is this Linux requirement documented anywhere? We're running 256MB
> > of swap on 1GB machines and have not had any problems. But we don't
> > swap much either.

2.2 actually needs 2x swap, but the problems are worse with 2.4. 2.2 won't
die a horrible screaming death -- but 2.4 WILL DIE if you run out of swap in
the wrong way. As to documentation, I can't tell you how I found out about
it, as I'm under NDA from that source.

However, it is public information: see http://lwn.net/2001/0607/kernel.php3
for some pointers. Also see
http://www.geocrawler.com/archives/3/84/2001/5/0/5867356/
http://www.tuxedo.org/~esr/writings/ultimate-linux-box/configuration.html
and
http://www.ultraviolet.org/mail-archives/linux-kernel.2001/28831.html

And note that Red Hat Linux 7.1 and 7.2 will complain vociferously if you
create a swap partition smaller than 2x RAM during installation (anaconda).
What it doesn't do is complain when you upgrade RAM but don't upgrade your
swap."

Now, as to whether this is _still_ a requirement or not, I don't know. Search
the lkml (Linux Kernel Mailing List) for it.

However, understand that the Red Hat kernel is closer to an Alan Cox kernel
than to a Linus kernel. At least that was true up to 2.4.18; the Red Hat
2.4.20 is very different, with NPTL and its ilk thrown in.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: "Nigel J(dot) Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 15:52:34
Message-ID: 200306141152.34936.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 13 June 2003 15:29, Lamar Owen wrote:
> It is or was a Linux kernel problem. The 2.2 kernel required double swap
> space, even though it wasn't well documented. Early 2.4 kernels also
> required double swap space, and it was better documented. Current Red Hat
> 2.4 kernels, I'm not sure which VM system is in use. The old VM certainly
> DID require double physical memory swap space.

After consulting with some kernel gurus, you can upgrade to a straight Alan
Cox (-ac) kernel and turn off overcommits to cause it to fail the allocation
instead of blowing processes out at random when the overcommit bites.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 16:30:21
Message-ID: 000b01c33292$405433e0$6401a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The trouble with this advice is that if I am an SA wanting to run a DBMS
server, I will want to run a kernel supplied by a vendor, not an arbitrary
kernel released by a developer, even one as respected as Alan Cox.

andrew

----- Original Message -----
From: "Lamar Owen" <lamar(dot)owen(at)wgcr(dot)org>
To: "Nigel J. Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, June 14, 2003 11:52 AM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

> On Friday 13 June 2003 15:29, Lamar Owen wrote:
> > It is or was a Linux kernel problem. The 2.2 kernel required double
swap
> > space, even though it wasn't well documented. Early 2.4 kernels also
> > required double swap space, and it was better documented. Current Red
Hat
> > 2.4 kernels, I'm not sure which VM system is in use. The old VM
certainly
> > DID require double physical memory swap space.
>
> After consulting with some kernel gurus, you can upgrade to a straight
Alan
> Cox (-ac) kernel and turn off overcommits to cause it to fail the
allocation
> instead of blowing processes out at random when the overcommit bites.


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 18:31:35
Message-ID: 002101c332a3$30a1e440$6401a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


http://lwn.net/Articles/4628/ has this possibly useful info:

---------------
So what is strict VM overcommit? We introduce new overcommit policies
that attempt to never succeed an allocation that can not be fulfilled by
the backing store and consequently never OOM. This is achieved through
strict accounting of the committed address space and a policy to
allow/refuse allocations based on that accounting.

In the strictest of modes, it should be impossible to allocate more
memory than available and impossible to OOM. All memory failures should
be pushed down to the allocation routines -- malloc, mmap, etc.
--------------
But see also the discussion from July last
year:http://www.ussg.iu.edu/hypermail/linux/kernel/0207.2/index.htmlA quick
investigation of 2.4 releases on kernel.org appears to show this still
hasn't made it into mainline kernels. Apparently Alan did this work
originally because RH had customers using Oracle who were running into OOM
... Surprise!I don't keep copies of old kernel sources around on my Linux
machine, so I don't know when it went into the RH kernel series - that at
least would be nice to know.andrew

----- Original Message -----
From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, June 14, 2003 12:30 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

> The trouble with this advice is that if I am an SA wanting to run a DBMS
> server, I will want to run a kernel supplied by a vendor, not an arbitrary
> kernel released by a developer, even one as respected as Alan Cox.
>
> andrew
>
> ----- Original Message -----
> From: "Lamar Owen" <lamar(dot)owen(at)wgcr(dot)org>
> To: "Nigel J. Andrews" <nandrews(at)investsystems(dot)co(dot)uk>
> Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>; <pgsql-hackers(at)postgresql(dot)org>
> Sent: Saturday, June 14, 2003 11:52 AM
> Subject: Re: [HACKERS] Pre-allocation of shared memory ...
>
>
> > On Friday 13 June 2003 15:29, Lamar Owen wrote:
> > > It is or was a Linux kernel problem. The 2.2 kernel required double
> swap
> > > space, even though it wasn't well documented. Early 2.4 kernels also
> > > required double swap space, and it was better documented. Current Red
> Hat
> > > 2.4 kernels, I'm not sure which VM system is in use. The old VM
> certainly
> > > DID require double physical memory swap space.
> >
> > After consulting with some kernel gurus, you can upgrade to a straight
> Alan
> > Cox (-ac) kernel and turn off overcommits to cause it to fail the
> allocation
> > instead of blowing processes out at random when the overcommit bites.
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
> joining column's datatypes do not match


From: Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 19:32:40
Message-ID: Pine.LNX.4.33.0306142031000.29823-100000@sphinx.mythic-beasts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 14 Jun 2003, Andrew Dunstan wrote:

> The trouble with this advice is that if I am an SA wanting to run a
> DBMS server, I will want to run a kernel supplied by a vendor, not an
> arbitrary kernel released by a developer, even one as respected as
> Alan Cox.

Like, say, Red Hat:

$ ls -l /proc/sys/vm/overcommit_memory
-rw-r--r-- 1 root root 0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
$ uname -a
Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux

(This is a Rawhide kernel, but I think that control has been
in stock RH kernels for some time now.)

Matthew.


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 19:44:48
Message-ID: 20030614194448.GA23483@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
> On Sat, 14 Jun 2003, Andrew Dunstan wrote:
>
> > The trouble with this advice is that if I am an SA wanting to run a
> > DBMS server, I will want to run a kernel supplied by a vendor, not an
> > arbitrary kernel released by a developer, even one as respected as
> > Alan Cox.
>
> Like, say, Red Hat:
>
> $ ls -l /proc/sys/vm/overcommit_memory
> -rw-r--r-- 1 root root 0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
> $ uname -a
> Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux

I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.

Kurt


From: Matthew Kirkwood <matthew(at)hairy(dot)beasts(dot)org>
To: Kurt Roeckx <Q(at)ping(dot)be>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 19:59:24
Message-ID: Pine.LNX.4.33.0306142045100.31213-100000@sphinx.mythic-beasts.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, 14 Jun 2003, Kurt Roeckx wrote:

> > $ ls -l /proc/sys/vm/overcommit_memory
> > -rw-r--r-- 1 root root 0 Jun 14 18:58 /proc/sys/vm/overcommit_memory
> > $ uname -a
> > Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31 EDT 2003 i686 i686 i386 GNU/Linux
>
> I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.

This might also be interesting:

http://www.cs.helsinki.fi/linux/linux-kernel/2002-33/0826.html

I couldn't say how much of it is in the stock RH kernels,
or how successful the heuristic is.

Matthew.


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 20:38:31
Message-ID: 003c01c332b4$eb966530$6401a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Yes, but it's only a binary flag. Non-zero says "cheerfully overcommit" and
0 says "try not to overcommit" but there isn't a value that says "make sure
not to overcommit".

Have a look in mm/mmap.c in the plain 2.4.21 sources for evidence. There's
nothing like the Alan Cox patch.

IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

I *know* the latest RH kernel docs *say* they have paranoid mode that
supposedly guarantees against OOM - it was me that pointed that out
originally :-). I just checked on the latest sources (today it's RH8, kernel
2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
really bad of RH, btw, if I'm correct - saying in your docs you support
something that you don't)

The proof, if any is needed, that the mainline kernel still does not have
this, is that it is still in Alan's patch set against 2.4.21, at
http://www.kernel.org/pub/linux/kernel/people/alan/linux-2.4/2.4.21/patch-2.4.21-ac1.gz

Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
important not to give people false expectations. For now, I'm leaning in
Tom's direction of advising people to avoid Linux for mission-critical
situations that could run into an OOM.

cheers

andrew

----- Original Message -----
From: "Kurt Roeckx" <Q(at)ping(dot)be>
To: "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>; <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, June 14, 2003 3:44 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

> On Sat, Jun 14, 2003 at 08:32:40PM +0100, Matthew Kirkwood wrote:
> > On Sat, 14 Jun 2003, Andrew Dunstan wrote:
> >
> > > The trouble with this advice is that if I am an SA wanting to run a
> > > DBMS server, I will want to run a kernel supplied by a vendor, not an
> > > arbitrary kernel released by a developer, even one as respected as
> > > Alan Cox.
> >
> > Like, say, Red Hat:
> >
> > $ ls -l /proc/sys/vm/overcommit_memory
> > -rw-r--r-- 1 root root 0 Jun 14 18:58
/proc/sys/vm/overcommit_memory
> > $ uname -a
> > Linux stinky.hoopy.net 2.4.20-20.1.1995.2.2.nptl #1 Fri May 23 12:18:31
EDT 2003 i686 i686 i386 GNU/Linux
>
>
> I also got that /proc/sys/vm/overcommit_memory on a plain 2.4.21.
>
>
> Kurt
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 21:16:56
Message-ID: 9343.1055625416@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8, kernel
> 2.4.20-18.8) to be doubly sure, and can't see the patches.

I think you must be looking in the wrong place. Red Hat's kernels have
included the mode 2/3 overcommit logic since RHL 7.3, according to
what I can find. (Don't forget Alan Cox works for Red Hat ;-).)

But it is true that it's not in Linus' tree yet. This may be because
there are still some loose ends. The copy of the overcommit document
in my RHL 8.0 system lists some ToDo items down at the bottom:

To Do
-----
o Account ptrace pages (this is hard)
o Disable MAP_NORESERVE in mode 2/3
o Account for shared anonymous mappings properly
- right now we account them per instance

I have not installed RHL 9 yet --- is the ToDo list any shorter there?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 21:38:29
Message-ID: 9431.1055626709@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8, kernel
> 2.4.20-18.8) to be doubly sure, and can't see the patches. (That would be
> really bad of RH, btw, if I'm correct - saying in your docs you support
> something that you don't)

I tried a direct test on my RHL 8.0 box, and was able to prove that
indeed the overcommit 2/3 modes do something, though whether they work
exactly as documented is another question.

I wrote this silly little test program to get an approximate answer
about the largest amount a program could malloc:

#include <stdio.h>
#include <stdlib.h>

int
main (int argc, char **argv)
{
size_t min = 1024; /* assume this'd work */
size_t max = -1; /* = max unsigned */
size_t sz;
void *ptr;

while ((max - min) >= 1024ul) {
sz = (((unsigned long long) max) + ((unsigned long long) min)) / 2;
ptr = malloc(sz);
if (ptr) {
free(ptr);
// printf("malloc(%lu) succeeded\n", sz);
min = sz;
} else {
// printf("malloc(%lu) failed\n", sz);
max = sz;
}
}

printf("Max malloc is %lu Kb\n", min / 1024);

return 0;
}

and got these results:

[root(at)rh1 tmp]# echo 0 > /proc/sys/vm/overcommit_memory
[root(at)rh1 tmp]# ./alloc
Max malloc is 1489075 Kb
[root(at)rh1 tmp]# echo 1 > /proc/sys/vm/overcommit_memory
[root(at)rh1 tmp]# ./alloc
Max malloc is 2063159 Kb
[root(at)rh1 tmp]# echo 2 > /proc/sys/vm/overcommit_memory
[root(at)rh1 tmp]# ./alloc
Max malloc is 1101639 Kb
[root(at)rh1 tmp]# echo 3 > /proc/sys/vm/overcommit_memory
[root(at)rh1 tmp]# ./alloc
Max malloc is 974179 Kb

So it's definitely doing something. /proc/meminfo shows

total: used: free: shared: buffers: cached:
Mem: 261042176 160456704 100585472 0 72015872 63344640
Swap: 1077501952 44974080 1032527872
MemTotal: 254924 kB
MemFree: 98228 kB
MemShared: 0 kB
Buffers: 70328 kB
Cached: 59244 kB
SwapCached: 2616 kB
Active: 102532 kB
Inact_dirty: 11644 kB
Inact_clean: 21840 kB
Inact_target: 27200 kB
HighTotal: 0 kB
HighFree: 0 kB
LowTotal: 254924 kB
LowFree: 98228 kB
SwapTotal: 1052248 kB
SwapFree: 1008328 kB
Committed_AS: 77164 kB

It does appear that the limit in mode 3 is not too far from where
you'd expect (SwapTotal - Committed_AS), and mode 2 allows about
128M more, which is correct since there's 256 M of RAM.

regards, tom lane


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-14 21:39:56
Message-ID: 001201c332bd$7fee5780$6401a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I know he does - *but* I think it has probably been wiped out by accident
somewhere along the line (like when they went to 2.4.20?)

Here's what's in RH sources - tell me after you look that I am looking in
the wrong place. (Or did RH get cute and decide to do this only for the AS
product?)

first, RH7.3/kernel 2.4.18-3 (patch present):

----------------
int vm_enough_memory(long pages, int charge)
{
/* Stupid algorithm to decide if we have enough memory: while
* simple, it hopefully works in most obvious cases.. Easy to
* fool it, but this should catch most mistakes.
*
* 23/11/98 NJC: Somewhat less stupid version of algorithm,
* which tries to do "TheRightThing". Instead of using half of
* (buffers+cache), use the minimum values. Allow an extra 2%
* of num_physpages for safety margin.
*
* 2002/02/26 Alan Cox: Added two new modes that do real accounting
*/
unsigned long free, allowed;
struct sysinfo i;

if(charge)
atomic_add(pages, &vm_committed_space);

/* Sometimes we want to use more memory than we have. */
if (sysctl_overcommit_memory == 1)
return 1;
if (sysctl_overcommit_memory == 0)
{
/* The page cache contains buffer pages these days.. */
free = atomic_read(&page_cache_size);
free += nr_free_pages();
free += nr_swap_pages;

/*
* This double-counts: the nrpages are both in the
page-cache
* and in the swapper space. At the same time, this
compensates
* for the swap-space over-allocation (ie "nr_swap_pages"
being
* too small.
*/
free += swapper_space.nrpages;

/*
* The code below doesn't account for free space in the
inode
* and dentry slab cache, slab cache fragmentation, inodes
and
* dentries which will become freeable under VM load, etc.
* Lets just hope all these (complex) factors balance out...
*/
free += (dentry_stat.nr_unused * sizeof(struct dentry)) >>
PAGE_SHIFT;
free += (inodes_stat.nr_unused * sizeof(struct inode)) >>
PAGE_SHIFT;

if(free > pages)
return 1;
atomic_sub(pages, &vm_committed_space);
return 0;
}
allowed = total_swap_pages;

if(sysctl_overcommit_memory == 2)
{
/* FIXME - need to add arch hooks to get the bits we need
without the higher overhead crap */
si_meminfo(&i);
allowed += i.totalram >> 1;
}
if(atomic_read(&vm_committed_space) < allowed)
return 1;
if(charge)
atomic_sub(pages, &vm_committed_space);
return 0;

}
---------
and here's what's in RH9/2.4.20-18 (patch absent):
--------------
int vm_enough_memory(long pages)
{
/* Stupid algorithm to decide if we have enough memory: while
* simple, it hopefully works in most obvious cases.. Easy to
* fool it, but this should catch most mistakes.
*/
/* 23/11/98 NJC: Somewhat less stupid version of algorithm,
* which tries to do "TheRightThing". Instead of using half of
* (buffers+cache), use the minimum values. Allow an extra 2%
* of num_physpages for safety margin.
*/

unsigned long free;

/* Sometimes we want to use more memory than we have. */
if (sysctl_overcommit_memory)
return 1;

/* The page cache contains buffer pages these days.. */
free = atomic_read(&page_cache_size);
free += nr_free_pages();
free += nr_swap_pages;

/*
* This double-counts: the nrpages are both in the page-cache
* and in the swapper space. At the same time, this compensates
* for the swap-space over-allocation (ie "nr_swap_pages" being
* too small.
*/
free += swapper_space.nrpages;

/*
* The code below doesn't account for free space in the inode
* and dentry slab cache, slab cache fragmentation, inodes and
* dentries which will become freeable under VM load, etc.
* Lets just hope all these (complex) factors balance out...
*/
free += (dentry_stat.nr_unused * sizeof(struct dentry)) >>
PAGE_SHIFT;
free += (inodes_stat.nr_unused * sizeof(struct inode)) >>
PAGE_SHIFT;

return free > pages;
}

----- Original Message -----
From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>; "Matthew Kirkwood"
<matthew(at)hairy(dot)beasts(dot)org>; <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, June 14, 2003 5:16 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

> "Andrew Dunstan" <andrew(at)dunslane(dot)net> writes:
> > I *know* the latest RH kernel docs *say* they have paranoid mode that
> > supposedly guarantees against OOM - it was me that pointed that out
> > originally :-). I just checked on the latest sources (today it's RH8,
kernel
> > 2.4.20-18.8) to be doubly sure, and can't see the patches.
>
> I think you must be looking in the wrong place. Red Hat's kernels have
> included the mode 2/3 overcommit logic since RHL 7.3, according to
> what I can find. (Don't forget Alan Cox works for Red Hat ;-).)
>
> But it is true that it's not in Linus' tree yet. This may be because
> there are still some loose ends. The copy of the overcommit document
> in my RHL 8.0 system lists some ToDo items down at the bottom:
>
> To Do
> -----
> o Account ptrace pages (this is hard)
> o Disable MAP_NORESERVE in mode 2/3
> o Account for shared anonymous mappings properly
> - right now we account them per instance
>
> I have not installed RHL 9 yet --- is the ToDo list any shorter there?
>
> regards, tom lane


From: Lamar Owen <lamar(dot)owen(at)wgcr(dot)org>
To: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-15 03:46:48
Message-ID: 200306142346.48798.lamar.owen@wgcr.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Saturday 14 June 2003 16:38, Andrew Dunstan wrote:
> IOW, simply the presence of /proc/sys/vm/overcommit_memory with a value set
> to 0 doesn't guarantee you won't get an OOM kill, AFAICS.

Right. You need the value to be 2 or 3. Which means you need Alan's patch to
do that.

> I *know* the latest RH kernel docs *say* they have paranoid mode that
> supposedly guarantees against OOM - it was me that pointed that out
> originally :-). I just checked on the latest sources (today it's RH8,
> kernel 2.4.20-18.8) to be doubly sure, and can't see the patches. (That
> would be really bad of RH, btw, if I'm correct - saying in your docs you
> support something that you don't)

But note these two lines in the docs with 2.4.20-13.9 (RHL9 errata):
* This describes the overcommit management facility in the latest kernel
tree (FIXME: actually it also describes the stuff that isnt yet done)

Pay double attention to the line that says FIXME. IOW, they've documented
stuff that might not be done!

You can try Red Hat's enterprise kernel, but you'll have to build it from
source. RHEL AS is available online as source RPMs.

Also understand that the official Red Hat kernel is very close to an Alan Cox
kernel. Also, if you really want to get down and dirty testing the kernel, a
test suite is available to help with that, known as Cerberus. Configs are
available specifically tuned to stress-test kernels. I think Cerberus is on
Source Forge.

So, make sure you have a kernel that allows overcommit-accounting mode 2 to
prevent kills on OOM. Theoretically mode 2 will prevent the possiblity of
OOM completely.

If I read things right, if you have double swap space mode 0 will not OOM
nearly as quickly.
--
Lamar Owen
WGCR Internet Radio
1 Peter 4:11


From: "Shridhar Daithankar" <shridhar_daithankar(at)persistent(dot)co(dot)in>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-15 10:43:07
Message-ID: 3EEC9B13.3586.45A335@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 14 Jun 2003 at 16:38, Andrew Dunstan wrote:
> Summary: don't take shortcuts looking for this - Read the Source, Luke. It's
> important not to give people false expectations. For now, I'm leaning in
> Tom's direction of advising people to avoid Linux for mission-critical
> situations that could run into an OOM.

While I agree that vanilla linux does not handle the situation gracefully
enough, anybody running a mission critical application should spec. the machine
and the demads on the same carefully enough. For certain linux won't start
doing OOM kill because it started going low on buffer memory. ( At least I hope
so.)

If on expects to throw uncalculated amount of load on a mission critical box,
till it reaches swap for every malloc in a strcpy, there are things need to be
checked before which kernel/OS you are running.

And BTW whas that original comment for vanilla liux or linux in general..:-)

Bye
Shridhar

--
Adore, v.: To venerate expectantly. -- Ambrose Bierce, "The Devil's
Dictionary"


From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>, "Matthew Kirkwood" <matthew(at)hairy(dot)beasts(dot)org>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-15 11:00:40
Message-ID: 001b01c3332d$5d0a3200$6401a8c0@DUNSLANE
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alan Cox has written to me thus:

> It got dropped for RH9 and some errata kernels because of clashes between
> the old stuff and the rmap vm and other weird RH patches

andrew

----- Original Message -----
From: "Andrew Dunstan" <andrew(at)dunslane(dot)net>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Kurt Roeckx" <Q(at)ping(dot)be>; "Matthew Kirkwood"
<matthew(at)hairy(dot)beasts(dot)org>; <pgsql-hackers(at)postgresql(dot)org>
Sent: Saturday, June 14, 2003 5:39 PM
Subject: Re: [HACKERS] Pre-allocation of shared memory ...

> I know he does - *but* I think it has probably been wiped out by accident
> somewhere along the line (like when they went to 2.4.20?)
>
> Here's what's in RH sources - tell me after you look that I am looking in
> the wrong place. (Or did RH get cute and decide to do this only for the AS
> product?)
>


From: "Jim C(dot) Nasby" <jim(at)nasby(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Pre-allocation of shared memory ...
Date: 2003-06-16 16:21:11
Message-ID: 20030616162111.GG40542@flake.decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jun 13, 2003 at 12:41:28PM -0400, Bruce Momjian wrote:
> Of course, if you exceed swap, your system hangs.

Are you sure? I ran out of swap once or came damn close, due to a cron
job gone amuck. My clue was starting to see lots of memory allocation
errors. After I fixed what was blocking all the backed-up cron jobs, the
machine ground to a crawl (mmm... system load of 400+ on a dual
PII-375), and X did crash (though I think that's because I tried
switching to a different virtual console), but the machine stayed up and
eventually worked through everything.
--
Jim C. Nasby (aka Decibel!) jim(at)nasby(dot)net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"