Re: memory strangeness (fwd)

Lists: pgsql-admin
From: Gregor Mosheh <stigmata(at)blackangel(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Subject: memory strangeness (fwd)
Date: 2002-07-05 01:11:52
Message-ID: 20020704181141.V44600-100000@osiris.deathkeep.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384
MB RAM. Because the system will be doing nothing except PG, I'd like to
dump as much memory as possible into PG's shared memory.

I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and
DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me:

# sysctl kern.ipc.shmall
kern.ipc.shmall: 42240
# sysctl kern.ipc.shmmax
kern.ipc.shmmax: 346030080

I still cannot set PG's shared_buffers higher than 20000 (160 MB):

IpcMemoryCreate: shmget(key=8971001, size=183017472, 03600) failed: Cannot
allocate memory
This error usually means that PostgreSQL's request for a shared
memory segment exceeded available memory or swap space.
To reduce the request size (currently 183017472 bytes), reduce
PostgreSQL's shared_buffers parameter (currently 22000) and/or
its max_connections parameter (currently 5).

That I'm out of RAM seems fishy, though, since there's very little running
and vmstat reports plenty of free RAM:

procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 ad3 in sy cs us sy
id
0 0 0 8484 353908 1 0 0 0 2 0 2 0 231 5 5 0 0
100

Any ideas as to what I missed?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregor Mosheh <stigmata(at)blackangel(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 01:30:32
Message-ID: 7081.1025832632@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Gregor Mosheh <stigmata(at)blackangel(dot)net> writes:
> Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384
> MB RAM. Because the system will be doing nothing except PG, I'd like to
> dump as much memory as possible into PG's shared memory.
> I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and
> DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me:

Very likely the kernel has its own ideas on how much memory it needs
to reserve for other purposes, and is unwilling to give you a shmem
segment that represents the bulk of physical RAM.

While I don't know FreeBSD well enough to speculate on exactly why it's
limiting you, I do think that you are going in the wrong direction
for Postgres anyhow. Pretty much everyone who has looked at the issue
has concluded that it's a mistake to try to set shared memory that high.
If you're trying to set it to more than a quarter of physical RAM you're
off track IMHO. In practice, given that this isn't an especially huge
system, I'd think a shared_buffers setting in the low thousands would be
appropriate.

Also, reducing max_connections as low as 5 seems the wrong direction
too ...

regards, tom lane


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Gregor Mosheh <stigmata(at)blackangel(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 02:31:45
Message-ID: Pine.NEB.4.44.0207051104460.478-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, 4 Jul 2002, Gregor Mosheh wrote:

> Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384
> MB RAM. Because the system will be doing nothing except PG, I'd like to
> dump as much memory as possible into PG's shared memory.

Well, see Tom's comments on this, 'cause that's what I always say
too. (I'm a NetBSD developer, but FreeBSD's internals aren't so
different, really.)

However, if you wanted to do it both ways, and benchmark your
application, I'd be really interested in hearing about the results.

> I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and
> DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me:

You don't need to rebuild for DFLDSIZ; you can always bump that
up to MAXDSIZ with sysctl (assuming you're root--do it before
you "su pgsql -c nadanada" in your startup script). And moving
MAXDSIZ to 330 MB looks like a reduction to me; FreeBSD's default in
4.6-RELEASE is 512 MB. (NetBSD's is 1GB.) You probably want to check
/usr/include/machine/vmparam.h for default settings before changing
stuff like this, lest you accidently lower it instead.

> I still cannot set PG's shared_buffers higher than 20000 (160 MB):

Shared memory pages, IIRC, are locked, meaning that they cannot be
swapped. There's always a limit on the number of locked pages you
may have in the system in total, if only becuase the system has only
so much physical RAM. But for some reasont he RLIMIT_MEMLOCK parameter
on my nearest FreeBSD 4.6 system is a bit bogus:

server2 $ ulimit -Ha | grep locked
lockedmem(kbytes) unlimited

(I certainly cannot lock an unlimited number of pages in this 1GB
machine! And for some reason it's also saying that my RLIMIT_RSS
is unlimited; again, this should return no more than the physical
RAM available in the machine.)

So I suspect you're running into some other, more secret, limit on
the number of pages that the system or a process may lock into RAM.
So you'll want to dig around the kernel to find out where this is,
and tweak it.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: Gregor Mosheh <stigmata(at)blackangel(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 03:18:22
Message-ID: 7780.1025839102@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Curt Sampson <cjs(at)cynic(dot)net> writes:
>> I still cannot set PG's shared_buffers higher than 20000 (160 MB):

> Shared memory pages, IIRC, are locked, meaning that they cannot be
> swapped.

Is that really how it works on *BSD? That's great if so --- it's
exactly what Postgres wants --- but you'll pardon my paranoia about
assuming that any part of the universe works the way I want it to...

There are Unixen that will happily swap shared-memory segments just
like anything else. You don't have to think hard to see that this
would be a dead loss for Postgres' shared disk buffers. For a clean
buffer page, if we don't have room for it we can:
(a) drop it from shared mem, and re-read it from the database file
next time we need it. Cost: one disk read.
(b) swap it out to swap area, and swap it in next time we need it.
Cost: one disk write and one read.
For a dirty page that we no longer have room for, we can:
(a) write it out to the database file and then drop it from shmem.
Cost: one disk write, and possibly a disk read if we need
the page again later.
(b) swap it out to swap area, then sometime later swap it in so
we can write it out to the database file. Cost: two writes
and a read.
So a swappable shared buffer always comes out behind.

The prospect of this effect is one of the reasons that I discourage
people from setting shared_buffers really high. If their kernel
will let them set shared_buffers to a very large fraction of RAM,
it probably means that the kernel is willing to swap shared memory,
and so they are going to end up behind not ahead.

What's more, exactly the same analysis applies to other chunks of
memory that the kernel might choose to swap out, if it's forced to
because it's under memory pressure due to PG shared_buffers chewing
too much of system RAM. Unless your swap disk is way faster than
your database disk, it's never a win to swap instead of sending
pages to their final resting place on the database disk.

Bottom line: you don't ever want to have PG shmem large enough that
it pushes the kernel into swapping. Even if the kernel will lock
shmem into RAM, you can still lose due to swapping other stuff.
So I'm pretty conservative about how much of physical RAM to assign
to shared buffers.

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Curt Sampson <cjs(at)cynic(dot)net>, Gregor Mosheh <stigmata(at)blackangel(dot)net>, pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 03:33:00
Message-ID: 200207050333.g653X1t28750@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Tom Lane wrote:
> Curt Sampson <cjs(at)cynic(dot)net> writes:
> >> I still cannot set PG's shared_buffers higher than 20000 (160 MB):
>
> > Shared memory pages, IIRC, are locked, meaning that they cannot be
> > swapped.
>
> Is that really how it works on *BSD? That's great if so --- it's
> exactly what Postgres wants --- but you'll pardon my paranoia about
> assuming that any part of the universe works the way I want it to...

Sure does. FreeBSD has a sysctl option to enable it, BSD/OS and I bet
NetBSD do it by default.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gregor Mosheh <stigmata(at)blackangel(dot)net>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 05:29:14
Message-ID: Pine.NEB.4.44.0207051247210.478-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, 4 Jul 2002, Tom Lane wrote:

> Curt Sampson <cjs(at)cynic(dot)net> writes:
> > Shared memory pages, IIRC, are locked, meaning that they cannot be
> > swapped.
>
> Is that really how it works on *BSD? That's great if so --- it's
> exactly what Postgres wants --- but you'll pardon my paranoia about
> assuming that any part of the universe works the way I want it to...

Well, note the IIRC. :-)

But as it happens, I'm wrong. (I was probably remembering the old
behaviour, from before the new VM system in NetBSD.)

Now, under NetBSD, the memory segment is an anonymous memory object
that is indeed pagable, so anybody allocating tons and tons of SysV
shared memory to postgres on a NetBSD box is going very, very wrong
indeed. FreeBSD uses pagable as well, though it appears that you can
set a kern.ipc.shm_use_phys sysctl to a value other than 0 to make it
allocate non-pagable memory to all future requests. (Segments already
created will still be pagable.)

> You don't have to think hard to see that this
> would be a dead loss for Postgres' shared disk buffers.

Quite so.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Gregor Mosheh <stigmata(at)blackangel(dot)net>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-05 17:47:23
Message-ID: 20020705101359.V48217-100000@osiris.deathkeep.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


First off, thanks for your help, guys.

I'm aware of the problems of over-allocating RAM, and I surely wouldn't
want to force the buffers into swap. (thanks, Curt, for
kern.ipc.shm_use_phys) On this particular system, though, it's doing
nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me
with some 170 MB of idle RAM.

Here are the limits pulled from vmparam.h, by Curt's suggestion:
#define MAXTSIZ (128UL*1024*1024) /* max text size */
#define DFLDSIZ (128UL*1024*1024) /* initial data size limit */
#define MAXDSIZ (512UL*1024*1024) /* max data size */

I'd read somewhere (obiously outdated) that the MAXDSIZ was 128 MB. I've
since rebooted with the kernel.GENERIC... What's the sysctl setting I use
to set/check the data size limits?

Tom: You said that max_connections shouldn't be set as low as 5. I only
intend to use 1 for the application, and I'll only need 1-2 for my own
admin use, so a setting of 3 should work for my needs. Is there a
technical reason it should be higher?

On Fri, 5 Jul 2002, Curt Sampson wrote:

> On Thu, 4 Jul 2002, Gregor Mosheh wrote:
>
> > Hiya. I've installed Postgres 7.2 on a dedicated FreeBSD system with 384
> > MB RAM. Because the system will be doing nothing except PG, I'd like to
> > dump as much memory as possible into PG's shared memory.
>
> Well, see Tom's comments on this, 'cause that's what I always say
> too. (I'm a NetBSD developer, but FreeBSD's internals aren't so
> different, really.)
>
> However, if you wanted to do it both ways, and benchmark your
> application, I'd be really interested in hearing about the results.
>
> > I rebuilt the kernel with very large limits: 330 MB on the MAXDSIZ and
> > DFLDSIZ, and 330 MB for SHMMAXPAGES. This gives me:
>
> You don't need to rebuild for DFLDSIZ; you can always bump that
> up to MAXDSIZ with sysctl (assuming you're root--do it before
> you "su pgsql -c nadanada" in your startup script). And moving
> MAXDSIZ to 330 MB looks like a reduction to me; FreeBSD's default in
> 4.6-RELEASE is 512 MB. (NetBSD's is 1GB.) You probably want to check
> /usr/include/machine/vmparam.h for default settings before changing
> stuff like this, lest you accidently lower it instead.
>
> > I still cannot set PG's shared_buffers higher than 20000 (160 MB):
>
> Shared memory pages, IIRC, are locked, meaning that they cannot be
> swapped. There's always a limit on the number of locked pages you
> may have in the system in total, if only becuase the system has only
> so much physical RAM. But for some reasont he RLIMIT_MEMLOCK parameter
> on my nearest FreeBSD 4.6 system is a bit bogus:
>
> server2 $ ulimit -Ha | grep locked
> lockedmem(kbytes) unlimited
>
> (I certainly cannot lock an unlimited number of pages in this 1GB
> machine! And for some reason it's also saying that my RLIMIT_RSS
> is unlimited; again, this should return no more than the physical
> RAM available in the machine.)
>
> So I suspect you're running into some other, more secret, limit on
> the number of pages that the system or a process may lock into RAM.
> So you'll want to dig around the kernel to find out where this is,
> and tweak it.
>
> cjs
> --
> Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
> Don't you know, in this new Dark Age, we're all light. --XTC
>
>

--
Gregor Mosheh, B.S. http://www.blackangel.net/

COMPUTER:
An electronic entity which performs sequences of useful steps in a
totally understandable, rigorously logical manner. If you believe
this, see me about a bridge I have for sale in Manhattan.


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Gregor Mosheh <stigmata(at)blackangel(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-07 04:06:00
Message-ID: Pine.NEB.4.44.0207071301240.487-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Fri, 5 Jul 2002, Gregor Mosheh wrote:

> I'm aware of the problems of over-allocating RAM, and I surely wouldn't
> want to force the buffers into swap. (thanks, Curt, for
> kern.ipc.shm_use_phys) On this particular system, though, it's doing
> nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me
> with some 170 MB of idle RAM.

No, that's not idle RAM; that's buffer cache. (FreeBSD, like most
modern Unix operating systems, will use any spare physical memory
to cache blocks read from the disk.) If you allocate 160/170, you
now have pretty much maximized your chances that postgres and the
operating system will be buffering the same data, and made your
memory as minimally effective as possible.

> I'd read somewhere (obiously outdated) that the MAXDSIZ was 128 MB. I've
> since rebooted with the kernel.GENERIC... What's the sysctl setting I use
> to set/check the data size limits?

It's not a sysctl, because those limits are settable per-process.
Use "ulimit -aS" to check the current (soft) limits, and "ulimit
-aH" to check the hard limits (beyond which the soft limits may
not be raised). Also see the getrlimit manpage.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Gregor Mosheh <stigmata(at)blackangel(dot)net>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-07 17:14:03
Message-ID: 20020707101235.G61642-100000@osiris.deathkeep.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin


> > kern.ipc.shm_use_phys) On this particular system, though, it's doing
> > nothing except PG. 384 MB of RAM, I can give PG 160 of it, which leaves me
> > with some 170 MB of idle RAM.
> No, that's not idle RAM; that's buffer cache. (FreeBSD, like most
> modern Unix operating systems, will use any spare physical memory

Oh, of course! How silly of me to forget that! You're entirely right.

> It's not a sysctl, because those limits are settable per-process.
> Use "ulimit -aS" to check the current (soft) limits, and "ulimit
> -aH" to check the hard limits (beyond which the soft limits may
> not be raised). Also see the getrlimit manpage.

I shall. Thanks a lot!


From: Tim Ellis <Tim(dot)Ellis(at)gamet(dot)com>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: stigmata(at)blackangel(dot)net, pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-08 18:23:04
Message-ID: 20020708112304.733fbe30.Tim.Ellis@gamet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Sun, 7 Jul 2002 13:06:00 +0900 (JST)
Curt Sampson <cjs(at)cynic(dot)net> wrote:

> > On this particular system, though, it's doing
> > nothing except PG. 384 MB of RAM, I can give PG 160 of it, which
> > leaves me with some 170 MB of idle RAM.
>
> No, that's not idle RAM; that's buffer cache. (FreeBSD, like most
> modern Unix operating systems, will use any spare physical memory
> to cache blocks read from the disk.) If you allocate 160/170, you
> now have pretty much maximized your chances that postgres and the
> operating system will be buffering the same data, and made your
> memory as minimally effective as possible.

Is there not a method of reading a disk block and instructing the OS not
to buffer it, or am I hallucinating? (My main RDBMS experience experience
is on Solaris, and I could swear that major RDBMSs on that OS will ask the
OS not to buffer disk I/O on behalf of the database)

--
Tim Ellis
DBA, Gamet


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Tim Ellis <Tim(dot)Ellis(at)gamet(dot)com>
Cc: stigmata(at)blackangel(dot)net, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: memory strangeness (fwd)
Date: 2002-07-09 02:31:05
Message-ID: Pine.NEB.4.44.0207091117510.21914-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Mon, 8 Jul 2002, Tim Ellis wrote:

> Is there not a method of reading a disk block and instructing the OS not
> to buffer it, or am I hallucinating?

Yes. Read the block from a raw device. Of course, it's up to you to deal
with an filesystem issues or whatever. :-)

FreeBSD also has the O_DIRECT flag, but that's not guaranteed not to
buffer the read. A much smarter way to do this is just to let the OS
deal with the caching, and use the mmap system call to map blocks you
want to modify into your processes' address space. This will also let
you buffer more than 2GB of data, since the number of blocks buffered
will not be limited by the maximum address space of a single process.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC


From: Tim Ellis <Tim(dot)Ellis(at)gamet(dot)com>
To: Curt Sampson <cjs(at)cynic(dot)net>
Cc: stigmata(at)blackangel(dot)net, pgsql-admin(at)postgresql(dot)org
Subject: Re: memory strangeness (fwd)
Date: 2002-07-09 17:19:47
Message-ID: 20020709101947.3bd8e3d0.Tim.Ellis@gamet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

> > Is there not a method of reading a disk block and instructing the OS
> > not to buffer it, or am I hallucinating?
>
> Yes. Read the block from a raw device. Of course, it's up to you to deal
> with an filesystem issues or whatever. :-)
>
> FreeBSD also has the O_DIRECT flag, but that's not guaranteed not to
> buffer the read. A much smarter way to do this is just to let the OS
> deal with the caching, and use the mmap system call to map blocks you
> want to modify into your processes' address space. This will also let
> you buffer more than 2GB of data, since the number of blocks buffered
> will not be limited by the maximum address space of a single process.

A very diplomatic way to say: "Essentially you are hallucinating" :)

I've always been a proponent of letting the OS do what it does best, so
this letting-the-OS-buffer-for-you concept, to me, is very appealing.

You still must know when the OS-buffered data has truly made it to disk
while not needing to submit lots of sync() calls. Part of that nasty
requirement that databases should guarantee data consistency and
integrity, while still performing well...

--
Tim Ellis
DBA, Gamet


From: Curt Sampson <cjs(at)cynic(dot)net>
To: Tim Ellis <Tim(dot)Ellis(at)gamet(dot)com>
Cc: stigmata(at)blackangel(dot)net, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: memory strangeness (fwd)
Date: 2002-07-10 04:24:41
Message-ID: Pine.NEB.4.44.0207101319080.21914-100000@angelic.cynic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Tue, 9 Jul 2002, Tim Ellis wrote:

> I've always been a proponent of letting the OS do what it does best, so
> this letting-the-OS-buffer-for-you concept, to me, is very appealing.

Me too. :-) Especially since it's generally going to be doing some sort
of buffering anyway, and has a whole bunch of code designed to optimzize
this buffering as best as possible.

> You still must know when the OS-buffered data has truly made it to disk
> while not needing to submit lots of sync() calls.

When you need to force blocks to disk, you just msync() them.

The bigger problem, if you're using mmap, is how to make sure your
modifications *don't* get written to disk until the WAL blocks have been
written. There's no standard way to tell the OS that it may *not* yet
write back certain blocks. So instead you'd probably have to reserve the
space in a block, keep a separate list of changes to the block that any
other processes must also examine, and only apply those changes to the
block after the WAL entry is confirmed written to stable storage.

cjs
--
Curt Sampson <cjs(at)cynic(dot)net> +81 90 7737 2974 http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light. --XTC