Re: Postgres filling up hard drive with swap files

Lists: pgsql-general
From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Postgres filling up hard drive with swap files
Date: 2004-08-20 15:25:20
Message-ID: 25976F38-F2BD-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

This thread was renamed. It used to be: "shared_buffers Question". The
old thread kind of died out. I'm hoping to get some more direction by
rephrasing the problem, along with some extra observations I've
recently made.

The core of the problem is that Postgres is filling up my hard drive
with swap files at the rate of around 3 to 7 GB per week (that's
Gigabytes not Megabytes) . At this rate it takes roughly two months to
fill up my 40 GB hard drive with swap files. When the hard drive is
full, Postgres crashes and all the 200 connected clients hang. It's not
pretty. I can see the swap files and watch them grow over time. They
are located on my Mac OS 10.3.2 eMac at /var/vm/. I know the swap files
come from Postgres processes because, if I stop the Postgres server,
the swap files go away and the hard drive space is freed up.

When I look at each individual Postgres back-end process in the
Activity Monitor, each one seems to take up a lot of memory. Processes
usually jump to 60.04 MB of "virtual memory" soon after they are first
created. However, "real memory" use ranges 1 MB to 10 MB per process.

I'm looking for advice on what settings to look at in order to solve
the problem of my hard drive filling up with swap files. My guess is
that it must be something I'm doing, since I would find it hard to
imagine that Postgres has an *innate* memory leak that big on OS X.
Here are the only settings I have customized for my Postgres
installation, in case it helps:

In /etc/profile:
ulimit -u 512 # increases the max processes per user (from 100 to 512)
at the shell level.
ulimit -n 40000 # increases the max number of open files allowed by the
shell level (used to be 8000).

In postgresql.conf:
max_connections = 200
shared_buffers = 2000

In /etc/rc:
sysctl -w kern.sysv.shmmax=167772160
sysctl -w kern.sysv.shmall=65536

In /etc/sysctl.conf:
kern.maxproc=2048
kern.maxprocperuid=512 # Turn up the maxproc per user
kern.maxfiles=40000 # Turn up max files
kern.maxfilesperproc=30000 # Turn up max files per process

Thanks for any insight you can provide!

Joe


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 16:15:26
Message-ID: 20040820161526.GB5940@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Aug 20, 2004 at 10:25:20AM -0500, Joe Lester wrote:
> This thread was renamed. It used to be: "shared_buffers Question".
> The old thread kind of died out. I'm hoping to get some more direction
> by rephrasing the problem, along with some extra observations I've
> recently made.
>
> The core of the problem is that Postgres is filling up my hard drive
> with swap files at the rate of around 3 to 7 GB per week (that's
> Gigabytes not Megabytes).

Maybe you said this before, but I didn't follow the previous thread.
What do you mean by swap files? Where are those files located? Can you
show a listing of them?

I assume, of course, that your data proper does not grow linearly with
those "swap files," does it?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Si no sabes adonde vas, es muy probable que acabes en otra parte.


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 16:45:18
Message-ID: 51925696-F2C8-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the response!

> Maybe you said this before, but I didn't follow the previous thread.
> What do you mean by swap files?

By swap files I mean the files that contain the memory that the OS is
maintaining on the hard disk, after physical RAM has overflowed.

> Where are those files located?

I can see the swap files and watch them grow over time. They are
located on my Mac OS 10.3.2 eMac at /var/vm/.

> Can you show a listing of them?

Right now there are not that many since I just restarted the server
recently. But already, they total 3 GB.

officelink:/var/vm root# ls -lh
drwx--x--x 14 root wheel 476B 17 Feb 2004 app_profile
-rw------T 1 root wheel 64M 31 Jul 09:10 swapfile0
-rw------T 1 root wheel 64M 31 Jul 15:13 swapfile1
-rw------T 1 root wheel 128M 31 Jul 16:08 swapfile2
-rw------T 1 root wheel 256M 3 Aug 13:20 swapfile3
-rw------T 1 root wheel 512M 18 Aug 14:29 swapfile4
-rw------T 1 root wheel 512M 18 Aug 16:24 swapfile5
-rw------T 1 root wheel 512M 18 Aug 23:30 swapfile6
-rw------T 1 root wheel 512M 19 Aug 12:59 swapfile7
-rw------T 1 root wheel 512M 20 Aug 09:56 swapfile8

> I assume, of course, that your data proper does not grow linearly with
> those "swap files," does it?

The total amount of disk space consumed by the swap files vastly
exceeds the size of my postgres data directory, if that's what you're
asking, which is about 300 MB or so.


From: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 16:57:29
Message-ID: 20040820165729.GI5940@dcc.uchile.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Aug 20, 2004 at 11:45:18AM -0500, Joe Lester wrote:

Joe,

> >Maybe you said this before, but I didn't follow the previous thread.
> >What do you mean by swap files?
>
> By swap files I mean the files that contain the memory that the OS is
> maintaining on the hard disk, after physical RAM has overflowed.

Wow, this would be the most spectacular memory leak I have ever seen.
How do you know these files are related to the Postgres processes?
Could we see a top extract, or some equivalent showing a process list
and their memory sizes?

--
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Jajaja! Solo hablaba en serio!


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, alvherre(at)dcc(dot)uchile(dot)cl
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 17:02:18
Message-ID: 20040820130218.41736683.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe Lester <joe_lester(at)sweetwater(dot)com> wrote:

> Thanks for the response!
>
> > Maybe you said this before, but I didn't follow the previous thread.
> > What do you mean by swap files?
>
> By swap files I mean the files that contain the memory that the OS is
> maintaining on the hard disk, after physical RAM has overflowed.
>
> > Where are those files located?
>
> I can see the swap files and watch them grow over time. They are
> located on my Mac OS 10.3.2 eMac at /var/vm/.
>
> > Can you show a listing of them?
>
> Right now there are not that many since I just restarted the server
> recently. But already, they total 3 GB.
>
> officelink:/var/vm root# ls -lh
> drwx--x--x 14 root wheel 476B 17 Feb 2004 app_profile
> -rw------T 1 root wheel 64M 31 Jul 09:10 swapfile0
> -rw------T 1 root wheel 64M 31 Jul 15:13 swapfile1
> -rw------T 1 root wheel 128M 31 Jul 16:08 swapfile2
> -rw------T 1 root wheel 256M 3 Aug 13:20 swapfile3
> -rw------T 1 root wheel 512M 18 Aug 14:29 swapfile4
> -rw------T 1 root wheel 512M 18 Aug 16:24 swapfile5
> -rw------T 1 root wheel 512M 18 Aug 23:30 swapfile6
> -rw------T 1 root wheel 512M 19 Aug 12:59 swapfile7
> -rw------T 1 root wheel 512M 20 Aug 09:56 swapfile8
>
> > I assume, of course, that your data proper does not grow linearly with
> > those "swap files," does it?
>
> The total amount of disk space consumed by the swap files vastly
> exceeds the size of my postgres data directory, if that's what you're
> asking, which is about 300 MB or so.

How many Postgres processes are running? Does this number increase with
the memory usage.

Simple fact of the matter is that you have apparently found a memory
leak. How Mac OS X deals with swapping is (more or less) unimportant.

I'm wondering, however, if you have a connection leak instead. i.e.
is it possible that your client application is opening a whole bunch
of connections and never closing them? You did show that you have
a max # of connection of 200. That's pretty high, unless you've got
a lot of RAM in that machine.

A more accurate description of the problem would be:
How many connections are actually open?
How much memory is actually in use by Postgres processes? (The amount of
swap in use is unimportant to the Postgres folks, it's an OS thing)

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 18:09:38
Message-ID: 197B34DA-F2D4-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> Wow, this would be the most spectacular memory leak I have ever seen.
> How do you know these files are related to the Postgres processes?

Because if I stop the Postgres server, the swap files go away and the
hard drive space is freed up.

> Could we see a top extract, or some equivalent showing a process list
> and their memory sizes?

Here's a partial top extract. As noted before, the VSIZE is very high
on all.

Processes: 231 total, 3 running, 228 sleeping... 300 threads
13:05:16
Load Avg: 0.12, 0.07, 0.02 CPU usage: 6.3% user, 15.9% sys, 77.8%
idle
SharedLibs: num = 102, resident = 14.0M code, 1.21M data, 2.77M
LinkEdit
MemRegions: num = 9888, resident = 134M + 7.73M private, 32.9M shared
PhysMem: 78.6M wired, 283M active, 141M inactive, 504M used, 7.82M
free
VM: 12.1G + 70.5M 465206(0) pageins, 1792391(0) pageouts

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
VSIZE
14326 top 19.7% 0:03.13 1 17 26 364K 384K 740K
27.1M
14293 bash 0.0% 0:00.13 1 12 18 124K 852K 796K
18.2M
14291 sshd 0.0% 0:00.49 1 15 42 504K 884K 1.43M
30.0M
14290 postgres 0.0% 0:00.31 1 9 32 860K 16.9M 6.90M
59.5M
14289 ftpd 0.0% 0:00.02 1 15 34 152K 640K 772K
27.4M
14264 postgres 0.0% 0:00.31 1 9 32 860K 16.9M 6.32M
59.5M
14235 postgres 0.0% 0:01.36 1 9 33 880K 16.9M 9.62M
60.0M
14224 postgres 0.0% 0:00.27 1 9 32 856K 16.9M 6.26M
59.5M
14220 postgres 0.0% 0:00.24 1 9 32 840K 16.9M 6.50M
59.5M
14218 postgres 0.0% 0:39.17 1 9 32 860K 16.9M 10.7M
59.5M
14113 postgres 0.0% 0:00.21 1 9 32 648K 16.9M 4.69M
59.5M
14075 postgres 0.0% 1:19.88 1 9 33 756K 16.9M 7.84M
60.0M
14063 postgres 0.0% 0:00.46 1 9 33 680K 16.9M 7.38M
60.0M
14061 postgres 0.0% 0:00.18 1 9 32 616K 16.9M 4.48M
59.5M
14058 postgres 0.0% 0:00.60 1 9 33 720K 16.9M 6.53M
60.0M
14055 postgres 0.0% 0:00.17 1 9 32 496K 16.9M 1.77M
59.5M
14028 postgres 0.0% 0:00.18 1 9 32 448K 16.9M 1.68M
59.5M
14015 postgres 0.0% 0:18.61 1 9 33 844K 16.9M 8.61M
60.0M
14013 postgres 0.0% 0:19.65 1 9 33 784K 16.9M 7.21M
60.0M
14009 postgres 0.0% 2:05.01 1 9 33 940K 16.9M 10.9M
60.0M
13980 postgres 0.0% 0:00.47 1 9 32 804K 16.9M 2.70M
59.5M
13444 postgres 0.0% 0:16.70 1 9 33 792K 16.9M 10.7M
60.0M
13415 postgres 0.0% 0:02.03 1 9 33 936K 16.9M 6.74M
60.0M
13409 postgres 0.0% 0:01.75 1 9 32 508K 16.9M 1.96M
59.5M
13355 postgres 0.0% 0:01.75 1 9 33 884K 16.9M 5.50M
60.0M
13339 postgres 0.0% 0:19.49 1 9 33 716K 16.9M 7.21M
60.0M
13337 postgres 0.0% 0:01.93 1 9 33 848K 16.9M 3.56M
60.0M
13321 postgres 0.0% 0:01.88 1 9 33 868K 16.9M 5.89M
60.0M


From: Jeff <threshar(at)torgo(dot)978(dot)org>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 18:28:43
Message-ID: C4393ECA-F2D6-11D8-AAA9-000D9366F0C4@torgo.978.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Aug 20, 2004, at 2:09 PM, Joe Lester wrote:

>
> H PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
> VSIZE
> 14326 top 19.7% 0:03.13 1 17 26 364K 384K 740K
> 27.1M
> 14293 bash 0.0% 0:00.13 1 12 18 124K 852K 796K
> 18.2M
> 14291 sshd 0.0% 0:00.49 1 15 42 504K 884K 1.43M
> 30.0M
> 14290 postgres 0.0% 0:00.31 1 9 32 860K 16.9M 6.90M
> 59.5M

On osx the number to look at isn't vsize, but actually rprvt. vsize is
its size in virtual memory (basically useless)- rprvt is the size of
its "private memory" - non shared memory. It is a pretty good number to
go by.

But I believe the whole problem is the emac only has 512MB of ram and
you simply don't have enough memory for that many connections. Even
with each connection sucking up 2MB of memory that is 400MB and as you
can see.. most of them are sucking 4-8MB.

If you cannot upgrade I'd recommend looking into something like pgpool
so you can pool your connections (this results in having a lot fewer
instances of postgres, thus saving tons of memory)

If you watch top I'm guessing you'll see a TON of pagein/pageouts
(watch the numbers change. it displays the difference in parenthesis
after updates)

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.jefftrout.com/
http://www.stuarthamm.net/


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Bill Moran <wmoran(at)potentialtech(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 18:33:21
Message-ID: 69C06FBA-F2D7-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> How many Postgres processes are running?

146 right now. Most of them are idle at any given point in time.

> Does this number increase with
> the memory usage.

No. There are 140 - 150 postgres processes running on average, but the
memory usage keeps going up and up.

> Simple fact of the matter is that you have apparently found a memory
> leak. How Mac OS X deals with swapping is (more or less) unimportant.

Wow, I sure hope not. I was really hoping to take care of the problem.

> I'm wondering, however, if you have a connection leak instead. i.e.
> is it possible that your client application is opening a whole bunch
> of connections and never closing them?

No. The clients open only one connection (and hang onto it for dear
life :-).

> You did show that you have
> a max # of connection of 200. That's pretty high, unless you've got
> a lot of RAM in that machine.

I have 512 MB of RAM in the machine. The server is performing
wonderfully. It's just that the swap files keep sprouting like weeds.

> A more accurate description of the problem would be:
> How many connections are actually open?

146 right now. That's about average.

> How much memory is actually in use by Postgres processes? (The amount
> of
> swap in use is unimportant to the Postgres folks, it's an OS thing)

This is where I could use some pointers. The following line is a top
entry for a single postgres process. Hope that helps.

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
VSIZE
14235 postgres 0.0% 0:01.36 1 9 33 880K 16.9M 9.62M
60.0M


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, alvherre(at)dcc(dot)uchile(dot)cl
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:16:07
Message-ID: 20040820151607.414c7703.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe Lester <joe_lester(at)sweetwater(dot)com> wrote:
> > How many Postgres processes are running?
>
> 146 right now. Most of them are idle at any given point in time.

That's a lot for a machine with only 512M of RAM.

> > I'm wondering, however, if you have a connection leak instead. i.e.
> > is it possible that your client application is opening a whole bunch
> > of connections and never closing them?
>
> No. The clients open only one connection (and hang onto it for dear
> life :-).

If these clients aren't utilizing the database, it might be worthwhile
to have them disconnect after a period of inactivity, and reconnect when
things get busy again.

> > You did show that you have
> > a max # of connection of 200. That's pretty high, unless you've got
> > a lot of RAM in that machine.
>
> I have 512 MB of RAM in the machine.

That's not a lot of RAM. I have 512M in a machine that's only designed
to handle 20 connections (although that's MS Windows + MSSQL ... but
you get the idea ... we're talking a factor of 10 here)

> The server is performing
> wonderfully. It's just that the swap files keep sprouting like weeds.

I would expect that if you ignore it for a while, eventually it will
reach an equalibrium. (where it's not increasing the amount of swap in
use) but it will always hurt performance any time is has to page in or
out.

> > How much memory is actually in use by Postgres processes? (The amount
> > of
> > swap in use is unimportant to the Postgres folks, it's an OS thing)
>
> This is where I could use some pointers. The following line is a top
> entry for a single postgres process. Hope that helps.
>
> PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE
> 14235 postgres 0.0% 0:01.36 1 9 33 880K 16.9M 9.62M 60.0M

Please don't wrap machine-generated output ... it makes it VERY difficult
to understand.

I'll defer this answer to Jeff, as he seems to know quite a bit more about
how Darwin manages memory than I do.

His recommendation to try pgpool was also good.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Jeff <threshar(at)torgo(dot)978(dot)org>, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:18:43
Message-ID: C0517E07-F2DD-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> On osx the number to look at isn't vsize, but actually rprvt. vsize
> is its size in virtual memory (basically useless)- rprvt is the size
> of its "private memory" - non shared memory. It is a pretty good
> number to go by.
> But I believe the whole problem is the emac only has 512MB of ram and
> you simply don't have enough memory for that many connections. Even
> with each connection sucking up 2MB of memory that is 400MB and as you
> can see.. most of them are sucking 4-8MB.

Thanks for the response, Jeff. The RPRVT, which you say is best to go
by, shows only 4K-2MB per connection. That's less than the 4-8MB size
you reference which I believe comes from the RSIZE column in top. So
according to RSIZE I should add more RAM. But according to RPRVT I
should be OK with 512 MB of RAM. I'm a little confused on which column
to go by.

Also, even if I need to add more RAM, it still strikes me as more than
a little weird that a memory overflow situation gradually eats up all
the space on my hard drive. That's the problem I'd really like to solve
if possible.

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
VSIZE
14378 postgres 0.0% 0:00.41 1 9 33 880K 17.8M 8.31M
60.0M
14359 postgres 0.0% 0:00.90 1 9 32 860K 17.8M 4.12M
59.5M
14346 postgres 0.0% 0:00.31 1 9 32 840K 17.8M 2.26M
59.5M
14342 postgres 0.0% 0:01.08 1 9 33 880K 17.8M 7.86M
60.0M
14290 postgres 0.0% 0:00.55 1 9 32 868K 17.8M 3.09M
59.5M
14235 postgres 0.0% 0:03.39 1 9 33 880K 17.8M 7.68M
60.0M
14224 postgres 0.0% 0:00.43 1 9 32 864K 17.8M 2.55M
59.5M
14220 postgres 0.0% 0:00.40 1 9 33 888K 17.8M 3.02M
60.0M
14218 postgres 0.0% 1:07.53 1 9 33 2.04M 17.8M 10.9M
60.0M
14075 postgres 0.0% 1:40.98 1 9 33 872K 17.8M 10.5M
60.0M
14063 postgres 0.0% 0:00.55 1 9 33 892K 17.8M 3.27M
60.0M

> If you cannot upgrade I'd recommend looking into something like pgpool
> so you can pool your connections (this results in having a lot fewer
> instances of postgres, thus saving tons of memory)

Huh. Cool. Do you have any experience of what the performance hit would
be for using pgpool?


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: postgres list <pgsql-general(at)postgresql(dot)org>, Nader Nafissi <nader(at)apple(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:28:39
Message-ID: 1648.1093030119@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe Lester <joe_lester(at)sweetwater(dot)com> writes:
>> Simple fact of the matter is that you have apparently found a memory
>> leak. How Mac OS X deals with swapping is (more or less) unimportant.

> Wow, I sure hope not. I was really hoping to take care of the problem.

I think what you've found is an OS X bug.

I was able to replicate this behavior on OS X 10.3.5. All I did was
start the postmaster and then start a continuous loop in a shell window:
while true
do
psql -c "select count(*) from tenk1" regression
done
(tenk1 is just a test table with 10000 or so rows ... not very big.)
After a few thousand iterations I have more swapfiles than I did before.
The postmaster itself is certainly not leaking memory, and there are no
backends lasting longer than a fraction of a second, but the machine is
acting like it's got a problem. Watching top, I see the "free PhysMem"
steadily decrease to zero and then bounce back up to about 645M (out of
768M installed). Each time it bounces up, the VM pageouts count takes a
jump (otherwise pageouts doesn't move), and there's also a burst of disk
activity according to iostat. The cycle repeats every 45 seconds or so.
Meanwhile, the entire system has become exceedingly sluggish (opening
Safari is painful, for example, and even just switching front
application is visibly slow).

What I think is happening is that the system thinks that the memory
associated with the exited backends is still in use, and hence
faithfully hangs onto it and eventually swaps it out. Almost certainly,
this bug is specifically triggered by our use of SysV shared memory.
If Apple had this bug for every process spawned by OS X, they'd have
noticed and fixed it long since ;-). But SysV shmem is not a popular
feature on OS X (as evidenced by the ridiculously low default limits
on it) and it's easy to imagine such a bug going unnoticed if it's
only triggered by shmem usage.

A crude estimate of the leakage rate I'm seeing is about 2.5MB per
exited backend, which is in the same general area as the size of the
shared memory segment, so it seems plausible that the system is somehow
treating an "afterimage" of the shmem segment as live data that it needs
to swap out.

BTW, I tried hacking the backend to forcibly shmdt() just before exit,
just to see if that would work around the problem. No go; doesn't seem
to change the behavior at all. I can't think of anything else we could
do at the application level to dodge the problem.

Time to file a bug report. With Apple, not with us.

regards, tom lane


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, Bill Moran <wmoran(at)potentialtech(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:30:00
Message-ID: 540811EE-F2DF-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> If these clients aren't utilizing the database, it might be worthwhile
> to have them disconnect after a period of inactivity, and reconnect
> when
> things get busy again.

That's a good idea, except in the future, all the clients will be
active most of the time. So, I'd like to get the server to the point
where it can handle 150-200 client connections gracefully.

> I would expect that if you ignore it for a while, eventually it will
> reach an equalibrium. (where it's not increasing the amount of swap in
> use) but it will always hurt performance any time is has to page in or
> out.

Unfortunately, it does not reach an equilibrium. It just keeps eating
disk space until it's all gone.

> Please don't wrap machine-generated output ... it makes it VERY
> difficult
> to understand.

Please forgive me. What do you mean by "wrap machine-generated output".
I would love to oblige.


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nader Nafissi <nader(at)apple(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:40:37
Message-ID: CF4861C4-F2E0-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 20, 2004, at 2:28 PM, Tom Lane wrote:
> I think what you've found is an OS X bug.
>
> I was able to replicate this behavior on OS X 10.3.5. All I did was
> start the postmaster and then start a continuous loop in a shell
> window:
> while true
> do
> psql -c "select count(*) from tenk1" regression
> done
> (tenk1 is just a test table with 10000 or so rows ... not very big.)
> After a few thousand iterations I have more swapfiles than I did
> before.
> The postmaster itself is certainly not leaking memory, and there are no
> backends lasting longer than a fraction of a second, but the machine is
> acting like it's got a problem. Watching top, I see the "free PhysMem"
> steadily decrease to zero and then bounce back up to about 645M (out of
> 768M installed). Each time it bounces up, the VM pageouts count takes
> a
> jump (otherwise pageouts doesn't move), and there's also a burst of
> disk
> activity according to iostat. The cycle repeats every 45 seconds or
> so.
> Meanwhile, the entire system has become exceedingly sluggish (opening
> Safari is painful, for example, and even just switching front
> application is visibly slow).

Yes, my system also becomes increasingly sluggish in the manner you
describe as the swap files increase in number and size.

> Time to file a bug report. With Apple, not with us.

I will. Thanks Tom!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: Joe Lester <joe_lester(at)sweetwater(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:43:15
Message-ID: 1784.1093030995@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Bill Moran <wmoran(at)potentialtech(dot)com> writes:
> Joe Lester <joe_lester(at)sweetwater(dot)com> wrote:
>>> I'm wondering, however, if you have a connection leak instead. i.e.
>>> is it possible that your client application is opening a whole bunch
>>> of connections and never closing them?
>>
>> No. The clients open only one connection (and hang onto it for dear
>> life :-).

> If these clients aren't utilizing the database, it might be worthwhile
> to have them disconnect after a period of inactivity, and reconnect when
> things get busy again.

If my theory is right, this would actually be counterproductive. The
leak I think I'm seeing is associated with backend exit and so the way
to slow it as much as possible is to prolong backend lifetime as much
as possible. Joe, what is the mean lifetime of your connections anyway?
I assume they don't stay up forever.

regards, tom lane


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 19:59:27
Message-ID: 71050674-F2E3-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 20, 2004, at 2:43 PM, Tom Lane wrote:
> Bill Moran <wmoran(at)potentialtech(dot)com> writes:
>> Joe Lester <joe_lester(at)sweetwater(dot)com> wrote:
>>>> I'm wondering, however, if you have a connection leak instead. i.e.
>>>> is it possible that your client application is opening a whole bunch
>>>> of connections and never closing them?
>>>
>>> No. The clients open only one connection (and hang onto it for dear
>>> life :-).
>
>> If these clients aren't utilizing the database, it might be worthwhile
>> to have them disconnect after a period of inactivity, and reconnect
>> when
>> things get busy again.
>
> If my theory is right, this would actually be counterproductive. The
> leak I think I'm seeing is associated with backend exit and so the way
> to slow it as much as possible is to prolong backend lifetime as much
> as possible. Joe, what is the mean lifetime of your connections
> anyway?
> I assume they don't stay up forever.

They are "permanent connections", meaning that the same connection
stays open on the server as long as the client application is running.
And it's common for the clients to stay running for days at a time. I'd
say the average length of a connection is 3 days.


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 20:01:11
Message-ID: 20040820160111.00bc2454.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe Lester <joe_lester(at)sweetwater(dot)com> wrote:

> > If these clients aren't utilizing the database, it might be worthwhile
> > to have them disconnect after a period of inactivity, and reconnect
> > when
> > things get busy again.
>
> That's a good idea, except in the future, all the clients will be
> active most of the time. So, I'd like to get the server to the point
> where it can handle 150-200 client connections gracefully.

Ahh ...

> > I would expect that if you ignore it for a while, eventually it will
> > reach an equalibrium. (where it's not increasing the amount of swap in
> > use) but it will always hurt performance any time is has to page in or
> > out.
>
> Unfortunately, it does not reach an equilibrium. It just keeps eating
> disk space until it's all gone.

Well ... I was wrong, and per Tom's post, you've found a problem in
Darwin/OS X.

> > Please don't wrap machine-generated output ... it makes it VERY
> > difficult
> > to understand.
>
> Please forgive me. What do you mean by "wrap machine-generated output".
> I would love to oblige.

Well, you're wrapping everything. Notice how my part of the conversation
above is ugly. It should look like this:

> > Please don't wrap machine-generated output ... it makes it VERY difficult
> > to understand.

In the case of 'machine-generated output', I was specifically talking
about top. You sent this:

Processes: 231 total, 3 running, 228 sleeping... 300 threads
13:05:16
Load Avg: 0.12, 0.07, 0.02 CPU usage: 6.3% user, 15.9% sys, 77.8%
idle
SharedLibs: num = 102, resident = 14.0M code, 1.21M data, 2.77M
LinkEdit
MemRegions: num = 9888, resident = 134M + 7.73M private, 32.9M shared
PhysMem: 78.6M wired, 283M active, 141M inactive, 504M used, 7.82M
free
VM: 12.1G + 70.5M 465206(0) pageins, 1792391(0) pageouts

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE
VSIZE
14326 top 19.7% 0:03.13 1 17 26 364K 384K 740K
27.1M
...

Notice how incredibly difficult it is to make sense of the top output.
Compare with this:

Processes: 231 total, 3 running, 228 sleeping... 300 threads 13:05:16
Load Avg: 0.12, 0.07, 0.02 CPU usage: 6.3% user, 15.9% sys, 77.8% idle
SharedLibs: num = 102, resident = 14.0M code, 1.21M data, 2.77M LinkEdit
MemRegions: num = 9888, resident = 134M + 7.73M private, 32.9M shared
PhysMem: 78.6M wired, 283M active, 141M inactive, 504M used, 7.82M free
VM: 12.1G + 70.5M 465206(0) pageins, 1792391(0) pageouts

PID COMMAND %CPU TIME #TH #PRTS #MREGS RPRVT RSHRD RSIZE VSIZE
14326 top 19.7% 0:03.13 1 17 26 364K 384K 740K 27.1M
...

This is usually caused by a setting in your mail client that reads
something like "wrap lines at 72 characters" being turned on.

You should wrap your text at 72 chars when you're typing, (so it displays
readibly on most mail programs) but it's not a good idea to arbitrarily
wrap _all_ text in a message to any line length. Doing so usually ends
up making a mess of some part of the message.

On another note: sorry about leading you in the wrong direction on the
problem, but I'm glad Tom was able to isolate it for you.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: postgres list <pgsql-general(at)postgresql(dot)org>, Bill Moran <wmoran(at)potentialtech(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 20:08:14
Message-ID: 2031.1093032494@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Joe Lester <joe_lester(at)sweetwater(dot)com> writes:
> And it's common for the clients to stay running for days at a time. I'd
> say the average length of a connection is 3 days.

Uh-huh. Can you do anything to increase that?

Another possibility for slowing the leakage rate (pending a real fix)
is to decrease the size of your shared memory segment, ie, reduce
shared_buffers and the other shared-memory sizing parameters as much
as you can without killing performance.

regards, tom lane


From: Joe Lester <joe_lester(at)sweetwater(dot)com>
To: postgres list <pgsql-general(at)postgresql(dot)org>
Cc: Bill Moran <wmoran(at)potentialtech(dot)com>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-20 20:27:07
Message-ID: 4EA49CC2-F2E7-11D8-BDD7-000A95A58EA0@sweetwater.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Aug 20, 2004, at 3:01 PM, Bill Moran wrote:
>>> Please don't wrap machine-generated output ... it makes it VERY
>>> difficult
>>> to understand.
>
> This is usually caused by a setting in your mail client that reads
> something like "wrap lines at 72 characters" being turned on.
>
> You should wrap your text at 72 chars when you're typing, (so it
> displays
> readibly on most mail programs) but it's not a good idea to arbitrarily
> wrap _all_ text in a message to any line length. Doing so usually ends
> up making a mess of some part of the message.

I'll try to be sensitive to that. Unfortunately, my mail client forces
the text to wrap and provides no override preference. I'm using
Mail.app, so if anyone knows of a workaround or solution, please let me
know. Thanks.


From: Jeff <threshar(at)torgo(dot)978(dot)org>
To: Joe Lester <joe_lester(at)sweetwater(dot)com>
Cc: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>, postgres list <pgsql-general(at)postgresql(dot)org>
Subject: Re: Postgres filling up hard drive with swap files
Date: 2004-08-21 01:46:23
Message-ID: E82CAF4B-F313-11D8-AAA9-000D9366F0C4@torgo.978.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Aug 20, 2004, at 3:18 PM, Joe Lester wrote:

> Huh. Cool. Do you have any experience of what the performance hit
> would be for using pgpool?
>
I've only had performance increases with pgpool. Then again, my
connections are common of web apps (as web apps connect) - many many
short lived connections.

Given you have long lived connections.. I'm not sure if pgpool would
help.
It may, give it a whirl.

however it seems Tom found the actual problem.

--
Jeff Trout <jeff(at)jefftrout(dot)com>
http://www.jefftrout.com/
http://www.stuarthamm.net/