Re: One source of constant annoyance identified

Lists: pgsql-general
From: "Markus Wollny" <Markus(dot)Wollny(at)computec(dot)de>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: One source of constant annoyance identified
Date: 2002-07-03 13:23:39
Message-ID: 2266D0630E43BB4290742247C8910575014CE2C8@dozer.computec.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Okay, now it's even more annoying...

We just upgraded the server from 1GB to 2GB of RAM because it kept
swapping out about 300MB.

I updated postgresql.conf accordingly, setting

max_connections = 190 (this is the minimum we need to satisfy webservers
and backend-jobs)
shared_buffers = 60000 (that's 468,75MB; I took this value assuming a
recommended value of 25% of RAM)
sort_mem = 80000 (that's 78,125MB, recommended value is 4% of RAM,
equalling 82MB).

Now top-output sorted by memory usage turns out to be:

2:43pm up 1:34, 3 users, load average: 3.29, 2.25, 2.08
123 processes: 118 sleeping, 5 running, 0 zombie, 0 stopped
CPU0 states: 60.5% user, 3.1% system, 0.0% nice, 35.4% idle
CPU1 states: 47.3% user, 15.3% system, 0.0% nice, 36.4% idle
CPU2 states: 51.0% user, 17.1% system, 0.0% nice, 31.3% idle
CPU3 states: 75.4% user, 4.0% system, 0.0% nice, 20.0% idle
Mem: 2061560K av, 2054268K used, 7292K free, 0K shrd, 13924K
buff
Swap: 2097136K av, 7788K used, 2089348K free 1825104K
cached

PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
1652 postgres 9 0 469M 469M 467M S 0.0 23.3 2:21 postmaster
2041 postgres 9 0 431M 431M 429M S 1.2 21.4 1:32 postmaster
1588 postgres 9 0 411M 411M 410M S 15.6 20.4 1:25 postmaster
1597 postgres 9 0 289M 289M 287M S 0.1 14.3 0:31 postmaster
1849 postgres 9 0 259M 259M 258M S 0.0 12.9 0:14 postmaster
2046 postgres 9 0 239M 239M 238M S 0.0 11.9 0:25 postmaster
1973 postgres 9 0 172M 172M 171M S 0.0 8.5 0:21 postmaster
2142 postgres 9 0 128M 128M 127M S 0.0 6.3 0:04 postmaster
2156 postgres 9 0 116M 116M 114M S 0.0 5.7 0:02 postmaster
1598 postgres 9 0 86548 84M 84752 S 1.7 4.1 2:36 postmaster
1608 postgres 9 0 60932 59M 59356 S 0.0 2.9 2:09 postmaster
1582 postgres 9 0 57624 56M 55444 S 0.0 2.7 1:58 postmaster
1609 postgres 9 0 56408 55M 55164 S 8.3 2.7 2:05 postmaster
1766 postgres 9 0 45248 44M 43824 S 0.0 2.1 1:09 postmaster
2139 postgres 9 0 45276 44M 43892 S 0.0 2.1 0:01 postmaster
2045 postgres 9 0 41500 40M 39820 S 1.7 2.0 2:43 postmaster
1610 postgres 9 0 41336 40M 40172 S 0.0 2.0 0:53 postmaster
2044 postgres 14 0 38328 37M 37056 R 25.0 1.8 0:33 postmaster
1881 postgres 9 0 34936 34M 33676 S 17.2 1.6 0:29 postmaster
2042 postgres 9 0 33144 32M 31920 S 3.7 1.6 1:13 postmaster
1679 postgres 9 0 32516 31M 31288 S 0.0 1.5 0:05 postmaster
1678 postgres 9 0 31996 31M 30812 S 0.0 1.5 0:19 postmaster
1653 postgres 9 0 29424 28M 28180 S 0.0 1.4 0:05 postmaster
2048 postgres 9 0 27772 27M 26556 S 0.0 1.3 0:09 postmaster
1802 postgres 9 0 26676 26M 25504 S 0.0 1.2 0:06 postmaster
2211 postgres 9 0 25940 25M 24592 S 0.0 1.2 0:02 postmaster
2047 postgres 14 0 25588 24M 23532 R 8.0 1.2 0:55 postmaster
2065 postgres 11 0 25584 24M 24404 S 3.9 1.2 0:04 postmaster
1980 postgres 11 0 24584 24M 22864 S 3.5 1.1 0:07 postmaster
1872 postgres 9 0 23908 23M 22800 S 0.0 1.1 0:04 postmaster
2068 postgres 9 0 21352 20M 20188 S 0.0 1.0 0:01 postmaster
2138 postgres 9 0 20928 20M 19644 S 0.0 1.0 0:06 postmaster
1983 postgres 9 0 20544 20M 19344 S 0.0 0.9 0:04 postmaster
2342 postgres 9 0 20352 19M 13772 S 0.0 0.9 0:23 postmaster
2357 postgres 15 0 20260 19M 18532 R 93.1 0.9 1:34 postmaster
2204 postgres 9 0 19816 19M 18244 S 0.0 0.9 0:01 postmaster
2199 postgres 9 0 16840 16M 15452 S 0.0 0.8 0:00 postmaster
2207 postgres 9 0 16784 16M 15512 S 0.0 0.8 0:00 postmaster
2050 postgres 9 0 15880 15M 14136 S 0.0 0.7 0:06 postmaster
2200 postgres 9 0 15568 15M 14080 S 0.0 0.7 0:00 postmaster
2301 postgres 9 0 15076 14M 13940 S 0.0 0.7 0:00 postmaster
2236 postgres 9 0 14132 13M 12824 S 0.0 0.6 0:00 postmaster
2346 postgres 9 0 14080 13M 12972 S 0.0 0.6 0:09 postmaster
2347 postgres 9 0 14064 13M 12960 S 0.0 0.6 0:09 postmaster
2205 postgres 9 0 13904 13M 12412 S 0.0 0.6 0:00 postmaster
2339 postgres 11 0 12660 12M 11448 S 36.1 0.6 0:27 postmaster

which is not at all good...

And still the major problem in finding the cause of all this is to
identify the query which causes these huge backends. At the moment I
tend to suspect that there certainly are several queries which need a
bit of RAM, not that much however, and the backend fails to free up the
memory used by processed queries. What puzzles me here is that all these
400MB+ backend-processes are marked as idle (ps-output) respectively
sleeping, so they don't actually do much. But how come a sleeping
process grabs 20% of 2GB of RAM and 15% of processing time?

Another fact that hints at backends just eating memory without reason is
that when I stop the database, processes keep lingering - kill just
doesn't help, I have to kill -9 the last of the lot to get red of them
before restarting the database:

dunkles:/var/lib/pgsql/data/base # /etc/init.d/postgresql stop
Shutting down PostgreSQL
done
dunkles:/var/lib/pgsql/data/base # ps ax|grep post
863 pts/1 S 0:00 login -- postgres
885 pts/2 S 0:00 login -- postgres
1552 pts/0 S 0:01 /opt/pgsql/bin/postmaster -i
-D/var/lib/pgsql/data/base
1553 pts/0 S 0:00 postgres: stats buffer process

1554 pts/0 S 0:01 postgres: stats collector process

1650 pts/0 S 0:00 postgres: postgres template1 212.123.109.25
idle
1681 pts/0 S 0:00 postgres: postgres abo 212.123.109.25 idle

1682 pts/0 S 0:00 postgres: postgres bluebox 212.123.109.25
idle
1683 pts/0 S 0:00 postgres: postgres kidszone 212.123.109.25
idle
1684 pts/0 S 0:00 postgres: postgres mcv 212.123.109.25 idle

1685 pts/0 S 0:00 postgres: postgres mpo 212.123.109.25 idle

1686 pts/0 S 0:00 postgres: postgres nzone 212.123.109.25 idle

1687 pts/0 S 0:00 postgres: postgres pcaction 212.123.109.25
idle
1688 pts/0 S 0:00 postgres: postgres pcgames 212.123.109.25
idle
1689 pts/0 S 0:00 postgres: postgres phppgadmin 212.123.109.25
idle
1690 pts/0 S 0:00 postgres: postgres pszone 212.123.109.25 idle

1691 pts/0 S 0:00 postgres: postgres saturn 212.123.109.25 idle

1693 pts/0 S 0:00 postgres: postgres template1 212.123.109.25
idle
1780 pts/0 S 0:00 postgres: postgres template1 212.123.108.149
idle
1781 pts/0 S 0:00 postgres: postgres template1 212.123.108.149
idle
1784 pts/0 S 0:00 postgres: postgres pcgames 212.123.108.149
idle

This just has some sort of Windows-look&feel to it - processes not
responding any more.

Now I set back the original values before upgrading RAM: shared_buffers
= 32768 (256MB) and sort_mem = 51200 (50MB). Limit for backend size not
seems to be some 265 odd MB (about four to six of those around at any
given time, as always) and top reports:

3:11pm up 2:03, 3 users, load average: 3.24, 1.65, 1.81
163 processes: 156 sleeping, 7 running, 0 zombie, 0 stopped
CPU0 states: 81.3% user, 15.5% system, 0.0% nice, 2.2% idle
CPU1 states: 67.1% user, 26.0% system, 0.0% nice, 6.3% idle
CPU2 states: 69.2% user, 21.0% system, 0.0% nice, 9.3% idle
CPU3 states: 65.4% user, 20.2% system, 0.0% nice, 13.3% idle
Mem: 2061560K av, 2025392K used, 36168K free, 0K shrd, 13108K
buff
Swap: 2097136K av, 7732K used, 2089404K free 1750556K
cached

Which isn't quite as bad anymore - although it cannot be the optimum
performance for this machine, I can't imagine.

Now what on earth can I do to get this DB running, and when I say
"running" I don't mean "sort of crawling uphills"? Are there any
OS-patches or environment-settings I need to consider? Are there known
memory-leaks? I just doesn't seem to matter whichever settings I take in
postgresql.conf, the database just eats up any available memory in any
case. I cannot use it for production purposes this way. I am quite
dispaired right now - and I am definitely running out of time. Which
bits of linux (glibc, kernel, whatnot) does postgresql 7.2.1 need in
which version to run smoothly? As I mentioned before, we don't use any
"fancy" features of the database like foreign keys, triggers (except
one) or whatever, it's just basic functionality that seems to fall down
on us...

I am willing to try almost anything - but I need to squeeze more
performance out of this thing and I need some hints on which tools to
use to identify the problem. And unfortunately I need all this as soon
as possible... Hints and help are very, very much appreciated. Thank
you!

Regards,

Markus


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-07-03 15:00:22
Message-ID: 20020704010022.A8722@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Jul 03, 2002 at 03:23:39PM +0200, Markus Wollny wrote:
> Okay, now it's even more annoying...
>
> We just upgraded the server from 1GB to 2GB of RAM because it kept
> swapping out about 300MB.

Well, you've solved the swapping problem.

> PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
> 1652 postgres 9 0 469M 469M 467M S 0.0 23.3 2:21 postmaster
> 2041 postgres 9 0 431M 431M 429M S 1.2 21.4 1:32 postmaster
> 1588 postgres 9 0 411M 411M 410M S 15.6 20.4 1:25 postmaster
> 1597 postgres 9 0 289M 289M 287M S 0.1 14.3 0:31 postmaster

That %CPU refers to the average CPU over the lifetime of the process. For
example, that first process has done nearly 2 and a half minutes of work.
That's either a lot of queries or one really big one. Now I can imagine
that many queries mapping in all of shared memory. Note that RSS is around
the size of the shared memory you have. So all those 460M processes seem to
be all the same memory, all shared.

What exactly is the problem here? You're not swapping.

> And still the major problem in finding the cause of all this is to
> identify the query which causes these huge backends. At the moment I
> tend to suspect that there certainly are several queries which need a
> bit of RAM, not that much however, and the backend fails to free up the
> memory used by processed queries. What puzzles me here is that all these
> 400MB+ backend-processes are marked as idle (ps-output) respectively
> sleeping, so they don't actually do much. But how come a sleeping
> process grabs 20% of 2GB of RAM and 15% of processing time?

My theory is that it's not using RAM at all, it's simply mapping the shared
memory in, which inflates the RSS. If you reduce the amount of shared
memory, does the RSS of the processes go down too?

> Another fact that hints at backends just eating memory without reason is
> that when I stop the database, processes keep lingering - kill just
> doesn't help, I have to kill -9 the last of the lot to get red of them
> before restarting the database:

Never kill -9. Recipie for disaster. If you want those processes to die,
perhaps you should stop the processes that are accesses the DB. They're the
ones that are idle. Are you using persistant connections at all? If so, how
long for?

Note that pg_ctl has various stop modes: smart, fast and immediate. Make
you're using the one you want.

> 1689 pts/0 S 0:00 postgres: postgres phppgadmin 212.123.109.25
> idle
> 1690 pts/0 S 0:00 postgres: postgres pszone 212.123.109.25 idle
>
> 1691 pts/0 S 0:00 postgres: postgres saturn 212.123.109.25 idle
>
> 1693 pts/0 S 0:00 postgres: postgres template1 212.123.109.25
> idle
> 1780 pts/0 S 0:00 postgres: postgres template1 212.123.108.149
> idle
> 1781 pts/0 S 0:00 postgres: postgres template1 212.123.108.149
> idle
> 1784 pts/0 S 0:00 postgres: postgres pcgames 212.123.108.149
> idle

Note that these are not the processes you listed above. The process times
don't match. You also have quite a lot of connections to template1.

> 3:11pm up 2:03, 3 users, load average: 3.24, 1.65, 1.81

Look at that load average. You have a whole CPU idle. CPU power is not your
limitation. No swap so that's not the problem. You really need to work out
what is slow.

> Which isn't quite as bad anymore - although it cannot be the optimum
> performance for this machine, I can't imagine.

So is it actually slow or are you looking for some ethereal "faster".

> Now what on earth can I do to get this DB running, and when I say
> "running" I don't mean "sort of crawling uphills"? Are there any

Look. 90% of performance improvement comes from modifying queries. Tweaking
the config settings can really only account for so much. There is nothing in
any of your messages where you have provided anything that we can use to
help you. There are no optimal settings, they depend entirely on what your
queries are.

> OS-patches or environment-settings I need to consider? Are there known
> memory-leaks? I just doesn't seem to matter whichever settings I take in
> postgresql.conf, the database just eats up any available memory in any
> case. I cannot use it for production purposes this way. I am quite
> dispaired right now - and I am definitely running out of time. Which
> bits of linux (glibc, kernel, whatnot) does postgresql 7.2.1 need in
> which version to run smoothly? As I mentioned before, we don't use any
> "fancy" features of the database like foreign keys, triggers (except
> one) or whatever, it's just basic functionality that seems to fall down
> on us...

By the way, you do notice the little figure in your top output labeled
"cached" that is around 1.7GB. I do beleive that means the kernel is using
that much memory for disk cache. You are not running out of memory by any
stretch of the imagination (unless one of your queries is using a lot, but
it appears to be getting freed at the end).

> I am willing to try almost anything - but I need to squeeze more
> performance out of this thing and I need some hints on which tools to
> use to identify the problem. And unfortunately I need all this as soon
> as possible... Hints and help are very, very much appreciated. Thank
> you!

Work out where your bottleneck is. It's not memory and it doesn't appear to
be CPU. With that much cache I can't imagine it's disk transfer rate either.
So the only explanation is the individual queries.

As for hints, the straight forward method is the best. Find whatever action
is the slowest and profile it. How much in the front end, how much in the
database, how much in client latency. Only once you understand where the
time is going can you do any meaningful optimisation. Tweaking config
settings will not help you here.

--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Markus Wollny <Markus(dot)Wollny(at)computec(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: One source of constant annoyance identified
Date: 2002-07-03 15:43:56
Message-ID: 25151.1025711036@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> My theory is that it's not using RAM at all, it's simply mapping the shared
> memory in, which inflates the RSS.

Interesting idea. By rights the whole shared memory segment should be
charged against a backend's SIZE all the time, since it's certainly all
there in the address space. But if for some reason a page of shmem
isn't charged until first touched, then indeed that could explain Markus'
observation --- and, perhaps, my inability to duplicate it on other
platforms. (I tried on Red Hat 7.2, which you'd think would be fairly
close to his SuSE release, but maybe it's different on just this point.
HPUX seems to completely ignore shared segments in both SIZE and RSS,
but it's way out in left field on a number of fronts...)

regards, tom lane