Re: /proc/self/oom_adj is deprecated in newer Linux kernels

From: Gurjeet Singh <gurjeet(at)singh(dot)im>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: /proc/self/oom_adj is deprecated in newer Linux kernels
Date: 2014-06-10 15:45:41
Message-ID: CABwTF4W8YBEfgHm64CcZLvYHBj7yRYKM9ZcThW2aNsdDPeKixw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 10, 2014 at 11:14 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> In my view, the root-owned startup script grants OOM exemption to
> the postmaster because it *knows* that the postmaster's children
> will drop the exemption. If that trust can be violated because some
> clueless DBA decided to frob a GUC setting, I'd be a lot more hesitant
> about giving the postmaster the exemption in the first place.

Even if the clueless DBA tries to set the oom_score_adj below that of
Postmaster, Linux kernel prevents that from being done. I demonstrate
that in the below screen share. I used GUC as well as plain command
line to try and set a child's badness (oom_score_adj) to be lower than
that of Postmaster's, and Linux disallows doing that, unless I use
root's powers.

So the argument that this GUC is a security concern, can be ignored.
Root user (one with control of start script) still controls the lowest
badness setting of all Postgres processes. If done at fork_process
time, the child process simply inherits parent's badness setting.

Best regards,

# Set postmaster badness: -1000
$ sudo echo -1000 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-1000

# Set backend badness the same as postmaster: -1000
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj =
-1000/' $PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = -1000

$ pgstop; pgstart
pg_ctl: server is running (PID: 65355)
/home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres "-D"
"/home/gurjeet/dev/pgdbuilds/oom_guc/db/data"
waiting for server to shut down.... done
server stopped
pg_ctl: no server running
waiting for server to start.... done
server started
Database cluster state: in production

# Backends and the postmaster have the same badness; lower than default 0
$ for p in $(echo $(pgserverPIDList)| cut -d , --output-delimiter=' '
-f 1-); do echo $p $(cat /proc/$p/oom_score_adj) $(cat
/proc/$p/cmdline); done
537 -1000 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
539 -1000 postgres: checkpointer process
540 -1000 postgres: writer process
541 -1000 postgres: wal writer process
542 -1000 postgres: autovacuum launcher process
543 -1000 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the lower than postmaster: -500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = -500...
linux_oom_score_adj = -500

$ pgstop; pgstart
...

# Backends are capped to parent's badness.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
716 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
718 -100 postgres: checkpointer process
719 -100 postgres: writer process
720 -100 postgres: wal writer process
721 -100 postgres: autovacuum launcher process
722 -100 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Set backend badness the higher than postmaster: +500 vs. -100
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 500/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 500

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1602 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1604 500 postgres: checkpointer process
1605 500 postgres: writer process
1606 500 postgres: wal writer process
1607 500 postgres: autovacuum launcher process
1608 500 postgres: stats collector process

# Set postmaster badness: -100
$ sudo echo -100 > /proc/$$/oom_score_adj ; cat /proc/self/oom_score_adj
-100

# Reset backend badness to default: 0
$ perl -pi -e 's/linux_oom_score_adj = .*/linux_oom_score_adj = 0/'
$PGDATA/postgresql.conf ; grep oom $PGDATA/postgresql.conf
linux_oom_score_adj = 0

$ pgstop; pgstart
...

# Backends have higher badness, hence higher likelyhood to be killed
than the Postmaster.
$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 0 postgres: checkpointer process
1838 0 postgres: writer process
1839 0 postgres: wal writer process
1840 0 postgres: autovacuum launcher process
1841 0 postgres: stats collector process

# Lower checkpointer's badness, but keep it higher than postmaster's
$ echo -1 > /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -1 postgres: checkpointer process
...

# Lower checkpointer's badness, but keep it same as postmaster's
$ echo -100 > /proc/1837/oom_score_adj

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# Lower checkpointer's badness, and try to lower it below postmaster's
$ echo -101 > /proc/1837/oom_score_adj
bash: echo: write error: Permission denied

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -100 postgres: checkpointer process
...

# As root, force child process' badness to be lower than postnaster's
$ sudo echo -101 > /proc/1837/oom_score_adj
[sudo] password for gurjeet:

$ for p in $(echo $(pgserverPIDList)| cut -d , ...
1835 -100 /home/gurjeet/dev/pgdbuilds/oom_guc/db/bin/postgres-D/home/gurjeet/dev/pgdbuilds/oom_guc/db/data
1837 -101 postgres: checkpointer process
...

--
Gurjeet Singh http://gurjeet.singh.im/

EDB www.EnterpriseDB.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-06-10 15:46:22 Re: /proc/self/oom_adj is deprecated in newer Linux kernels
Previous Message Andres Freund 2014-06-10 15:45:36 Re: /proc/self/oom_adj is deprecated in newer Linux kernels