Re: Configuration for a new server.

Lists: pgsql-performance
From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Configuration for a new server.
Date: 2011-01-31 23:55:32
Message-ID: F4E6A2751A2823418A21D4A160B689887B0E2A@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Scott,

I don't know if you received my private email, but just in case you did not I am posting the infomration here.

I have a new set of servers coming in - Dual Xeon E5620's, 96GB RAM, 18 spindles (1 RAID1 for OS - SATA, 12 disk RAID10 for data - SAS, RAID-1 for logs - SAS, 2 hot spares SAS). They are replacing a single Dual Xeon E5406 with 16GB RAM and 2x RAID1 - one for OS/Data, one for Logs.

Current server is using 3840MB of shared buffers.

It will be running FreeBSD 8.1 x64, PG 9.0.2, running streaming replication to a like server.

I have read the performance tuning book written by Greg Smith, and am using it as a guide to configure it for performance.

The main questions which I have are the following:

Is the 25% RAM for shared memory still a good number to go with for this size server?

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes - what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.

Currently I have what I believe to be an aggressive bgwriter setting as follows:

bgwriter_delay = 200ms # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round

bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round

Does this look right?

I have the following settings:

work_mem = 64MB # min 64kB

maintenance_work_mem = 128MB # min 1MB

And, of course, some of the most critical ones - the WAL settings. Right now, in order to give the best performance to the end users due to the size of the current box, I have a very unoptimal setting in my opinion

fsync = off # turns forced synchronization on or off

#synchronous_commit = on # immediate fsync at commit

#wal_sync_method = fsync # the default is the first option

# supported by the operating system:

# open_datasync

# fdatasync

# fsync

# fsync_writethrough

# open_sync

full_page_writes = on # recover from partial page writes

wal_buffers = 16MB

#wal_buffers = 1024KB # min 32kB

# (change requires restart)

# wal_writer_delay = 100ms # 1-10000 milliseconds

#commit_delay = 0 # range 0-100000, in microseconds

#commit_siblings = 5 # range 1-1000

# - Checkpoints -

#checkpoint_segments = 128 # in logfile segments, min 1, 16MB each

checkpoint_segments = 1024

checkpoint_timeout = 60min # range 30s-1h

#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0

checkpoint_completion_target = 0.1

checkpoint_warning = 45min # 0 disables

These are values which I arrived to by playing with them to make sure that the end user performance did not suffer. The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal - when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.

The above configuration is using PG 8.4.

Thanks in advance for any insight.


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Benjamin Krajmalnik <kraj(at)servoyant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Configuration for a new server.
Date: 2011-02-01 11:53:49
Message-ID: 4D47F44D.2000305@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Krajmalnik wrote:
>
> have a new set of servers coming in -- Dual Xeon E5620's, 96GB RAM,
> 18 spindles (1 RAID1 for OS -- SATA, 12 disk RAID10 for data -- SAS,
> RAID-1 for logs -- SAS, 2 hot spares SAS).
>

You didn't mention the RAID controller and its cache setup. That's a
critical piece to get write, err, right. Presumably you've got a
battery-backed RAID cache on your SAS controller. Knowing that and what
model it is (to make sure it's one of the ones that performs well) would
be good info to pass along here.

>
> Is the 25% RAM for shared memory still a good number to go with for
> this size server?
>

Several people have reported to me they see drop-offs in performance
between 8GB and 10GB for that setting. I currently recommend limiting
shared_buffers to 8GB until we have more data on why that is. You
suggested already having checkpoint issues, too; if that's true, you
don't want to dedicate too much RAM to the database for that reason, too.

> There are approximately 50 tables which get updated with almost 100%
> records updated every 5 minutes -- what is a good number of autovacuum
> processes to have on these? The current server I am replacing only
> has 3 of them but I think I may gain a benefit from having more.
>

Watch pg_stat_user_tables and you can figure this out for your
workload. There are no generic answers in this area.

> Currently I have what I believe to be an aggressive bgwriter setting
> as follows:
>
>
>
> bgwriter_delay = 200ms # 10-10000ms between rounds
>
> bgwriter_lru_maxpages = 1000 # 0-1000 max buffers
> written/round
>
> bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers
> scanned/round
>
>
>
> Does this look right?
>

You'd probably be better off decreasing the delay rather than pushing up
the other two parameters. It's easy to tell if you did it right or not;
just look at pg_stat_bgwriter. If buffers_backend is high relative to
the others, that means the multiplier or delay is wrong. Or if
maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is
too low.

> These are values which I arrived to by playing with them to make sure
> that the end user performance did not suffer. The checkpoints are
> taking about 8 minutes to complete, but between checkpoints the disk
> i/o on the data partition is very minimal -- when I had lower segments
> running a 15 minute timeout with a .9 completion target, the platform
> was fairly slow vis-à-vis the end user.
>

The completion target isn't the main driver here, the number of
segments/timeout is. When you space checkpoints out further, the actual
amount of total I/O the server does decreases, both to the WAL and to
the main database. So I suspect your tweaking the target had little
impact, and it's possible you might even get smoother performance if you
put it back to a higher value again.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: "Greg Smith" <greg(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Configuration for a new server.
Date: 2011-02-01 16:46:59
Message-ID: F4E6A2751A2823418A21D4A160B689887B0E34@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Greg,

Thank you very much for your quick response.

The servers are using Areca 1600 series controllers with battery backup and 2GB cache.

I really enjoyed your book (actually, both of the books your company published). Found them extremely helpful and they filled a lot of gaps in my still gappy knowledge J

From: Greg Smith [mailto:greg(at)2ndquadrant(dot)com]
Sent: Tuesday, February 01, 2011 4:54 AM
To: Benjamin Krajmalnik
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Configuration for a new server.

Benjamin Krajmalnik wrote:

have a new set of servers coming in - Dual Xeon E5620's, 96GB RAM, 18 spindles (1 RAID1 for OS - SATA, 12 disk RAID10 for data - SAS, RAID-1 for logs - SAS, 2 hot spares SAS).

You didn't mention the RAID controller and its cache setup. That's a critical piece to get write, err, right. Presumably you've got a battery-backed RAID cache on your SAS controller. Knowing that and what model it is (to make sure it's one of the ones that performs well) would be good info to pass along here.

Is the 25% RAM for shared memory still a good number to go with for this size server?

Several people have reported to me they see drop-offs in performance between 8GB and 10GB for that setting. I currently recommend limiting shared_buffers to 8GB until we have more data on why that is. You suggested already having checkpoint issues, too; if that's true, you don't want to dedicate too much RAM to the database for that reason, too.

There are approximately 50 tables which get updated with almost 100% records updated every 5 minutes - what is a good number of autovacuum processes to have on these? The current server I am replacing only has 3 of them but I think I may gain a benefit from having more.

Watch pg_stat_user_tables and you can figure this out for your workload. There are no generic answers in this area.

Currently I have what I believe to be an aggressive bgwriter setting as follows:

bgwriter_delay = 200ms # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round

bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers scanned/round

Does this look right?

You'd probably be better off decreasing the delay rather than pushing up the other two parameters. It's easy to tell if you did it right or not; just look at pg_stat_bgwriter. If buffers_backend is high relative to the others, that means the multiplier or delay is wrong. Or if maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is too low.

These are values which I arrived to by playing with them to make sure that the end user performance did not suffer. The checkpoints are taking about 8 minutes to complete, but between checkpoints the disk i/o on the data partition is very minimal - when I had lower segments running a 15 minute timeout with a .9 completion target, the platform was fairly slow vis-à-vis the end user.

The completion target isn't the main driver here, the number of segments/timeout is. When you space checkpoints out further, the actual amount of total I/O the server does decreases, both to the WAL and to the main database. So I suspect your tweaking the target had little impact, and it's possible you might even get smoother performance if you put it back to a higher value again.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: "Greg Smith" <greg(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Configuration for a new server.
Date: 2011-02-01 17:22:21
Message-ID: F4E6A2751A2823418A21D4A160B689887B0E39@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

There are approximately 50 tables which get updated with almost 100%
records updated every 5 minutes - what is a good number of autovacuum
processes to have on these? The current server I am replacing only has
3 of them but I think I may gain a benefit from having more.

Watch pg_stat_user_tables and you can figure this out for your workload.
There are no generic answers in this area.

What in particular should I be looking at to help me decide?

Currently I have what I believe to be an aggressive bgwriter setting as
follows:

bgwriter_delay = 200ms # 10-10000ms between rounds

bgwriter_lru_maxpages = 1000 # 0-1000 max buffers
written/round

bgwriter_lru_multiplier = 10 # 0-10.0 multipler on buffers
scanned/round

Does this look right?

You'd probably be better off decreasing the delay rather than pushing up
the other two parameters. It's easy to tell if you did it right or not;
just look at pg_stat_bgwriter. If buffers_backend is high relative to
the others, that means the multiplier or delay is wrong. Or if
maxwritten_clean is increasing fast, that means bgwriter_lru_maxpages is
too low.

checkpoints_timed = 261

checkpoints_req = 0

buffers_checkpoint = 49058438

buffers_clean = 3562421

maxwritten_clean = 243

buffers_backend = 11774254

buffers_alloc = 42816578


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Benjamin Krajmalnik <kraj(at)servoyant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Configuration for a new server.
Date: 2011-02-02 01:16:26
Message-ID: 4D48B06A.6020100@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Krajmalnik wrote:
>
> There are approximately 50 tables which get updated with almost 100%
> records updated every 5 minutes -- what is a good number of autovacuum
> processes to have on these? The current server I am replacing only
> has 3 of them but I think I may gain a benefit from having more.
>
>
> Watch pg_stat_user_tables and you can figure this out for your
> workload. There are no generic answers in this area.
>
> What in particular should I be looking at to help me decide?
>

The information reported that's related to vacuuming. If you don't have
enough workers, you can watch the dead row counts pop upwards without
enough "last autovacuum time" updates on enough tables to suggest it's
keeping up. If you see >20% dead rows on lots of tables and they're not
being processed by AV and having their timestamps, that's your sign that
you don't have enough workers.

> You'd probably be better off decreasing the delay rather than pushing
> up the other two parameters. It's easy to tell if you did it right or
> not; just look at pg_stat_bgwriter. If buffers_backend is high
> relative to the others, that means the multiplier or delay is wrong.
> Or if maxwritten_clean is increasing fast, that means
> bgwriter_lru_maxpages is too low.
>
> checkpoints_timed = 261
>
> checkpoints_req = 0
>
> buffers_checkpoint = 49,058,438
>
> buffers_clean = 3,562,421
>
> maxwritten_clean = 243
>
> buffers_backend = 11,774,254
>
> buffers_alloc = 42,816,578
>

See how buffers_backend is much larger than buffers_clean, even though
maxwritten_clean is low? That means the background writer isn't running
often enough to keep up with cleaning things, even though it does a lot
of work when it does kick in. In your situation I'd normally do a first
pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now, cut
bgwriter_delay to 1/4 as well (to 50ms), and then see how the
proportions change. You can probably cut the multiplier, too, yet still
see more pages written by the cleaner.

I recommend saving a snapsot of this data with a timestamp, i.e.:

select now(),* from pg_stat_bgwriter;

Anytime you make a change to one of the background writer or checkpoint
timing parameters. That way you have a new baseline to compare
against. These numbers aren't very useful with a single value, but once
you get two of them with timestamps you can compute all sorts of fun
statistics from the pair.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: "Benjamin Krajmalnik" <kraj(at)servoyant(dot)com>
To: "Greg Smith" <greg(at)2ndquadrant(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Configuration for a new server.
Date: 2011-02-02 17:46:06
Message-ID: F4E6A2751A2823418A21D4A160B689887B0E51@fletch.stackdump.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

>See how buffers_backend is much larger than buffers_clean, even though maxwritten_clean is low?  That means the background writer isn't running often enough to keep up with cleaning things, even though >it does a lot of work when it does kick in.  In your situation I'd normally do a first pass by cutting bgwriter_lru_maxpages to 1/4 of what it is now, cut bgwriter_delay to 1/4 as well (to 50ms), and >then see how the proportions change.  You can probably cut the multiplier, too, yet still see more pages written by the cleaner.

>I recommend saving a snapsot of this data with a timestamp, i.e.:

>select now(),* from pg_stat_bgwriter;

>Anytime you make a change to one of the background writer or checkpoint timing parameters.  That way you have a new baseline to compare against.  These numbers aren't very useful with a single value, >but once you get two of them with timestamps you can compute all sorts of fun statistics from the pair.

So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Benjamin Krajmalnik <kraj(at)servoyant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Configuration for a new server.
Date: 2011-02-02 18:22:55
Message-ID: 4D49A0FF.4010104@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Benjamin Krajmalnik wrote:
> So, if I understand correctly, I should strive for a relative increase in buffers_clean to buffers_backend
>

Right. Buffers written by a backend are the least efficient way to get
data out of the buffer cache, because the client running into that is
stuck waiting for a write call before it can use the resuling free
block. You want to avoid those in favor of checkpoint and
background-writer cleaner writes instead.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books