How to setup disk spindles for best performance

Lists: pgsql-performance
From: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: How to setup disk spindles for best performance
Date: 2008-08-20 22:25:59
Message-ID: 972BDF1A-B4ED-46EC-B416-0CDF15A0439A@technocon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I'm currently trying to find out what the best configuration is for
our new database server. It will server a database of about 80 GB and
growing fast. The new machine has plenty of memory (64GB) and 16 SAS
disks, of wich two are already in use as a mirror for the OS.

The rest can be used for PostgreSQL. So that makes a total of 14 15k.5
SAS diks. There is obviously a lot to interesting reading to be found,
most of them stating that the transaction log should be put onto a
separate disk spindle. You can also do this with the indexes. Since
they will be updated a lot, I guess that might be a good idea. But
what no-one states, is what performance these spindle should have in
comparison to the data spindle? If I create a raid 10 of 6 disks for
the data, 4 disk raid 10 for the log, and 4 disk raid 10 for the
indexes, will this yield best performance? Or is it sufficient to just
have a simple mirror for the log and/or indexes...? I have not found
any information about these figures, and I guess it should be possible
to give some pointers on how these different setup might affect
performance?

So I hope someone has already tested this and can give some tips...

Kind regards,

Christiaan


From: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
To: "Christiaan Willemsen" <cwillemsen(at)technocon(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 01:49:54
Message-ID: b42b73150808201849t3577d65eq76e7484d8519af51@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Aug 20, 2008 at 6:25 PM, Christiaan Willemsen
<cwillemsen(at)technocon(dot)com> wrote:
> I'm currently trying to find out what the best configuration is for our new
> database server. It will server a database of about 80 GB and growing fast.
> The new machine has plenty of memory (64GB) and 16 SAS disks, of wich two
> are already in use as a mirror for the OS.
>
> The rest can be used for PostgreSQL. So that makes a total of 14 15k.5 SAS
> diks. There is obviously a lot to interesting reading to be found, most of
> them stating that the transaction log should be put onto a separate disk
> spindle. You can also do this with the indexes. Since they will be updated a
> lot, I guess that might be a good idea. But what no-one states, is what
> performance these spindle should have in comparison to the data spindle? If
> I create a raid 10 of 6 disks for the data, 4 disk raid 10 for the log, and
> 4 disk raid 10 for the indexes, will this yield best performance? Or is it
> sufficient to just have a simple mirror for the log and/or indexes...? I
> have not found any information about these figures, and I guess it should be
> possible to give some pointers on how these different setup might affect
> performance?

Well, the speed of your logging device puts an upper bound on the
write speed of the database. While modern sas drives can do 80mb/sec
+ with sequential ops, this can turn to 1mb/sec real fast if the
logging is duking it out with the other generally random work the
database has to do, which is why it's often separated out.

80mb/sec is actually quite a lot in database terms and you will likely
only get anything close to that when doing heavy insertion, so that
it's unlikely to become the bottleneck. Even if you hit that limit
sometimes, those drives are probably put to better use in the data
volume somewhere.

As for partitioning the data volume, I'd advise this only if you have
a mixed duty database that does different tasks with different
performance requirements. You may be serving a user interface which
has very low maximum transaction time and therefore gets dedicated
disk i/o apart from the data churn that is going on elsewhere. Apart
from that though, I'd keep it in a single volume.

merlin


From: "Scott Marlowe" <scott(dot)marlowe(at)gmail(dot)com>
To: "Christiaan Willemsen" <cwillemsen(at)technocon(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 03:41:05
Message-ID: dcc563d10808202041t16c45243q17b364dd151d419c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Wed, Aug 20, 2008 at 4:25 PM, Christiaan Willemsen
<cwillemsen(at)technocon(dot)com> wrote:
> I'm currently trying to find out what the best configuration is for our new
> database server. It will server a database of about 80 GB and growing fast.
> The new machine has plenty of memory (64GB) and 16 SAS disks, of wich two
> are already in use as a mirror for the OS.

Got almost the same setup with 32 Gig of ram... I'm running the Areca
1680 controller with 512M battery backed cache and I'm quite happy
with the performance so far.

Mine is set up to use the OS mirror set for the pg_xlog as well, but
on a separate logical partition. I figure the RAID controller will
even out the writes there to keep it fast. The next 12 drives are a
RAID-10 set and the last two are hot spares. On an array this big you
should always have at least one hot spare.

Generally breaking the disks up for index versus tables etc is a lot
of work for minimal gain. A good RAID controller will make up for
having to do that, and usually do better, since if the indexes are
getting hit a LOT then you have all 12 disks in the RAID-10 working
together.

But what's your workload look like? Lotsa updates, inserts, deletes,
big selects, bulk loads in the middle of the day, etc...?


From: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
To: "Merlin Moncure" <mmoncure(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 05:38:39
Message-ID: 331B8FD1-7460-4B97-BD2C-03D8381E1C4D@technocon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

So, what you are basically saying, is that a single mirror is in
general more than enough to facilitate the transaction log.

So it would not be smart to put the indexes onto a separate disk
spindle to improve index performance?

On Aug 21, 2008, at 3:49 AM, Merlin Moncure wrote:

> On Wed, Aug 20, 2008 at 6:25 PM, Christiaan Willemsen
> <cwillemsen(at)technocon(dot)com> wrote:
>> I'm currently trying to find out what the best configuration is for
>> our new
>> database server. It will server a database of about 80 GB and
>> growing fast.
>> The new machine has plenty of memory (64GB) and 16 SAS disks, of
>> wich two
>> are already in use as a mirror for the OS.
>>
>> The rest can be used for PostgreSQL. So that makes a total of 14
>> 15k.5 SAS
>> diks. There is obviously a lot to interesting reading to be found,
>> most of
>> them stating that the transaction log should be put onto a separate
>> disk
>> spindle. You can also do this with the indexes. Since they will be
>> updated a
>> lot, I guess that might be a good idea. But what no-one states, is
>> what
>> performance these spindle should have in comparison to the data
>> spindle? If
>> I create a raid 10 of 6 disks for the data, 4 disk raid 10 for the
>> log, and
>> 4 disk raid 10 for the indexes, will this yield best performance?
>> Or is it
>> sufficient to just have a simple mirror for the log and/or
>> indexes...? I
>> have not found any information about these figures, and I guess it
>> should be
>> possible to give some pointers on how these different setup might
>> affect
>> performance?
>
> Well, the speed of your logging device puts an upper bound on the
> write speed of the database. While modern sas drives can do 80mb/sec
> + with sequential ops, this can turn to 1mb/sec real fast if the
> logging is duking it out with the other generally random work the
> database has to do, which is why it's often separated out.
>
> 80mb/sec is actually quite a lot in database terms and you will likely
> only get anything close to that when doing heavy insertion, so that
> it's unlikely to become the bottleneck. Even if you hit that limit
> sometimes, those drives are probably put to better use in the data
> volume somewhere.
>
> As for partitioning the data volume, I'd advise this only if you have
> a mixed duty database that does different tasks with different
> performance requirements. You may be serving a user interface which
> has very low maximum transaction time and therefore gets dedicated
> disk i/o apart from the data churn that is going on elsewhere. Apart
> from that though, I'd keep it in a single volume.
>
> merlin


From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 05:50:38
Message-ID: 48AD022E.7090400@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Christiaan Willemsen wrote:
> So, what you are basically saying, is that a single mirror is in general
> more than enough to facilitate the transaction log.

http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide

And to answer your question, yes. Transaction logs are written
sequentially. You do not need a journaled file system and raid 1 is
plenty for most if not all work loads.

Sincerely,

Joshua D. Drake


From: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
To: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 08:34:19
Message-ID: 48AD288B.8030902@technocon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Thanks Joshua,

So what about putting the indexes on a separate array? Since we do a lot
of inserts indexes are going to be worked on a lot of the time.

Regards,

Christiaan

Joshua D. Drake wrote:
> Christiaan Willemsen wrote:
>> So, what you are basically saying, is that a single mirror is in
>> general more than enough to facilitate the transaction log.
>
> http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
>
> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
>
> And to answer your question, yes. Transaction logs are written
> sequentially. You do not need a journaled file system and raid 1 is
> plenty for most if not all work loads.
>
> Sincerely,
>
> Joshua D. Drake
>


From: "Scott Carey" <scott(at)richrelevance(dot)com>
To: "Christiaan Willemsen" <cwillemsen(at)technocon(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Merlin Moncure" <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 14:53:05
Message-ID: a1ec7d000808210753w3a954e5g1d4f4325775d8eca@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Indexes will be random write workload, but these won't by synchronous writes
and will be buffered by the raid controller's cache. Assuming you're using
a hardware raid controller that is, and one that doesn't have major
performance problems on your platform. Which brings those questions up ---
what is your RAID card and OS?

For reads, if your shared_buffers is large enough, your heavily used indexes
won't likely go to disk much at all.

A good raid controller will typically help distribute the workload
effectively on a large array.

You probably want a simple 2 disk mirror or 4 disks in raid 10 for your OS +
xlog, and the rest for data + indexes -- with hot spares IF your card
supports them.

The biggest risk to splitting up data and indexes is that you don't know how
much I/O each needs relative to each other, and if this isn't a relatively
constant ratio you will have one subset busy while the other subset is idle.
Unless you have extensively profiled your disk activity into index and data
subsets and know roughly what the optimal ratio is, its probably going to
cause more problems than it fixes.
Furthermore, if this ratio changes at all, its a maintenance nightmare. How
much each would need in a perfect world is application dependant, so there
can be no general recommendation other than: don't do it.

On Thu, Aug 21, 2008 at 1:34 AM, Christiaan Willemsen <
cwillemsen(at)technocon(dot)com> wrote:

> Thanks Joshua,
>
> So what about putting the indexes on a separate array? Since we do a lot of
> inserts indexes are going to be worked on a lot of the time.
>
> Regards,
>
> Christiaan
>
>
> Joshua D. Drake wrote:
>
>> Christiaan Willemsen wrote:
>>
>>> So, what you are basically saying, is that a single mirror is in general
>>> more than enough to facilitate the transaction log.
>>>
>>
>>
>> http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
>> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
>>
>> And to answer your question, yes. Transaction logs are written
>> sequentially. You do not need a journaled file system and raid 1 is plenty
>> for most if not all work loads.
>>
>> Sincerely,
>>
>> Joshua D. Drake
>>
>>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


From: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 15:06:13
Message-ID: 48AD8465.7090604@technocon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi Scott,

Great info! Our RAID card is at the moment a ICP vortex (Adaptec)
ICP5165BR, and I'll be using it with Ubuntu server 8.04. I tried
OpenSolaris, but it yielded even more terrible performance, specially
using ZFS.. I guess that was just a missmatch. Anyway, I'm going to
return the controller, because it does not scale very well with more
that 4 disks in raid 10. Bandwidth is limited to 350MB/sec, and IOPS
scale badly with extra disks...

So I guess, I'll be waiting for another controller first. The idea for
xlog + os on 4 disk raid 10 and the rest for the data sound good :) I
hope it will turn out that way too.. First another controller..

Regards,

Christiaan

Scott Carey wrote:
> Indexes will be random write workload, but these won't by synchronous
> writes and will be buffered by the raid controller's cache. Assuming
> you're using a hardware raid controller that is, and one that doesn't
> have major performance problems on your platform. Which brings those
> questions up --- what is your RAID card and OS?
>
> For reads, if your shared_buffers is large enough, your heavily used
> indexes won't likely go to disk much at all.
>
> A good raid controller will typically help distribute the workload
> effectively on a large array.
>
> You probably want a simple 2 disk mirror or 4 disks in raid 10 for
> your OS + xlog, and the rest for data + indexes -- with hot spares IF
> your card supports them.
>
> The biggest risk to splitting up data and indexes is that you don't
> know how much I/O each needs relative to each other, and if this isn't
> a relatively constant ratio you will have one subset busy while the
> other subset is idle.
> Unless you have extensively profiled your disk activity into index and
> data subsets and know roughly what the optimal ratio is, its probably
> going to cause more problems than it fixes.
> Furthermore, if this ratio changes at all, its a maintenance
> nightmare. How much each would need in a perfect world is application
> dependant, so there can be no general recommendation other than:
> don't do it.
>
> On Thu, Aug 21, 2008 at 1:34 AM, Christiaan Willemsen
> <cwillemsen(at)technocon(dot)com <mailto:cwillemsen(at)technocon(dot)com>> wrote:
>
> Thanks Joshua,
>
> So what about putting the indexes on a separate array? Since we do
> a lot of inserts indexes are going to be worked on a lot of the time.
>
> Regards,
>
> Christiaan
>
>
> Joshua D. Drake wrote:
>
> Christiaan Willemsen wrote:
>
> So, what you are basically saying, is that a single mirror
> is in general more than enough to facilitate the
> transaction log.
>
>
> http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
>
> http://wiki.postgresql.org/wiki/HP_ProLiant_DL380_G5_Tuning_Guide
>
> And to answer your question, yes. Transaction logs are written
> sequentially. You do not need a journaled file system and raid
> 1 is plenty for most if not all work loads.
>
> Sincerely,
>
> Joshua D. Drake
>
>
> --
> Sent via pgsql-performance mailing list
> (pgsql-performance(at)postgresql(dot)org
> <mailto:pgsql-performance(at)postgresql(dot)org>)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>
>


From: Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-21 19:14:47
Message-ID: 48ADBEA7.5060107@cheapcomplexdevices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Scott Carey wrote:
> For reads, if your shared_buffers is large enough, your heavily used
> indexes won't likely go to disk much at all.

ISTM this would happen regardless of your shared_buffers setting.
If you have enough memory the OS should cache the frequently used
pages regardless of shared_buffers; and if you don't have enough
memory it won't.

> ... splitting up data and indexes ...

FWIW, I've had a system where moving pgsql_tmp to different disks
helped more than moving indexes.


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-29 02:43:54
Message-ID: Pine.GSO.4.64.0808282231330.11207@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 21 Aug 2008, Christiaan Willemsen wrote:

> Anyway, I'm going to return the controller, because it
> does not scale very well with more that 4 disks in raid 10. Bandwidth is
> limited to 350MB/sec, and IOPS scale badly with extra disks...

How did you determine that upper limit? Usually it takes multiple
benchmark processes running at once in order to get more than 350MB/s out
of a controller. For example, if you look carefully at the end of
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
you can see that Joshua had to throw 8 threads at the disks in order to
reach maximum bandwidth.

> The idea for xlog + os on 4 disk raid 10 and the rest for the data sound
> good

I would just use a RAID1 pair for the OS, another pair for the xlog, and
throw all the other disks into a big 0+1 set. There is some value to
separating the WAL from the OS disks, from both the performance and the
management perspectives. It's nice to be able to monitor the xlog write
bandwidth rate under load easily for example.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Christiaan Willemsen <cwillemsen(at)technocon(dot)com>
To: Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: How to setup disk spindles for best performance
Date: 2008-08-29 05:07:58
Message-ID: E3A3D7E0-04D5-48F6-B498-8A02F864316D@technocon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Aug 29, 2008, at 4:43 AM, Greg Smith wrote:

> On Thu, 21 Aug 2008, Christiaan Willemsen wrote:
>
>> Anyway, I'm going to return the controller, because it does not
>> scale very well with more that 4 disks in raid 10. Bandwidth is
>> limited to 350MB/sec, and IOPS scale badly with extra disks...
>
> How did you determine that upper limit? Usually it takes multiple
> benchmark processes running at once in order to get more than 350MB/
> s out of a controller. For example, if you look carefully at the
> end of http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/
> you can see that Joshua had to throw 8 threads at the disks in
> order to reach maximum bandwidth.

I used IOmeter to do some tests, with 50 worker thread doing jobs. I
can get more than 350 MB/sec, I'll have to use huge blocksizes
(something like 8 MB). Even worse is random read and 70%read, 50%
random tests. They don't scale at all when you add disks. A 6 disk
raid 5 is exactly as fast as a 12 disk raid 10 :(

>> The idea for xlog + os on 4 disk raid 10 and the rest for the data
>> sound good
>
> I would just use a RAID1 pair for the OS, another pair for the xlog,
> and throw all the other disks into a big 0+1 set. There is some
> value to separating the WAL from the OS disks, from both the
> performance and the management perspectives. It's nice to be able
> to monitor the xlog write bandwidth rate under load easily for
> example.

Yes, that's about what I had in mind.

Kind regards,

Christiaan