Re: RAID vs. Single Big SCSI Disk

Lists: pgsql-admin
From: "G(dot) Anthony Reina" <reina(at)nsi(dot)edu>
To: "pgsql-admin(at)postgreSQL(dot)org" <pgsql-admin(at)postgreSQL(dot)org>
Subject: RAID vs. Single Big SCSI Disk
Date: 2000-12-08 02:24:20
Message-ID: 3A304654.6EB760D5@nsi.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

We have three databases for our scientific research and are getting
close to filling our 12 Gig partition. My boss thinks that just getting
a really big (i.e. > 30 Gig) SCSI drive will be cheaper and should do
nicely. Currently, we only have 4 people accessing the database and
usually only have 1-2 jobs (e.g. selects, updates, etc.) going at any
one time (probably a high estimate). The db sits on a Pentium II/400 MHz
with RedHat 6.0.

Other than mirroring, are there any other advantages (e.g. speed, cost)
of just getting a RAID controller over, say, a 73 Gig Ultra SCSI Cheetah
drive (which cost in the neighborhood of $1300).

Also, can Postgres handle being spread over several disks? I'd think
that the RAID must control disk spanning, but just want to make sure
that Postgres would be compatible.

Thanks
-Tony Reina


From: bob(at)bob(dot)usuhs(dot)mil
To: "G(dot) Anthony Reina" <reina(at)nsi(dot)edu>
Cc: "pgsql-admin(at)postgreSQL(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: RAID vs. Single Big SCSI Disk
Date: 2000-12-12 16:20:08
Message-ID: 3A365038.3E21FDAE@bob.usuhs.mil
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

"G. Anthony Reina" wrote:

> We have three databases for our scientific research and are getting
> close to filling our 12 Gig partition. My boss thinks that just getting
> a really big (i.e. > 30 Gig) SCSI drive will be cheaper and should do
> nicely. Currently, we only have 4 people accessing the database and
> usually only have 1-2 jobs (e.g. selects, updates, etc.) going at any
> one time (probably a high estimate). The db sits on a Pentium II/400 MHz
> with RedHat 6.0.
>
> Other than mirroring, are there any other advantages (e.g. speed, cost)
> of just getting a RAID controller over, say, a 73 Gig Ultra SCSI Cheetah
> drive (which cost in the neighborhood of $1300).

It sounds like you would be much better off with an Ultra ATA 66
software or hardware RAID solution. Maxtor 40 Gb ATA100 disks
can be had for $100. each. Alone they operate near 20 Mb/sec
and in a striped 2 disk Raid they can do 30-40 Mb/sec, probably
faster than your Cheetah configuration for a fraction of the cost.
3ware makes a hardware RAID controller that would get you to
40 Mb/sec with two, or 70 mb/sec with four of these disks in RAID 0.
With four disks in RAID 01 you can mirror and still get near 40 Mb/sec.
The 3ware solution also relieves your cpu from the usual ATA overhead.

>
>
> Also, can Postgres handle being spread over several disks? I'd think
> that the RAID must control disk spanning, but just want to make sure
> that Postgres would be compatible.

That is transparent.


From: Ragnar Kjørstad <postgres(at)ragnark(dot)vestdata(dot)no>
To: "G(dot) Anthony Reina" <reina(at)nsi(dot)edu>
Cc: "pgsql-admin(at)postgreSQL(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: RAID vs. Single Big SCSI Disk
Date: 2000-12-12 16:33:38
Message-ID: 20001212173338.B10484@vestdata.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Thu, Dec 07, 2000 at 06:24:20PM -0800, G. Anthony Reina wrote:
> We have three databases for our scientific research and are getting
> close to filling our 12 Gig partition. My boss thinks that just getting
> a really big (i.e. > 30 Gig) SCSI drive will be cheaper and should do
> nicely. Currently, we only have 4 people accessing the database and
> usually only have 1-2 jobs (e.g. selects, updates, etc.) going at any
> one time (probably a high estimate). The db sits on a Pentium II/400 MHz
> with RedHat 6.0.
>
> Other than mirroring, are there any other advantages (e.g. speed, cost)
> of just getting a RAID controller over, say, a 73 Gig Ultra SCSI Cheetah
> drive (which cost in the neighborhood of $1300).

A RAID can be both faster and more reliable than a single disk.
I say "can" because not all RAID configurations will be.

> Also, can Postgres handle being spread over several disks? I'd think
> that the RAID must control disk spanning, but just want to make sure
> that Postgres would be compatible.

You can spread the data over several disks by moving some of the files
and creating symlinks, or by using striping (software or hardware) or
concatenation (software or hardware).

Your alternatives for software concatenation/striping will depend on
your OS.

Software RAID will neiter be as fast (because there is no cache and
because it's using your main CPU instead of a dedicated one) nor as
reliable (because there is no battery-backup for updates) as a hardware
RAID.

--
Ragnar Kjørstad
Big Storage


From: Ragnar Kjørstad <postgres(at)ragnark(dot)vestdata(dot)no>
To: bob(at)bob(dot)usuhs(dot)mil
Cc: "G(dot) Anthony Reina" <reina(at)nsi(dot)edu>, "pgsql-admin(at)postgreSQL(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: RAID vs. Single Big SCSI Disk
Date: 2000-12-12 22:16:12
Message-ID: 20001212231612.D10484@vestdata.no
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Tue, Dec 12, 2000 at 11:20:08AM -0500, bob(at)bob(dot)usuhs(dot)mil wrote:
> > We have three databases for our scientific research and are getting
> > close to filling our 12 Gig partition. My boss thinks that just getting
> > a really big (i.e. > 30 Gig) SCSI drive will be cheaper and should do
> > nicely. Currently, we only have 4 people accessing the database and
> > usually only have 1-2 jobs (e.g. selects, updates, etc.) going at any
> > one time (probably a high estimate). The db sits on a Pentium II/400 MHz
> > with RedHat 6.0.
> >
> > Other than mirroring, are there any other advantages (e.g. speed, cost)
> > of just getting a RAID controller over, say, a 73 Gig Ultra SCSI Cheetah
> > drive (which cost in the neighborhood of $1300).
>
> It sounds like you would be much better off with an Ultra ATA 66
> software or hardware RAID solution. Maxtor 40 Gb ATA100 disks
> can be had for $100. each. Alone they operate near 20 Mb/sec
> and in a striped 2 disk Raid they can do 30-40 Mb/sec, probably
> faster than your Cheetah configuration for a fraction of the cost.
> 3ware makes a hardware RAID controller that would get you to
> 40 Mb/sec with two, or 70 mb/sec with four of these disks in RAID 0.
> With four disks in RAID 01 you can mirror and still get near 40 Mb/sec.
> The 3ware solution also relieves your cpu from the usual ATA overhead.

There is more to a disk than just teoretical througput:

* SCSI disks support tagged command queueing (TCQ) this means it can
execute multiple requests at the same time, and that means less pause
between requests and optimal request ordering.
* Disk cache; Seagate Cheeta disk come with up to 16 MB of cache - it
makes a big difference for performance.
(and hardware RAID controllers come with several hunded MB of cache)
* Seek time
* Remapping bad blocks

Theese are not strictly IDE vs SCSI issues, but SCSI disks are usually
intended for more high-end market, so they usually score better than the
average IDE-disk. Some IDE raid controllers "fix" some of theese
problem, by having internal cache on the controller and use a scsi
interface with TCQ to connect to the host - but as far as I know, 3ware
does not.

Another thing is that some IDE disk (Maxtor is one of them, I think)
come with write-back cache enabled, without any battery-backup. This
means when you write to disk, the disk will only put the data in cache
and write it to disk later. This improves performance, but it will kill
your data if your application relies on write-ordering. In other words -
if your system crashes while writing to disk, it's likely that your
database is corrupt when booting up again!

OK, no attempt to start another SCSI vs IDE flamewar :-)

--
Ragnar Kjørstad
Big Storage