Re: Bad iostat numbers

Lists: pgsql-performance
From: "Carlos H(dot) Reimer" <carlos(dot)reimer(at)opendb(dot)com(dot)br>
To: <pgsql-performance(at)postgresql(dot)org>
Subject: Bad iostat numbers
Date: 2006-12-01 00:44:25
Message-ID: PEEPKDFEHHEMKBBFPOOKKEHCDKAA.carlos.reimer@opendb.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I was called to find out why one of our PostgreSQL servers has not a
satisfatory response time. The server has only two SCSI disks configurated
as a RAID1 software.

While collecting performance data I discovered very bad numbers in the I/O
subsystem and I would like to know if I´m thinking correctly.

Here is a typical iostat -x:

avg-cpu: %user %nice %system %iowait %idle

50.40 0.00 0.50 1.10 48.00

Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util

sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80 56.80
22.82 570697.50 10.59 147.06 100.00

sdb 0.20 7.80 0.60 6.40 40.00 113.60 20.00 56.80
21.94 570697.50 9.83 142.86 100.00

md1 0.00 0.00 1.20 13.40 81.60 107.20 40.80 53.60
12.93 0.00 0.00 0.00 0.00

md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00 0.00

Are they not saturated?

What kind of parameters should I pay attention when comparing SCSI
controllers and disks? I would like to discover how much cache is present in
the controller, how can I find this value from Linux?

Thank you in advance!

dmesg output:

...

SCSI subsystem initialized

ACPI: PCI Interrupt 0000:04:02.0[A] -> GSI 18 (level, low) -> IRQ 18

scsi0 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0

<Adaptec (Dell OEM) 39320 Ultra320 SCSI adapter>

aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI 33 or 66Mhz, 512
SCBs

Vendor: SEAGATE Model: ST336607LW Rev: DS10

Type: Direct-Access ANSI SCSI revision: 03

target0:0:0: asynchronous

scsi0:A:0:0: Tagged Queuing enabled. Depth 4

target0:0:0: Beginning Domain Validation

target0:0:0: wide asynchronous

target0:0:0: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RDSTRM RTI WRFLOW
PCOMP (6.25 ns, offset 63)

target0:0:0: Ending Domain Validation

SCSI device sda: 71132959 512-byte hdwr sectors (36420 MB)

sda: Write Protect is off

sda: Mode Sense: ab 00 10 08

SCSI device sda: drive cache: write back w/ FUA

SCSI device sda: 71132959 512-byte hdwr sectors (36420 MB)

sda: Write Protect is off

sda: Mode Sense: ab 00 10 08

SCSI device sda: drive cache: write back w/ FUA

sda: sda1 sda2 sda3

sd 0:0:0:0: Attached scsi disk sda

Vendor: SEAGATE Model: ST336607LW Rev: DS10

Type: Direct-Access ANSI SCSI revision: 03

target0:0:1: asynchronous

scsi0:A:1:0: Tagged Queuing enabled. Depth 4

target0:0:1: Beginning Domain Validation

target0:0:1: wide asynchronous

target0:0:1: FAST-160 WIDE SCSI 320.0 MB/s DT IU QAS RDSTRM RTI WRFLOW
PCOMP (6.25 ns, offset 63)

target0:0:1: Ending Domain Validation

SCSI device sdb: 71132959 512-byte hdwr sectors (36420 MB)

sdb: Write Protect is off

sdb: Mode Sense: ab 00 10 08

SCSI device sdb: drive cache: write back w/ FUA

SCSI device sdb: 71132959 512-byte hdwr sectors (36420 MB)

sdb: Write Protect is off

sdb: Mode Sense: ab 00 10 08

SCSI device sdb: drive cache: write back w/ FUA

sdb: sdb1 sdb2 sdb3

sd 0:0:1:0: Attached scsi disk sdb

ACPI: PCI Interrupt 0000:04:02.1[B] -> GSI 19 (level, low) -> IRQ 19

scsi1 : Adaptec AIC79XX PCI-X SCSI HBA DRIVER, Rev 3.0

<Adaptec (Dell OEM) 39320 Ultra320 SCSI adapter>

aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI 33 or 66Mhz, 512
SCBs

...

Reimer


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: carlos(dot)reimer(at)opendb(dot)com(dot)br
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-01 01:47:24
Message-ID: 456F89AC.9070207@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Carlos H. Reimer wrote:
> While collecting performance data I discovered very bad numbers in the
> I/O subsystem and I would like to know if I´m thinking correctly.
>
> Here is a typical iostat -x:
>
>
> avg-cpu: %user %nice %system %iowait %idle
>
> 50.40 0.00 0.50 1.10 48.00
>
>
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> avgrq-sz avgqu-sz await svctm %util
>
> sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80
> 56.80 22.82 570697.50 10.59 147.06 100.00
>
> sdb 0.20 7.80 0.60 6.40 40.00 113.60 20.00
> 56.80 21.94 570697.50 9.83 142.86 100.00
>
> md1 0.00 0.00 1.20 13.40 81.60 107.20 40.80
> 53.60 12.93 0.00 0.00 0.00 0.00
>
> md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00
>
>
>
> Are they not saturated?
>

They look it (if I'm reading your typical numbers correctly) - %util 100
and svctime in the region of 100 ms!

On the face of it, looks like you need something better than a RAID1
setup - probably RAID10 (RAID5 is probably no good as you are writing
more than you are reading it seems). However read on...

If this is a sudden change in system behavior, then it is probably worth
trying to figure out what is causing it (i.e which queries) - for
instance it might be that you have some new queries that are doing disk
based sorts (this would mean you really need more memory rather than
better disk...)

Cheers

Mark


From: David Boreham <david_list(at)boreham(dot)org>
To: carlos(dot)reimer(at)opendb(dot)com(dot)br
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-01 02:24:34
Message-ID: 456F9262.4080207@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Carlos H. Reimer wrote:

>
>
> avg-cpu: %user %nice %system %iowait %idle
>
> 50.40 0.00 0.50 1.10 48.00
>
>
>
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> avgrq-sz avgqu-sz await svctm %util
>
> sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80
> 56.80 22.82 570697.50 10.59 147.06 100.00
>
> sdb 0.20 7.80 0.60 6.40 40.00 113.60 20.00
> 56.80 21.94 570697.50 9.83 142.86 100.00
>
> md1 0.00 0.00 1.20 13.40 81.60 107.20 40.80
> 53.60 12.93 0.00 0.00 0.00 0.00
>
> md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> 0.00 0.00 0.00 0.00 0.00 0.00
>
>
>
> Are they not saturated?
>
>
>
> What kind of parameters should I pay attention when comparing SCSI
> controllers and disks? I would like to discover how much cache is
> present in the controller, how can I find this value from Linux?
>
>
These number look a bit strange. I am wondering if there is a hardware
problem on one of the drives
or on the controller. Check in syslog for messages about disk timeouts
etc. 100% util but 6 writes/s
is just wrong (unless the drive is a 1980's vintage floppy).


From: Mark Kirkwood <markir(at)paradise(dot)net(dot)nz>
To: david_list(at)boreham(dot)org
Cc: carlos(dot)reimer(at)opendb(dot)com(dot)br, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-01 03:00:05
Message-ID: 456F9AB5.1070409@paradise.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

David Boreham wrote:

>
> These number look a bit strange. I am wondering if there is a hardware
> problem on one of the drives
> or on the controller. Check in syslog for messages about disk timeouts
> etc. 100% util but 6 writes/s
> is just wrong (unless the drive is a 1980's vintage floppy).
>

Agreed - good call, I was misreading the wkB/s as wMB/s...


From: "Carlos H(dot) Reimer" <carlos(dot)reimer(at)opendb(dot)com(dot)br>
To: <david_list(at)boreham(dot)org>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: RES: Bad iostat numbers
Date: 2006-12-01 14:29:11
Message-ID: PEEPKDFEHHEMKBBFPOOKOEIEDKAA.carlos.reimer@opendb.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

I´ve taken a look in the /var/log/messages and found some temperature
messages about the disk drives:

Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2
Celsius to 51 Celsius since last report

Can this temperature influence in the performance?

Reimer

> -----Mensagem original-----
> De: David Boreham [mailto:david_list(at)boreham(dot)org]
> Enviada em: sexta-feira, 1 de dezembro de 2006 00:25
> Para: carlos(dot)reimer(at)opendb(dot)com(dot)br
> Cc: pgsql-performance(at)postgresql(dot)org
> Assunto: Re: [PERFORM] Bad iostat numbers
>
>
> Carlos H. Reimer wrote:
>
> >
> >
> > avg-cpu: %user %nice %system %iowait %idle
> >
> > 50.40 0.00 0.50 1.10 48.00
> >
> >
> >
> > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> > avgrq-sz avgqu-sz await svctm %util
> >
> > sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80
> > 56.80 22.82 570697.50 10.59 147.06 100.00
> >
> > sdb 0.20 7.80 0.60 6.40 40.00 113.60 20.00
> > 56.80 21.94 570697.50 9.83 142.86 100.00
> >
> > md1 0.00 0.00 1.20 13.40 81.60 107.20 40.80
> > 53.60 12.93 0.00 0.00 0.00 0.00
> >
> > md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00 0.00 0.00
> >
> >
> >
> > Are they not saturated?
> >
> >
> >
> > What kind of parameters should I pay attention when comparing SCSI
> > controllers and disks? I would like to discover how much cache is
> > present in the controller, how can I find this value from Linux?
> >
> >
> These number look a bit strange. I am wondering if there is a hardware
> problem on one of the drives
> or on the controller. Check in syslog for messages about disk timeouts
> etc. 100% util but 6 writes/s
> is just wrong (unless the drive is a 1980's vintage floppy).
>
>
>
>


From: "Carlos H(dot) Reimer" <carlos(dot)reimer(at)opendb(dot)com(dot)br>
To: "Mark Kirkwood" <markir(at)paradise(dot)net(dot)nz>
Cc: <pgsql-performance(at)postgresql(dot)org>
Subject: RES: Bad iostat numbers
Date: 2006-12-01 14:47:28
Message-ID: PEEPKDFEHHEMKBBFPOOKCEIGDKAA.carlos.reimer@opendb.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Hi,

If you look the iostat data it shows that the system is doing much more
writes than reads. It is strange, because if you look in the pg_stat tables
we see a complete different scenario. Much more reads than writes. I was
monitoring the presence of temporary files in the data directory what could
denote big sorts, but nothing there too.

But I think it is explained because of the high number of indexes present in
those tables. One write in the base table, many others in the indexes.

Well, about the server behaviour, it has not changed suddenly but the
performance is becoming worse day by day.

Reimer

> -----Mensagem original-----
> De: Mark Kirkwood [mailto:markir(at)paradise(dot)net(dot)nz]
> Enviada em: quinta-feira, 30 de novembro de 2006 23:47
> Para: carlos(dot)reimer(at)opendb(dot)com(dot)br
> Cc: pgsql-performance(at)postgresql(dot)org
> Assunto: Re: [PERFORM] Bad iostat numbers
>
>
> Carlos H. Reimer wrote:
> > While collecting performance data I discovered very bad numbers in the
> > I/O subsystem and I would like to know if I´m thinking correctly.
> >
> > Here is a typical iostat -x:
> >
> >
> > avg-cpu: %user %nice %system %iowait %idle
> >
> > 50.40 0.00 0.50 1.10 48.00
> >
> >
> >
> > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> > avgrq-sz avgqu-sz await svctm %util
> >
> > sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80
> > 56.80 22.82 570697.50 10.59 147.06 100.00
> >
> > sdb 0.20 7.80 0.60 6.40 40.00 113.60 20.00
> > 56.80 21.94 570697.50 9.83 142.86 100.00
> >
> > md1 0.00 0.00 1.20 13.40 81.60 107.20 40.80
> > 53.60 12.93 0.00 0.00 0.00 0.00
> >
> > md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
> > 0.00 0.00 0.00 0.00 0.00 0.00
> >
> >
> >
> > Are they not saturated?
> >
>
> They look it (if I'm reading your typical numbers correctly) - %util 100
> and svctime in the region of 100 ms!
>
> On the face of it, looks like you need something better than a RAID1
> setup - probably RAID10 (RAID5 is probably no good as you are writing
> more than you are reading it seems). However read on...
>
> If this is a sudden change in system behavior, then it is probably worth
> trying to figure out what is causing it (i.e which queries) - for
> instance it might be that you have some new queries that are doing disk
> based sorts (this would mean you really need more memory rather than
> better disk...)
>
> Cheers
>
> Mark
>
>
>
>
>


From: David Boreham <david_list(at)boreham(dot)org>
To: carlos(dot)reimer(at)opendb(dot)com(dot)br
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: RES: Bad iostat numbers
Date: 2006-12-01 15:32:03
Message-ID: 45704AF3.9090309@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Carlos H. Reimer wrote:

>I´ve taken a look in the /var/log/messages and found some temperature
>messages about the disk drives:
>
>Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2
>Celsius to 51 Celsius since last report
>
>Can this temperature influence in the performance?
>
>
it can influence 'working-ness' which I guess in turn affects performance ;)

But I'm not sure if 50C is too high for a disk drive, it might be ok.

If you are able to, I'd say just replace the drives and see if that
improves things.


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 05:44:47
Message-ID: Pine.GSO.4.64.0612032336250.19679@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 30 Nov 2006, Carlos H. Reimer wrote:

> I would like to discover how much cache is present in
> the controller, how can I find this value from Linux?

As far as I know there is no cache on an Adaptec 39320. The write-back
cache Linux was reporting on was the one in the drives, which is 8MB; see
http://www.seagate.com/cda/products/discsales/enterprise/tech/1,1593,541,00.html
Be warned that running your database with the combination of an uncached
controller plus disks with write caching is dangerous to your database
integrity.

There is a common problem with the Linux driver for this card (aic7902)
where it enters what's they're calling an "Infinite Interrupt Loop".
That seems to match your readings:

> Here is a typical iostat -x:
> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80 56.80
> avgrq-sz avgqu-sz await svctm %util
> 22.82 570697.50 10.59 147.06 100.00

An avgqu-sz of 570697.50 is extremely large. That explains why the
utilization is 100%, because there's a massive number of I/O operations
queued up that aren't getting flushed out. The read and write data says
these drives are barely doing anything, as 20kB/s and 57KB/s are
practically idle; they're not even remotely close to saturated.

See http://lkml.org/lkml/2005/10/1/47 for a suggested workaround that may
reduce the magnitude of this issue; lower the card's speed to U160 in the
BIOS was also listed as a useful workaround. You might get better results
by upgrading to a newer Linux kernel, and just rebooting to clear out the
garbage might help if you haven't tried that yet.

On the pessimistic side, other people reporting issues with this
controller are:

http://lkml.org/lkml/2005/12/17/55
http://www.ussg.iu.edu/hypermail/linux/kernel/0512.2/0390.html
http://www.linuxforums.org/forum/peripherals-hardware/59306-scsi-hangs-boot.html
and even under FreeBSD at
http://lists.freebsd.org/pipermail/aic7xxx/2003-August/003973.html

This Adaptec card just barely works under Linux, which happens regularly
with their controllers, and my guess is that you've run into one of the
ways it goes crazy sometimes. I just chuckled when checking
http://linux.adaptec.com/ again and noticing they can't even be bothered
to keep that server up at all. According to
http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=ASC-39320-R&dn=Adaptec+SCSI+Card+39320-R
the driver for your card is "*minimally tested* for Linux Kernel v2.6 on
all platforms." Adaptec doesn't care about Linux support on their
products; if you want a SCSI controller that actually works under Linux,
get an LSI MegaRAID.

If this were really a Postgres problem, I wouldn't expect %iowait=1.10.
Were the database engine waiting to read/write data, that number would be
dramatically higher. Whatever is generating all these I/O requests, it's
not waiting for them to complete like the database would be. Besides the
driver problems that I'm very suspicious of, I'd suspect a runaway process
writing garbage to the disks might also cause this behavior.

> Ive taken a look in the /var/log/messages and found some temperature
> messages about the disk drives:
> Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature changed 2
> Celsius to 51 Celsius since last report
> Can this temperature influence in the performance?

That's close to the upper tolerance for this drive (55 degrees), which
means the drive is being cooked and will likely wear out quickly. But
that won't slow it down, and you'd get much scarier messages out of smartd
if the drives had a real problem. You should improve cooling in this case
if you want to drives to have a healthy life, odds are low this is
relevant to your performance issue though.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 07:17:48
Message-ID: 33c6269f0612032317t9f0bc77n3ff791a5cd88fc80@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

People recommend LSI MegaRAID controllers on here regularly, but I have
found that they do not work that well. I have bonnie++ numbers that show
the controller is not performing anywhere near the disk's saturation level
in a simple RAID 1 on RedHat Linux EL4 on two seperate machines provided by
two different hosting companies. In one case I asked them to replace the
card, and the numbers got a bit better, but still not optimal.

LSI MegaRAID has proved to be a bit of a disapointment. I have seen better
numbers from the HP SmartArray 6i, and from 3ware cards with 7200RPM SATA
drives.

for the output: http://www.infoconinc.com/test/bonnie++.html (the first line
is a six drive RAID 10 on a 3ware 9500S, the next three are all RAID 1s on
LSI MegaRAID controllers, verified by lspci).

Alex.

On 12/4/06, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>
> On Thu, 30 Nov 2006, Carlos H. Reimer wrote:
>
> > I would like to discover how much cache is present in
> > the controller, how can I find this value from Linux?
>
> As far as I know there is no cache on an Adaptec 39320. The write-back
> cache Linux was reporting on was the one in the drives, which is 8MB; see
>
> http://www.seagate.com/cda/products/discsales/enterprise/tech/1,1593,541,00.html
> Be warned that running your database with the combination of an uncached
> controller plus disks with write caching is dangerous to your database
> integrity.
>
> There is a common problem with the Linux driver for this card (aic7902)
> where it enters what's they're calling an "Infinite Interrupt Loop".
> That seems to match your readings:
>
> > Here is a typical iostat -x:
> > Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s rkB/s wkB/s
> > sda 0.00 7.80 0.40 6.40 41.60 113.60 20.80 56.80
> > avgrq-sz avgqu-sz await svctm %util
> > 22.82 570697.50 10.59 147.06 100.00
>
> An avgqu-sz of 570697.50 is extremely large. That explains why the
> utilization is 100%, because there's a massive number of I/O operations
> queued up that aren't getting flushed out. The read and write data says
> these drives are barely doing anything, as 20kB/s and 57KB/s are
> practically idle; they're not even remotely close to saturated.
>
> See http://lkml.org/lkml/2005/10/1/47 for a suggested workaround that may
> reduce the magnitude of this issue; lower the card's speed to U160 in the
> BIOS was also listed as a useful workaround. You might get better results
> by upgrading to a newer Linux kernel, and just rebooting to clear out the
> garbage might help if you haven't tried that yet.
>
> On the pessimistic side, other people reporting issues with this
> controller are:
>
> http://lkml.org/lkml/2005/12/17/55
> http://www.ussg.iu.edu/hypermail/linux/kernel/0512.2/0390.html
>
> http://www.linuxforums.org/forum/peripherals-hardware/59306-scsi-hangs-boot.html
> and even under FreeBSD at
> http://lists.freebsd.org/pipermail/aic7xxx/2003-August/003973.html
>
> This Adaptec card just barely works under Linux, which happens regularly
> with their controllers, and my guess is that you've run into one of the
> ways it goes crazy sometimes. I just chuckled when checking
> http://linux.adaptec.com/ again and noticing they can't even be bothered
> to keep that server up at all. According to
>
> http://www.adaptec.com/en-US/downloads/linux_source/linux_source_code?productId=ASC-39320-R&dn=Adaptec+SCSI+Card+39320-R
> the driver for your card is "*minimally tested* for Linux Kernel v2.6 on
> all platforms." Adaptec doesn't care about Linux support on their
> products; if you want a SCSI controller that actually works under Linux,
> get an LSI MegaRAID.
>
> If this were really a Postgres problem, I wouldn't expect %iowait=1.10.
> Were the database engine waiting to read/write data, that number would be
> dramatically higher. Whatever is generating all these I/O requests, it's
> not waiting for them to complete like the database would be. Besides the
> driver problems that I'm very suspicious of, I'd suspect a runaway process
> writing garbage to the disks might also cause this behavior.
>
> > Ive taken a look in the /var/log/messages and found some temperature
> > messages about the disk drives:
> > Nov 30 11:08:07 totall smartd[1620]: Device: /dev/sda, Temperature
> changed 2
> > Celsius to 51 Celsius since last report
> > Can this temperature influence in the performance?
>
> That's close to the upper tolerance for this drive (55 degrees), which
> means the drive is being cooked and will likely wear out quickly. But
> that won't slow it down, and you'd get much scarier messages out of smartd
> if the drives had a real problem. You should improve cooling in this case
> if you want to drives to have a healthy life, odds are low this is
> relevant to your performance issue though.
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Alex Turner <armtuk(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 16:25:14
Message-ID: 1165249513.14565.329.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 2006-12-04 at 01:17, Alex Turner wrote:
> People recommend LSI MegaRAID controllers on here regularly, but I
> have found that they do not work that well. I have bonnie++ numbers
> that show the controller is not performing anywhere near the disk's
> saturation level in a simple RAID 1 on RedHat Linux EL4 on two
> seperate machines provided by two different hosting companies. In one
> case I asked them to replace the card, and the numbers got a bit
> better, but still not optimal.
>
> LSI MegaRAID has proved to be a bit of a disapointment. I have seen
> better numbers from the HP SmartArray 6i, and from 3ware cards with
> 7200RPM SATA drives.
>
> for the output: http://www.infoconinc.com/test/bonnie++.html (the
> first line is a six drive RAID 10 on a 3ware 9500S, the next three are
> all RAID 1s on LSI MegaRAID controllers, verified by lspci).

Wait, you're comparing a MegaRAID running a RAID 1 against another
controller running a 6 disk RAID10? That's hardly fair.

My experience with the LSI was that with the 1.18 series drivers, they
were slow but stable.

With the version 2.x drivers, I found that the performance was very good
with RAID-5 and fair with RAID-1 and that layered RAID was not any
better than unlayered (i.e. layering RAID0 over RAID1 resulted in basic
RAID-1 performance).

OTOH, with the choice at my last place of employment being LSI or
Adaptec, LSI was a much better choice. :)

I'd ask which LSI megaraid you've tested, and what driver was used.
Does RHEL4 have the megaraid 2 driver?


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Alex Turner <armtuk(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 16:37:34
Message-ID: 1165250254.14565.333.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 2006-12-04 at 10:25, Scott Marlowe wrote:
>
> OTOH, with the choice at my last place of employment being LSI or
> Adaptec, LSI was a much better choice. :)
>
> I'd ask which LSI megaraid you've tested, and what driver was used.
> Does RHEL4 have the megaraid 2 driver?

Just wanted to add that what we used our database for at my last company
was for lots of mostly small writes / reads. I.e. sequential throughput
didn't really matter, but random write speed did. for that application,
the LSI Megaraid with battery backed cache was great.

Last point, bonnie++ is a good benchmarking tool, but until you test
your app / postgresql on top of the hardware, you can't really say how
well it will perform.

A controller that looks fast under a single bonnie++ thread might
perform poorly when there are 100+ pending writes, and vice versa, a
controller that looks mediocre under bonnie++ might shine when there's
heavy parallel write load to handle.


From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "Scott Marlowe" <smarlowe(at)g2switchworks(dot)com>
Cc: "Greg Smith" <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 17:37:29
Message-ID: 33c6269f0612040937l276c5178naecac1983d355e29@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The RAID 10 was in there merely for filling in, not really as a compare,
indeed it would be ludicrous to compare a RAID 1 to a 6 drive RAID 10!!

How do I find out if it has version 2 of the driver?

This discussion I think is important, as I think it would be useful for this
list to have a list of RAID cards that _do_ work well under Linux/BSD for
people as recommended hardware for Postgresql. So far, all I can recommend
is what I've found to be good, which is 3ware 9500 series cards with 10k
SATA drives. Throughput was great until you reached higher levels of RAID
10 (the bonnie++ mark I posted showed write speed is a bit slow). But that
doesn't solve the problem for SCSI. What cards in the SCSI arena solve the
problem optimally? Why should we settle for sub-optimal performance in SCSI
when there are a number of almost optimally performing cards in the SATA
world (Areca, 3Ware/AMCC, LSI).

Thanks,

Alex

On 12/4/06, Scott Marlowe <smarlowe(at)g2switchworks(dot)com> wrote:
>
> On Mon, 2006-12-04 at 01:17, Alex Turner wrote:
> > People recommend LSI MegaRAID controllers on here regularly, but I
> > have found that they do not work that well. I have bonnie++ numbers
> > that show the controller is not performing anywhere near the disk's
> > saturation level in a simple RAID 1 on RedHat Linux EL4 on two
> > seperate machines provided by two different hosting companies. In one
> > case I asked them to replace the card, and the numbers got a bit
> > better, but still not optimal.
> >
> > LSI MegaRAID has proved to be a bit of a disapointment. I have seen
> > better numbers from the HP SmartArray 6i, and from 3ware cards with
> > 7200RPM SATA drives.
> >
> > for the output: http://www.infoconinc.com/test/bonnie++.html (the
> > first line is a six drive RAID 10 on a 3ware 9500S, the next three are
> > all RAID 1s on LSI MegaRAID controllers, verified by lspci).
>
> Wait, you're comparing a MegaRAID running a RAID 1 against another
> controller running a 6 disk RAID10? That's hardly fair.
>
> My experience with the LSI was that with the 1.18 series drivers, they
> were slow but stable.
>
> With the version 2.x drivers, I found that the performance was very good
> with RAID-5 and fair with RAID-1 and that layered RAID was not any
> better than unlayered (i.e. layering RAID0 over RAID1 resulted in basic
> RAID-1 performance).
>
> OTOH, with the choice at my last place of employment being LSI or
> Adaptec, LSI was a much better choice. :)
>
> I'd ask which LSI megaraid you've tested, and what driver was used.
> Does RHEL4 have the megaraid 2 driver?
>


From: Michael Stone <mstone+postgres(at)mathom(dot)us>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 17:43:22
Message-ID: 20061204174320.GX1622@mathom.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Dec 04, 2006 at 12:37:29PM -0500, Alex Turner wrote:
>This discussion I think is important, as I think it would be useful for this
>list to have a list of RAID cards that _do_ work well under Linux/BSD for
>people as recommended hardware for Postgresql. So far, all I can recommend
>is what I've found to be good, which is 3ware 9500 series cards with 10k
>SATA drives. Throughput was great until you reached higher levels of RAID
>10 (the bonnie++ mark I posted showed write speed is a bit slow). But that
>doesn't solve the problem for SCSI. What cards in the SCSI arena solve the
>problem optimally? Why should we settle for sub-optimal performance in SCSI
>when there are a number of almost optimally performing cards in the SATA
>world (Areca, 3Ware/AMCC, LSI).

Well, one factor is to be more precise about what you're looking for; a
HBA != RAID controller, and you may be comparing apples and oranges. (If
you have an external array with an onboard controller you probably want
a simple HBA rather than a RAID controller.)

Mike Stone


From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 17:52:46
Message-ID: 33c6269f0612040952l461407dcsd52b3f2c57825431@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

http://en.wikipedia.org/wiki/RAID_controller

Alex

On 12/4/06, Michael Stone <mstone+postgres(at)mathom(dot)us> wrote:
>
> On Mon, Dec 04, 2006 at 12:37:29PM -0500, Alex Turner wrote:
> >This discussion I think is important, as I think it would be useful for
> this
> >list to have a list of RAID cards that _do_ work well under Linux/BSD for
> >people as recommended hardware for Postgresql. So far, all I can
> recommend
> >is what I've found to be good, which is 3ware 9500 series cards with 10k
> >SATA drives. Throughput was great until you reached higher levels of
> RAID
> >10 (the bonnie++ mark I posted showed write speed is a bit slow). But
> that
> >doesn't solve the problem for SCSI. What cards in the SCSI arena solve
> the
> >problem optimally? Why should we settle for sub-optimal performance in
> SCSI
> >when there are a number of almost optimally performing cards in the SATA
> >world (Areca, 3Ware/AMCC, LSI).
>
> Well, one factor is to be more precise about what you're looking for; a
> HBA != RAID controller, and you may be comparing apples and oranges. (If
> you have an external array with an onboard controller you probably want
> a simple HBA rather than a RAID controller.)
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


From: Michael Stone <mstone+postgres(at)mathom(dot)us>
To: Alex Turner <armtuk(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 18:03:17
Message-ID: 20061204180315.GY1622@mathom.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, Dec 04, 2006 at 12:52:46PM -0500, Alex Turner wrote:
>http://en.wikipedia.org/wiki/RAID_controller

What is the wikipedia quote supposed to prove? Pray tell, if you
consider RAID==HBA, what would you call a SCSI (e.g.) controller that
has no RAID functionality? If you'd call it an HBA, then there is a
useful distinction to be made, no?

Mike Stone


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Michael Stone <mstone+postgres(at)mathom(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 18:05:15
Message-ID: 1165255515.14565.337.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 2006-12-04 at 11:43, Michael Stone wrote:
> On Mon, Dec 04, 2006 at 12:37:29PM -0500, Alex Turner wrote:
> >This discussion I think is important, as I think it would be useful for this
> >list to have a list of RAID cards that _do_ work well under Linux/BSD for
> >people as recommended hardware for Postgresql. So far, all I can recommend
> >is what I've found to be good, which is 3ware 9500 series cards with 10k
> >SATA drives. Throughput was great until you reached higher levels of RAID
> >10 (the bonnie++ mark I posted showed write speed is a bit slow). But that
> >doesn't solve the problem for SCSI. What cards in the SCSI arena solve the
> >problem optimally? Why should we settle for sub-optimal performance in SCSI
> >when there are a number of almost optimally performing cards in the SATA
> >world (Areca, 3Ware/AMCC, LSI).
>
> Well, one factor is to be more precise about what you're looking for; a
> HBA != RAID controller, and you may be comparing apples and oranges. (If
> you have an external array with an onboard controller you probably want
> a simple HBA rather than a RAID controller.)

I think he's been pretty clear. He's just talking about SCSI based RAID
controllers is all.


From: Scott Marlowe <smarlowe(at)g2switchworks(dot)com>
To: Alex Turner <armtuk(at)gmail(dot)com>
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-04 18:13:12
Message-ID: 1165255991.14565.346.camel@state.g2switchworks.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 2006-12-04 at 11:37, Alex Turner wrote:
> The RAID 10 was in there merely for filling in, not really as a
> compare, indeed it would be ludicrous to compare a RAID 1 to a 6 drive
> RAID 10!!
>
> How do I find out if it has version 2 of the driver?

Go to the directory it lives in (on my Fedora Core 2 box, it's in
something like: /lib/modules/2.6.10-1.9_FC2/kernel/drivers/scsi )
and run modinfo on the driver:

modinfo megaraid.ko
author: LSI Logic Corporation
description: LSI Logic MegaRAID driver
license: GPL
version: 2.00.3

SNIPPED extra stuff

> This discussion I think is important, as I think it would be useful
> for this list to have a list of RAID cards that _do_ work well under
> Linux/BSD for people as recommended hardware for Postgresql. So far,
> all I can recommend is what I've found to be good, which is 3ware 9500
> series cards with 10k SATA drives. Throughput was great until you
> reached higher levels of RAID 10 (the bonnie++ mark I posted showed
> write speed is a bit slow). But that doesn't solve the problem for
> SCSI. What cards in the SCSI arena solve the problem optimally? Why
> should we settle for sub-optimal performance in SCSI when there are a
> number of almost optimally performing cards in the SATA world (Areca,
> 3Ware/AMCC, LSI).

Well, I think the LSI works VERY well under linux. And I've always made
it quite clear in my posts that while I find it an acceptable performer,
my main recommendation is based on it's stability, not speed, and that
the Areca and 3Ware cards are generally regarded as faster. And all
three beat the adaptecs which are observed as being rather unstable.

Does this LSI have battery backed cache? Are you testing it under heavy
parallel load versus single threaded to get an idea how it scales with
multiple processes hitting it at once?

Don't get me wrong, I'm a big fan of running tools like bonnie to get a
basic idea of how good the hardware is, but benchmarks that simulate
real production loads are the only ones worth putting your trust in.


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 04:10:00
Message-ID: Pine.GSO.4.64.0612041946120.9323@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Mon, 4 Dec 2006, Alex Turner wrote:

> People recommend LSI MegaRAID controllers on here regularly, but I have
> found that they do not work that well. I have bonnie++ numbers that
> show the controller is not performing anywhere near the disk's
> saturation level in a simple RAID 1 on RedHat Linux EL4 on two seperate
> machines provided by two different hosting companies.
> http://www.infoconinc.com/test/bonnie++.html

I don't know what's going on with your www-september-06 machine, but the
other two are giving 32-40MB/s writes and 53-68MB/s reads. For a RAID-1
volume, these aren't awful numbers, but I agree they're not great.

My results are no better. For your comparison, here's a snippet of
bonnie++ results from one of my servers: RHEL 4, P4 3GHz, MegaRAID
firmware 1L37, write-thru cache setup, RAID 1; I think the drives are 10K
RPM Seagate Cheetahs. This is from the end of the drive where performance
is the worst (I partitioned the important stuff at the beginning where
it's fastest and don't have enough free space to run bonnie there):

------Sequential Output------ --Sequential Input- --Random-
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
20708 50 21473 9 9603 3 34419 72 55799 7 467.1 1

21Mb/s writes, 56MB/s reads. Not too different from yours (especially if
your results were from the beginning of the disk), and certainly nothing
special. I might be able to tune the write performance higher if I cared;
the battery backed cache sits unused and everything is tuned for paranoia
rather than performance. On this machine it doesn't matter.

The thing is, even though it's rarely the top performing card even when
setup perfectly, the LSI SCSI Megaraid just works. The driver is stable,
caching behavior is well defined, it's a pleasure to administer. I'm
never concerned that it's lying to me or doing anything to put data at
risk. The command-line tools for Linux work perfectly, let me look at or
control whatever I want, and it was straighforward for me to make my own
customized monitoring script using them.

> LSI MegaRAID has proved to be a bit of a disapointment. I have seen
> better numbers from the HP SmartArray 6i, and from 3ware cards with
> 7200RPM SATA drives.

Whereas although I use 7200RPM SATA drives, I always try to keep an eye on
them because I never really trust them. The performance list archives
here also have plenty of comments about people having issues with the
SmartArray controllers; search the archives for "cciss" and you'll see
what I'm talking about.

The Megaraid controller is very boring. That's why I like it. As a Linux
distribution, RedHat has similar characteristics. If I were going for a
performance setup, I'd dump that, too, for something sexier with a newish
kernel. It all depends on which side of the performance/stability
tradeoff you're aiming at.

On Mon, 4 Dec 2006, Scott Marlowe wrote:
> Does RHEL4 have the megaraid 2 driver?

This is from the moderately current RHEL4 installation I had results from
above. Redhat has probably done a kernel rev since I last updated back in
September, haven't needed or wanted to reboot since then:

megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
megaraid: 2.20.4.6-rh2 (Release Date: Wed Jun 28 12:27:22 EST 2006)

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: "Greg Smith" <gsmith(at)gregsmith(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 06:21:38
Message-ID: 33c6269f0612042221x5820e98bl96f8d0dc4a87bd3f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

I agree, that MegaRAID is very stable, and it's very appealing from that
perspective. And two years ago I would have never even mentioned cciss
based cards on this list, because they sucked wind big time, but I believe
some people have started seeing better number from the 6i. 20MB/sec write,
when the number should be closer to 60.... thats off by a factor of 3. For
my data wharehouse application, thats a big difference, and if I can get a
better number from 7200RPM drives and a good SATA controller, I'm gonna do
that because my data isn't OLTP, and I don't care if the whole system shits
itself and I have to restore from backup one day.

My other and most important point is that I can't find any solid
recommendations for a SCSI card that can perform optimally in Linux or
*BSD. Off by a factor of 3x is pretty sad IMHO. (and yes, we know the
Adaptec cards suck worse, that doesn't bring us to a _good_ card).

Alex.

On 12/4/06, Greg Smith <gsmith(at)gregsmith(dot)com> wrote:
>
> On Mon, 4 Dec 2006, Alex Turner wrote:
>
> > People recommend LSI MegaRAID controllers on here regularly, but I have
> > found that they do not work that well. I have bonnie++ numbers that
> > show the controller is not performing anywhere near the disk's
> > saturation level in a simple RAID 1 on RedHat Linux EL4 on two seperate
> > machines provided by two different hosting companies.
> > http://www.infoconinc.com/test/bonnie++.html
>
> I don't know what's going on with your www-september-06 machine, but the
> other two are giving 32-40MB/s writes and 53-68MB/s reads. For a RAID-1
> volume, these aren't awful numbers, but I agree they're not great.
>
> My results are no better. For your comparison, here's a snippet of
> bonnie++ results from one of my servers: RHEL 4, P4 3GHz, MegaRAID
> firmware 1L37, write-thru cache setup, RAID 1; I think the drives are 10K
> RPM Seagate Cheetahs. This is from the end of the drive where performance
> is the worst (I partitioned the important stuff at the beginning where
> it's fastest and don't have enough free space to run bonnie there):
>
> ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> 20708 50 21473 9 9603 3 34419 72 55799 7 467.1 1
>
> 21Mb/s writes, 56MB/s reads. Not too different from yours (especially if
> your results were from the beginning of the disk), and certainly nothing
> special. I might be able to tune the write performance higher if I cared;
> the battery backed cache sits unused and everything is tuned for paranoia
> rather than performance. On this machine it doesn't matter.
>
> The thing is, even though it's rarely the top performing card even when
> setup perfectly, the LSI SCSI Megaraid just works. The driver is stable,
> caching behavior is well defined, it's a pleasure to administer. I'm
> never concerned that it's lying to me or doing anything to put data at
> risk. The command-line tools for Linux work perfectly, let me look at or
> control whatever I want, and it was straighforward for me to make my own
> customized monitoring script using them.
>
> > LSI MegaRAID has proved to be a bit of a disapointment. I have seen
> > better numbers from the HP SmartArray 6i, and from 3ware cards with
> > 7200RPM SATA drives.
>
> Whereas although I use 7200RPM SATA drives, I always try to keep an eye on
> them because I never really trust them. The performance list archives
> here also have plenty of comments about people having issues with the
> SmartArray controllers; search the archives for "cciss" and you'll see
> what I'm talking about.
>
> The Megaraid controller is very boring. That's why I like it. As a Linux
> distribution, RedHat has similar characteristics. If I were going for a
> performance setup, I'd dump that, too, for something sexier with a newish
> kernel. It all depends on which side of the performance/stability
> tradeoff you're aiming at.
>
> On Mon, 4 Dec 2006, Scott Marlowe wrote:
> > Does RHEL4 have the megaraid 2 driver?
>
> This is from the moderately current RHEL4 installation I had results from
> above. Redhat has probably done a kernel rev since I last updated back in
> September, haven't needed or wanted to reboot since then:
>
> megaraid cmm: 2.20.2.6 (Release Date: Mon Mar 7 00:01:03 EST 2005)
> megaraid: 2.20.4.6-rh2 (Release Date: Wed Jun 28 12:27:22 EST 2006)
>
> --
> * Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>


From: Michael Stone <mstone+postgres(at)mathom(dot)us>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 12:15:09
Message-ID: 20061205121507.GZ1622@mathom.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Dec 05, 2006 at 01:21:38AM -0500, Alex Turner wrote:
>My other and most important point is that I can't find any solid
>recommendations for a SCSI card that can perform optimally in Linux or
>*BSD. Off by a factor of 3x is pretty sad IMHO. (and yes, we know the
>Adaptec cards suck worse, that doesn't bring us to a _good_ card).

This gets back to my point about terminology. As a SCSI HBA the Adaptec
is decent: I can sustain about 300MB/s off a single channel of the
39320A using an external RAID controller. As a RAID controller I can't
even imagine using the Adaptec; I'm fairly certain they put that
"functionality" on there just so they could charge more for the card. It
may be that there's not much market for on-board SCSI RAID controllers;
between SATA on the low end and SAS & FC on the high end, there isn't a
whole lotta space left for SCSI. I definitely don't think much
R&D is going into SCSI controllers any more, compared to other solutions
like SATA or SAS RAID (the 39320 hasn't change in at least 3 years,
IIRC). Anyway, since the Adaptec part is a decent SCSI controller and a
lousy RAID controller, have you tried just using software RAID?

Mike Stone


From: "Alex Turner" <armtuk(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 12:57:43
Message-ID: 33c6269f0612050457s4a8eb0eas662d7a50fb5dfa89@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

The problem I see with software raid is the issue of a battery backed unit:
If the computer loses power, then the 'cache' which is held in system
memory, goes away, and fubars your RAID.

Alex

On 12/5/06, Michael Stone <mstone+postgres(at)mathom(dot)us> wrote:
>
> On Tue, Dec 05, 2006 at 01:21:38AM -0500, Alex Turner wrote:
> >My other and most important point is that I can't find any solid
> >recommendations for a SCSI card that can perform optimally in Linux or
> >*BSD. Off by a factor of 3x is pretty sad IMHO. (and yes, we know the
> >Adaptec cards suck worse, that doesn't bring us to a _good_ card).
>
> This gets back to my point about terminology. As a SCSI HBA the Adaptec
> is decent: I can sustain about 300MB/s off a single channel of the
> 39320A using an external RAID controller. As a RAID controller I can't
> even imagine using the Adaptec; I'm fairly certain they put that
> "functionality" on there just so they could charge more for the card. It
> may be that there's not much market for on-board SCSI RAID controllers;
> between SATA on the low end and SAS & FC on the high end, there isn't a
> whole lotta space left for SCSI. I definitely don't think much
> R&D is going into SCSI controllers any more, compared to other solutions
> like SATA or SAS RAID (the 39320 hasn't change in at least 3 years,
> IIRC). Anyway, since the Adaptec part is a decent SCSI controller and a
> lousy RAID controller, have you tried just using software RAID?
>
> Mike Stone
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


From: "Craig A(dot) James" <cjames(at)modgraph-usa(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 14:46:34
Message-ID: 4575864A.8060005@modgraph-usa.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

Alex Turner wrote:
> The problem I see with software raid is the issue of a battery backed
> unit: If the computer loses power, then the 'cache' which is held in
> system memory, goes away, and fubars your RAID.

I'm not sure I see the difference. If data are cached, they're not written whether it is software or hardware RAID. I guess if you're writing RAID 1, the N disks could be out of sync, but the system can synchronize them once the array is restored, so that's no different than a single disk or a hardware RAID. If you're writing RAID 5, then the blocks are inherently error detecting/correcting, so you're still OK if a partial write occurs, right?

I'm not familiar with the inner details of software RAID, but the only circumstance I can see where things would get corrupted is if the RAID driver writes a LOT of blocks to one disk of the array before synchronizing the others, but my guess (and it's just a guess) is that the writes to the N disks are tightly coupled.

If I'm wrong about this, I'd like to know, because I'm using software RAID 1 and 1+0, and I'm pretty happy with it.

Craig


From: Michael Stone <mstone+postgres(at)mathom(dot)us>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-05 14:49:41
Message-ID: 20061205144939.GA1622@mathom.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, Dec 05, 2006 at 07:57:43AM -0500, Alex Turner wrote:
>The problem I see with software raid is the issue of a battery backed unit:
>If the computer loses power, then the 'cache' which is held in system
>memory, goes away, and fubars your RAID.

Since the Adaptec doesn't have a BBU, it's a lateral move. Also, this is
less an issue of data integrity than performance; you can get exactly
the same level of integrity, you just have to wait for the data to sync
to disk. If you're read-mostly that's irrelevant.

Mike Stone


From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-06 04:54:58
Message-ID: Pine.GSO.4.64.0612052327080.7579@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Tue, 5 Dec 2006, Craig A. James wrote:

> I'm not familiar with the inner details of software RAID, but the only
> circumstance I can see where things would get corrupted is if the RAID driver
> writes a LOT of blocks to one disk of the array before synchronizing the
> others...

You're talking about whether the discs in the RAID are kept consistant.
While it's helpful with that, too, that's not the main reason a the
battery-backed cache is so helpful. When PostgreSQL writes to the WAL, it
waits until that data has really been placed on the drive before it enters
that update into the database. In a normal situation, that means that you
have to pause until the disk has physically written the blocks out, and
that puts a fairly low upper limit on write performance that's based on
how fast your drives rotate. RAID 0, RAID 1, none of that will speed up
the time it takes to complete a single synchronized WAL write.

When your controller has a battery-backed cache, it can immediately tell
Postgres that the WAL write completed succesfully, while actually putting
it on the disk later. On my systems, this results in simple writes going
2-4X as fast as they do without a cache. Should there be a PC failure, as
long as power is restored before the battery runs out that transaction
will be preserved.

What Alex is rightly pointing out is that a software RAID approach doesn't
have this feature. In fact, in this area performance can be even worse
under SW RAID than what you get from a single disk, because you may have
to wait for multiple discs to spin to the correct position and write data
out before you can consider the transaction complete.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD


From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Bad iostat numbers
Date: 2006-12-06 16:19:18
Message-ID: B2BB633E-3C15-4011-A390-D76112488D00@blighty.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance


On Dec 5, 2006, at 8:54 PM, Greg Smith wrote:

> On Tue, 5 Dec 2006, Craig A. James wrote:
>
>> I'm not familiar with the inner details of software RAID, but the
>> only circumstance I can see where things would get corrupted is if
>> the RAID driver writes a LOT of blocks to one disk of the array
>> before synchronizing the others...
>
> You're talking about whether the discs in the RAID are kept
> consistant. While it's helpful with that, too, that's not the main
> reason a the battery-backed cache is so helpful. When PostgreSQL
> writes to the WAL, it waits until that data has really been placed
> on the drive before it enters that update into the database. In a
> normal situation, that means that you have to pause until the disk
> has physically written the blocks out, and that puts a fairly low
> upper limit on write performance that's based on how fast your
> drives rotate. RAID 0, RAID 1, none of that will speed up the time
> it takes to complete a single synchronized WAL write.
>
> When your controller has a battery-backed cache, it can immediately
> tell Postgres that the WAL write completed succesfully, while
> actually putting it on the disk later. On my systems, this results
> in simple writes going 2-4X as fast as they do without a cache.
> Should there be a PC failure, as long as power is restored before
> the battery runs out that transaction will be preserved.
>
> What Alex is rightly pointing out is that a software RAID approach
> doesn't have this feature. In fact, in this area performance can
> be even worse under SW RAID than what you get from a single disk,
> because you may have to wait for multiple discs to spin to the
> correct position and write data out before you can consider the
> transaction complete.

So... the ideal might be a RAID1 controller with BBU for the WAL and
something else, such as software RAID, for the main data array?

Cheers,
Steve