SSDs with Postgresql?

Lists: pgsql-general
From: Benjamin Smith <lists(at)benjamindsmith(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: SSDs with Postgresql?
Date: 2011-04-14 04:19:55
Message-ID: 201104132119.55264.lists@benjamindsmith.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

The speed benefits of SSDs as benchmarked would seem incredible. Can anybody
comment on SSD benefits and problems in real life use?

I maintain some 100 databases on 3 servers, with 32 GB of RAM each and an
extremely rich, complex schema. (300+ normalized tables)

I was wondering if anybody here could comment on the benefits of SSD in
similar, high-demand rich schema situations?

-Ben

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 06:17:49
Message-ID: 4DA6918D.5030804@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/13/11 9:19 PM, Benjamin Smith wrote:
>
> The speed benefits of SSDs as benchmarked would seem incredible. Can
> anybody comment on SSD benefits and problems in real life use?
>
>
> I maintain some 100 databases on 3 servers, with 32 GB of RAM each and
> an extremely rich, complex schema. (300+ normalized tables)
>
>
> I was wondering if anybody here could comment on the benefits of SSD
> in similar, high-demand rich schema situations?
>
>

consumer grade MLC SSD's will crash and burn in short order under a
heavy transactional workload characterized by sustained small block
random writes.

The enterprise grade SLC SSDs' will perform very nicely, but they are
very very expensive, and found in high end enterprise database servers
like Oracle's Exadata machines.


From: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
To: "Benjamin Smith" <lists(at)benjamindsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 08:35:03
Message-ID: 8b80298d9d78e9f3da384cca030949b4.squirrel@zenmail.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Thu, April 14, 2011 06:19, Benjamin Smith wrote:
> The speed benefits of SSDs as benchmarked would seem incredible. Can anybody
> comment on SSD benefits and problems in real life use?
>
> I maintain some 100 databases on 3 servers, with 32 GB of RAM each and an
> extremely rich, complex schema. (300+ normalized tables)
>
> I was wondering if anybody here could comment on the benefits of SSD in
> similar, high-demand rich schema situations?

Even if you only use SSDs for your indexes, the gains are staggering. We use
them on several servers, one of which is extremely busy (xid wraparound stuff)
and the performance gains are game-changing.

There is no going back. Hint: don't use cheap SSDs - cough up and use Intel.

Cheers
Henry


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: Benjamin Smith <lists(at)benjamindsmith(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 08:51:22
Message-ID: 4DA6B58A.6040000@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 14/04/2011 4:35 PM, Henry C. wrote:

> There is no going back. Hint: don't use cheap SSDs - cough up and use Intel.

The server-grade SLC stuff with a supercap, I hope, not the scary
consumer-oriented MLC "pray you weren't writing anything during
power-loss" devices?

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: John R Pierce <pierce(at)hogranch(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 08:54:48
Message-ID: 4DA6B658.1060506@hogranch.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/14/11 1:35 AM, Henry C. wrote:
> Hint: don't use cheap SSDs - cough up and use Intel.

aren't most of the Intel SSD's still MLC, and still have performance and
reliability issues with sustained small block random writes such as are
generated by database servers? the enterprise grade SLC SSD drives are
things like STEC ZeusIOPS and Seagate Pulsar. and the majority of them
end up in EMC and other big iron SAN systems.


From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 09:30:06
Message-ID: 786754.16958.qm@web29013.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

have a look at

http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td4268261.html

It looks like those are "safe" to use with a db, and aren't that expensive.


From: Arnaud Lesauvage <arnaud(dot)listes(at)codata(dot)eu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 09:37:39
Message-ID: 4DA6C063.1060904@codata.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le 14/04/2011 10:54, John R Pierce a écrit :
> On 04/14/11 1:35 AM, Henry C. wrote:
>> Hint: don't use cheap SSDs - cough up and use Intel.
>
> aren't most of the Intel SSD's still MLC, and still have performance and
> reliability issues with sustained small block random writes such as are
> generated by database servers? the enterprise grade SLC SSD drives are
> things like STEC ZeusIOPS and Seagate Pulsar. and the majority of them
> end up in EMC and other big iron SAN systems.

I thnik Henry is referring to Intel's X25-E line. They are SLC,
enterprise grade.
Quite expensive though, ~700 euros for the 64GB version.
We have one of them in a production server (light load), it works very
well so far.
Performance gain versus WD Raptor RAID array is huge. I never tried to
quantify it.

Arnaud


From: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
To: "Craig Ringer" <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 09:40:57
Message-ID: 2d9a4e2e660ba58bd1ea4039aedb2d43.squirrel@zenmail.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Thu, April 14, 2011 10:51, Craig Ringer wrote:
> On 14/04/2011 4:35 PM, Henry C. wrote:
>
>
>> There is no going back. Hint: don't use cheap SSDs - cough up and use
>> Intel.
>>
>
> The server-grade SLC stuff with a supercap, I hope, not the scary
> consumer-oriented MLC "pray you weren't writing anything during power-loss"
> devices?

That's what a UPS and genset are for. Who writes critical stuff to *any*
drive without power backup?

You have a valid point about using SLC if that's what you need though.
However, MLC works just fine provided you stick them into RAID1. In fact, we
use a bunch of them in RAID0 on top of RAID1.

In our environment (clusters) it's all about using cheap consumer-grade
commodity hardware with lots of redundancy to cater for the inevitable
failures. The trade-off is huge: performance with low cost.

We've been using MLC intel drives since they came out and have never had a
failure. Other SSDs we've tried have failed, and so have hard drives. The
point though, is that there are tremendous performance gains to be had with
commodity h/w if you factor in failure rates and make *sure* you have
redundancy.

h


From: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
To: "Leonardo Francalanci" <m_lists(at)yahoo(dot)it>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 09:46:12
Message-ID: cf3c990934aa74b49fe9077147dc06fb.squirrel@zenmail.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, April 14, 2011 11:30, Leonardo Francalanci wrote:
> have a look at
>
> http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426826
> 1.html
>
>
>
> It looks like those are "safe" to use with a db, and aren't that expensive.

The new SSDs look great. From our experience, we trust SSDs (even MLC) far
more than mechanical hard drives.

I believe this perception that SSDs are less "safe" than failure-prone
mechanical hard drives will eventually change.

In the meantime, we've embraced them and the advantages are compelling.

h


From: Arnaud Lesauvage <arnaud(dot)listes(at)codata(dot)eu>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 09:58:43
Message-ID: 4DA6C553.2020702@codata.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Le 14/04/2011 11:40, Henry C. a écrit :
> You have a valid point about using SLC if that's what you need though.
> However, MLC works just fine provided you stick them into RAID1. In fact, we
> use a bunch of them in RAID0 on top of RAID1.

AFAIK, you won't have TRIM support on RAID-arrayed SSDs.
That might change soon, but I think that RAID boards supporting TRIM are
still a work in progress.

Arnaud


From: Radosław Smogura <rsmogura(at)softperience(dot)eu>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 10:07:24
Message-ID: f4f330b78c98300a55306153ef4fbfd7@mail.softperience.eu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 14 Apr 2011 11:46:12 +0200, Henry C. wrote:
> On Thu, April 14, 2011 11:30, Leonardo Francalanci wrote:
>> have a look at
>>
>>
>> http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426826
>> 1.html
>>
>>
>>
>> It looks like those are "safe" to use with a db, and aren't that
>> expensive.
>
> The new SSDs look great. From our experience, we trust SSDs (even
> MLC) far
> more than mechanical hard drives.
>
> I believe this perception that SSDs are less "safe" than
> failure-prone
> mechanical hard drives will eventually change.
>
> In the meantime, we've embraced them and the advantages are
> compelling.
>
> h

One thing you should care about is such called write endurance - number
of writes to one memory region before it will be destroyed - if your SSD
driver do not have transparent allocation, then you may destroy it
really fast, because write of each "block" will be in same memory
segment, clog/xlog may be failed with 10k-100k writes. But if your SSD
has transparent allocation, then internal controller will count your
writes to given memory cell, and when lifetime of this cell will be at
the end, it will "associate" block with different cell. With transparent
allocation, You may sometimes do not fear if system uses journaling, you
store logs there on any kind of often updatable data. You may calculate
life time of your SSD with:
WritesToDestroyCells = "write_endurance" * "disk_size"
AvgLifeTime = WritesToDestroyCells / writes_per_sec

Those are high numbers, even with simply disks as 10.000 * 60GB, means
you need to send 600TB of data to one SSD (not completely true, as you
can't send one byte, but full blocks) . Ofc, In order to extend life
time of SSD you should provide file systems cache, or SSD with cache, as
well turn off FS journaling.

Regards,
Radek


From: Leonardo Francalanci <m_lists(at)yahoo(dot)it>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 10:39:21
Message-ID: 813628.54999.qm@web29019.mail.ird.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> I believe this perception that SSDs are less "safe" than failure-prone
> mechanical hard drives will eventually change.

By "safe" I mean they won't corrupt data in case of crash of the machine.


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 10:48:46
Message-ID: 4DA6D10E.3020005@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 14/04/2011 5:40 PM, Henry C. wrote:

>> The server-grade SLC stuff with a supercap, I hope, not the scary
>> consumer-oriented MLC "pray you weren't writing anything during power-loss"
>> devices?
>
> That's what a UPS and genset are for. Who writes critical stuff to *any*
> drive without power backup?

Even a server with redundant PSUs on a UPS backed by generators can go
down hard and unexpectedly. I'd be extremely nervous unless I could
afford to lose data since the last backup, or unless I had a really
trustworthy replication setup going.

Of course, it's wise to have one or both of those conditions be true
anyway, because no redundant storage system will save you from file
system corruption caused by an OS bug, data corruption caused by a Pg
bug, or a "DELETE FROM critical_table;" by a careless superuser. So I
guess it doesn't cost you more than the risk of some downtime to use
potentially corruption-prone non-supercap MLC, and it's probably worth
it for the performance in your clustered environment.

All I meant with my post was to raise the concern that the OP needs to
be aware of the untrustworthy nature of even the low-end Intel SSDs.
They're usable, you just have to compensate for their deficiencies.

> You have a valid point about using SLC if that's what you need though.
> However, MLC works just fine provided you stick them into RAID1. In fact, we
> use a bunch of them in RAID0 on top of RAID1.

RAID won't help you if they all drop their caches if the power supply
throws a wobbly. That said, it's certainly good for the lifetime issues
especially if the units are upgraded or rotated out in phases.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Vick Khera <vivek(at)khera(dot)org>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 13:25:06
Message-ID: BANLkTinCSMQvdq5g=-a3wsq7rEFyhRCVtA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 12:19 AM, Benjamin Smith
<lists(at)benjamindsmith(dot)com>wrote:

> I was wondering if anybody here could comment on the benefits of SSD in
> similar, high-demand rich schema situations?
>
>
For the last several months, I've been using Texas Memory Systems RamSAN 620
drives on my main DB servers. Having near zero seek times has been a
tremendous boon to our performance, and will have pretty much paid for
themselves within the next couple of months. Ie, the "throw hardware at it"
solution worked really well :)


From: Lincoln Yeoh <lyeoh(at)pop(dot)jaring(dot)my>
To: Radosław Smogura <rsmogura(at)softperience(dot)eu>, "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>,<pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 14:36:51
Message-ID: 20110414145207.41D291337BCF@mail.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

At 06:07 PM 4/14/2011, Radosław Smogura wrote:

>One thing you should care about is such called
>write endurance - number of writes to one memory
>region before it will be destroyed - if your SSD
>driver do not have transparent allocation, then
>you may destroy it really fast, because write of
>each "block" will be in same memory segment,
>clog/xlog may be failed with 10k-100k writes.
>But if your SSD has transparent allocation, then
>internal controller will count your writes to
>given memory cell, and when lifetime of this
>cell will be at the end, it will "associate"
>block with different cell. With transparent
>allocation, You may sometimes do not fear if
>system uses journaling, you store logs there on
>any kind of often updatable data. You may calculate life time of your SSD with:
>WritesToDestroyCells = "write_endurance" * "disk_size"
>AvgLifeTime = WritesToDestroyCells / writes_per_sec
>
>Those are high numbers, even with simply disks
>as 10.000 * 60GB, means you need to send 600TB
>of data to one SSD (not completely true, as you
>can't send one byte, but full blocks) . Ofc, In
>order to extend life time of SSD you should
>provide file systems cache, or SSD with cache, as well turn off FS journaling.

I'm not an expert on SSDs, but I believe modern
SSDs are supposed to automatically spread the
writes across the entire disk where possible -
even to the extent of moving already written stuff.

So if the drives are full or near full, the
tradeoff is between lower performance (because
you have to keep moving stuff about) or lower
lifespan (one area gets overused).

If the drives are mostly empty the SSD's
controller has an easier job - it doesn't have to move as much data around.

Regards,
Link.


From: Benjamin Smith <lists(at)benjamindsmith(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 16:56:12
Message-ID: 201104140956.13059.lists@benjamindsmith.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

After a glowing review at AnandTech (including DB benchmarks!) I decided to
spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost about $300
with shipping, etc and at this point, won't be putting any

Considering that I sprang for 96 GB of ECC RAM last spring for around $5000,
even if I put the OCX drives in pairs w/RAID1, I'd still come out well ahead
if it allows me to put off buying more servers for a year or two.

-Ben

On Thursday, April 14, 2011 02:30:06 AM Leonardo Francalanci wrote:
> have a look at
>
> http://postgresql.1045698.n5.nabble.com/Intel-SSDs-that-may-not-suck-td426
> 8261.html
>
>
> It looks like those are "safe" to use with a db, and aren't that expensive.

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 18:27:34
Message-ID: BANLkTimUtTmfSqY+r+2Lty8ErQUsHQLR0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 3:40 AM, Henry C. <henka(at)cityweb(dot)co(dot)za> wrote:
>
> On Thu, April 14, 2011 10:51, Craig Ringer wrote:
>> On 14/04/2011 4:35 PM, Henry C. wrote:
>>
>>
>>> There is no going back.  Hint: don't use cheap SSDs - cough up and use
>>> Intel.
>>>
>>
>> The server-grade SLC stuff with a supercap, I hope, not the scary
>> consumer-oriented MLC "pray you weren't writing anything during power-loss"
>> devices?
>
> That's what a UPS and genset are for.  Who writes critical stuff to *any*
> drive without power backup?

Because power supply systems with UPS never fail.

(hint, I've seen them fail, more than once)


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 18:54:29
Message-ID: 20110414185428.GT24471@shinkuro.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 12:27:34PM -0600, Scott Marlowe wrote:
> > That's what a UPS and genset are for.  Who writes critical stuff to *any*
> > drive without power backup?
>
> Because power supply systems with UPS never fail.

Right, there's obviously a trade-off here. Some of this has to do
with how much your data is worth vs. how much the speed is worth.
There's also the issue of whether you can stand to lose a few rows,
and whether you can stand to lose them for a short time. For
instance, collecting user comments might be a matter of great value,
but if you write them to more than one system, you might not care
whether one of the systems fails briefly. In that case, maybe big
redundancy of cheap disks with power backup is good enough to meet the
price:value ratio. On stock trades worth maybe millions of dollars,
not so much: you miss your teeny window of opportunity to do a trade
and suddenly you're out in the street wearing a barrel.

I can think of lots of different points to be along that continuum,
and surely nobody is suggesting that there is one right answer for
everything.

A

--
Andrew Sullivan
ajs(at)crankycanuck(dot)ca


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
Cc: Leonardo Francalanci <m_lists(at)yahoo(dot)it>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 19:14:11
Message-ID: 4DA74783.4010804@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Henry C. wrote:
> I believe this perception that SSDs are less "safe" than failure-prone
> mechanical hard drives will eventually change.
>

Only because the manufacturers are starting to care about write
durability enough to include the right hardware for it. Many of them
are less safe right now on some common database tasks. Intel's gen 1
and gen 2 drives are garbage for database use. I've had customers lose
terabytes of data due to them. Yes, every system can fail, but these
*will* fail and corrupt your database the first time there's a serious
power problem of some sort. And the idea that a UPS is sufficient to
protect against that even happening in wildly optimistic.

See http://wiki.postgresql.org/wiki/Reliable_Writes for more background
here, and links to reading on the older Intel drives. I summarized the
situation with their newer 320 series drives at
http://blog.2ndquadrant.com/en/2011/04/intel-ssd-now-off-the-sherr-sh.html
Those finally get the write flushing right. But the random seeks IOPS
is wildly lower than you might expect on read/write workloads. My own
tests and other sources have all come up with around 3500 IOPS as being
a real-world expectation for the larger sizes of these drives. Also, it
is cheap flash, so durability in a server environment won't be great.
Don't put your WAL on them if you have a high transaction rate. Put
some indexes there instead.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>, Leonardo Francalanci <m_lists(at)yahoo(dot)it>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-14 20:42:24
Message-ID: BANLkTimVu2-KKafntgYnvm8LSH7BXht8QA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 1:14 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:

>  And the idea that a UPS is sufficient to protect against that even happening in
> wildly optimistic.

Note that the real danger in relying on a UPS is that most power
conditioning / UPS setups tend to fail in total, not in parts. The
two times I've seen it happen, the whole grid shut down completely for
a few hours. The first time we had Oracle, Ingress, Sybase,
SQL-Server, etc. etc. database server across the company corrupted.
DAYS of recovery time, and since they all failed at once, the machines
in replication got corrupted as well. Restoring production dbs from
backups took days.

The only machine to survive was the corporate intranet running pgsql
on twin 15k SCSI drives with a proven reliable battery backed
controller on it. It was mine. This was a company that lost
something like $10k a minute for downtime. And the downtime was
measured not in seconds, minutes or hours, but days because everyone
had said the same thing, "The UPS and power conditioners make power
plug pull survivability a non issue." When the only machine with an
uncorrupted database is the corporate intranet server the 24/7
production guys look pretty stupid. They also suddenly decided to
start doing power plug pull tests on all database servers.

To make matters worse, the kind of system to NEED the higher
throughput from SSDs is likely the kind of system to be the worst kind
to suffer downtime due to corruption. OTOH, restores from backups
should run pretty fast. :)


From: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
To: "Benjamin Smith" <lists(at)benjamindsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-17 08:55:02
Message-ID: 23515578878a3df874d891f56ccbe423.squirrel@zenmail.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Thu, April 14, 2011 18:56, Benjamin Smith wrote:
> After a glowing review at AnandTech (including DB benchmarks!) I decided to
> spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost about $300
> with shipping, etc and at this point, won't be putting any
>
> Considering that I sprang for 96 GB of ECC RAM last spring for around $5000,
> even if I put the OCX drives in pairs w/RAID1, I'd still come out well ahead if
> it allows me to put off buying more servers for a year or two.

Exactly. Be aware of the risks, plan for failure and reap the rewards.


From: "Henry C(dot)" <henka(at)cityweb(dot)co(dot)za>
To: "Andrew Sullivan" <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-17 09:17:18
Message-ID: 966a942b7d8f2b4daed52b30f7996a4e.squirrel@zenmail.co.za
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, April 14, 2011 20:54, Andrew Sullivan wrote:
> On Thu, Apr 14, 2011 at 12:27:34PM -0600, Scott Marlowe wrote:
>
>>> That's what a UPS and genset are for.  Who writes critical stuff to
>>> *any*
>>> drive without power backup?
>>
>> Because power supply systems with UPS never fail.
>>
>
> Right, there's obviously a trade-off here. Some of this has to do
> with how much your data is worth vs. how much the speed is worth. There's also
> the issue of whether you can stand to lose a few rows, and whether you can
> stand to lose them for a short time. For instance, collecting user comments
> might be a matter of great value, but if you write them to more than one
> system, you might not care whether one of the systems fails briefly. In that
> case, maybe big redundancy of cheap disks with power backup is good enough to
> meet the price:value ratio. On stock trades worth maybe millions of dollars,
> not so much: you miss your teeny window of opportunity to do a trade and
> suddenly you're out in the street wearing a barrel.
>
> I can think of lots of different points to be along that continuum,
> and surely nobody is suggesting that there is one right answer for everything.

Exactly. To address the issue of something going wrong (it will, given enough
time), we recently started trying out Pg's built-in replication (we were using
Londiste - nothing wrong with it, but life is simpler now that the Pg
overlords have realised what everyone has known for a long time).

When the super-fast SSD-based machine fails, switching to a (slower) standard
hard-drive based machine provides continuity and buys time while we get the
primary machine back online.

For us, life is better, thanks to SSDs.

h


From: Vick Khera <vivek(at)khera(dot)org>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-18 18:01:59
Message-ID: BANLkTi=r4SMsHw7uDcZO0YYiboD47pBWRQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sun, Apr 17, 2011 at 5:17 AM, Henry C. <henka(at)cityweb(dot)co(dot)za> wrote:

> When the super-fast SSD-based machine fails, switching to a (slower)
> standard
> hard-drive based machine provides continuity and buys time while we get the
> primary machine back online.
>

The question you should ask yourself is: can my business run at needed
capacity on the slower spinning media? Our answer was "no", so we ended up
with a second SSD array for the backup server as well. It was *still*
totally worth the metric gazillion dollars that they cost.


From: Benjamin Smith <lists(at)benjamindsmith(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-19 17:07:52
Message-ID: 201104191007.52159.lists@benjamindsmith.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Sunday, April 17, 2011 01:55:02 AM Henry C. wrote:
> On Thu, April 14, 2011 18:56, Benjamin Smith wrote:
> > After a glowing review at AnandTech (including DB benchmarks!) I decided
> > to spring for an OCX Vertex 3 Pro 120 for evaluation purposes. It cost
> > about $300
> >
> > with shipping, etc and at this point, won't be putting any
> >
> > Considering that I sprang for 96 GB of ECC RAM last spring for around
> > $5000, even if I put the OCX drives in pairs w/RAID1, I'd still come out
> > well ahead if it allows me to put off buying more servers for a year or
> > two.
>
> Exactly. Be aware of the risks, plan for failure and reap the rewards.

Just curious what your thoughts are with respect to buying SSDs and mirroring
them with software RAID 1. (I use Linux/CentOS)

-Ben

--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Benjamin Smith <lists(at)benjamindsmith(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-19 18:28:50
Message-ID: 4DADD462.20607@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 2011-04-19 19:07, Benjamin Smith wrote:
>
> On Sunday, April 17, 2011 01:55:02 AM Henry C. wrote:
>
> >
>
> > Exactly. Be aware of the risks, plan for failure and reap the rewards.
>
>
> Just curious what your thoughts are with respect to buying SSDs and
> mirroring them with software RAID 1. (I use Linux/CentOS)
>

Since SSD fail when the write cycles are gone, it wouldn't make sense to
buy two identical ones and put them in a RAID 1: under normal
circumstances they'd fail simultanously. An idea I'm thinking of is to
put both a OCZ Vertex 2 Pro and Intel 320 in a software RAID 1 setup. It
would have the benefit that there are no complications to be expected
with an extra device layer from a hardware RAID card, such as
incompatibilites between controller and SSD firmware, or not being able
to access the physical disk's smart values, which in the SSD case are
important to figure out the remaining lifetime. Also since both drives
have a supercap, PostgreSQL data on it would survive power failures. It
would be interesting if md could be configured to do reads on both
mirror legs and compare them before returning values, like a continual
check, but there doesn't seem to be a option for something like that.

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-20 05:50:12
Message-ID: 4DAE7414.40503@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 20/04/11 04:28, Yeb Havinga wrote:
> On 2011-04-19 19:07, Benjamin Smith wrote:
>>
>> On Sunday, April 17, 2011 01:55:02 AM Henry C. wrote:
>>
>> >
>>
>> > Exactly. Be aware of the risks, plan for failure and reap the rewards.
>>
>>
>> Just curious what your thoughts are with respect to buying SSDs and
>> mirroring them with software RAID 1. (I use Linux/CentOS)
>>
>
> Since SSD fail when the write cycles are gone, it wouldn't make sense to
> buy two identical ones and put them in a RAID 1: under normal
> circumstances they'd fail simultanously.

I don't think SSD drives wear out in quite the manner you seem to
describe. The wearing out of blocks is not something that occurs at an
exact number; the quoted amounts are more of an average, and will vary
from block to block and from drive to drive.

Also, all decent SSD drives will remap bad blocks as they wear out, so
you don't get just one and then die.

Also, the number of erase cycles you can get, over the whole disk, is
quite large on modern disks!

So large that you'll probably go decades before you wear the disk out,
even with continual writes.

Don't buy into the SSD FUD myths..

Cheers,
Toby


From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-20 06:15:19
Message-ID: 4DAE79F7.7010503@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 14/04/11 23:25, Vick Khera wrote:
> On Thu, Apr 14, 2011 at 12:19 AM, Benjamin Smith
> <lists(at)benjamindsmith(dot)com <mailto:lists(at)benjamindsmith(dot)com>> wrote:
>
> I was wondering if anybody here could comment on the benefits of SSD
> in similar, high-demand rich schema situations?
>
>
> For the last several months, I've been using Texas Memory Systems RamSAN
> 620 drives on my main DB servers. Having near zero seek times has been
> a tremendous boon to our performance, and will have pretty much paid for
> themselves within the next couple of months. Ie, the "throw hardware at
> it" solution worked really well :)

hey, I wonder - could you, or someone else with some SSD drives running
their DBs in production - check the SMART attributes for their drives?

In particular, the Media_Wearout_Indicator - this starts at 100 and goes
down towards 1 as the erase cycles add up..

So you can calculate the total estimated lifetime by looking at how much
has been used up over how long you've been using the drive in production.

I have a very cheap 64GB consumer SSD used in a personal server (so not
in serious production use, but it does see some traffic), and I note
that after a year it's still on 100%!

Toby


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 14:49:28
Message-ID: 4DB043F8.7070000@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/20/2011 01:50 AM, Toby Corkindale wrote:
> Also, the number of erase cycles you can get, over the whole disk, is
> quite large on modern disks!
>
> So large that you'll probably go decades before you wear the disk out,
> even with continual writes.
>
> Don't buy into the SSD FUD myths..

There is no FUD being spread here. Particularly given the PostgreSQL
WAL write pattern, it's not impossible to wear out a SSD placed there in
a small number of years. A system with a trivial but not completely
idle workload will generate one 16MB WAL segment every 5 minutes, which
works out to 4.5GB/day of writes. That's the baseline--the reality is
much, much higher than that on most systems. The fact that every row
update can temporarily use more than 8K means that actual write
throughput on the WAL can be shockingly large. The smallest customer I
work with regularly has a 50GB database, yet they write 20GB of WAL
every day. You can imagine how much WAL is generated daily on systems
with terabyte databases.

As for what this translates into in the real world, go read
http://archives.postgresql.org/message-id/BANLkTi=GsyBfq+ApWPR_qCA7AN+NqT=zww@mail.gmail.com
as one worked out sample. Anyone deploying PostgreSQL onto MLC can't
necessarily ignore this issue.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 15:33:45
Message-ID: 82fwpbaaau.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

* Greg Smith:

> The fact that every row update can temporarily use more than 8K means
> that actual write throughput on the WAL can be shockingly large. The
> smallest customer I work with regularly has a 50GB database, yet they
> write 20GB of WAL every day. You can imagine how much WAL is
> generated daily on systems with terabyte databases.

Interesting. Is there an easy way to monitor WAL traffic in away? It
does not have to be finegrained, but it might be helpful to know if
we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
database, should the question of SSDs ever come up.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Florian Weimer <fweimer(at)bfk(dot)de>, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 15:42:43
Message-ID: 201104210842.44007.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 21, 2011 8:33:45 am Florian Weimer wrote:
> * Greg Smith:
> > The fact that every row update can temporarily use more than 8K means
> > that actual write throughput on the WAL can be shockingly large. The
> > smallest customer I work with regularly has a 50GB database, yet they
> > write 20GB of WAL every day. You can imagine how much WAL is
> > generated daily on systems with terabyte databases.
>
> Interesting. Is there an easy way to monitor WAL traffic in away? It
> does not have to be finegrained, but it might be helpful to know if
> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
> database, should the question of SSDs ever come up.

They are found in $DATA/pg_xlog so checking the size of that directory regularly
would get you the information.

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 15:44:04
Message-ID: 82bozza9tn.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

* Adrian Klaver:

>> Interesting. Is there an easy way to monitor WAL traffic in away? It
>> does not have to be finegrained, but it might be helpful to know if
>> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
>> database, should the question of SSDs ever come up.
>
> They are found in $DATA/pg_xlog so checking the size of that
> directory regularly would get you the information.

But log files are recycled, so looking at the directory alone does not
seem particularly helpful.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: "Postgresql (General)" <pgsql-general(at)postgresql(dot)org>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 16:01:25
Message-ID: A3A17242-3176-43D1-837B-26E82ECF57E4@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Apr 21, 2011, at 9:44 AM, Florian Weimer wrote:

> But log files are recycled, so looking at the directory alone does not
> seem particularly helpful.

You have to look at the file timestamps. From that you can get an idea of traffic.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 16:10:40
Message-ID: 4DB05700.9090201@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/21/2011 11:33 AM, Florian Weimer wrote:
> Is there an easy way to monitor WAL traffic in away? It
> does not have to be finegrained, but it might be helpful to know if
> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
> database, should the question of SSDs ever come up.
>

You can use functions like pg_current_xlog_location() :
http://www.postgresql.org/docs/9.0/interactive/functions-admin.html

Save a copy of this periodically:

select now(),pg_current_xlog_location();

And you can see WAL volume over time given any two points from that set
of samples.

To convert the internal numbers returned by that into bytes, you'll need
to do some math on them. Examples showing how that works and code in a
few languages:

http://archives.postgresql.org/pgsql-general/2010-10/msg00077.php (by hand)
http://munin-monitoring.org/browser/trunk/plugins/node.d/postgres_streaming_.in?rev=3905
(in Perl)
http://archives.postgresql.org/pgsql-general/2010-10/msg00079.php (in C)
http://postgresql.1045698.n5.nabble.com/How-can-we-tell-how-far-behind-the-standby-is-td3252297.html
(in bash with bc(!), other links)

What I keep meaning to write is something that does that as part of the
SQL itself, so it gets pulled out of the database already in bytes.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 16:22:18
Message-ID: 24484.1303402938@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Florian Weimer <fweimer(at)bfk(dot)de> writes:
> * Adrian Klaver:
>>> Interesting. Is there an easy way to monitor WAL traffic in away?

>> They are found in $DATA/pg_xlog so checking the size of that
>> directory regularly would get you the information.

> But log files are recycled, so looking at the directory alone does not
> seem particularly helpful.

"du" would be useless, but you could check the name of the newest WAL
segment file from time to time, and do a bit of math to see how much
WAL had been written since the previous time.

regards, tom lane


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Florian Weimer <fweimer(at)bfk(dot)de>, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, Greg Smith <greg(at)2ndquadrant(dot)com>
Subject: Re: SSDs with Postgresql?
Date: 2011-04-21 20:08:46
Message-ID: BANLkTinZmaVLUR-vu4qHphPn0p5-h7oHWA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 21, 2011 at 11:22 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Florian Weimer <fweimer(at)bfk(dot)de> writes:
>> * Adrian Klaver:
>>>> Interesting.  Is there an easy way to monitor WAL traffic in away?
>
>>> They are found in $DATA/pg_xlog so checking the size of that
>>> directory regularly would get you the information.
>
>> But log files are recycled, so looking at the directory alone does not
>> seem particularly helpful.
>
> "du" would be useless, but you could check the name of the newest WAL
> segment file from time to time, and do a bit of math to see how much
> WAL had been written since the previous time.

I'd think using sysstat packages sar is the way to see how much work
your drives are doing. Assuming the sysstat package / daemon is set
to monitor disk block activity.


From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-26 14:30:41
Message-ID: 1000688582.482.1303828241339.JavaMail.root@dmz03
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

----- Original Message -----
> From: "Greg Smith" <greg(at)2ndQuadrant(dot)com>
> To: pgsql-general(at)postgresql(dot)org
> Sent: Friday, 22 April, 2011 12:49:28 AM
> Subject: Re: [GENERAL] SSDs with Postgresql?
> On 04/20/2011 01:50 AM, Toby Corkindale wrote:
> > Also, the number of erase cycles you can get, over the whole disk,
> > is
> > quite large on modern disks!
> >
> > So large that you'll probably go decades before you wear the disk
> > out,
> > even with continual writes.
> >
> > Don't buy into the SSD FUD myths..
>
> There is no FUD being spread here. Particularly given the PostgreSQL
> WAL write pattern, it's not impossible to wear out a SSD placed there
> in
> a small number of years. A system with a trivial but not completely
> idle workload will generate one 16MB WAL segment every 5 minutes,
> which
> works out to 4.5GB/day of writes. That's the baseline--the reality is
> much, much higher than that on most systems. The fact that every row
> update can temporarily use more than 8K means that actual write
> throughput on the WAL can be shockingly large. The smallest customer I
> work with regularly has a 50GB database, yet they write 20GB of WAL
> every day. You can imagine how much WAL is generated daily on systems
> with terabyte databases.

The larger the database, the larger number of disks the DB will be spread over, thus keeping the amount of data written per-disk to a manageable amount. (Also, larger SSDs seem to have higher write-limits too)

I see Intel is/was claiming their SLC SSDs had a *minimum* lifetime of 2PB in writes for their 64GB disks; for your customer with a 50GB db and 20GB/day of WAL, that would work out at a minimum lifetime of a million days, or about 273 years!
The cheaper "consumer grade" MLC drives should still last minimum 5 years at 20GB/day according to their literature. (And what I found was fairly out of date)
That doesn't seem too bad to me - I don't think I've worked anywhere that keeps their traditional spinning disks in service beyond 5 years either.

> As for what this translates into in the real world, go read
> http://archives.postgresql.org/message-id/BANLkTi=GsyBfq+ApWPR_qCA7AN+NqT=zww@mail.gmail.com
> as one worked out sample. Anyone deploying PostgreSQL onto MLC can't
> necessarily ignore this issue.

Agreed, it shouldn't be ignored, but it seems like you have to have an early-generation SSD and combine that with very heavy write loads in order to risk wearing them out within a matter of years - and there are SMART attributes that should be able to indicate when the drive has used a lot of its life-span as an early-warning signal.
You've linked to one anecdotal failure, but surely there are many more people using SSDs successfully, who just haven't posted about it because it's working fine..?

-Toby


From: Michael Nolan <htfoot(at)gmail(dot)com>
To: Florian Weimer <fweimer(at)bfk(dot)de>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-26 16:26:01
Message-ID: BANLkTikm9G-3XKe5cM9z=PsK9kCPUtt7aA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 21, 2011 at 10:33 AM, Florian Weimer <fweimer(at)bfk(dot)de> wrote:

> * Greg Smith:
>
> > The fact that every row update can temporarily use more than 8K means
> > that actual write throughput on the WAL can be shockingly large. The
> > smallest customer I work with regularly has a 50GB database, yet they
> > write 20GB of WAL every day. You can imagine how much WAL is
> > generated daily on systems with terabyte databases.
>
> Interesting. Is there an easy way to monitor WAL traffic in away? It
> does not have to be finegrained, but it might be helpful to know if
> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
> database, should the question of SSDs ever come up.
>
> If you archive your WAL files, wouldn't that give you a pretty good
indication of write activity?

For example, yesterday I archived 74 WAL files, each 16MB. That's about 1.2
gigabytes for a database that takes up about 58 GB.
--
Mike Nolan


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Michael Nolan <htfoot(at)gmail(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 13:08:29
Message-ID: 82sjt2a5gy.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

* Michael Nolan:

> If you archive your WAL files, wouldn't that give you a pretty good
> indication of write activity?

WAL archiving may increase WAL traffic considerably, I think. Fresh
table contents (after CREATE TABLE or TRUNCATE) is written to the log
if WAL archiving is active. This would have a significant performance
impact on some of our loads.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: Florian Weimer <fweimer(at)bfk(dot)de>
To: Greg Smith <greg(at)2ndQuadrant(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 13:11:14
Message-ID: 82oc3qa5cd.fsf@mid.bfk.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

* Greg Smith:

> To convert the internal numbers returned by that into bytes, you'll
> need to do some math on them. Examples showing how that works and
> code in a few languages:

Thanks for the pointers.

Those examples are slightly incongruent, but I think I've distilled
something that should be reasonably accurate. The numbers look as if
they are valid, they match my expectations for different databases
with different loads.

--
Florian Weimer <fweimer(at)bfk(dot)de>
BFK edv-consulting GmbH http://www.bfk.de/
Kriegsstraße 100 tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 13:21:41
Message-ID: 4DB969E5.9070504@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

One thing to remember in this discussion about SSD longevity is that the
underlying value of interest is the total number of erase cycles, per
block, on the flash devices. Vendors quote lifetime as a number of
bytes, but this is calculated using an assumed write amplification
factor. That constant varies with the workload, and I suspect that a
database WAL will not produce the value used in the vendor marketing
material. I don't think you can simply say that I am writing so many Gb
WAL files, therefore according to the vendor's spec the device should
have lifetime of X years -- presumably you'd need to know how many
fsync() calls per second were being made, and then how those translate
to flash block erase rate on the far side of the SSD controller firmware.


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 14:20:27
Message-ID: 495A28CE-FD74-471A-B8DF-EAD6BA22D034@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Apr 28, 2011, at 7:21 AM, David Boreham wrote:

> I don't think you can simply say that I am writing so many Gb WAL files, therefore according to the vendor's spec

Also, I fully expect the vendors lie about erase cycles as baldly as they lie about MTBF, so I would divide by a very healthy skepticism factor.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: David Boreham <david_list(at)boreham(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 14:48:32
Message-ID: 4DB97E40.9010501@boreham.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 4/28/2011 8:20 AM, Scott Ribe wrote:
>> I don't think you can simply say that I am writing so many Gb WAL files, therefore according to the vendor's spec
> Also, I fully expect the vendors lie about erase cycles as baldly as they lie about MTBF, so I would divide by a very healthy skepticism factor.
>

As a former card-carrying semiconductor company employee, I'm not so
sure about this. I'd expect a healthy guard-band on the erase cycles
spec (so if they say 100K, the devices will actually achieve better than
2X). MTBF otoh is a mythical computed value that everyone takes with a
big pinch of salt, in my experience. I'm not sure it is possible to lie
about MTBF since it's essentially just the result of a calculation based
on the number of contacts, cells, traces, etc and the known failure
mechanisms for those things. Therefore it will be "true", but not
necessarily useful (e.g it does not take account of process defects or
an aging mechanism that were not known at the time of manufacture).


From: Scott Ribe <scott_ribe(at)elevated-dev(dot)com>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 16:54:00
Message-ID: A32C6C50-BCDE-4ACD-A5B0-2BCA033AC384@elevated-dev.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Apr 28, 2011, at 8:48 AM, David Boreham wrote:

> As a former card-carrying semiconductor company employee, I'm not so sure about this.

Well, yes, you have a good point that in many, if not all, cases we're dealing with different companies. That really should have occurred to me, that manufacturers of SSDDs (or at least some of them) might not have an ingrained culture of extreme cost cutting and deceptive ratings--I'm going to use "feeling under the weather" as my excuse. (Of course reliability of some early consumer-grade SSDDs was abysmal, but that should be a fairly easy problem to avoid.)

> MTBF otoh is a mythical computed value...

It's not only mythical, it's not even remotely realistic, to the point that it is no exaggeration to call it a bald-faced lie. Sorry, don't remember the university, but there was a nice study of large numbers of disks in data centers, and the result was that actual lifespans were so far from MBTF specs, that the remaining disks would have to just about outlive the universe in order to get the mean near the same order of magnitude as the published numbers.

--
Scott Ribe
scott_ribe(at)elevated-dev(dot)com
http://www.elevated-dev.com/
(303) 722-0567 voice


From: Robert Treat <rob(at)xzilla(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Florian Weimer <fweimer(at)bfk(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 19:34:00
Message-ID: BANLkTinBAXmhBUhOnyNBtV0WpAajZhi4Nw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 21, 2011 at 12:10 PM, Greg Smith <greg(at)2ndquadrant(dot)com> wrote:
> On 04/21/2011 11:33 AM, Florian Weimer wrote:
>>
>> Is there an easy way to monitor WAL traffic in away? It
>> does not have to be finegrained, but it might be helpful to know if
>> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
>> database, should the question of SSDs ever come up.
>>
>
> You can use functions like pg_current_xlog_location() :
> http://www.postgresql.org/docs/9.0/interactive/functions-admin.html
>
> Save a copy of this periodically:
>
> select now(),pg_current_xlog_location();
>
> And you can see WAL volume over time given any two points from that set of
> samples.
>
> To convert the internal numbers returned by that into bytes, you'll need to
> do some math on them.  Examples showing how that works and code in a few
> languages:
>
> http://archives.postgresql.org/pgsql-general/2010-10/msg00077.php (by hand)
> http://munin-monitoring.org/browser/trunk/plugins/node.d/postgres_streaming_.in?rev=3905
> (in Perl)
> http://archives.postgresql.org/pgsql-general/2010-10/msg00079.php (in C)
> http://postgresql.1045698.n5.nabble.com/How-can-we-tell-how-far-behind-the-standby-is-td3252297.html
> (in bash with bc(!), other links)
>
> What I keep meaning to write is something that does that as part of the SQL
> itself, so it gets pulled out of the database already in bytes.
>

We have an open task to work on this same problem. What we had cobbled
together so far was some sql which converted the xlog value into an
integer (it's pretty ugly, but I could send it over if you think it
would help), which we could then stick in a monitoring system and
graph. To get an idea of traffic, I just multiplied this by 16MB. End
result ended up looking like this:
https://circonus.com/shared/graphs/9497d906-4c5b-e6d2-bf91-d8869e7c1668/OnxdZG

Couldn't decide on exactly where to go from there. That's graphing
MB/sec, which does map easily in my mind, since xlogs streams are in
16mb bursts. It would make more sense for wal streaming though (but in
that case we'd probably want to measure it more precisely).

Robert Treat
play: http://xzilla.net
work: http://omniti.com


From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: Robert Treat <rob(at)xzilla(dot)net>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Florian Weimer <fweimer(at)bfk(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 20:25:46
Message-ID: 4DB9CD4A.7050903@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 2011-04-28 21:34, Robert Treat wrote:
>
> We have an open task to work on this same problem. What we had cobbled
> together so far was some sql which converted the xlog value into an
> integer (it's pretty ugly, but I could send it over if you think it
> would help), which we could then stick in a monitoring system and
> graph. To get an idea of traffic, I just multiplied this by 16MB. End
> result ended up looking like this:
> https://circonus.com/shared/graphs/9497d906-4c5b-e6d2-bf91-d8869e7c1668/OnxdZG
>
> Couldn't decide on exactly where to go from there. That's graphing
> MB/sec, which does map easily in my mind, since xlogs streams are in
> 16mb bursts. It would make more sense for wal streaming though (but in
> that case we'd probably want to measure it more precisely).
If the goal is predicting the EOL of the SSD, graphing IO to the
disk/partition, or perhaps graphing the smart values containing write
cycles/GBs written/lifetime curve could work. Both monitoring disk IO
(and iops) as well as smart values can be done with Symon: example
picture with smart attributes graphed at http://i.imgur.com/T4NAq.png -
the actual smart values for a SSD firmware would have to be configured
though, since they vary a lot.

(*) http://www.xs4all.nl/~wpd/symon/

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


From: Basil Bourque <basil(dot)list(at)me(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 22:27:04
Message-ID: 57D29247-23B3-4197-9450-879982D0DCE8@me.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Instead of the usual SSD, you may want to consider the 'ioDrive' products from "Fusion-io".
http://www.fusionio.com/

This company makes enterprise-class storage on a board populated with flash and managed by their own supposedly-sophisticated drivers. The board + drivers are meant to get around the problems of SSDs, such as write-failures. They make both SLC and MLC products, at different sizes, to meet different budgets and purposes.

I tried them about 3 years ago on a database server (not Postgres). The real-world speed was excellent, yet disappointing in that it performed as well as our high-end RAID-10 from HP. We were looking for even more speed, and were promised that, but did not see it in our trials.
Caveats:
• This was using their first attempt at Windows drivers.
• We may not have tuned the settings properly.
In the end, we chose to keep our RAID for the time being.

So, while I can't specifically recommend their products, I certainly suggest considering them. They have drivers for Linux and Windows, but apparently their rumored Mac OS X drivers never came to fruition.

Other benefits beyond speed include size, power, and heat. An internal board with flash saves on all three, compared to a RAID made of either discs or SSDs.

--Basil Bourque


From: "Mark Felder" <feld(at)feld(dot)me>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-28 23:00:50
Message-ID: op.vun7josd34t2sn@tech304
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, 28 Apr 2011 17:27:04 -0500, Basil Bourque <basil(dot)list(at)me(dot)com>
wrote:

> So, while I can't specifically recommend their products, I certainly
> suggest considering them.

Customer of ours is probably lurking on here. We host their servers in our
datacenter -- we had a UPS go "pop" after an amazing surge and their
servers all went down (weren't paying for N+1 power). They had several
FusionIO cards in servers running Postgres and experienced zero
corruption. YMMV.


From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-29 04:32:43
Message-ID: 4DBA3F6B.1010309@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 22/04/11 01:33, Florian Weimer wrote:
> * Greg Smith:
>
>> The fact that every row update can temporarily use more than 8K means
>> that actual write throughput on the WAL can be shockingly large. The
>> smallest customer I work with regularly has a 50GB database, yet they
>> write 20GB of WAL every day. You can imagine how much WAL is
>> generated daily on systems with terabyte databases.
>
> Interesting. Is there an easy way to monitor WAL traffic in away? It
> does not have to be finegrained, but it might be helpful to know if
> we're doing 10 GB, 100 GB or 1 TB of WAL traffic on a particular
> database, should the question of SSDs ever come up.

One thought I had on monitoring write usage..

If you're on Linux with the ext4 filesystem, then it keeps track of some
statistics for you.
Check out /sys/fs/ext4/$DEV/lifetime_write_kbytes
(where $DEV is the device the fs is mounted on, eg. sda1, or dm-0, or
whatnot - see /dev/mapper to get mappings from LVMs to dm-numbers)

If you log that value every day, you could get an idea of your daily
write load.

-Toby


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-29 06:35:00
Message-ID: 4DBA5C14.6020105@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/26/2011 10:30 AM, Toby Corkindale wrote:
> I see Intel is/was claiming their SLC SSDs had a *minimum* lifetime of
> 2PB in writes for their 64GB disks; for your customer with a 50GB db
> and 20GB/day of WAL, that would work out at a minimum lifetime of a
> million days, or about 273 years!
> The cheaper "consumer grade" MLC drives should still last minimum 5 years at 20GB/day according to their literature. (And what I found was fairly out of date)
> That doesn't seem too bad to me - I don't think I've worked anywhere that keeps their traditional spinning disks in service beyond 5 years either.
>

The comment I made there was that the 20GB/day system was a very small
customer. One busy server, who are also the ones most likely to want
SSD, I just watched recently chug through 16MB of WAL every 3
seconds=450GB/day. Now, you're right that those systems also aren't
running with a tiny amount of flash, either. But the write volume
scales along with the size, too. If you're heavily updating records in
particular, the WAL volume can be huge relative to the drive space
needed to store the result.

As for the idea that I'm just singling out one anecdote, I have
terabytes of lost data on multiple systems behind my negativity here. I
was just pointing out a public failure that included some post-mortem I
liked. I'm not sure if I have any happy customers who were early
adopters of regular SLC or MLC drives really; the disaster rate is very
close to 100% for the first few generations of those drives I've seen,
and I've been around 50-ish of them. I'm hoping the current models
shipping now are better, getting the write cache stuff sorted out better
will be a big help. But it's been a scary technology for database use
so far. The published numbers from the manufacturer literature are a
very rosy best case when you're hitting the disk with this type of workload.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-29 07:23:35
Message-ID: 4DBA6777.7020208@strategicdata.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 29/04/11 16:35, Greg Smith wrote:
> On 04/26/2011 10:30 AM, Toby Corkindale wrote:
>> I see Intel is/was claiming their SLC SSDs had a *minimum* lifetime of
>> 2PB in writes for their 64GB disks; for your customer with a 50GB db
>> and 20GB/day of WAL, that would work out at a minimum lifetime of a
>> million days, or about 273 years!
>> The cheaper "consumer grade" MLC drives should still last minimum 5
>> years at 20GB/day according to their literature. (And what I found was
>> fairly out of date)
>> That doesn't seem too bad to me - I don't think I've worked anywhere
>> that keeps their traditional spinning disks in service beyond 5 years
>> either.
>
>
> The comment I made there was that the 20GB/day system was a very small
> customer. One busy server, who are also the ones most likely to want
> SSD, I just watched recently chug through 16MB of WAL every 3
> seconds=450GB/day. Now, you're right that those systems also aren't
> running with a tiny amount of flash, either. But the write volume scales
> along with the size, too. If you're heavily updating records in
> particular, the WAL volume can be huge relative to the drive space
> needed to store the result.

For that sort of workload, it does sound like SSD isn't ideal.
Although if the Intel figure of 2PB is to be believed, it'd still take
>200 years to wear the drives out (assuming you'll need at least ten 64
GB disks just to store the DB on).
(Reference: http://download.intel.com/design/flash/nand/extreme/319984.pdf )

But yeah, much sooner on the cheaper MLC drives, as I understand it..
seems harder to get info out of Intel on their max write amounts.

> As for the idea that I'm just singling out one anecdote, I have
> terabytes of lost data on multiple systems behind my negativity here. I
> was just pointing out a public failure that included some post-mortem I
> liked. I'm not sure if I have any happy customers who were early
> adopters of regular SLC or MLC drives really; the disaster rate is very
> close to 100% for the first few generations of those drives I've seen,
> and I've been around 50-ish of them. I'm hoping the current models
> shipping now are better, getting the write cache stuff sorted out better
> will be a big help. But it's been a scary technology for database use so
> far. The published numbers from the manufacturer literature are a very
> rosy best case when you're hitting the disk with this type of workload.

Ah, thanks - it's interesting to hear more of your experiences there.

I do note that more recent SSD drives have made many improvements for
durability - now some of them are doing a sort of internal-RAID over
their NAND chips, so that if/when bits of them die, you don't lose your
data, and can keep operating.
(Reference: http://www.anandtech.com/show/4244/intel-ssd-320-review/2 )

It sounds like the technology is more mature now, but I guess we won't
know until more people have been using it, successfully, for a while..

-Toby


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Toby Corkindale <toby(dot)corkindale(at)strategicdata(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-29 10:42:08
Message-ID: BANLkTimdxwVvpMXhsuMpGOXU=_Yb62JVWQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Fri, Apr 29, 2011 at 1:23 AM, Toby Corkindale
<toby(dot)corkindale(at)strategicdata(dot)com(dot)au> wrote:
> On 29/04/11 16:35, Greg Smith wrote:
>>
>> On 04/26/2011 10:30 AM, Toby Corkindale wrote:
>>>
>>> I see Intel is/was claiming their SLC SSDs had a *minimum* lifetime of
>>> 2PB in writes for their 64GB disks; for your customer with a 50GB db
>>> and 20GB/day of WAL, that would work out at a minimum lifetime of a
>>> million days, or about 273 years!
>>> The cheaper "consumer grade" MLC drives should still last minimum 5
>>> years at 20GB/day according to their literature. (And what I found was
>>> fairly out of date)
>>> That doesn't seem too bad to me - I don't think I've worked anywhere
>>> that keeps their traditional spinning disks in service beyond 5 years
>>> either.
>>
>>
>> The comment I made there was that the 20GB/day system was a very small
>> customer. One busy server, who are also the ones most likely to want
>> SSD, I just watched recently chug through 16MB of WAL every 3
>> seconds=450GB/day. Now, you're right that those systems also aren't
>> running with a tiny amount of flash, either. But the write volume scales
>> along with the size, too. If you're heavily updating records in
>> particular, the WAL volume can be huge relative to the drive space
>> needed to store the result.
>
> For that sort of workload, it does sound like SSD isn't ideal.
> Although if the Intel figure of 2PB is to be believed, it'd still take >200
> years to wear the drives out (assuming you'll need at least ten 64 GB disks
> just to store the DB on).
> (Reference: http://download.intel.com/design/flash/nand/extreme/319984.pdf )

I think you misunderstood. He's not storing 480GB on the drives,
that's how much WAL is moving across it. It could easily be a single
80GB SSD drive or something like that.


From: Robert Treat <rob(at)xzilla(dot)net>
To: Mark Felder <feld(at)feld(dot)me>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-29 21:35:57
Message-ID: BANLkTimTwm+J5dE+gJF68tPeJ_zj39-a-w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 28, 2011 at 7:00 PM, Mark Felder <feld(at)feld(dot)me> wrote:
> On Thu, 28 Apr 2011 17:27:04 -0500, Basil Bourque <basil(dot)list(at)me(dot)com> wrote:
>
>> So, while I can't specifically recommend their products, I certainly
>> suggest considering them.
>
> Customer of ours is probably lurking on here. We host their servers in our
> datacenter -- we had a UPS go "pop" after an amazing surge and their servers
> all went down (weren't paying for N+1 power). They had several FusionIO
> cards in servers running Postgres and experienced zero corruption. YMMV.
>

Yeah, we're running fusion-io on some pretty heavily traffic'd
servers, and the performance has been good, and durability there when
needed. It's certainly worth checking out for those investigating
these options.

Robert Treat
conjecture: xzilla.net
consulting: omniti.com


From: Greg Smith <greg(at)2ndQuadrant(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: SSDs with Postgresql?
Date: 2011-04-30 04:59:46
Message-ID: 4DBB9742.6070508@2ndQuadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 04/29/2011 06:42 AM, Scott Marlowe wrote:
> I think you misunderstood. He's not storing 480GB on the drives,
> that's how much WAL is moving across it. It could easily be a single
> 80GB SSD drive or something like that.
>

Right; that's why you don't necessarily get saved by the fact that
larger databases must go onto more flash cells, too. Sometimes, yes,
but not always. The WAL is really close to a worst-case for flash:
lots of redundant information that's constantly overwritten. It's the
last thing you want to consider putting onto SSD. There's a good reason
why so many of the "enterprise" SSDs try to distinguish themselves with
redundancy and wear leveling advancements; it's so this sort of workload
doesn't kill them.

Combine that workload possibility with the limitations of MLC flash, and
you can see why the lifetimes actually are a serious concern in some
situations. Not all of them, of course, but this is why I recommend
things like directly measuring your WAL volume.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books