Re: Recommendations for SSDs in production?

From: Yeb Havinga <yebhavinga(at)gmail(dot)com>
To: David Boreham <david_list(at)boreham(dot)org>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Recommendations for SSDs in production?
Date: 2011-11-24 13:20:59
Message-ID: 4ECE44BB.9060102@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 2011-11-04 16:24, David Boreham wrote:
> On 11/4/2011 8:26 AM, Yeb Havinga wrote:
>>
>> First, if your'e interested in doing a test like this yourself, I'm
>> testing on ubuntu 11.10, but even though this is a brand new
>> distribution, the smart database was a few months old.
>> 'update-smart-drivedb' had as effect that the names of the values
>> turned into something useful: instead of #LBA's written, it now shows
>> #32MiB's written. Also there are now three 'workload' related
>> parameters.
>>
> I submitted the patch for these to smartmontools a few weeks ago and
> it is now in the current db but not yet in any of the distro update
> packages. I probably forgot to mention in my post here that you need
> the latest db for the 710. Also, if you pull the trunk source code and
> build it yourself it has the ability to decode the drive stats log
> data (example pasted below). I haven't yet found a use for this
> myself, but it does seem to have a little more informaiton than the
> SMART attributes. (Thanks to Christian Franke of the smartmontools
> project for implementing this feature)
>
> Your figures from the workload wear roughly match mine. In production
> we don't expect to subject the drives to anything close to 100% of the
> pgbench workload (probably around 1/10 of that on average), so the
> predicted wear life of the drive is 10+ years in our estimates, under
> production loads.
>
> The big question of course is can the drive's wearout estimate be
> trusted ? A little more information from Intel about how it is
> calculated would help allay concerns in this area.

TLDR: some numbers after three week media wear testing on a software
mirror with intel 710 and ocz vertex 2 pro.

The last couple of weeks I've been running pgbench for an hour then
sleep for 10 minutes in an infinite loop, just to see how values would grow.

This is the intel 710 mirror leg:

225 Host_Writes_32MiB 0x0032 100 100 000 Old_age
Always - 3020093
226 Workld_Media_Wear_Indic 0x0032 100 100 000 Old_age
Always - 2803
227 Workld_Host_Reads_Perc 0x0032 100 100 000 Old_age
Always - 0
228 Workload_Minutes 0x0032 100 100 000 Old_age
Always - 21444
232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail
Always - 0
233 Media_Wearout_Indicator 0x0032 098 098 000 Old_age
Always - 0
241 Host_Writes_32MiB 0x0032 100 100 000 Old_age
Always - 3020093
242 Host_Reads_32MiB 0x0032 100 100 000 Old_age
Always - 22259

Note: raw value of 226 (E2) = 2803. According to
http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-4.html
you have to divide it by 1024 to get a percentage. That would be 2%.
This matches with 098 of the (not raw) value at 233 (E9).

This is the ocz vertex 2 PRO mirror leg:

5 Retired_Block_Count 0x0033 100 100 003 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 22
100 Gigabytes_Erased 0x0032 000 000 000 Old_age
Always - 21120
170 Reserve_Block_Count 0x0032 000 000 000 Old_age
Always - 34688
177 Wear_Range_Delta 0x0000 000 000 000 Old_age
Offline - 3
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail
Always - 100
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail
Always - 0
232 Available_Reservd_Space 0x0000 000 000 000 Old_age
Offline - 33
233 SandForce_Internal 0x0000 000 000 000 Old_age
Offline - 21184
234 SandForce_Internal 0x0032 000 000 000 Old_age
Always - 94656
235 SuperCap_Health 0x0033 100 100 002 Pre-fail
Always - 0
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age
Always - 94656
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age
Always - 960

Here the 177 (B1) wear range delta is on a raw value of 3 - this isn't
ssd life left, but Delta between most-worn and least-worn Flash blocks.
I really wonder at which point SSD life left will change to 99 on this
drive..

regards,
Yeb Havinga

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message pasman pasmański 2011-11-24 13:37:11 How to display the progress of query
Previous Message Phoenix Kiula 2011-11-24 12:23:54 Re: Table Design question for gurus (without going to "NoSQL")...