Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

Lists: pgsql-performance
From: <gnuoytr(at)rcn(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-12 00:53:56
Message-ID: 20100811205356.AHB77050@ms14.lnh.mail.rcn.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

A number of amusing aspects to this discussion.

- I've carried out similar tests using the Intel X-25M with both PG and DB2 (both on linux). While it is a simple matter to build parallel databases on DB2, on HDD and SSD, with buffers and tablespaces and logging and on and on set to recreate as many scenarios as one wishes using a single engine instance, not so for PG. While PG is the "best" OS database, from a tuning and admin point of view there's rather a long way to go. No one should think that retail SSD should be used to support an enterprise database. People have gotten lulled into thinking otherwise as a result of the blurring of the two use cases in the HDD world where the difference is generally just QA.

- All flash SSD munge the byte stream, some (SandForce controlled in particular) more than others. Industrial strength flash SSD can have 64 internal channels, written in parallel; they don't run on commodity controllers. Treating SSD as just a faster HDD is a trip on the road to perdition. Industrial strength (DRAM) SSDs have been used by serious database folks for a couple of decades, but not the storefront semi-professionals who pervade the web start up world.

- The value of SSD in the database world is not as A Faster HDD(tm). Never was, despite the naive' who assert otherwise. The value of SSD is to enable BCNF datastores. Period. If you're not going to do that, don't bother. Silicon storage will never reach equivalent volumetric density, ever. SSD will never be useful in the byte bloat world of xml and other flat file datastores (resident in databases or not). Industrial strength SSD will always be more expensive/GB, and likely by a lot. (Re)factoring to high normalization strips out an order of magnitude of byte bloat, increases native data integrity by as much, reduces much of the redundant code, and puts the ACID where it belongs. All good things, but not effortless.

You're arguing about the wrong problem. Sufficiently bulletproof flash SSD exist and have for years, but their names are not well known (no one on this thread has named any), but neither the Intel parts nor any of their retail cousins have any place in the mix except development machines. Real SSD have MTBFs measured in decades; OEMs have qualified such parts, but you won't find them on the shelf at Best Buy. You need to concentrate on understanding what can be done with such drives that can't be done with vanilla HDD that cost 1/50 the dollars. Just being faster won't be the answer. Removing the difference between sequential file processing and true random access is what makes SSD worth the bother; makes true relational datastores second nature rather than rocket science.

Robert


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: gnuoytr(at)rcn(dot)com
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-12 04:49:06
Message-ID: 4C637D42.5040208@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

gnuoytr(at)rcn(dot)com wrote:
> Sufficiently bulletproof flash SSD exist and have for years, but their names are not well known (no one on this thread has named any)

The models perceived as bulletproof are the really dangerous ones to
deploy. First, people let their guard down and stop being as paranoid
as they should be when they use them. Second, it becomes much more
difficult for them to justify buying more than one of the uber-SSD.
That combination makes it easier to go back to having a single copy of
their data, and there's a really bad road to wander down.

The whole idea that kicked off this thread was to enable building
systems cheap enough to allow making more inexpensive copies of the
data. My systems at home for example follow this model to some degree.
There's not a single drive more expensive than $100 to be found here,
but everything important to me is sitting on four of them in two systems
within seconds after I save it. However, even here I've found it worth
dropping enough money for a real battery-backed write cache, to reduce
the odds of write corruption on the more important of the servers. Not
doing so would be a dangerously cheap decision. That's similar to how I
feel about SSDs right now too. You need them to be expensive enough
that corruption is unusual rather than expected after a crash--it's
ridiculous to not spend enough to get something that's not completely
broken by design--while not spending so much that you can't afford to
deploy many of them.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us


From: Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-12 07:22:19
Message-ID: 4C63A12B.7050607@tweakers.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 12-8-2010 2:53 gnuoytr(at)rcn(dot)com wrote:
> - The value of SSD in the database world is not as A Faster HDD(tm).
> Never was, despite the naive' who assert otherwise. The value of SSD
> is to enable BCNF datastores. Period. If you're not going to do
> that, don't bother. Silicon storage will never reach equivalent
> volumetric density, ever. SSD will never be useful in the byte bloat
> world of xml and other flat file datastores (resident in databases or
> not). Industrial strength SSD will always be more expensive/GB, and
> likely by a lot. (Re)factoring to high normalization strips out an
> order of magnitude of byte bloat, increases native data integrity by
> as much, reduces much of the redundant code, and puts the ACID where
> it belongs. All good things, but not effortless.

It is actually quite common to under-utilize (short stroke) hard drives
in the enterprise world. Simply because 'they' need more IOps per amount
of data than a completely utilized disk can offer.
As such the expense/GB can be much higher than simply dividing the
capacity by its price (and if you're looking at fiber channel disks,
that price is quite high already). And than it is relatively easy to
find enterprise SSD's with better pricing for the whole system as soon
as the IOps are more important than the capacity.

So in the current market, you may already be better off, price-wise,
with (expensive) SSD if you need IOps rather than huge amounts of
storage. And while you're in both cases not comparing separate disks to
SSD, you're replacing a 'disk based storage system' with a '(flash)
memory based storage system' and it basically becomes 'A Faster HDD' ;)
But you're right, that for data-heavy applications, completely replacing
HDD's with some form of SSD is not going to happen soon, maybe never.

Best regards,

Arjen


From: Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-12 12:35:14
Message-ID: 4C63EA82.7080505@ca.afilias.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On 10-08-12 03:22 AM, Arjen van der Meijden wrote:
> On 12-8-2010 2:53 gnuoytr(at)rcn(dot)com wrote:
>> - The value of SSD in the database world is not as A Faster HDD(tm).
>> Never was, despite the naive' who assert otherwise. The value of SSD
>> is to enable BCNF datastores. Period. If you're not going to do
>> that, don't bother. Silicon storage will never reach equivalent
>> volumetric density, ever. SSD will never be useful in the byte bloat
>> world of xml and other flat file datastores (resident in databases or
>> not). Industrial strength SSD will always be more expensive/GB, and
>> likely by a lot. (Re)factoring to high normalization strips out an
>> order of magnitude of byte bloat, increases native data integrity by
>> as much, reduces much of the redundant code, and puts the ACID where
>> it belongs. All good things, but not effortless.
>
> It is actually quite common to under-utilize (short stroke) hard
> drives in the enterprise world. Simply because 'they' need more IOps
> per amount of data than a completely utilized disk can offer.
> As such the expense/GB can be much higher than simply dividing the
> capacity by its price (and if you're looking at fiber channel disks,
> that price is quite high already). And than it is relatively easy to
> find enterprise SSD's with better pricing for the whole system as soon
> as the IOps are more important than the capacity.

And when you compare the ongoing operational costs of rack space,
powering and cooling for big arrays full of spinning disks to flash
based solutions the price comparison evens itself out even more.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.


From: david(at)lang(dot)hm
To: Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-18 06:37:26
Message-ID: alpine.DEB.2.00.1008172333040.21463@asgard.lang.hm
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

On Thu, 12 Aug 2010, Brad Nicholson wrote:

> On 10-08-12 03:22 AM, Arjen van der Meijden wrote:
>> On 12-8-2010 2:53 gnuoytr(at)rcn(dot)com wrote:
>>> - The value of SSD in the database world is not as A Faster HDD(tm).
>>> Never was, despite the naive' who assert otherwise. The value of SSD
>>> is to enable BCNF datastores. Period. If you're not going to do
>>> that, don't bother. Silicon storage will never reach equivalent
>>> volumetric density, ever. SSD will never be useful in the byte bloat
>>> world of xml and other flat file datastores (resident in databases or
>>> not). Industrial strength SSD will always be more expensive/GB, and
>>> likely by a lot. (Re)factoring to high normalization strips out an
>>> order of magnitude of byte bloat, increases native data integrity by
>>> as much, reduces much of the redundant code, and puts the ACID where
>>> it belongs. All good things, but not effortless.
>>
>> It is actually quite common to under-utilize (short stroke) hard drives in
>> the enterprise world. Simply because 'they' need more IOps per amount of
>> data than a completely utilized disk can offer.
>> As such the expense/GB can be much higher than simply dividing the capacity
>> by its price (and if you're looking at fiber channel disks, that price is
>> quite high already). And than it is relatively easy to find enterprise
>> SSD's with better pricing for the whole system as soon as the IOps are more
>> important than the capacity.
>
> And when you compare the ongoing operational costs of rack space, powering
> and cooling for big arrays full of spinning disks to flash based solutions
> the price comparison evens itself out even more.

check your SSD specs, some of the high performance ones draw quite a bit
of power.

David Lang


From: <gnuoytr(at)rcn(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date: 2010-08-18 11:49:19
Message-ID: 20100818074919.AHR05405@ms14.lnh.mail.rcn.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-performance

If you can cite a specific device that draws more than 10% of the equivalently performing (e.g., short stroked) array, I would be very interested. There may be a DRAM SSD that draws more than a flash SSD, but I'd be really surprised to find a flash SSD that draws the same as any HDD, even at gross capacity.

Robert

---- Original message ----
>Date: Tue, 17 Aug 2010 23:37:26 -0700 (PDT)
>From: pgsql-performance-owner(at)postgresql(dot)org (on behalf of david(at)lang(dot)hm)
>Subject: Re: [PERFORM] Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
>To: Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
>Cc: pgsql-performance(at)postgresql(dot)org
>
>On Thu, 12 Aug 2010, Brad Nicholson wrote:
>
>> On 10-08-12 03:22 AM, Arjen van der Meijden wrote:
>>> On 12-8-2010 2:53 gnuoytr(at)rcn(dot)com wrote:
>>>> - The value of SSD in the database world is not as A Faster HDD(tm).
>>>> Never was, despite the naive' who assert otherwise. The value of SSD
>>>> is to enable BCNF datastores. Period. If you're not going to do
>>>> that, don't bother. Silicon storage will never reach equivalent
>>>> volumetric density, ever. SSD will never be useful in the byte bloat
>>>> world of xml and other flat file datastores (resident in databases or
>>>> not). Industrial strength SSD will always be more expensive/GB, and
>>>> likely by a lot. (Re)factoring to high normalization strips out an
>>>> order of magnitude of byte bloat, increases native data integrity by
>>>> as much, reduces much of the redundant code, and puts the ACID where
>>>> it belongs. All good things, but not effortless.
>>>
>>> It is actually quite common to under-utilize (short stroke) hard drives in
>>> the enterprise world. Simply because 'they' need more IOps per amount of
>>> data than a completely utilized disk can offer.
>>> As such the expense/GB can be much higher than simply dividing the capacity
>>> by its price (and if you're looking at fiber channel disks, that price is
>>> quite high already). And than it is relatively easy to find enterprise
>>> SSD's with better pricing for the whole system as soon as the IOps are more
>>> important than the capacity.
>>
>> And when you compare the ongoing operational costs of rack space, powering
>> and cooling for big arrays full of spinning disks to flash based solutions
>> the price comparison evens itself out even more.
>
>check your SSD specs, some of the high performance ones draw quite a bit
>of power.
>
>David Lang
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance