Quick Links

Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD

Lists:	pgsql-performance

From:	Michael March <mmarch(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-07 23:47:55
Message-ID:	AANLkTikXLYc32Jyr5Fb8VeVNt6R86mCcF+0CkTHALd8W@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

If anyone is interested I just completed a series of benchmarks of stock
Postgresql running on a normal HDD vs a SSD.

If you don't want to read the post, the summary is that SSDs are 5 to 7
times faster than a 7200RPM HDD drive under a pgbench load.

http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html

Is this what everyone else is seeing?

Thanks!

--
mmarch(at)gmail(dot)com

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Michael March <mmarch(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 06:36:25
Message-ID:	BAE01AAD-6C69-48C7-BD2E-77E463D49903@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

SSD's actually vary quite a bit with typical postgres benchmark workloads. Many of them also do not guarantee data that has been sync'd will not be lost if power fails (most hard drives with a sane OS and file system do).

On Aug 7, 2010, at 4:47 PM, Michael March wrote:

If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.

http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html

Is this what everyone else is seeing?

Thanks!

--
mmarch(at)gmail(dot)com<mailto:mmarch(at)gmail(dot)com>

From:	Michael March <mmarch(at)gmail(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 06:49:38
Message-ID:	AANLkTi=km+H+mZ=yyx0_HTFZ-yU2sh85D+p5cs+ZcUSF@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

> SSD's actually vary quite a bit with typical postgres benchmark workloads.
>

You mean various SSDs from different vendors? Or are you saying the same SSD
model might vary in performance from drive to drive?

> Many of them also do not guarantee data that has been sync'd will not be
> lost if power fails (most hard drives with a sane OS and file system do).
>

What feature does an SSD need to have to insure that sync'd data is indeed
written to the SSD in the case of power loss?

>
>
> On Aug 7, 2010, at 4:47 PM, Michael March wrote:
>
> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
>
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
>
>
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
>
> Thanks!
>
>

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Michael March <mmarch(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 06:55:09
Message-ID:	AANLkTin=s7md9j6zoWRaH2VtoH3j4Q+tzj1w53C615Nm@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sat, Aug 7, 2010 at 5:47 PM, Michael March <mmarch(at)gmail(dot)com> wrote:
> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
> Thanks!

It's a good first swing, but I'd be interested in seeing how it runs
with various numbers of clients, and how it runs as the number of
clients goes past optimal. I.e. a nice smooth downward trend or a
horrible drop-off for whichever drives.

--
To understand recursion, one must first understand recursion.

From:	Michael March <mmarch(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 07:03:32
Message-ID:	AANLkTim0QD7+E+3CO0K+nGuUq44N46yk=fTs-H6XrqWY@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

>
> On Sat, Aug 7, 2010 at 5:47 PM, Michael March <mmarch(at)gmail(dot)com> wrote:
> > If anyone is interested I just completed a series of benchmarks of stock
> > Postgresql running on a normal HDD vs a SSD.
> > If you don't want to read the post, the summary is that SSDs are 5 to 7
> > times faster than a 7200RPM HDD drive under a pgbench load.
> >
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
> >
> > Is this what everyone else is seeing?
> > Thanks!
>
> It's a good first swing, but I'd be interested in seeing how it runs
> with various numbers of clients, and how it runs as the number of
> clients goes past optimal. I.e. a nice smooth downward trend or a
> horrible drop-off for whichever drives.
>
>
Yeah. I was thinking the same thing..

I need to automate the tests so I can incrementally increase the scaling of
the seed tables and increase of number simultaneous clients over time. Put
another way,I need to do A LOT more tests that will gently increment all the
testable factors one small step at a time.

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Michael March <mmarch(at)gmail(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 07:30:27
Message-ID:	AANLkTi=vPV4mfDFKUmYNK7avf8owS6RfKBt43_ypVBDk@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Sun, Aug 8, 2010 at 12:49 AM, Michael March <mmarch(at)gmail(dot)com> wrote:
>
>> SSD's actually vary quite a bit with typical postgres benchmark workloads.
>
> You mean various SSDs from different vendors? Or are you saying the same SSD
> model might vary in performance from drive to drive?
>
>>
>> Many of them also do not guarantee data that has been sync'd will not be
>> lost if power fails (most hard drives with a sane OS and file system do).
>
> What feature does an SSD need to have to insure that sync'd data is indeed
> written to the SSD in the case of power loss?

A big freaking capacitor and the ability to notice power's been cut
and start writing out the cache. There are a few that have it that
are coming out right about now. There was a post about one such drive
a few days ago, it was like 50G and $450 or so, so not cheap, but not
that bad compared to the $7000 drive bays with 16 15k6 drives I've
used in to the past to get good performance (3 to 4k tps)

--
To understand recursion, one must first understand recursion.

From:	Yeb Havinga <yebhavinga(at)gmail(dot)com>
To:	Michael March <mmarch(at)gmail(dot)com>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-08 10:12:07
Message-ID:	4C5E82F7.8040700@gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Michael March wrote:
> If anyone is interested I just completed a series of benchmarks of
> stock Postgresql running on a normal HDD vs a SSD.
>
> If you don't want to read the post, the summary is that SSDs are 5 to
> 7 times faster than a 7200RPM HDD drive under a pgbench load.
>
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
I tested a SSD with a capacitor and posted conclusions here
http://archives.postgresql.org/pgsql-performance/2010-07/msg00449.php

regards,
Yeb Havinga

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Michael March <mmarch(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-09 16:49:50
Message-ID:	A1640F99-A7C5-4F04-9A25-782C46BE7DAB@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Aug 7, 2010, at 11:49 PM, Michael March wrote:

SSD's actually vary quite a bit with typical postgres benchmark workloads.

You mean various SSDs from different vendors? Or are you saying the same SSD model might vary in performance from drive to drive?

Model to model (more specifically, controller chip to controller chip -- i.e. most 'Indilinx Barefoot' controller based SSD's perform similar).

Many of them also do not guarantee data that has been sync'd will not be lost if power fails (most hard drives with a sane OS and file system do).

What feature does an SSD need to have to insure that sync'd data is indeed written to the SSD in the case of power loss?

Either properly flush to storage when the OS / File sytem asks for it (most SSD's don't, most Hard Drives do), or have a supercapacitor to flush data on power loss.

The former can be achieved by turning off the write cache on some drives (such as Intel's X25-M and -E), but hurts performance.

Also, the amount of data at risk in a power loss varies between drives. For Intel's drives, its a small chunk of data ( < 256K). For some other drives, the cache can be over 30MB of outstanding writes.
For some workloads this is acceptable -- not every application is doing financial transactions. Not every part of the system needs to be on an SSD either -- the WAL, and various table spaces can all have different data integrity and performance requirements.

On Aug 7, 2010, at 4:47 PM, Michael March wrote:

If anyone is interested I just completed a series of benchmarks of stock Postgresql running on a normal HDD vs a SSD.

If you don't want to read the post, the summary is that SSDs are 5 to 7 times faster than a 7200RPM HDD drive under a pgbench load.

http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html

Is this what everyone else is seeing?

Thanks!

From:	Jeff Davis <pgsql(at)j-davis(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-09 22:41:59
Message-ID:	1281393719.3667.8.camel@jdavis-ux.asterdata.local
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Mon, 2010-08-09 at 09:49 -0700, Scott Carey wrote:
> Also, the amount of data at risk in a power loss varies between
> drives. For Intel's drives, its a small chunk of data ( < 256K). For
> some other drives, the cache can be over 30MB of outstanding writes.
> For some workloads this is acceptable -- not every application is
> doing financial transactions. Not every part of the system needs to
> be on an SSD either -- the WAL, and various table spaces can all have
> different data integrity and performance requirements.

I don't think it makes sense to speak about the data integrity of a
drive in terms of the amount of data at risk, especially with a DBMS.
Depending on which 256K you lose, you might as well lose your entire
database.

That may be an exaggeration, but the point is that it's not as simple as
"this drive is only risking 256K data loss per outage".

Regards,
Jeff Davis

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 16:21:20
Message-ID:	4C617C80.9080509@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Scott Carey wrote:
> Also, the amount of data at risk in a power loss varies between
> drives. For Intel's drives, its a small chunk of data ( < 256K). For
> some other drives, the cache can be over 30MB of outstanding writes.
> For some workloads this is acceptable

No, it isn't ever acceptable. You can expect the type of data loss you
get when a cache fails to honor write flush calls results in
catastrophic database corruption. It's not "I lost the last few
seconds"; it's "the database is corrupted and won't start" after a
crash. This is why we pound on this topic on this list. A SSD that
fails to honor flush requests is completely worthless for anything other
than toy databases. You can expect significant work to recover any
portion of your data after the first unexpected power loss under heavy
write load in this environment, during which you're down. We do
database corruption recovery at 2ndQuadrant; while I can't talk about
the details of some recent incidents, I am not speaking theoretically
when I warn about this.

Michael, I would suggest you read
http://www.postgresql.org/docs/current/static/wal-reliability.html and
link to it at the end of your article. You are recommending that people
consider a configuration that will result in their data being lost.
That can be acceptable, if for example your data is possible to recreate
from backups or the like. But people should be extremely clear that
trade-off is happening, and your blog post is not doing that yet. Part
of the reason for the bang per buck you're seeing here is that cheap
SSDs are cheating.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Jeff Davis <pgsql(at)j-davis(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 16:27:20
Message-ID:	4C617DE8.4030203@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Jeff Davis wrote:
> Depending on which 256K you lose, you might as well lose your entire
> database.
>

Let's be nice and assume that you only lose one 8K block because of the
SSD write cache; that's not so bad, right? Guess what--you could easily
be the next lucky person who discovers the block corrupted is actually
in the middle of the pg_class system catalog, where the list of tables
in the database is at! Enjoy getting your data back again with that
piece missing. It's really a fun time, I'll tell you that.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 17:57:20
Message-ID:	4C619300.3080400@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 8/10/2010 12:21 PM, Greg Smith wrote:
> Scott Carey wrote:
>> Also, the amount of data at risk in a power loss varies between
>> drives. For Intel's drives, its a small chunk of data ( < 256K).
>> For some other drives, the cache can be over 30MB of outstanding writes.
>> For some workloads this is acceptable
>
> No, it isn't ever acceptable. You can expect the type of data loss
> you get when a cache fails to honor write flush calls results in
> catastrophic database corruption. It's not "I lost the last few
> seconds"; it's "the database is corrupted and won't start" after a
> crash. This is why we pound on this topic on this list. A SSD that
> fails to honor flush requests is completely worthless for anything
> other than toy databases. You can expect significant work to recover
> any portion of your data after the first unexpected power loss under
> heavy write load in this environment, during which you're down. We do
> database corruption recovery at 2ndQuadrant; while I can't talk about
> the details of some recent incidents, I am not speaking theoretically
> when I warn about this.
>

What about putting indexes on them? If the drive fails and drops writes
on those, they could be rebuilt - assuming your system can function
without the index(es) temporarily.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Karl Denninger <karl(at)denninger(dot)net>
To:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
Cc:	Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:13:47
Message-ID:	4C6196DB.8050307@denninger.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Brad Nicholson wrote:
> On 8/10/2010 12:21 PM, Greg Smith wrote:
>> Scott Carey wrote:
>>> Also, the amount of data at risk in a power loss varies between
>>> drives. For Intel's drives, its a small chunk of data ( < 256K).
>>> For some other drives, the cache can be over 30MB of outstanding
>>> writes.
>>> For some workloads this is acceptable
>>
>> No, it isn't ever acceptable. You can expect the type of data loss
>> you get when a cache fails to honor write flush calls results in
>> catastrophic database corruption. It's not "I lost the last few
>> seconds"; it's "the database is corrupted and won't start" after a
>> crash. This is why we pound on this topic on this list. A SSD that
>> fails to honor flush requests is completely worthless for anything
>> other than toy databases. You can expect significant work to recover
>> any portion of your data after the first unexpected power loss under
>> heavy write load in this environment, during which you're down. We
>> do database corruption recovery at 2ndQuadrant; while I can't talk
>> about the details of some recent incidents, I am not speaking
>> theoretically when I warn about this.
>
> What about putting indexes on them? If the drive fails and drops
> writes on those, they could be rebuilt - assuming your system can
> function without the index(es) temporarily.
You could put indices on them but as noted by Scott, he's SPOT ON.

ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use. It is simply a matter of time
before you have a failure of some sort that results in catastrophic data
loss. If you're LUCKY the database won't start and you know you're in
trouble. If you're UNLUCKY the database DOES start but there is
undetected and unrecoverable data corruption somewhere inside the data
tables, which you WILL discover at the most-inopportune moment (like
when you desperately NEED that business record for some reason.) You
cannot put either the tables or the logs on such a drive without running
the risk of a data corruption problem that WILL lose data and MAY be
catastrophic, depending on exactly what fails when.

While it is possible to recover that which is not damaged from a
database that has corruption like this it simply is not possible to
recover data that never made it to the disk - no matter what you do -
and the time and effort expended (not to mention money if you have to
bring in someone with specialized skills you do not possess) that result
from such "decisions" when things go wrong are extreme.

Don't do it.

-- Karl

Attachment	Content-Type	Size
karl.vcf	text/x-vcard	124 bytes

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Karl Denninger <karl(at)denninger(dot)net>
Cc:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:23:22
Message-ID:	AANLkTi=pehZj1Bb8KVusGykyEzWn__pif-87Tz2beJYj@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl(at)denninger(dot)net> wrote:

> ANY disk that says "write is complete" when it really is not is entirely
> unsuitable for ANY real database use. It is simply a matter of time

What about read only slaves where there's a master with 100+spinning
hard drives "getting it right" and you need a half dozen or so read
slaves? I can imagine that being ok, as long as you don't restart a
server after a crash without checking on it.

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:28:31
Message-ID:	4C619A4F.7080306@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Brad Nicholson wrote:
> What about putting indexes on them? If the drive fails and drops
> writes on those, they could be rebuilt - assuming your system can
> function without the index(es) temporarily.

Dumping indexes on SSD is one of the better uses for them, presuming you
can survive what is likely to be an outage from a "can the site handle
full load?" perspective while they rebuild after a crash. As I'm sure
Brad is painfully aware of already, index rebuilding in PostgreSQL can
take a while. To spin my broken record here again, the main thing to
note when you consider that--relocate indexes onto SSD--is that the ones
you are most concerned about the performance of were likely to be
already sitting in RAM anyway, meaning the SSD speedup doesn't help
reads much. So the giant performance boost just isn't there in that case.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Karl Denninger <karl(at)denninger(dot)net>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:38:08
Message-ID:	4C619C90.3090005@denninger.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Scott Marlowe wrote:
> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl(at)denninger(dot)net> wrote:
>
>> ANY disk that says "write is complete" when it really is not is entirely
>> unsuitable for ANY real database use. It is simply a matter of time
>>
>
> What about read only slaves where there's a master with 100+spinning
> hard drives "getting it right" and you need a half dozen or so read
> slaves? I can imagine that being ok, as long as you don't restart a
> server after a crash without checking on it.
>
A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

IF you mean "a server that only accepts SELECTs, does not accept UPDATEs
or INSERTs, and on a crash **reloads the entire database from the
master**", then ok.

Most people who will do this won't reload it after a crash. They'll
"inspect" the database and say "ok", and put it back online. Bad Karma
will ensue in the future.

Incidentally, that risk is not theoretical either (I know about this one
from hard experience. Fortunately the master was still ok and I was
able to force a full-table copy.... I didn't like it as the database was
a few hundred GB, but I had no choice.)

-- Karl

Attachment	Content-Type	Size
karl.vcf	text/x-vcard	124 bytes

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:47:16
Message-ID:	4C619EB4.9080008@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 8/10/2010 2:28 PM, Greg Smith wrote:
> Brad Nicholson wrote:
>> What about putting indexes on them? If the drive fails and drops
>> writes on those, they could be rebuilt - assuming your system can
>> function without the index(es) temporarily.
>
> Dumping indexes on SSD is one of the better uses for them, presuming
> you can survive what is likely to be an outage from a "can the site
> handle full load?" perspective while they rebuild after a crash. As
> I'm sure Brad is painfully aware of already, index rebuilding in
> PostgreSQL can take a while. To spin my broken record here again, the
> main thing to note when you consider that--relocate indexes onto
> SSD--is that the ones you are most concerned about the performance of
> were likely to be already sitting in RAM anyway, meaning the SSD
> speedup doesn't help reads much. So the giant performance boost just
> isn't there in that case.
>
The case where I'm thinking they may be of use is for indexes you can
afford to lose. I'm thinking of ones that are needed by nightly batch
jobs, down stream systems or reporting - the sorts of things that you
can turn off during a rebuild, and where the data sets are not likely
to be in cache.

We have a few such cases, but we don't need the speed of SSD's for them.

Personally, I wouldn't entertain any SSD with a capacitor backing it for
anything, even indexes. Not worth the hassle to me.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	Karl Denninger <karl(at)denninger(dot)net>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 18:54:12
Message-ID:	4C61A054.6080400@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 8/10/2010 2:38 PM, Karl Denninger wrote:
> Scott Marlowe wrote:
>> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
>>
>>> ANY disk that says "write is complete" when it really is not is entirely
>>> unsuitable for ANY real database use. It is simply a matter of time
>>>
>>
>> What about read only slaves where there's a master with 100+spinning
>> hard drives "getting it right" and you need a half dozen or so read
>> slaves? I can imagine that being ok, as long as you don't restart a
>> server after a crash without checking on it.
>>
> A read-only slave isn't read-only, is it?
>
> I mean, c'mon - how does the data get there?
>

A valid case is a Slony replica if used for query offloading (not for
DR). It's considered a read-only subscriber from the perspective of
Slony as only Slony can modify the data (although you are technically
correct, it is not read only - controlled write may be more accurate).

In case of failure, a rebuild + resubscribe gets you back to the same
consistency. If you have high IO requirements, and don't have the
budget to rack up extra disk arrays to meet them, it could be an option.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Karl Denninger <karl(at)denninger(dot)net>
To:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 19:28:18
Message-ID:	4C61A852.9040309@denninger.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Brad Nicholson wrote:
> On 8/10/2010 2:38 PM, Karl Denninger wrote:
>> Scott Marlowe wrote:
>>> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl(at)denninger(dot)net> wrote:
>>>
>>>> ANY disk that says "write is complete" when it really is not is entirely
>>>> unsuitable for ANY real database use. It is simply a matter of time
>>>>
>>>
>>> What about read only slaves where there's a master with 100+spinning
>>> hard drives "getting it right" and you need a half dozen or so read
>>> slaves? I can imagine that being ok, as long as you don't restart a
>>> server after a crash without checking on it.
>>>
>> A read-only slave isn't read-only, is it?
>>
>> I mean, c'mon - how does the data get there?
>>
> A valid case is a Slony replica if used for query offloading (not for
> DR). It's considered a read-only subscriber from the perspective of
> Slony as only Slony can modify the data (although you are technically
> correct, it is not read only - controlled write may be more accurate).
>
> In case of failure, a rebuild + resubscribe gets you back to the same
> consistency. If you have high IO requirements, and don't have the
> budget to rack up extra disk arrays to meet them, it could be an option.
CAREFUL with that model and beliefs.

Specifically, the following will hose you without warning:

1. SLONY gets a change on the master.
2. SLONY commits it to the (read-only) slave.
3. Confirmation comes back to the master that the change was propagated.
4. Slave CRASHES without actually committing the changed data to stable
storage.

When the slave restarts it will not know that the transaction was lost.
Neither will the master, since it was told that it was committed. Slony
will happily go on its way and replicate forward, without any indication
of a problem - except that on the slave, there are one or more
transactions that are **missing**.

Some time later you issue an update that goes to the slave, but the
change previously lost causes the slave commit to violate referential
integrity. SLONY will fail to propagate that change and all behind it
- it effectively locks at that point in time.

You can recover from this by dropping the slave from replication and
re-inserting it, but that forces a full-table copy of everything in the
replication set. The bad news is that the queries to the slave in
question may have been returning erroneous data for some unknown period
of time prior to the lockup in replication (which hopefully you detect
reasonably quickly - you ARE watching SLONY queue depth with some
automated process, right?)

I can both cause this in the lab and have had it happen in the field.
It's a nasty little problem that bit me on a series of disks that
claimed to have write caching off, but in fact did not. I was very
happy that the data on the master was good at that point, as if I had
needed to failover to the slave (thinking it was a "good" copy) I would
have been in SERIOUS trouble.

-- Karl

Attachment	Content-Type	Size
karl.vcf	text/x-vcard	124 bytes

From:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>
To:	Karl Denninger <karl(at)denninger(dot)net>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 19:51:31
Message-ID:	4C61ADC3.6050907@ca.afilias.info
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 8/10/2010 3:28 PM, Karl Denninger wrote:
> Brad Nicholson wrote:
>> On 8/10/2010 2:38 PM, Karl Denninger wrote:
>>> Scott Marlowe wrote:
>>>> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger<karl(at)denninger(dot)net> wrote:
>>>>
>>>>> ANY disk that says "write is complete" when it really is not is entirely
>>>>> unsuitable for ANY real database use. It is simply a matter of time
>>>>>
>>>>
>>>> What about read only slaves where there's a master with 100+spinning
>>>> hard drives "getting it right" and you need a half dozen or so read
>>>> slaves? I can imagine that being ok, as long as you don't restart a
>>>> server after a crash without checking on it.
>>>>
>>> A read-only slave isn't read-only, is it?
>>>
>>> I mean, c'mon - how does the data get there?
>>>
>> A valid case is a Slony replica if used for query offloading (not for
>> DR). It's considered a read-only subscriber from the perspective of
>> Slony as only Slony can modify the data (although you are
>> technically correct, it is not read only - controlled write may be
>> more accurate).
>>
>> In case of failure, a rebuild + resubscribe gets you back to the
>> same consistency. If you have high IO requirements, and don't have
>> the budget to rack up extra disk arrays to meet them, it could be an
>> option.
> CAREFUL with that model and beliefs.
>
> Specifically, the following will hose you without warning:
>
> 1. SLONY gets a change on the master.
> 2. SLONY commits it to the (read-only) slave.
> 3. Confirmation comes back to the master that the change was propagated.
> 4. Slave CRASHES without actually committing the changed data to
> stable storage.
>
What will hose you is assuming that your data will be okay in the case
of a failure, which is a very bad assumption to make in the case on
unreliable SSD's. You are assuming I am implying that these should be
treated like reliable media - I am not.

In case of failure, you need to assume data loss until proven
otherwise. If there is a problem, rebuild.

> When the slave restarts it will not know that the transaction was
> lost. Neither will the master, since it was told that it was
> committed. Slony will happily go on its way and replicate forward,
> without any indication of a problem - except that on the slave, there
> are one or more transactions that are **missing**.
>

Correct.
> Some time later you issue an update that goes to the slave, but the
> change previously lost causes the slave commit to violate referential
> integrity. SLONY will fail to propagate that change and all behind
> it - it effectively locks at that point in time.
>
It will lock data flow to that subscriber, but not to others.

> You can recover from this by dropping the slave from replication and
> re-inserting it, but that forces a full-table copy of everything in
> the replication set. The bad news is that the queries to the slave in
> question may have been returning erroneous data for some unknown
> period of time prior to the lockup in replication (which hopefully you
> detect reasonably quickly - you ARE watching SLONY queue depth with
> some automated process, right?)
>
There are ways around that - run two subscribers and redirect your
queries on failure. Don't bring up the failed replica until it is
verified or rebuilt.

> I can both cause this in the lab and have had it happen in the field.
> It's a nasty little problem that bit me on a series of disks that
> claimed to have write caching off, but in fact did not. I was very
> happy that the data on the master was good at that point, as if I had
> needed to failover to the slave (thinking it was a "good" copy) I
> would have been in SERIOUS trouble.

It's very easy to cause those sorts of problems.

What I am saying is that the technology can have a use, if you are
aware of the sharp edges, and can both work around them and live with
them. Everything you are citing is correct, but is more implying that
they they are blindly thrown in without understanding the risks and
mitigating them.

I'm also not suggesting that this is a configuration I would endorse,
but it could potentially save a lot of money in certain use cases.

--
Brad Nicholson 416-673-4106
Database Administrator, Afilias Canada Corp.

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Karl Denninger <karl(at)denninger(dot)net>
Cc:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Greg Smith <greg(at)2ndquadrant(dot)com>, Scott Carey <scott(at)richrelevance(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 19:52:16
Message-ID:	AANLkTi=Se3LDknyNu58QLjSPEEb3MyHzMPX0cA_a893A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Aug 10, 2010 at 12:38 PM, Karl Denninger <karl(at)denninger(dot)net> wrote:
> Scott Marlowe wrote:
>
> On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl(at)denninger(dot)net> wrote:
>
>
> ANY disk that says "write is complete" when it really is not is entirely
> unsuitable for ANY real database use. It is simply a matter of time
>
>
> What about read only slaves where there's a master with 100+spinning
> hard drives "getting it right" and you need a half dozen or so read
> slaves? I can imagine that being ok, as long as you don't restart a
> server after a crash without checking on it.
>
>
> A read-only slave isn't read-only, is it?
>
> I mean, c'mon - how does the data get there?

Well, duh. However, what I'm looking at is having two big servers in
failover running on solid reliable hardware, and then a small army of
read only slony slaves that are used for things like sending user rss
feeds and creating weekly reports and such. These 1U machines with 12
to 24 cores and a single SSD drive are "disposable" in terms that if
they ever crash, there's a simple script to run that reinits the db
and then subscribes them to the set.

My point being, no matter how terrible an idea a certain storage media
is, there's always a use case for it. Even if it's very narrow.

From:	Christopher Browne <cbbrowne(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 20:00:24
Message-ID:	AANLkTim+z+FV30=2g-RQMPirGQj27dUt+PjwaozENbwD@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
> My point being, no matter how terrible an idea a certain storage media
> is, there's always a use case for it. Even if it's very narrow.

The trouble is, if extra subscribers induce load on the "master,"
which they presumably will, then that sliver of "use case" may very
well get obscured by the cost, such that the sliver should be treated
as not existing :-(.
--
http://linuxfinances.info/info/linuxdistributions.html

From:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To:	Christopher Browne <cbbrowne(at)gmail(dot)com>
Cc:	"pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-10 20:06:27
Message-ID:	AANLkTimU1PnRrzHcUU+SRVGDORo6FTphdK0DuSsQOO3P@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Tue, Aug 10, 2010 at 2:00 PM, Christopher Browne <cbbrowne(at)gmail(dot)com> wrote:
> On Tue, Aug 10, 2010 at 3:52 PM, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> My point being, no matter how terrible an idea a certain storage media
>> is, there's always a use case for it. Even if it's very narrow.
>
> The trouble is, if extra subscribers induce load on the "master,"
> which they presumably will, then that sliver of "use case" may very
> well get obscured by the cost, such that the sliver should be treated
> as not existing :-(.

One master, one slave, master handles all writes, slave handles all of
the other subscribers. I've run a setup like this with as many as 8
or so slaves at the bottom of the pile with no problems at all.

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-11 23:05:35
Message-ID:	72E29316-5F4C-4A01-8924-5340DF2FAE65@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Aug 10, 2010, at 9:21 AM, Greg Smith wrote:

> Scott Carey wrote:
>> Also, the amount of data at risk in a power loss varies between
>> drives. For Intel's drives, its a small chunk of data ( < 256K). For
>> some other drives, the cache can be over 30MB of outstanding writes.
>> For some workloads this is acceptable
>
> No, it isn't ever acceptable. You can expect the type of data loss you
> get when a cache fails to honor write flush calls results in
> catastrophic database corruption. It's not "I lost the last few
> seconds";

I never said it was.

> it's "the database is corrupted and won't start" after a
> crash.

Which is sometimes acceptables. There is NO GUARANTEE that you won't lose data, ever. An increase in the likelihood is an acceptable tradeoff in some situations, especially when it is small. On ANY power loss event, with or without battery backed caches and such, you should do a consistency check on the system proactively. With less reliable hardware, that task becomes much more of a burden, and is much more likely to require restoring data from somewhere.

What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5 minutes and you lose data before power is restored? What is the likelihood of human error?
Not that far off from the likelihood of power failure in a datacenter with redundant power. One MUST have a DR plan. Never assume that your perfect hardware won't fail.

> This is why we pound on this topic on this list. A SSD that
> fails to honor flush requests is completely worthless for anything other
> than toy databases.

Overblown. Not every DB and use case is a financial application or business critical app. Many are not toys at all. Slave, read only DB's (or simply subset tablespaces) ...

Indexes. (per application, schema)
Tables. (per application, schema)
System tables / indexes.
WAL.

Each has different reliability requirement and consequences from losing recently written data. less than 8K can be fatal to the WAL, or table data. Corrupting some tablespaces is not a big deal. Corrupting others is catastrophic. The problem with the assertion that this hardware is worthless is that it implies that every user, every use case, is at the far end of the reliability requirement spectrum.

Yes, that can be a critical requirement for many, perhaps most, DB's. But there are many uses for slightly unsafe storage systems.

> You can expect significant work to recover any
> portion of your data after the first unexpected power loss under heavy
> write load in this environment, during which you're down. We do
> database corruption recovery at 2ndQuadrant; while I can't talk about
> the details of some recent incidents, I am not speaking theoretically
> when I warn about this.

I've done the single-user mode recover system tables by hand thing myself at 4AM, on a system with battery backed RAID 10, redundant power, etc. Raid cards die, and 10TB recovery times from backup are long.

Its a game of balancing your data loss tolerance with the likelihood of power failure. Both of these variables are highly variable, and not just with 'toy' dbs. If you know what you are doing, you can use 'fast but not completely safe' storage for many things safely. Chance of loss is NEVER zero, do not assume that 'good' hardware is flawless.

Imagine a common internet case where synchronous_commit=false is fine. Recovery from backups is a pain (but a daily snapshot is taken of the important tables, and weekly for easily recoverable other stuff). If you expect one power related failure every 2 years, it might be perfectly reasonable to use 'unsafe' SSD's in order to support high transaction load on the risk that that once every 2 year downtime is 12 hours long instead of 30 minutes, and includes losing up to a day's information. Applications like this exist all over the place.

> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-11 23:10:29
Message-ID:	6FDD30A3-FEA2-45D3-8B09-2D28C6588185@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Aug 10, 2010, at 11:28 AM, Greg Smith wrote:

> Brad Nicholson wrote:
>> What about putting indexes on them? If the drive fails and drops
>> writes on those, they could be rebuilt - assuming your system can
>> function without the index(es) temporarily.
>
> Dumping indexes on SSD is one of the better uses for them, presuming you
> can survive what is likely to be an outage from a "can the site handle
> full load?" perspective while they rebuild after a crash. As I'm sure
> Brad is painfully aware of already, index rebuilding in PostgreSQL can
> take a while. To spin my broken record here again, the main thing to
> note when you consider that--relocate indexes onto SSD--is that the ones
> you are most concerned about the performance of were likely to be
> already sitting in RAM anyway, meaning the SSD speedup doesn't help
> reads much. So the giant performance boost just isn't there in that case.
>

For an OLTP type system, yeah. But for DW/OLAP and batch processing the gains are pretty big. Those indexes get kicked out of RAM and then pulled back in a lot. I'm talking about a server with 72GB of RAM that can't keep enough indexes in memory to avoid a lot of random access. Putting the indexes on an SSD has lowered the random I/O load on the other drives a lot, letting them get through sequential scans a lot faster.

Estimated power failure, once every 18 months (mostly due to human error). Rebuild indexes offline for 40 minutes every 18 months? No problem.

> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Karl Denninger <karl(at)denninger(dot)net>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Greg Smith <greg(at)2ndquadrant(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-11 23:52:26
Message-ID:	F695002B-AD66-4C8D-A76A-E3D6773EA8F1@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Aug 10, 2010, at 11:38 AM, Karl Denninger wrote:

Scott Marlowe wrote:

On Tue, Aug 10, 2010 at 12:13 PM, Karl Denninger <karl(at)denninger(dot)net><mailto:karl(at)denninger(dot)net> wrote:

ANY disk that says "write is complete" when it really is not is entirely
unsuitable for ANY real database use. It is simply a matter of time

A read-only slave isn't read-only, is it?

I mean, c'mon - how does the data get there?

IF you mean "a server that only accepts SELECTs, does not accept UPDATEs or INSERTs, and on a crash **reloads the entire database from the master**", then ok.

"ENTIRE database"?

Depends on your tablespace setup and schema usage pattern.
If:
* 90% of your data tables are partitioned by date, and untouched a week after insert. Partitions are backed up incrementally.
* The remaining 10% of it is backed up daily, and of that 9% can be re-generated from data elsewhere if data is lost.
* System catalog and wal are on 'safest of safe' hardware.

Then your 'bulk' data on a slave can be on less than flawless hardware. Simply restore the tables from the last week from the master or backup when the (rare) power failure occurs. The remaining data is safe, since it is not written to.
Split up your 10% of non-date partitioned data into what needs to be on safe hardware and what does not (maybe some indexes, etc).

Most of the time, the incremental cost of getting a BBU is too small to not do it, so the above hardly applies. But if you have data that is known to be read-only, you can do many unconventional things with it safely.

Most people who will do this won't reload it after a crash. They'll "inspect" the database and say "ok", and put it back online. Bad Karma will ensue in the future.

Anyone going with something unconventional better know what they are doing and not just blindly plug it in and think everything will be OK. I'd never recommend unconventional setups for a user that wasn't an expert and understood the tradeoff.

Incidentally, that risk is not theoretical either (I know about this one from hard experience. Fortunately the master was still ok and I was able to force a full-table copy.... I didn't like it as the database was a few hundred GB, but I had no choice.)

Been there with 10TB with hardware that should have been perfectly safe. 5 days of copying, and wishing that pg_dump supported lzo compression so that the dump portion had a chance at keeping up with the much faster restore portion with some level of compression on to save the copy bandwidth.

-- Karl
<karl.vcf>

From:	Karl Denninger <karl(at)denninger(dot)net>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Brad Nicholson <bnichols(at)ca(dot)afilias(dot)info>, Greg Smith <greg(at)2ndquadrant(dot)com>, Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-12 00:45:22
Message-ID:	4C634422.2090602@denninger.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Scott Carey wrote:
>
> On Aug 10, 2010, at 11:38 AM, Karl Denninger wrote:
>
> .....
>>
>> Most people who will do this won't reload it after a crash. They'll
>> "inspect" the database and say "ok", and put it back online. Bad
>> Karma will ensue in the future.
>
> Anyone going with something unconventional better know what they are
> doing and not just blindly plug it in and think everything will be OK.
> I'd never recommend unconventional setups for a user that wasn't an
> expert and understood the tradeoff.
True.
>>
>> Incidentally, that risk is not theoretical either (I know about this
>> one from hard experience. Fortunately the master was still ok and I
>> was able to force a full-table copy.... I didn't like it as the
>> database was a few hundred GB, but I had no choice.)
>
> Been there with 10TB with hardware that should have been perfectly
> safe. 5 days of copying, and wishing that pg_dump supported lzo
> compression so that the dump portion had a chance at keeping up with
> the much faster restore portion with some level of compression on to
> save the copy bandwidth.
Pipe it through ssh -C

PS: This works for SLONY and Bucardo too - set up a tunnel and then
change the port temporarily. This is especially useful when the DB
being COPY'd across has big fat honking BYTEA fields in it, which
otherwise expand about 400% - or more - on the wire.

-- Karl

Attachment	Content-Type	Size
karl.vcf	text/x-vcard	124 bytes

From:	Greg Smith <greg(at)2ndquadrant(dot)com>
To:	Scott Carey <scott(at)richrelevance(dot)com>
Cc:	Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-12 04:30:47
Message-ID:	4C6378F7.9040000@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

Scott Carey wrote:
> What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5 minutes and you lose data before power is restored? What is the likelihood of human error?
>

These are all things that happen sometimes, sure. The problem with the
cheap SSDs is that they happen downright often if you actually test for
it. If someone is aware of the risk and makes an informed decision,
fine. But most of the time I see articles like the one that started
this thread that are oblivious to the issue, and that's really bad.

--
Greg Smith 2ndQuadrant US Baltimore, MD
PostgreSQL Training, Services and Support
greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us

From:	Scott Carey <scott(at)richrelevance(dot)com>
To:	Greg Smith <greg(at)2ndquadrant(dot)com>
Cc:	Michael March <mmarch(at)gmail(dot)com>, "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-12 23:40:11
Message-ID:	FEDD0586-54B5-4F42-94AF-ACB3D89075E8@richrelevance.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On Aug 11, 2010, at 9:30 PM, Greg Smith wrote:

> Scott Carey wrote:
>> What is the likelihood that your RAID card fails, or that the battery that reported 'good health' only lasts 5 minutes and you lose data before power is restored? What is the likelihood of human error?
>>
>
> These are all things that happen sometimes, sure. The problem with the
> cheap SSDs is that they happen downright often if you actually test for
> it. If someone is aware of the risk and makes an informed decision,
> fine. But most of the time I see articles like the one that started
> this thread that are oblivious to the issue, and that's really bad.
>

Agreed. There is a HUGE gap between "ooh ssd's are fast, look!" and engineering a solution that uses them properly with all their strengths and faults. And as 'gnuoytr' points out, there is a big difference between an Intel SSD and say, this thing: http://www.nimbusdata.com/products/s-class_overview.html

> --
> Greg Smith 2ndQuadrant US Baltimore, MD
> PostgreSQL Training, Services and Support
> greg(at)2ndQuadrant(dot)com www.2ndQuadrant.us
>

From:	Arjen van der Meijden <acmmailing(at)tweakers(dot)net>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-13 05:59:29
Message-ID:	4C64DF41.40803@tweakers.net
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

On 13-8-2010 1:40 Scott Carey wrote:
> Agreed. There is a HUGE gap between "ooh ssd's are fast, look!" and
> engineering a solution that uses them properly with all their
> strengths and faults. And as 'gnuoytr' points out, there is a big
> difference between an Intel SSD and say, this thing:
> http://www.nimbusdata.com/products/s-class_overview.html

From the description it sounds as if its either FreeBSD or OpenSolaris
with ZFS with some webinterface-layer. That's not a bad thing per se,
but as the site suggests its 'only' $25k for the smallest (2.5TB?)
device. That makes it very likely that it are "off the shelf" MLC flash
drives. Given the design of the device and the pricing it probably are
your average 2.5"-drives with 100, 200 or 400GB capacity (maybe OCZ
vertex 2 pro, which do have such a capacitor?), similar to the Intel SSD
you compared it to.
And than we're basically back to square one, unless the devices have a
capacitor or ZFS works better with SSD-drives to begin with (it will at
least know silent data corruption did occur).

There are of course devices that are not built on top of normal disk
form factor SSD-drives like the Ramsan devices or Sun's F5100.

Best regards,

Arjen

From:	Michael March <mmarch(at)gmail(dot)com>
To:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: Completely un-tuned Postgresql benchmark results: SSD vs desktop HDD
Date:	2010-08-13 06:06:59
Message-ID:	AANLkTi=uNPBFeo5OOmp4Z5TPQms83hs=qYcQA_-4typs@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-performance

As a postscript to these tests.. I just tried the 500GB Monentus XT hybrid
SSD/HDD drive. I had this fantasy that it would at least do better than the
7200 rpm desktop drive.

Oh lord, my gut was wrong. The performance was inconsistent and never over
2/3rds the performance of the slowest desktop drive.

On Sat, Aug 7, 2010 at 4:47 PM, Michael March <mmarch(at)gmail(dot)com> wrote:

> If anyone is interested I just completed a series of benchmarks of stock
> Postgresql running on a normal HDD vs a SSD.
>
> If you don't want to read the post, the summary is that SSDs are 5 to 7
> times faster than a 7200RPM HDD drive under a pgbench load.
>
>
> http://it-blog.5amsolutions.com/2010/08/performance-of-postgresql-ssd-vs.html
>
> Is this what everyone else is seeing?
>
> Thanks!
>
> --
> mmarch(at)gmail(dot)com
>

--
<admiral>

Michael F. March ----- mmarch(at)gmail(dot)com