Re: Raw devices vs. Filesystems

Lists: pgsql-adminpgsql-performance
From: "Gregory S(dot) Williamson" <gsw(at)globexplorer(dot)com>
To: "Christopher Browne" <cbbrowne(at)acm(dot)org>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Raw devices vs. Filesystems
Date: 2004-04-05 19:43:21
Message-ID: 71E37EF6B7DCC1499CEA0316A25683280105786A@loki.wc.globexplorer.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance


No point to beating a dead horse (other than the sheer joy of the thing) since postgres does not have raw device support, but ...

raw devices, at least on solaris, are about 10 times as fast as cooked file systems for Informix. This might still be a gain for postgres' performance, but the portability issues remain.

raw device use in Informix is safer in terms of data because Informix does not ever have to use the regular file system and so issues of buffering and so on go away. My understanding -- fortunately not ever tried in the real world -- is that postgres' WAL log system is as reliable as Informix writing to raw devices.

raw devices can't be copied or tampered with with regular file tools (mv, cp etc.); this changes how backups get done but also adds a layer of insulation between valuable data and users.

Greg Williamson
DBA
GlobeXplorer LLC
-----Original Message-----
From: Christopher Browne [mailto:cbbrowne(at)acm(dot)org]
Sent: Mon 3/29/2004 10:28 AM
To: pgsql-admin(at)postgresql(dot)org
Cc:
Subject: Re: [ADMIN] Raw devices vs. Filesystems
After takin a swig o' Arrakan spice grog, el_vigia_ec(at)hotmail(dot)com ("Jaime Casanova") belched out:
> Can you tell me (or at least guide me to a palce where i can find the
> answer) what are the benefits of filesystems over raw devices?

For PostgreSQL, filesystems have the merit that you can actually use
them. PostgreSQL doesn't support use of "raw devices."

Two major benefits of using filesystems as opposed to raw devices are
that:

a) The use of raw devices is dramatically non-portable; you have to
reimplement data access on every platform you are trying to
support;

b) The use of raw devices essentially mandates that you implement
some form of generic filesystem on top of them, which adds
considerable complexity to your code.

Two benefits to raw devices are claimed...

c) It's faster. But that assumes that the "cooked" filesystems are
implemented fairly badly. That was typically true, a dozen
years ago, but it isn't so typical now, particularly with a
fancy cacheing controller.

d) It guarantees application control of update ordering. Of course,
with a cacheing controller, or disk drives that lie to one degree
or another, those guarantees might be gone anyways.

There are other filesystem advantages, such as

e) Shifting "cooked" data around may be as simple as a "mv," whereas
reorganizing on raw disk requires DB-specific tools...

> And what filesystem is the best for postgresql performance?

That would depend, assortedly, on what OS you are using, what kind of
hardware you are running on, what kind of usage patterns you have, as
well as on how you define the notion of "best."

Absent of any indication of any of those things, the best that can be
said is "that depends..."
--
(format nil "~S(at)~S" "cbbrowne" "acm.org")
http://cbbrowne.com/info/languages.html
TTY Message from The-XGP at MIT-AI:
The-XGP(at)AI 02/59/69 02:59:69
Your XGP output is startling.

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend


From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Raw devices vs. Filesystems
Date: 2004-04-06 20:57:02
Message-ID: 603c7gj04h.fsf@dev6.int.libertyrms.info
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

gsw(at)globexplorer(dot)com ("Gregory S. Williamson") writes:
> No point to beating a dead horse (other than the sheer joy of the
> thing) since postgres does not have raw device support, but ... raw
> devices, at least on solaris, are about 10 times as fast as cooked
> file systems for Informix. This might still be a gain for postgres'
> performance, but the portability issues remain.

That claim seems really rather remarkable.

It implies an entirely stunning degree of inefficiency in the
implementation of filesystems on Solaris.

The amount of indirection involved in walking through i-nodes and such
is something I would expect to introduce some percentage of
performance loss, but for it to introduce overhead of over 900%
presumably implies that Sun (and/or Veritas) got something really
horribly wrong.
--
select 'cbbrowne' || '@' || 'cbbrowne.com';
http://www.ntlug.org/~cbbrowne/nonrdbms.html
Rules of the Evil Overlord #1. "My Legions of Terror will have helmets
with clear plexiglass visors, not face-concealing ones."
<http://www.eviloverlord.com/>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chris Browne <cbbrowne(at)acm(dot)org>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Raw devices vs. Filesystems
Date: 2004-04-07 05:26:02
Message-ID: 5719.1081315562@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

Chris Browne <cbbrowne(at)acm(dot)org> writes:
> That claim seems really rather remarkable.
> It implies an entirely stunning degree of inefficiency in the
> implementation of filesystems on Solaris.

Solaris has a reputation for having stunning degrees of inefficiency
in a number of places :-(. On the other hand I've also heard it praised
for its ability to survive partial hardware failures (eg, N out of M
CPUs down), so maybe that's the price you gotta pay.

But to get back to the point of this discussion: to allow PG to use raw
devices instead of filesystems, we'd first have to do a ton of
portability work (since raw disk access is nowhere standard), and
abandon our principle that Postgres does not run as root (since raw disk
access is not permitted to non-root processes by any sane sysadmin).
But that last is a mighty comforting principle to have, anytime someone
complains that their el cheapo whitebox PC locks up as soon as they
start to stress the database. I know I'd have wasted a lot more time
chasing random hardware breakages if I couldn't say "system freezes and
filesystem corruption are Clearly Not Our Fault".

After that, we get to implement our own filesystem-equivalent management
of disk space allocation, disk I/O scheduling, etc. Are we really
smarter than all those kernel hackers doing this for a living? I doubt it.

After that, we get to re-optimize all the existing Postgres behaviors
that are designed to sit on top of a standard Unix buffering filesystem
layer.

After that, we might reap some performance benefits. Or maybe not.
There's not a heck of a lot of hard evidence that we would --- and
what there is traces to twenty-year-old assumptions about disk drive
and OS behavior, which are quite unlikely to still apply today.

Personally, I have a lot of more-promising projects to pursue...

regards, tom lane


From: Grega Bremec <gregab(at)noviforum(dot)si>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: Raw devices vs. Filesystems
Date: 2004-04-07 07:18:58
Message-ID: 20040407071858.GA7973@elbereth.noviforum.si
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

...and on Wed, Apr 07, 2004 at 01:26:02AM -0400, Tom Lane used the keyboard:
>
> After that, we get to implement our own filesystem-equivalent management
> of disk space allocation, disk I/O scheduling, etc. Are we really
> smarter than all those kernel hackers doing this for a living? I doubt it.
>
> After that, we get to re-optimize all the existing Postgres behaviors
> that are designed to sit on top of a standard Unix buffering filesystem
> layer.
>
> After that, we might reap some performance benefits. Or maybe not.
> There's not a heck of a lot of hard evidence that we would --- and
> what there is traces to twenty-year-old assumptions about disk drive
> and OS behavior, which are quite unlikely to still apply today.
>
> Personally, I have a lot of more-promising projects to pursue...
>

Has anyone tried PostgreSQL on top of OCFS? Personally, I'm not sure it
would even work, as Oracle clearly state that OCFS was _never_ meant to
be a fully fledged UNIX filesystem with POSIX features such as correct
timestamp updates, inode changes, etc., but OCFSv2 brings some features
that might lead one into thinking they're about to make it suitable for
uses beyond that of just having Oracle databases sitting on top of it.

Furthermore, this filesystem would be a blazing one stop solution for
all replication issues PostgreSQL currently suffers from, as its main
design goal was to present "a consistent file system image across the
servers in a cluster".

Now, if both goals can be achieved in one go, hell, I'm willing to try
it out myself in an attempt to extract off of it, some performance
indicators that could be compared to other database performance tests
sent to both this and the PERFORM mailing list.

So, anyone? :)

Cheers,
--
Grega Bremec
Senior Administrator
Noviforum Ltd., Software & Media
http://www.noviforum.si/


From: Harald Fuchs <hf320(at)protecting(dot)net>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: Raw devices vs. Filesystems
Date: 2004-04-07 13:05:55
Message-ID: pu3c7gx7ik.fsf@srv.protecting.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

In article <5719(dot)1081315562(at)sss(dot)pgh(dot)pa(dot)us>,
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> But to get back to the point of this discussion: to allow PG to use raw
> devices instead of filesystems, we'd first have to do a ton of
> portability work (since raw disk access is nowhere standard), and
> abandon our principle that Postgres does not run as root (since raw disk
> access is not permitted to non-root processes by any sane sysadmin).

Why not? In MySQL/InnoDB, you do a "chown mysql.daemon /dev/raw/raw1"
(or whatever raw disk you want to access), and that's all.

> After that, we get to implement our own filesystem-equivalent management
> of disk space allocation, disk I/O scheduling, etc. Are we really
> smarter than all those kernel hackers doing this for a living? I doubt it.

Ditto. I don't have hard numbers for MySQL, but I didn't see any
noticeable improvement when messing with raw disks (at least under
Linux).


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Grega Bremec <gregab(at)noviforum(dot)si>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Chris Browne <cbbrowne(at)acm(dot)org>, pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Date: 2004-04-07 16:09:16
Message-ID: 200404070909.16123.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

Grega,

> Furthermore, this filesystem would be a blazing one stop solution for
> all replication issues PostgreSQL currently suffers from, as its main
> design goal was to present "a consistent file system image across the
> servers in a cluster".

Does it work, though? Without Oracle admin tools?

> Now, if both goals can be achieved in one go, hell, I'm willing to try
> it out myself in an attempt to extract off of it, some performance
> indicators that could be compared to other database performance tests
> sent to both this and the PERFORM mailing list.

Hey, any test you wanna run is fine with us. I'm pretty sure that OCFS
belongs to Oracle, though, patent & copyright, so we couldn't actually use it
in practice.

If your intention in this test is to show the superiority of raw devices, let
me give you a reality check: barring some major corporate backing getting
involved, we can't possibly implement our own PG-FS for database support. We
already have a TODO list which is far too long for our developer pool, and
implementing a custom FS either takes a large team (OCFS) or several years of
development (Reiser).

Now, if you know somebody who might pay for one, then great ....

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Steve Atkins <steve(at)blighty(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Date: 2004-04-07 16:29:47
Message-ID: 20040407162946.GA7271@gp.word-to-the-wise.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

On Wed, Apr 07, 2004 at 09:09:16AM -0700, Josh Berkus wrote:

> If your intention in this test is to show the superiority of raw devices, let
> me give you a reality check: barring some major corporate backing getting
> involved, we can't possibly implement our own PG-FS for database support. We
> already have a TODO list which is far too long for our developer pool, and
> implementing a custom FS either takes a large team (OCFS) or several years of
> development (Reiser).

Is there any documentation as to what guarantees PostgreSQL requires
from the filesystem, or what posix semantics can be relaxed?

Cheers,
Steve


From: Grega Bremec <gregab(at)noviforum(dot)si>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Date: 2004-04-08 04:33:04
Message-ID: 20040408043304.GA28539@elbereth.noviforum.si
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

...and on Wed, Apr 07, 2004 at 09:09:16AM -0700, Josh Berkus used the keyboard:
>
> Does it work, though? Without Oracle admin tools?

Hello, Josh. :)

Well, as I said, that's why I was asking - I'm willing to give it a go
if nobody can prove me wrong. :)

> > Now, if both goals can be achieved in one go, hell, I'm willing to try
> > it out myself in an attempt to extract off of it, some performance
> > indicators that could be compared to other database performance tests
> > sent to both this and the PERFORM mailing list.
>
> Hey, any test you wanna run is fine with us. I'm pretty sure that OCFS
> belongs to Oracle, though, patent & copyright, so we couldn't actually use it
> in practice.

I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been open-
source for quite a while now - they're released under the GPL.

http://oss.oracle.com/projects/ocfs/
http://oss.oracle.com/projects/ocfs-tools/
http://oss.oracle.com/projects/ocfs2/

I don't know what that means to you (probably nothing good, as PostgreSQL
is released under the BSD license), but it most definitely can be considered
a good thing for the end user, as she can download it, compile, and set it
up on her disks, without the need to pay Oracle royalties. :)

> If your intention in this test is to show the superiority of raw devices, let
> me give you a reality check: barring some major corporate backing getting
> involved, we can't possibly implement our own PG-FS for database support. We
> already have a TODO list which is far too long for our developer pool, and
> implementing a custom FS either takes a large team (OCFS) or several years of
> development (Reiser).

Not really - I was just thinking about something not-entirely-a-filesystem
and POK!, OCFS sprang to mind. It omits many POSIX features that slow down
a traditional filesystem, yet it does know the concept of inodes and most
of all, it's _really_ heavy on caching. As such, it sounded quite promising
to me, but trial, I think, is the best test.

The question does spring up though, that Steve raised in another post - just
for the record, what POSIX semantics can a postmaster live without in a
filesystem?

Cheers,
--
Grega Bremec
Senior Administrator
Noviforum Ltd., Software & Media
http://www.noviforum.si/


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Grega Bremec <gregab(at)noviforum(dot)si>
Cc: pgsql-admin(at)postgresql(dot)org, pgsql-performance(at)postgresql(dot)org
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Date: 2004-04-09 16:02:00
Message-ID: 200404090902.00934.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

Grega,

> Well, as I said, that's why I was asking - I'm willing to give it a go
> if nobody can prove me wrong. :)

Why not? If you have time?

> I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been open-
> source for quite a while now - they're released under the GPL.

Keen! Wonder if we can make them regret it.

Seriously, if Oracle opened this stuff, it's probably becuase they used some
GPL components in it. It also probably means that it won't work for
anything but Oracle ...

> I don't know what that means to you (probably nothing good, as PostgreSQL
> is released under the BSD license),

Well, it just means that we can't ship OCFS with PostgreSQL.

> The question does spring up though, that Steve raised in another post -
> just for the record, what POSIX semantics can a postmaster live without in
> a filesystem?

You might want to ask that question again on Hackers. I don't know the
answer, myself.

--
Josh Berkus
Aglio Database Solutions
San Francisco


From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-admin(at)postgresql(dot)org
Subject: Re: [PERFORM] Raw devices vs. Filesystems
Date: 2004-04-09 19:34:44
Message-ID: m3ekqxdjxn.fsf@wolfe.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin pgsql-performance

josh(at)agliodbs(dot)com (Josh Berkus) wrote:
>> Well, as I said, that's why I was asking - I'm willing to give it a go
>> if nobody can prove me wrong. :)
>
> Why not? If you have time?

True enough.

>> I thought you knew - OCFS, OCFS-Tools and OCFSv2 have not only been
>> open- source for quite a while now - they're released under the
>> GPL.
>
> Keen! Wonder if we can make them regret it.
>
> Seriously, if Oracle opened this stuff, it's probably becuase they
> used some GPL components in it. It also probably means that it
> won't work for anything but Oracle ...

It could be that the experiment shows that OCFS isn't all that
helpful. Or that it helps cover inadequacies in certain aspects of
how Oracle accesses filesystems.

If it _does_ show that it is helpful, then that may suggest a
filesystem implementation strategy useful for the BSD folks.

The main "failure case" would be if the exercise shows that using OCFS
is pretty futile.
--
select 'cbbrowne' || '@' || 'acm.org';
http://www3.sympatico.ca/cbbrowne/linux.html
Do you know where your towel is?