Re: O_DSYNC broken on MacOS X?

From: Darren Duncan <darren(at)darrenduncan(dot)net>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: O_DSYNC broken on MacOS X?
Date: 2010-10-01 03:03:21
Message-ID: 4CA54F79.6030508@darrenduncan.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Smith wrote:
> You didn't quote the next part of that, which says "fsync() is not
> sufficient to guarantee that your data is on stable
> storage and on MacOS X we provide a fcntl(), called F_FULLFSYNC, to ask
> the drive to flush all buffered data to stable storage." That's exactly
> what turning on fsync_writethrough does in PostgreSQL. See
> http://archives.postgresql.org/pgsql-hackers/2005-04/msg00390.php as the
> first post on this topic that ultimately led to that behavior being
> implemented.
>
> From the perspective of the database, whether or not the behavior is
> standards compliant isn't the issue. Whether pages make it to physical
> disk or not when fsync is called, or when O_DSYNC writes are done on
> platforms that support them, is the important part. If you the OS
> doesn't do that, it is doing nothing useful from the perspective of the
> database's expectations. And that's not true on Darwin unless you
> specify F_FULLFSYNC, which doesn't happen by default in PostgreSQL. It
> only does that when you switch wal_sync_method=fsync_writethrough

Greg Smith also wrote:
> The main downside to switching the default on either OS X or Windows is
developers using those platforms for test deployments will suffer greatly from a
performance drop for data they don't really care about. As those two in
particular are much more likely to be client development platforms, too, that's
a scary thing to consider.

I think that, bottom line, Postgres should be defaulting to whatever the safest
and most reliable behavior is, per each platform, because data integrity is the
most important thing, ensuring that a returning commit has actually written data
to disk. If performance is worse, then so what? Code that does nothing has the
best performance of all, and is also generally useless.

Whenever there is a tradeoff to be made, reliability for speed, then users
should have to explicitly choose the less reliable option, which would
demonstrate they know what they're doing. Let the testers explicitly choose a
faster and less reliable option for the data they don't care about, and
otherwise by default users who don't better should get the safest option, for
data they likely care about. That is a DBMS priority.

This matter reminds me of a discussion on the SQLite list years ago about
whether pragma synchronous=normal or synchronous=full should be the default, and
thankfully 'full' won.

-- Darren Duncan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2010-10-01 06:35:43 Re: O_DSYNC broken on MacOS X?
Previous Message Robert Haas 2010-10-01 02:29:56 Re: patch: SQL/MED(FDW) DDL