Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 21:53:37
Message-ID: 4CD47CE1.20800@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom Lane wrote:
> If open_dsync is so bad for performance on Linux, maybe it's bad
> everywhere? Should we be rethinking the default preference order?
>

And I've seen the expected sync write performance gain over fdatasync on
a system with a battery-backed cache running VxFS on Linux, because
working open_[d]sync means O_DIRECT writes bypassing the OS cache, and
therefore reducing cache pollution from WAL writes. This doesn't work
by default on Solaris because they have a special system call you have
to execute for direct output, but if you trick the OS into doing that
via mount options you can observe it there too. The last serious tests
of this area I saw on that platform were from Jignesh, and they
certainly didn't show a significant performance regression running in
sync mode. I vaguely recall seeing a set once that showed a minor loss
compared to fdatasync, but it was too close to make any definitive
statement about reordering.

I haven't seen any report yet of a serious performance regression in the
new Linux case that was written by someone who understands fully how
fsync and drive cache flushing are supposed to interact. It's been
obvious for a year now that the reports from Phoronix about this had no
idea what they were actually testing. I didn't see anything from
Marti's report that definitively answers whether this is anything other
than Linux finally doing the right thing to flush drive caches out when
sync writes happen. There may be a performance regression here related
to WAL data going out in smaller chunks than it used to, but in all the
reports I've seen it that hasn't been isolated well enough to consider
making any changes yet--to tell if it's a performance loss or a
reliability gain we're seeing.

I'd like to see some output from the 9.0 test_fsync on one of these
RHEL6 systems on a system without a battery backed write cache as a
first step here. That should start to shed some light on what's
happening. I just bumped up the priority on the pending upgrade of my
spare laptop to the RHEL6 beta I had been trying to find time for, so I
can investigate this further myself.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Broersma 2010-11-05 21:54:54 Re: CREATE CONSTRAINT TRIGGER
Previous Message Tom Lane 2010-11-05 21:45:53 "Make" versus effective stack limit in regression tests