[PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+

Lists: pgsql-hackers
From: Marti Raudsepp <marti(at)juffo(dot)org>
To: pgsql-hackers(at)postgresql(dot)org
Subject: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 17:23:35
Message-ID: AANLkTim+iqYNf2tAoz+q1OoVdGdK__V0dnvJ-+2JUPD9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi list,

PostgreSQL's default settings change when built with Linux kernel
headers 2.6.33 or newer. As discussed on the pgsql-performance list,
this causes a significant performance regression:
http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php

NB! I am not proposing to change the default -- to the contrary --
this patch restores old behavior. Users might be in for a nasty
performance surprise when re-building their Postgres with newer Linux
headers (as was I), so I propose that this change should be made in
all supported releases.

-- commit message --
Revert default wal_sync_method to fdatasync on Linux 2.6.33+

Linux kernel headers from 2.6.33 (and later) change the behavior of the
O_SYNC flag. Previously O_SYNC was aliased to O_DSYNC, which caused
PostgreSQL to use fdatasync as the default instead.

Starting with kernels 2.6.33 and later, the definitions of O_DSYNC and
O_SYNC differ. When built with headers from these newer kernels,
PostgreSQL will default to using open_datasync. This patch reverts the
Linux default to fdatasync, which has had much more testing over time
and also significantly better performance.
-- end commit message --

Earlier kernel headers defined O_SYNC and O_DSYNC to 0x1000
2.6.33 and later define O_SYNC=0x101000 and O_DSYNC=0x1000 (since old
behavior on most FS-es was always equivalent to POSIX O_DSYNC)

More details at:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=6b2f3d1f769be5779b479c37800229d9a4809fc3

Currently PostgreSQL's include/access/xlogdefs.h defaults to using
open_datasync when O_SYNC != O_DSYNC, otherwise fdatasync is used.

Since other platforms might want to default to fdatasync in the
future, too, I defined a new PLATFORM_DEFAULT_SYNC_METHOD constant in
include/port/linux.h. I don't know if this is the best way to do it.

Regards,
Marti

Attachment Content-Type Size
0001-Revert-default-wal_sync_method-to-fdatasync-on-Linux.patch text/x-patch 2.4 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 18:13:47
Message-ID: 27641.1288980827@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> PostgreSQL's default settings change when built with Linux kernel
> headers 2.6.33 or newer. As discussed on the pgsql-performance list,
> this causes a significant performance regression:
> http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php

> NB! I am not proposing to change the default -- to the contrary --
> this patch restores old behavior.

I'm less than convinced this is the right approach ...

If open_dsync is so bad for performance on Linux, maybe it's bad
everywhere? Should we be rethinking the default preference order?

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 19:15:26
Message-ID: AANLkTikSsmirokZe5GmSTvtt0qYYOojADoc_kZEf=xAQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 5, 2010 at 20:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I'm less than convinced this is the right approach ...
>
> If open_dsync is so bad for performance on Linux, maybe it's bad
> everywhere?  Should we be rethinking the default preference order?

Sure, maybe for PostgreSQL 9.1

But the immediate problem is older releases (8.1 - 9.0) specifically
on Linux. Something as innocuous as re-building your DB on a newer
kernel will radically affect performance -- even when the DB kernel
didn't change.

So I think we should aim to fix old versions first. Do you disagree?

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 19:20:08
Message-ID: 29570.1288984808@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Fri, Nov 5, 2010 at 20:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> If open_dsync is so bad for performance on Linux, maybe it's bad
>> everywhere? Should we be rethinking the default preference order?

> So I think we should aim to fix old versions first. Do you disagree?

What's that got to do with it?

regards, tom lane


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 19:52:45
Message-ID: AANLkTimiM-3qG50rk07H2gXpu=vwFGyKWhRFte9Arkg9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 5, 2010 at 21:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
>> On Fri, Nov 5, 2010 at 20:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> If open_dsync is so bad for performance on Linux, maybe it's bad
>>> everywhere?  Should we be rethinking the default preference order?
>
>> So I think we should aim to fix old versions first. Do you disagree?
>
> What's that got to do with it?

I'm not sure what you're asking.

Surely changing the default wal_sync_method for all OSes in
maintenance releases is out of the question, no?

Regards,
Marti


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 20:16:40
Message-ID: 896.1288988200@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp <marti(at)juffo(dot)org> writes:
> On Fri, Nov 5, 2010 at 21:20, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> What's that got to do with it?

> I'm not sure what you're asking.

> Surely changing the default wal_sync_method for all OSes in
> maintenance releases is out of the question, no?

Well, if we could leave well enough alone it would be fine with me,
but I think our hand is being forced by the Linux kernel hackers.
I don't really think that "change the default on Linux" is that
much nicer than "change the default everywhere" when it comes to
what we ought to consider back-patching. In any case, you're getting
ahead of the game: we need to decide on the desired behavior first and
then think about what to patch. Do the performance results that were
cited show that open_dsync is generally inferior to fdatasync? If so,
why would we think that that conclusion is Linux-specific?

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 20:54:57
Message-ID: 201011052154.57775.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 05 November 2010 19:13:47 Tom Lane wrote:
> Marti Raudsepp <marti(at)juffo(dot)org> writes:
> > PostgreSQL's default settings change when built with Linux kernel
> > headers 2.6.33 or newer. As discussed on the pgsql-performance list,
> > this causes a significant performance regression:
> > http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php
> >
> > NB! I am not proposing to change the default -- to the contrary --
> > this patch restores old behavior.
>
> I'm less than convinced this is the right approach ...
>
> If open_dsync is so bad for performance on Linux, maybe it's bad
> everywhere? Should we be rethinking the default preference order?
I fail to see how it could be beneficial on *any* non-buggy platform.
Especially with small wal_buffers and larger commits (but also otherwise) it
increases the amount of synchronous writes the os has to do tremendously.

* It removes about all benefits of XLogBackgroundFlush()
* It removes any chances of reordering after writing.
* It makes AdvanceXLInsertBuffer synchronous if it has to write outy

Whats the theory about placing it so high in the preferences list?

Andres


From: Marti Raudsepp <marti(at)juffo(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 21:09:48
Message-ID: AANLkTi=3wMSSTYpGanRRNtADf3TpbrNY9on+AoyHhfzy@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Nov 5, 2010 at 22:16, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I don't really think that "change the default on Linux" is that
> much nicer than "change the default everywhere" when it comes to
> what we ought to consider back-patching.  In any case, you're getting
> ahead of the game: we need to decide on the desired behavior first and
> then think about what to patch.

We should be trying to guarantee the stability of maintenance
releases. "Stability" includes consistent defaults. The fact that
Linux now distinguishes between these two flags has a very surprising
effect on PostgreSQL's defaults; an effect that wasn't intended by any
developer, is not documented anywhere, and certainly won't be
anticipated by users.

Do you reject this premise?

As newer distros are adopting 2.6.33+ kernels, more and more people
will shoot themselves in the foot by this change. I am also worried
that it will have a direct effect on PostgreSQL adoption.

Regards,
Marti


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Marti Raudsepp <marti(at)juffo(dot)org>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 21:53:37
Message-ID: 4CD47CE1.20800@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> If open_dsync is so bad for performance on Linux, maybe it's bad
> everywhere? Should we be rethinking the default preference order?
>

And I've seen the expected sync write performance gain over fdatasync on
a system with a battery-backed cache running VxFS on Linux, because
working open_[d]sync means O_DIRECT writes bypassing the OS cache, and
therefore reducing cache pollution from WAL writes. This doesn't work
by default on Solaris because they have a special system call you have
to execute for direct output, but if you trick the OS into doing that
via mount options you can observe it there too. The last serious tests
of this area I saw on that platform were from Jignesh, and they
certainly didn't show a significant performance regression running in
sync mode. I vaguely recall seeing a set once that showed a minor loss
compared to fdatasync, but it was too close to make any definitive
statement about reordering.

I haven't seen any report yet of a serious performance regression in the
new Linux case that was written by someone who understands fully how
fsync and drive cache flushing are supposed to interact. It's been
obvious for a year now that the reports from Phoronix about this had no
idea what they were actually testing. I didn't see anything from
Marti's report that definitively answers whether this is anything other
than Linux finally doing the right thing to flush drive caches out when
sync writes happen. There may be a performance regression here related
to WAL data going out in smaller chunks than it used to, but in all the
reports I've seen it that hasn't been isolated well enough to consider
making any changes yet--to tell if it's a performance loss or a
reliability gain we're seeing.

I'd like to see some output from the 9.0 test_fsync on one of these
RHEL6 systems on a system without a battery backed write cache as a
first step here. That should start to shed some light on what's
happening. I just bumped up the priority on the pending upgrade of my
spare laptop to the RHEL6 beta I had been trying to find time for, so I
can investigate this further myself.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)postgresql(dot)org, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 22:07:10
Message-ID: 3002.1288994830@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On Friday 05 November 2010 19:13:47 Tom Lane wrote:
>> If open_dsync is so bad for performance on Linux, maybe it's bad
>> everywhere? Should we be rethinking the default preference order?

> I fail to see how it could be beneficial on *any* non-buggy platform.
> Especially with small wal_buffers and larger commits (but also otherwise) it
> increases the amount of synchronous writes the os has to do tremendously.

> * It removes about all benefits of XLogBackgroundFlush()
> * It removes any chances of reordering after writing.
> * It makes AdvanceXLInsertBuffer synchronous if it has to write outy

> Whats the theory about placing it so high in the preferences list?

I think the original idea was that if you had a dedicated WAL drive then
sync-on-write would be reasonable. But that was a very long time ago
and I'm not sure that the system's behavior is anything like what it was
then; for that matter I'm not sure we had proof that it was an optimal
choice even back then. That's why I want to revisit the choice of
default and not just go for "minimum" change.

regards, tom lane


From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 22:08:00
Message-ID: 201011052308.00616.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 05 November 2010 22:53:37 Greg Smith wrote:
> > If open_dsync is so bad for performance on Linux, maybe it's bad
> > everywhere? Should we be rethinking the default preference order?
> >
> >
>
> And I've seen the expected sync write performance gain over fdatasync on
> a system with a battery-backed cache running VxFS on Linux, because
> working open_[d]sync means O_DIRECT writes bypassing the OS cache, and
> therefore reducing cache pollution from WAL writes.
Which looks like a setup where you definitely need to know what you do. I.e.
don't need support from wal_sync_method by default being open_fdatasync...

Andres


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 22:14:02
Message-ID: 4CD481AA.6030407@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> I think the original idea was that if you had a dedicated WAL drive then
> sync-on-write would be reasonable. But that was a very long time ago
> and I'm not sure that the system's behavior is anything like what it was
> then; for that matter I'm not sure we had proof that it was an optimal
> choice even back then. That's why I want to revisit the choice of
> default and not just go for "minimum" change.

What plaforms do we need to test to get a reasonable idea? Solaris,
FreeBSD, Windows?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 22:31:07
Message-ID: 3471.1288996267@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> What plaforms do we need to test to get a reasonable idea? Solaris,
> FreeBSD, Windows?

At least. I'm hoping that Greg Smith will take the lead on testing
this, since he seems to have spent the most time in the area so far.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-05 22:33:03
Message-ID: 4CD4861F.3040207@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/5/10 3:31 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> What plaforms do we need to test to get a reasonable idea? Solaris,
>> FreeBSD, Windows?
>
> At least. I'm hoping that Greg Smith will take the lead on testing
> this, since he seems to have spent the most time in the area so far.

I could test at least 1 version of Solaris, I think.

Greg, any recommendations on pgbench parameters?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org, Marti Raudsepp <marti(at)juffo(dot)org>
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-07 23:44:48
Message-ID: 4CD739F0.20308@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> I'm hoping that Greg Smith will take the lead on testing
> this, since he seems to have spent the most time in the area so far.
>

It's not coincidence that the chapter of my book I convinced the
publisher to release as a sample is the one that covers this area; this
mess has been visibly approaching for some time now. I'm going to put
RHEL6 onto a system and start collecting some proper slowdown numbers
this week, then pass along a suggested test regime for others.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Marti Raudsepp <marti(at)juffo(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-11-15 01:09:27
Message-ID: 4CE08847.9060006@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Marti Raudsepp wrote:
> PostgreSQL's default settings change when built with Linux kernel
> headers 2.6.33 or newer. As discussed on the pgsql-performance list,
> this causes a significant performance regression:
> http://archives.postgresql.org/pgsql-performance/2010-10/msg00602.php
>
> NB! I am not proposing to change the default -- to the contrary --
> this patch restores old behavior.

Following our standard community development model, I've put this patch
onto our CommitFest list:
https://commitfest.postgresql.org/action/patch_view?id=432 and assigned
myself as the reviewer. I didn't look at this until now because I
already had some patch development and review work to finish before the
CommitFest deadline we just crossed. Now I can go back to reviewing
other people's work.

P.S. There is no pgsql-patch list anymore; everything goes through the
hackers list now.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-03 20:25:02
Message-ID: 4CF9521E.2090708@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

So, this week I've had my hands on a medium-high-end test system where I
could test various wal_sync_methods. This is a 24-core Intel Xeon
machine with 72GB of ram, and 8 internal 10K SAS disks attached to a
raid controller with 512MB BBU write cache. 2 of the disks are in a
RAID1, which supports both an Ext4 partition and an XFS partition. The
remaining disks are in a RAID10 which only supports a single pgdata
partition.

This is running on RHEL6, Linux Kernel: 2.6.32-71.el6.x86_64

I think this kind of a system much better represents our users who are
performance-conscious than testing on people's laptops or on VMs does.

I modified test_fsync in two ways to run this; first, to make it support
O_DIRECT, and second to make it run in the *current* directory. I think
the second change should be permanent; I imagine that a lot of people
who are running test_fsync are not aware that they're actually testing
the performance of /var/tmp, not whatever FS mount they wanted to test.

Here's the results. I think you'll agree that, at least on Linux, the
benefits of o_sync and o_dsync as defaults would be highly questionable.
Particularly, it seems that if O_DIRECT support is absent, fdatasync is
across-the-board faster:

=============

test_fsync with directIO, on 2 drives, XFS tuned:

Loops = 10000

Simple write:
8k write 198629.457/second

Compare file sync methods using one write:
open_datasync 8k write 14798.263/second
open_sync 8k write 14316.864/second
8k write, fdatasync 12198.871/second
8k write, fsync 12371.843/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 7362.805/second
2 open_sync 8k writes 7156.685/second
8k write, 8k write, fdatasync 10613.525/second
8k write, 8k write, fsync 10597.396/second

Compare open_sync with different sizes:
open_sync 16k write 13631.816/second
2 open_sync 8k writes 7645.038/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11427.096/second
8k write, close, fsync 11321.220/second

test_fsync with directIO, on 6 drives RAID10, XFS tuned:

Loops = 10000

Simple write:
8k write 196494.537/second

Compare file sync methods using one write:
open_datasync 8k write 14909.974/second
open_sync 8k write 14559.326/second
8k write, fdatasync 11046.025/second
8k write, fsync 11046.916/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 7349.223/second
2 open_sync 8k writes 7667.395/second
8k write, 8k write, fdatasync 9560.495/second
8k write, 8k write, fsync 9557.287/second

Compare open_sync with different sizes:
open_sync 16k write 12060.049/second
2 open_sync 8k writes 7650.746/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 9377.107/second
8k write, close, fsync 9251.233/second

test_fsync without directIO on RAID1, Ext4, data=journal:

Loops = 10000

Simple write:
8k write 150514.005/second

Compare file sync methods using one write:
open_datasync 8k write 4012.070/second
open_sync 8k write 5476.898/second
8k write, fdatasync 5512.649/second
8k write, fsync 5803.814/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 2910.401/second
2 open_sync 8k writes 2817.377/second
8k write, 8k write, fdatasync 5041.608/second
8k write, 8k write, fsync 5155.248/second

Compare open_sync with different sizes:
open_sync 16k write 4895.956/second
2 open_sync 8k writes 2720.875/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 4724.052/second
8k write, close, fsync 4694.776/second

test_fsync without directIO on RAID1, XFS, tuned:

Loops = 10000

Simple write:
8k write 199796.208/second

Compare file sync methods using one write:
open_datasync 8k write 12553.525/second
open_sync 8k write 12535.978/second
8k write, fdatasync 12268.298/second
8k write, fsync 12305.875/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 6323.835/second
2 open_sync 8k writes 6285.169/second
8k write, 8k write, fdatasync 10893.756/second
8k write, 8k write, fsync 10752.607/second

Compare open_sync with different sizes:
open_sync 16k write 11053.510/second
2 open_sync 8k writes 6293.270/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11087.482/second
8k write, close, fsync 11157.477/second

test_fsync without directIO on RAID10, 6 drives, XFS Tuned:

Loops = 10000

Simple write:
8k write 197262.003/second

Compare file sync methods using one write:
open_datasync 8k write 12784.699/second
open_sync 8k write 12684.512/second
8k write, fdatasync 12404.547/second
8k write, fsync 12452.757/second

Compare file sync methods using two writes:
2 open_datasync 8k writes 6376.587/second
2 open_sync 8k writes 6364.113/second
8k write, 8k write, fdatasync 9895.699/second
8k write, 8k write, fsync 9866.886/second

Compare open_sync with different sizes:
open_sync 16k write 10156.491/second
2 open_sync 8k writes 6400.889/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 11142.620/second
8k write, close, fsync 11076.393/second

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-05 03:12:57
Message-ID: 4CFB0339.7040009@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

All,

While I have this machine available I've been trying to run some
performance tests using pgbench and various wal_sync_methods. However,
I seem to be maxing out at the speed of pgbench itself; no matter which
wal_sync_method I use (including "fsync"), it tops out at around 2750 TPS.

Of course, it's also possible that the wal_sync_method does not in fact
make a difference in throughput.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Greg Smith <greg(at)2ndquadrant(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-05 22:12:18
Message-ID: 4CFC0E42.6010801@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> I modified test_fsync in two ways to run this; first, to make it support
> O_DIRECT, and second to make it run in the *current* directory.

Patch please? I agree with the latter change; what test_fsync does is
surprising.

I suggested a while ago that we refactor test_fsync to use a common set
of source code as the database itself for detecting things related to
wal_sync_method, perhaps just extract that whole set of DEFINE macro
logic to somewhere else. That happened at a bad time in the development
cycle (right before a freeze) and nobody ever got back to the idea
afterwards. If this code is getting touched, and it's clear it is in
some direction, I'd like to see things change so it's not possible for
the two to diverge again afterwards.

--
Greg Smith 2ndQuadrant US greg(at)2ndQuadrant(dot)com Baltimore, MD
PostgreSQL Training, Services and Support www.2ndQuadrant.us


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-07 00:41:06
Message-ID: 4CFD82A2.3040405@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/5/10 2:12 PM, Greg Smith wrote:
> Josh Berkus wrote:
>> I modified test_fsync in two ways to run this; first, to make it support
>> O_DIRECT, and second to make it run in the *current* directory.
>
> Patch please? I agree with the latter change; what test_fsync does is
> surprising.

Attached.

Making it support O_DIRECT would be possible but more complex; I don't
see the point unless we think we're going to have open_sync_with_odirect
as a seperate option.

> I suggested a while ago that we refactor test_fsync to use a common set
> of source code as the database itself for detecting things related to
> wal_sync_method, perhaps just extract that whole set of DEFINE macro
> logic to somewhere else. That happened at a bad time in the development
> cycle (right before a freeze) and nobody ever got back to the idea
> afterwards. If this code is getting touched, and it's clear it is in
> some direction, I'd like to see things change so it's not possible for
> the two to diverge again afterwards.

I don't quite follow you. Maybe nobody else did last time, either.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

Attachment Content-Type Size
test_fsync.patch text/x-patch 700 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-07 01:38:14
Message-ID: 5948.1291685894@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> Making it support O_DIRECT would be possible but more complex; I don't
> see the point unless we think we're going to have open_sync_with_odirect
> as a seperate option.

Whether it's complex or not isn't really the issue. The issue is that
what test_fsync is testing had better match what the backend does, or
people will be making choices based on not-comparable test results.
I think we should have test_fsync just automatically fold in O_DIRECT
the same way the backend does.

regards, tom lane


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-07 01:59:53
Message-ID: 4CFD9519.6020705@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/06/2010 08:38 PM, Tom Lane wrote:
> Josh Berkus<josh(at)agliodbs(dot)com> writes:
>> Making it support O_DIRECT would be possible but more complex; I don't
>> see the point unless we think we're going to have open_sync_with_odirect
>> as a seperate option.
> Whether it's complex or not isn't really the issue. The issue is that
> what test_fsync is testing had better match what the backend does, or
> people will be making choices based on not-comparable test results.
> I think we should have test_fsync just automatically fold in O_DIRECT
> the same way the backend does.
>
>

Indeed. We were quite confused for a while when we were dealing with
this about a week ago, and my handwritten test program failed as
expected but test_fsync didn't. Anything other than behaving just as the
backend does violates POLA, in my view.

cheers

andrew


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-07 02:05:12
Message-ID: 4CFD9658.6090304@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Whether it's complex or not isn't really the issue. The issue is that
> what test_fsync is testing had better match what the backend does, or
> people will be making choices based on not-comparable test results.
> I think we should have test_fsync just automatically fold in O_DIRECT
> the same way the backend does.

OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails.
What should I have it do instead?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-07 02:13:29
Message-ID: 6900.1291688009@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus <josh(at)agliodbs(dot)com> writes:
> OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails.
> What should I have it do instead?

Report that it fails, and keep testing the other methods.

regards, tom lane


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-09 20:14:25
Message-ID: 4D0138A1.1070603@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 12/6/10 6:13 PM, Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails.
>> What should I have it do instead?
>
> Report that it fails, and keep testing the other methods.

Patch attached. Includes a fair amount of comment cleanup, since
existing comments did not meet our current project standards. Tests all
6 of the methods we support separately.

Some questions, though:

(1) Why are we doing the open_sync different-size write test? AFAIK,
this doesn't match any behavior which PostgreSQL has.

(2) In this patch, I'm stepping down the number of loops which
fsync_writethrough does by 90%. The reason for that was that on the
platforms where I tested writethrough (desktop machines), doing 10,000
loops took 15-20 *minutes*, which seems hard on the user. Would be easy
to revert if you think it's a bad idea.
Possibly auto-sizing the number of loops based on the first fsync test
might be a good idea, but seems like going a bit too far.

(3) Should the multi-descriptor test be using writethrough on platforms
which support it?

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

Attachment Content-Type Size
test_fsync_expanded.patch text/x-patch 18.5 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-23 01:38:01
Message-ID: 201012230138.oBN1c1U15913@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> On 12/6/10 6:13 PM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails.
> >> What should I have it do instead?
> >
> > Report that it fails, and keep testing the other methods.
>
> Patch attached. Includes a fair amount of comment cleanup, since
> existing comments did not meet our current project standards. Tests all
> 6 of the methods we support separately.
>
> Some questions, though:
>
> (1) Why are we doing the open_sync different-size write test? AFAIK,
> this doesn't match any behavior which PostgreSQL has.

I did that so we could see the impact of doing 2 8k writes that were
both fsync'ed vs doing one 16k write and then fsync:

Compare open_sync with different sizes:
open_sync 16k write 201.323/second
2 open_sync 8k writes 332.466/second

We often write multiple 8k WAL pages and then fsync on commit.

> (2) In this patch, I'm stepping down the number of loops which
> fsync_writethrough does by 90%. The reason for that was that on the
> platforms where I tested writethrough (desktop machines), doing 10,000
> loops took 15-20 *minutes*, which seems hard on the user. Would be easy
> to revert if you think it's a bad idea.
> Possibly auto-sizing the number of loops based on the first fsync test
> might be a good idea, but seems like going a bit too far.

Sure, I recently increased the number, probably too much.

> (3) Should the multi-descriptor test be using writethrough on platforms
> which support it?

Uh, I didn't think that would matter because the test is to test kernel
behavior of writing to one file descriptor and fsyncing using another.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-23 01:49:17
Message-ID: 201012230149.oBN1nHs17239@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
> > Making it support O_DIRECT would be possible but more complex; I don't
> > see the point unless we think we're going to have open_sync_with_odirect
> > as a seperate option.
>
> Whether it's complex or not isn't really the issue. The issue is that
> what test_fsync is testing had better match what the backend does, or
> people will be making choices based on not-comparable test results.
> I think we should have test_fsync just automatically fold in O_DIRECT
> the same way the backend does.

The problem is that O_DIRECT was not implemented in macros but rather
down in the code:

if (!XLogIsNeeded() && !am_walreceiver)
o_direct_flag = PG_O_DIRECT;

Which means if we just export the macros, we would still not have caught
this. I would like to share all the defines --- I am just saying it
isn't trivial.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-23 22:45:44
Message-ID: 4D13D118.9030406@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Which means if we just export the macros, we would still not have caught
> this. I would like to share all the defines --- I am just saying it
> isn't trivial.

I just called all the define variables manually rather than relying on
the macros. Seemed to work fine.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2010-12-23 23:29:24
Message-ID: 4D13DB54.60900@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg, All:

Results for Solaris 10u8, on ZFS on a 7-drive attached storage array:

bash-3.00# ./test_fsync -f /dbdata/pgdata/test.out
Loops = 10000

Simple write:
8k write 59988.002/second

Compare file sync methods using one write:
open_datasync 8k write 214.125/second
(unavailable: o_direct)
open_sync 8k write 222.155/second
(unavailable: o_direct)
8k write, fdatasync 214.086/second
8k write, fsync 215.035/second
(unavailable: fsync_writethrough)

Compare file sync methods using two writes:
2 open_datasync 8k writes 108.227/second
(unavailable: o_direct)
2 open_sync 8k writes 106.935/second
(unavailable: o_direct)
8k write, 8k write, fdatasync 205.525/second
8k write, 8k write, fsync 210.483/second
(unavailable: fsync_writethrough)

Compare open_sync with different sizes:
open_sync 16k write 211.481/second
2 open_sync 8k writes 106.202/second

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
8k write, fsync, close 207.499/second
8k write, close, fsync 213.656/second

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Smith <greg(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Revert default wal_sync_method to fdatasync on Linux 2.6.33+
Date: 2011-01-15 16:58:46
Message-ID: 201101151658.p0FGwk218391@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
> On 12/6/10 6:13 PM, Tom Lane wrote:
> > Josh Berkus <josh(at)agliodbs(dot)com> writes:
> >> OK, patch coming then. Right now test_fsync aborts when O_DIRECT fails.
> >> What should I have it do instead?
> >
> > Report that it fails, and keep testing the other methods.
>
> Patch attached. Includes a fair amount of comment cleanup, since
> existing comments did not meet our current project standards. Tests all
> 6 of the methods we support separately.
>
> Some questions, though:
>
> (1) Why are we doing the open_sync different-size write test? AFAIK,
> this doesn't match any behavior which PostgreSQL has.

I added program output to explain this.

> (2) In this patch, I'm stepping down the number of loops which
> fsync_writethrough does by 90%. The reason for that was that on the
> platforms where I tested writethrough (desktop machines), doing 10,000
> loops took 15-20 *minutes*, which seems hard on the user. Would be easy
> to revert if you think it's a bad idea.
> Possibly auto-sizing the number of loops based on the first fsync test
> might be a good idea, but seems like going a bit too far.

I did not know why writethough we always be much slower than other sync
methods so I just reduced the loop count to 2k.

> (3) Should the multi-descriptor test be using writethrough on platforms
> which support it?

Thank you for your patch. I have applied most of it, attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

Attachment Content-Type Size
/rtmp/fsync text/x-diff 17.1 KB