Re: [HACKERS] win32 performance - fsync question

Lists: pgsql-hackerspgsql-hackers-win32pgsql-patches
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "Michael Paesold" <mpaesold(at)gmx(dot)at>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-hackers-win32(at)postgresql(dot)org>, "Merlin Moncure" <merlin(dot)moncure(at)rcsonline(dot)com>
Subject: Re: [pgsql-hackers-win32] win32 performance - fsync question
Date: 2005-03-17 09:05:39
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE6C70BD@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches

> > > > * Win32, with fsync, write-cache disabled: no data corruption
> > > > * Win32, with fsync, write-cache enabled: no data corruption
> > > > * Win32, with osync, write cache disabled: no data corruption
> > > > * Win32, with osync, write cache enabled: no data
> corruption. Once
> > > > I
> > > > got:
> > > > 2005-02-24 12:19:54 LOG: could not open file "C:/Program
> > > > Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010"
> > > (log file
> > > > 0, segment 16): No such file or directory
> > > > but the data in the database was consistent.
> > >
> > > It disturbs me that you couldn't produce data corruption in the
> > > cases where it theoretically should occur. Seems like this is an
> > > indication that your test was insufficiently severe, or
> that there
> > > is something going on we don't understand.
> >
> > The Windows driver knows abotu the write cache, and at
> least fsync()
> > pushes through the write cache even if it's there. This seems to
> > indicate taht O_SYNC at least partiallyi does this as well. This is
> > why there is no performance difference at all on fsync() with write
> > cache on or off.
> >
> > I don't know if this is true for all IDE disks. COuld be
> that my disk
> > is particularly well-behaved.
>
> This indicated to me that open_sync did not require any
> additional changes than our current fsync.

fsync and open_sync both write through the write cache in the operating
system. Only fsync=off turns this off.

fsync also writes through the hardware write cache. o_sync does not.
This is what causes the large slowdown with write cache enabled,
*including* most battery backed write cache systems (pretty much making
the write-cache a waste of money). This may be a good thing on IDE
systems (for admins that don't know how to remove the little check in
the box for "enable write caching on the disk" that MS provides, which
*explicitly* warns that you may lose data if you enabled it), but it's a
very bad thing for anything higher end.

fsync also syncs the directory metadata. o_sync only cares about the
files contents. (This is what causes the large slowdown with write cache
*disabled*, becuase it requires multiple writes on multiple disk
locations for each fsync).

Basically, fsync hurts people who configure their box correctly, or who
use things like SCSI disks. o_sync hurts people who configure their
machine in an unsafe way.

//Magnus


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-win32(at)postgresql(dot)org, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Subject: Re: [pgsql-hackers-win32] win32 performance - fsync question
Date: 2005-03-17 18:35:14
Message-ID: 200503171835.j2HIZEP05129@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches

Magnus Hagander wrote:
> > This indicated to me that open_sync did not require any
> > additional changes than our current fsync.
>
> fsync and open_sync both write through the write cache in the operating
> system. Only fsync=off turns this off.
>
> fsync also writes through the hardware write cache. o_sync does not.
> This is what causes the large slowdown with write cache enabled,
> *including* most battery backed write cache systems (pretty much making
> the write-cache a waste of money). This may be a good thing on IDE
> systems (for admins that don't know how to remove the little check in
> the box for "enable write caching on the disk" that MS provides, which
> *explicitly* warns that you may lose data if you enabled it), but it's a
> very bad thing for anything higher end.

I found the checkbox on XP looking at "Properties" for the drive, then
choosing "Hardware", the drive, "Properties", and "Policies".

> fsync also syncs the directory metadata. o_sync only cares about the
> files contents. (This is what causes the large slowdown with write cache
> *disabled*, because it requires multiple writes on multiple disk
> locations for each fsync).
>
> Basically, fsync hurts people who configure their box correctly, or who
> use things like SCSI disks. o_sync hurts people who configure their
> machine in an unsafe way.

So, it seems that Win32 open_sync is exactly the same as our
"wal_sync_method = open_datasync" on Unix (it needs to be renamed), and
"wal_sync_method = fsync" on Win32 is something we don't have that
writes through the disk write cache even if it is enabled.

I have developed the following patch which renames our wal_sync_method
Win32 support from open_sync to open_datasync:

ftp://candle.pha.pa.us/pub/postgresql/mypatches

One issue with this patch is that if applied it would make open_datasync
the default sync method on Win32 because we prefer open_datasync over
all other sync methods. If we don't want to do that, I think we should
still do the rename for accuracy and add a !WIN32 test to prevent
open_datasync from being the default.

However, I do prefer this patch and let Win32 have the same write cache
issues as Unix, for consistency.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, Michael Paesold <mpaesold(at)gmx(dot)at>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-win32(at)postgresql(dot)org, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Subject: Re: [pgsql-hackers-win32] win32 performance - fsync question
Date: 2005-03-17 18:41:00
Message-ID: 26109.1111084860@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> However, I do prefer this patch and let Win32 have the same write cache
> issues as Unix, for consistency.

I agree that the open flag is more nearly O_DSYNC than O_SYNC.

ISTM Windows' idea of fsync is quite different from Unix's and therefore
we should name the wal_sync_method that invokes it something different
than fsync. "write_through" or some such? We already have precedent
that not all wal_sync_method values are available on all platforms.

I'm not taking a position on which the default should be ...

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, Michael Paesold <mpaesold(at)gmx(dot)at>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-win32(at)postgresql(dot)org, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Subject: Re: [pgsql-hackers-win32] win32 performance - fsync question
Date: 2005-03-17 18:53:11
Message-ID: 200503171853.j2HIrBk08017@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches

Tom Lane wrote:
> Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> > However, I do prefer this patch and let Win32 have the same write cache
> > issues as Unix, for consistency.
>
> I agree that the open flag is more nearly O_DSYNC than O_SYNC.
>
> ISTM Windows' idea of fsync is quite different from Unix's and therefore
> we should name the wal_sync_method that invokes it something different
> than fsync. "write_through" or some such? We already have precedent
> that not all wal_sync_method values are available on all platforms.
>
> I'm not taking a position on which the default should be ...

Yes, I am thinking that too. I hesistated because it adds yet another
sync method, and we have to document it works only on Win32, but I see
no better solution.

I am going to let the Win32 users mostly vote on what the default should
be.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, Michael Paesold <mpaesold(at)gmx(dot)at>, pgsql-hackers(at)postgresql(dot)org, pgsql-hackers-win32(at)postgresql(dot)org, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>
Subject: Re: [pgsql-hackers-win32] win32 performance - fsync question
Date: 2005-03-17 19:02:29
Message-ID: 26335.1111086149@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches

Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> writes:
> Tom Lane wrote:
>> we should name the wal_sync_method that invokes it something different
>> than fsync. "write_through" or some such? We already have precedent
>> that not all wal_sync_method values are available on all platforms.

> Yes, I am thinking that too. I hesistated because it adds yet another
> sync method, and we have to document it works only on Win32, but I see
> no better solution.

It occurs to me that it'd probably be a good idea if the error message
for an unsupported wal_sync_method value explicitly listed the allowed
values for the platform. If there's no objection I'll try to make
that happen. (I'm not sure if it's trivial or not: I think the GUC
framework is a bit restrictive about custom error messages from GUC
assign hooks...)

regards, tom lane


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Michael Paesold <mpaesold(at)gmx(dot)at>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers-win32(at)postgresql(dot)org, Merlin Moncure <merlin(dot)moncure(at)rcsonline(dot)com>, PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] win32 performance - fsync question
Date: 2005-03-24 04:31:15
Message-ID: 200503240431.j2O4VFG07349@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32 pgsql-patches


I have applied the following patch to CVS HEAD and 8.0.X that changes
the Win32 O_SYNC flag to O_DATASYNC, because this the actual behavior of
the flag. This is now the default wal fsync method on Win32 because we
perfer O_DATASYNC to fsync().

And second, it changes Win32 fsync to a new wal sync method called
fsync_writethrough which is the old Win32 fsync behavior, which uses
_commit().

---------------------------------------------------------------------------

Magnus Hagander wrote:
> > > > > * Win32, with fsync, write-cache disabled: no data corruption
> > > > > * Win32, with fsync, write-cache enabled: no data corruption
> > > > > * Win32, with osync, write cache disabled: no data corruption
> > > > > * Win32, with osync, write cache enabled: no data
> > corruption. Once
> > > > > I
> > > > > got:
> > > > > 2005-02-24 12:19:54 LOG: could not open file "C:/Program
> > > > > Files/PostgreSQL/8.0/data/pg_xlog/000000010000000000000010"
> > > > (log file
> > > > > 0, segment 16): No such file or directory
> > > > > but the data in the database was consistent.
> > > >
> > > > It disturbs me that you couldn't produce data corruption in the
> > > > cases where it theoretically should occur. Seems like this is an
> > > > indication that your test was insufficiently severe, or
> > that there
> > > > is something going on we don't understand.
> > >
> > > The Windows driver knows abotu the write cache, and at
> > least fsync()
> > > pushes through the write cache even if it's there. This seems to
> > > indicate taht O_SYNC at least partiallyi does this as well. This is
> > > why there is no performance difference at all on fsync() with write
> > > cache on or off.
> > >
> > > I don't know if this is true for all IDE disks. COuld be
> > that my disk
> > > is particularly well-behaved.
> >
> > This indicated to me that open_sync did not require any
> > additional changes than our current fsync.
>
> fsync and open_sync both write through the write cache in the operating
> system. Only fsync=off turns this off.
>
> fsync also writes through the hardware write cache. o_sync does not.
> This is what causes the large slowdown with write cache enabled,
> *including* most battery backed write cache systems (pretty much making
> the write-cache a waste of money). This may be a good thing on IDE
> systems (for admins that don't know how to remove the little check in
> the box for "enable write caching on the disk" that MS provides, which
> *explicitly* warns that you may lose data if you enabled it), but it's a
> very bad thing for anything higher end.
>
> fsync also syncs the directory metadata. o_sync only cares about the
> files contents. (This is what causes the large slowdown with write cache
> *disabled*, becuase it requires multiple writes on multiple disk
> locations for each fsync).
>
>
> Basically, fsync hurts people who configure their box correctly, or who
> use things like SCSI disks. o_sync hurts people who configure their
> machine in an unsafe way.
>
> //Magnus
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073

Attachment Content-Type Size
unknown_filename text/plain 6.1 KB