Re: win32 performance - fsync question

Lists: pgsql-hackers
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 10:01:56
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE4768DB@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Hi,
>
> looking for the way how to increase performance at Windows XP
> box, I found the parameters
>
> #fsync = true # turns forced
> synchronization on or off
> #wal_sync_method = fsync # the default varies across platforms:
> # fsync, fdatasync,
> open_sync, or open_datasync
>
> I have no idea how it works with win32. May I try fsync =
> false, or it is dangerous? Which of wal_sync_method may I try
> at WinXP?

You can try it, but it is dangerous.
fsync is the correct wal_sync_method.

For some reason the syncing is quite a lot slower on win32. One reason
might be that it does flush metadata about the file as well, which I
beleive at least Linux doesn't.

If it wasn't clear already, if you're running antivirus, try
uninstalling it. Note that you may need to uninstall it to get all
performance back, just disabling is often *not* enough as the kernel
driver is still loaded.

Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
2) Disable the last access time (like noatime on linux). "fsutil
behavior set disablelastaccess 1"
3) Disable 8.3 filenames "fsutil behavior set disable8dot3 1"

2 and 3 may require a reboot.

(2 and 3 can be done on earlier windows through registry settings only,
in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem)

//Magnus


From: lsunley(at)mb(dot)sympatico(dot)ca
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 14:24:59
Message-ID: 0IC20071MB20PW@l-daemon
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In <4214B68C(dot)8000901(at)dunslane(dot)net>, on 02/17/05
at 10:21 AM, Andrew Dunstan <andrew(at)dunslane(dot)net> said:

>E.Rodichev wrote:

>>
>> This problem is addressed by file system (fsck, journalling etc.).
>> Is it reasonable to handle it directly within application?
>>
>>

>In the words of the Duke of Wellington, "If you believe that you'll
>believe anything."

>Please review past discussions on the mailing lists on this point.

>BTW, most journalling file systems do not guarantee file integrity, only
>file metadata integrity. In particular, I believe this is tru of NTFS
>(and whether it even does that has been debated).

>So by all means turn off fsync if you want the performance gain *and*
>you accept the risk. But if you do, don't come crying later that your
>data has been lost or corrupted.

>(the results are interesting, though - with fsync off Windows and Linux
>are in the same performance ballpark.)

>cheers

>andrew

In anything I've done, Windows is very slow when you use fsync or the
Windows API equivalent.

If you need the performance, you had better have the machine hooked up to
a UPS (probably a good idea in any case) and set up something that is
triggered by the UPS running down to signal postgreSQL to do an immediate
shutdown.

--
-----------------------------------------------------------
lsunley(at)mb(dot)sympatico(dot)ca
-----------------------------------------------------------


From: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 14:29:36
Message-ID: Pine.GSO.4.62.0502171635200.14407@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005, Magnus Hagander wrote:

>> Hi,
>>
>> looking for the way how to increase performance at Windows XP
>> box, I found the parameters
>>
>> #fsync = true # turns forced
>> synchronization on or off
>> #wal_sync_method = fsync # the default varies across platforms:
>> # fsync, fdatasync,
>> open_sync, or open_datasync
>>
>> I have no idea how it works with win32. May I try fsync =
>> false, or it is dangerous? Which of wal_sync_method may I try
>> at WinXP?
>
> You can try it, but it is dangerous.
> fsync is the correct wal_sync_method.
>
> For some reason the syncing is quite a lot slower on win32. One reason
> might be that it does flush metadata about the file as well, which I
> beleive at least Linux doesn't.
>
> If it wasn't clear already, if you're running antivirus, try
> uninstalling it. Note that you may need to uninstall it to get all
> performance back, just disabling is often *not* enough as the kernel
> driver is still loaded.

No, I have not any resident disk-related staff.

>
> Things worth experimenting with (these are all untested, so please
> report any successes):
> 1) Try reformatting with a cluster size of 8Kb (the pg page size), if
> you can.
> 2) Disable the last access time (like noatime on linux). "fsutil
> behavior set disablelastaccess 1"
> 3) Disable 8.3 filenames "fsutil behavior set disable8dot3 1"
>
> 2 and 3 may require a reboot.
>
> (2 and 3 can be done on earlier windows through registry settings only,
> in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem)

I've repeated the test under 2 and 3 - no noticeable difference. With
disablelastaccess I got about 10% - 15% better results, but it is not
too significant.

Finally I tried

fsync = false

and got 580-620 tps. So, the short summary:

WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux 800 tps

The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
required?

E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 14:32:40
Message-ID: 4214AB08.2020108@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The general question is - does PostgreSQL really need fsync? I suppose it
> is a question for design, not platform-specific one. It sounds like only
> one scenario, when fsync is useful, is to interprocess communication via
> open file. But PostgreSQL utilize IPC for this, so does fsync is really
> required?

NO!

Fsync is so that when your computer loses power without warning, you
will have no data loss.

If you turn it off, you run the risk of losing data if you lose power.

Chris


From: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 14:54:38
Message-ID: Pine.GSO.4.62.0502171750470.14407@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:

>> The general question is - does PostgreSQL really need fsync? I suppose it
>> is a question for design, not platform-specific one. It sounds like only
>> one scenario, when fsync is useful, is to interprocess communication via
>> open file. But PostgreSQL utilize IPC for this, so does fsync is really
>> required?
>
> NO!
>
> Fsync is so that when your computer loses power without warning, you will
> have no data loss.
>
> If you turn it off, you run the risk of losing data if you lose power.
>
> Chris

This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
Cc: chriskl(at)familyhealth(dot)com(dot)au, mha(at)sollentuna(dot)net, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 15:09:35
Message-ID: 20050217100935.71c71e51.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005 17:54:38 +0300 (MSK)
"E.Rodichev" <er(at)sai(dot)msu(dot)su> wrote:
> On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
>
> >> The general question is - does PostgreSQL really need fsync? I
> >suppose it> is a question for design, not platform-specific one. It
> >sounds like only> one scenario, when fsync is useful, is to
> >interprocess communication via> open file. But PostgreSQL utilize IPC
> >for this, so does fsync is really> required?
> >
> > NO!
> >
> > Fsync is so that when your computer loses power without warning, you
> > will have no data loss.
> >
> > If you turn it off, you run the risk of losing data if you lose
> > power.
> >
> > Chris
>
> This problem is addressed by file system (fsck, journalling etc.).
> Is it reasonable to handle it directly within application?

NO again!

Fsck only fixes up file system pointers after a crash. If the data did
not make it to the disk, no amount of fscking will put it there.

I'm not positive but I think that journalled file systems also need
fsync to guarantee that the information gets journalled but in any case,
journalling only helps if you have a journalled file system. Not
everyone does.

This is not to say that fsync is always required, just that it solves a
different problem than all those other tools.

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.


From: Doug McNaught <doug(at)mcnaught(dot)org>
To: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 15:18:22
Message-ID: 87psyzuse9.fsf@asmodeus.mcnaught.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"E.Rodichev" <er(at)sai(dot)msu(dot)su> writes:

> On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
>
>> Fsync is so that when your computer loses power without warning, you
>> will have no data loss.
>>
>> If you turn it off, you run the risk of losing data if you lose power.
>>
>> Chris
>
> This problem is addressed by file system (fsck, journalling etc.).
> Is it reasonable to handle it directly within application?

No, it's not addressed by the file system. fsync() tells the OS to
make sure the data is on disk. Without that, the OS is free to just
keep the WAL data in memory cache, and a power failure could cause
data from committed transactions to be lost (we don't report commit
success until fsync() tells us the file data is on disk).

-Doug


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: "E(dot)Rodichev" <er(at)sai(dot)msu(dot)su>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 15:21:48
Message-ID: 4214B68C.8000901@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

E.Rodichev wrote:

>
> This problem is addressed by file system (fsck, journalling etc.).
> Is it reasonable to handle it directly within application?
>
>

In the words of the Duke of Wellington, "If you believe that you'll
believe anything."

Please review past discussions on the mailing lists on this point.

BTW, most journalling file systems do not guarantee file integrity, only
file metadata integrity. In particular, I believe this is tru of NTFS
(and whether it even does that has been debated).

So by all means turn off fsync if you want the performance gain *and*
you accept the risk. But if you do, don't come crying later that your
data has been lost or corrupted.

(the results are interesting, though - with fsync off Windows and Linux
are in the same performance ballpark.)

cheers

andrew


From: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 18:07:51
Message-ID: Pine.GSO.4.62.0502172105420.21310@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005, Andrew Dunstan wrote:

> (the results are interesting, though - with fsync off Windows and Linux are
> in the same performance ballpark.)

Some addition:

WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
To: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 18:36:31
Message-ID: 4214E42F.6030807@familyhealth.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Some addition:
>
> WinXP fsync = true 20-28 tps
> WinXP fsync = false 600 tps
> Linux fsync = true 800 tps
> Linux fsync = false 980 tps

Wow, that's terrible on Windows. If there's a solution, it'd be nice to
backport it...

Chris


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>
Cc: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 19:20:50
Message-ID: 9957.1108668050@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
>> WinXP fsync = true 20-28 tps
>> WinXP fsync = false 600 tps
>> Linux fsync = true 800 tps
>> Linux fsync = false 980 tps

> Wow, that's terrible on Windows. If there's a solution, it'd be nice to
> backport it...

Actually, the number that's way out of line there is the Linux w/fsync
one. I infer that he's got disk write cache enabled and therefore the
transactions aren't really being synced to disk at all.

Any claimed TPS rate exceeding your disk drive's rotation rate is a
red flag.

regards, tom lane


From: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 19:54:22
Message-ID: Pine.GSO.4.62.0502172242520.23498@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005, Tom Lane wrote:

> Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au> writes:
>>> WinXP fsync = true 20-28 tps
>>> WinXP fsync = false 600 tps
>>> Linux fsync = true 800 tps
>>> Linux fsync = false 980 tps
>
>> Wow, that's terrible on Windows. If there's a solution, it'd be nice to
>> backport it...
>
> Actually, the number that's way out of line there is the Linux w/fsync
> one. I infer that he's got disk write cache enabled and therefore the
> transactions aren't really being synced to disk at all.
>
> Any claimed TPS rate exceeding your disk drive's rotation rate is a
> red flag.

Write cache is enabled under Linux by default all the time I make deal
with it (since 1993).

It doesn't interfere with fsync(), as linux kernel uses cache flush for
fsync.

I have 2.6.10 kernel running *without* any additional patches, and without
any specific hdparm settings.

fsync() really works fine as I switch off my notebook everyday 2-3 times,
and never had any data loss :)

Related staff from dmesg is

hda: cache flushes supported

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 20:01:09
Message-ID: 10347.1108670469@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Evgeny Rodichev <er(at)sai(dot)msu(dot)su> writes:
>> Any claimed TPS rate exceeding your disk drive's rotation rate is a
>> red flag.

> Write cache is enabled under Linux by default all the time I make deal
> with it (since 1993).

You're playing with fire.

> fsync() really works fine as I switch off my notebook everyday 2-3 times,
> and never had any data loss :)

Given that it's a notebook, it's possible that the hardware is smart
enough not to power down the disk until the disk is done writing
everything it's cached. Do you care to try some experiments with
pulling out the battery while Postgres is busy making updates?

regards, tom lane


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 20:24:06
Message-ID: 4214FD66.2040307@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Evgeny Rodichev wrote:

> Write cache is enabled under Linux by default all the time I make deal
> with it (since 1993).
>
> It doesn't interfere with fsync(), as linux kernel uses cache flush for
> fsync.

The problem is that most IDE drives lie (or perhaps you could say the
specification is ambiguous) about completion of the cache-flush command
-- they say "Yeah, I've flushed" when they have not actually written the
data to the media and have no provision for making sure it will get
there in the event of power failure.

So Linux is indeed doing a cache flush on fsync, but the hardware is not
behaving as expected. By turning off the write-cache on the disk via
hdparm, you manage to get the hardware to behave better. The kernel is
caching anyway, so the loss of the drive's write cache doesn't make a
big difference.

There was some work done for better IDE write-barrier support (related
to TCQ/SATA support?) in the kernel, but I'm not sure how far that has
progressed.

-O


From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 22:14:24
Message-ID: 87is4qygu7.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Oliver Jowett <oliver(at)opencloud(dot)com> writes:

> So Linux is indeed doing a cache flush on fsync

Actually I think the root of the problem was precisely that Linux does not
issue any sort of cache flush commands to drives on fsync. There was some talk
on linux-kernel of what how they could take advantage of new ATA features
planned on new SATA drives coming out now to solve this. But they didn't seem
to think it was urgent or worth the performance hit of doing a complete cache
flush.

--
greg


From: Oliver Jowett <oliver(at)opencloud(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 23:41:54
Message-ID: 42152BC2.40400@opencloud.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> Oliver Jowett <oliver(at)opencloud(dot)com> writes:
>
>
>>So Linux is indeed doing a cache flush on fsync
>
>
> Actually I think the root of the problem was precisely that Linux does not
> issue any sort of cache flush commands to drives on fsync. There was some talk
> on linux-kernel of what how they could take advantage of new ATA features
> planned on new SATA drives coming out now to solve this. But they didn't seem
> to think it was urgent or worth the performance hit of doing a complete cache
> flush.

Oh, ok. I haven't really kept up to date with it; I just run with
write-cache disabled on my IDE drives as a matter of course.

I did see this:
http://www.ussg.iu.edu/hypermail/linux/kernel/0304.1/0471.html

which implies you're never going to get an implementation that is safe
across all IDE hardware :(

-O


From: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-17 23:56:02
Message-ID: Pine.GSO.4.62.0502180235250.347@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, 17 Feb 2005, Tom Lane wrote:

> Evgeny Rodichev <er(at)sai(dot)msu(dot)su> writes:
>>> Any claimed TPS rate exceeding your disk drive's rotation rate is a
>>> red flag.
>
>> Write cache is enabled under Linux by default all the time I make deal
>> with it (since 1993).
>
> You're playing with fire.

Yes. I'm lucky in this play :)

More seriously, we (with Oleg Bartunov) investigated many platforms/OS
for commercial, scientific and other applications during past 10-12
years. I suppose, virtually all excluding modern mainframes.

For reliability Linux + PostreSQL was found the best one (including the
environment with very frequent unexpected power-off, as at some astronomical
observatories at high mountains).

Hence, I'm lucky :)

>
>> fsync() really works fine as I switch off my notebook everyday 2-3 times,
>> and never had any data loss :)
>
> Given that it's a notebook, it's possible that the hardware is smart
> enough not to power down the disk until the disk is done writing
> everything it's cached. Do you care to try some experiments with
> pulling out the battery while Postgres is busy making updates?

Yes, you are exactly right. All modern HDDs (not entry level ones) has
a huge cache (at device, not at controller), and provide the safe hardware
flush of cache *after* power off (thanks capacitors). My HDD has 16MB cache,
and it is the reason for excellent performance.

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
To: Oliver Jowett <oliver(at)opencloud(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Christopher Kings-Lynne <chriskl(at)familyhealth(dot)com(dot)au>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-18 00:16:09
Message-ID: Pine.GSO.4.62.0502180258540.347@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 18 Feb 2005, Oliver Jowett wrote:

> Evgeny Rodichev wrote:
>
>> Write cache is enabled under Linux by default all the time I make deal
>> with it (since 1993).
>>
>> It doesn't interfere with fsync(), as linux kernel uses cache flush for
>> fsync.
>
> The problem is that most IDE drives lie (or perhaps you could say the
> specification is ambiguous) about completion of the cache-flush command --
> they say "Yeah, I've flushed" when they have not actually written the data to
> the media and have no provision for making sure it will get there in the
> event of power failure.

Yes, I agree. But in my real SA practice I've met 50-100 times the situation
when HDD were unexpectedly physically corrupted (the heads touch a surface),
without possibility to restore. And I never met any corruption because of
possible "hardware lie".

>
> So Linux is indeed doing a cache flush on fsync, but the hardware is not
> behaving as expected. By turning off the write-cache on the disk via hdparm,
> you manage to get the hardware to behave better. The kernel is caching
> anyway, so the loss of the drive's write cache doesn't make a big difference.

Again, in practice, it is different. FreeBSD had a "true" flush (at least
2-3 yeas ago, not sure about the modern versions), and for write-intensive
applications it was a bit slower (comparing with linux), but it never was
more reliable (since 1996, at least).

Another practical example is Google :) Isn't reliable?

>
> There was some work done for better IDE write-barrier support (related to
> TCQ/SATA support?) in the kernel, but I'm not sure how far that has
> progressed.

Yes, but IMHO it is not stable enough at the moment.

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-18 00:25:41
Message-ID: Pine.GSO.4.62.0502180319270.347@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, 17 Feb 2005, Greg Stark wrote:

>
> Oliver Jowett <oliver(at)opencloud(dot)com> writes:
>
>> So Linux is indeed doing a cache flush on fsync
>
> Actually I think the root of the problem was precisely that Linux does not
> issue any sort of cache flush commands to drives on fsync.

No, it does. Let's try the simplest test:

for (i = 0; i < LEN; i++) {
write (fd, buf, 512);
if (sync) fsync (fd);
}

with sync = 0 and 1, and you'll see the difference.

> There was some talk
> on linux-kernel of what how they could take advantage of new ATA features
> planned on new SATA drives coming out now to solve this. But they didn't seem
> to think it was urgent or worth the performance hit of doing a complete cache
> flush.

It was a bit different topic.

Regards,
E.R.
_________________________________________________________________________
Evgeny Rodichev Sternberg Astronomical Institute
email: er(at)sai(dot)msu(dot)su Moscow State University
Phone: 007 (095) 939 2383
Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Evgeny Rodichev <er(at)sai(dot)msu(dot)su>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: win32 performance - fsync question
Date: 2005-02-18 04:31:37
Message-ID: 87acq2xzdi.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Evgeny Rodichev <er(at)sai(dot)msu(dot)su> writes:

> No, it does. Let's try the simplest test:
>
> for (i = 0; i < LEN; i++) {
> write (fd, buf, 512);
> if (sync) fsync (fd);
> }
>
> with sync = 0 and 1, and you'll see the difference.

Uh, I'm sure you'll see a difference, one will be limited by the i/o
throughput the IDE interface is capable of, the other will be limited purely
by the memory bandwidth and kernel syscall latency.

Try it with sync=1 and write caching disabled on your IDE drive and you should
see an even larger difference.

However, no filesystem and ide driver combination in linux 2.4 and afaik none
in 2.6 either issue any special ATA commands to force the drive to

> > There was some talk on linux-kernel of what how they could take advantage
> > of new ATA features planned on new SATA drives coming out now to solve
> > this. But they didn't seem to think it was urgent or worth the performance
> > hit of doing a complete cache flush.
>
> It was a bit different topic.

Well no way to tell if we're talking about the same threads. But in the
discussion I saw it was clear they were talking about adding an interface to
drivers so for filesystems to issue cache flushes when necessary to guarantee
filesystem integrity. They still didn't seem to get that users cared about
their data too, not just filesystem integrity.

--
greg