Re: sync()

Lists: pgsql-hackers
From: Giles Lean <giles(at)nemeton(dot)com(dot)au>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-01-13 08:31:08
Message-ID: 4993.1042446668@nemeton.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Tom Lane writes:

> Right. "Portably" was the key word in my comment (sorry for not
> emphasizing this more clearly). The real problem here is how to know
> what is the actual behavior of each platform? I'm certainly not
> prepared to trust reading-between-the-lines-of-some-man-pages. And I
> can't think of a simple yet reliable direct test.

Is the "Single Unix Standard, version 2" (aka UNIX98) any better?
It says for fsync():

"The fsync() function forces all currently queued I/O operations
associated with the file indicated by file descriptor fildes to
the synchronised I/O completion state. All I/O operations are
completed as defined for synchronised I/O file integrity
completion."

This to me clearly says that changes to the file must be written,
not just changes made via this file descriptor.

I did have to test this behaviour once (for a customer, strange
situation) but I couldn't find a portable way to do it, either.

What I did was read the appropriate disk block from the raw device to
bypass the buffer cache. As this required low level knowledge of the
on-disk filesystem layout it was not very portable. For anyone
interested Tom Christiansen's "icat" program can be ported to UFS
derived filesystems fairly easily:

http://www.rosat.mpe-garching.mpg.de/mailing-lists/perl5-porters/1997-04/msg00487.html

> AFAIK, all Unix implementations are paranoid about consistency of
> filesystem metadata, including directory contents. So fsync'ing
> directories from a user process strikes me as a waste of time, ...

There is one variant where this is not the case: Linux using ext2fs
and possibly other filesystems.

There was a flame fest of great entertainment value a few years ago
between Linus Torvalds and Dan Bernstein. Of course, neither was able
to influence the opinion of the other to any noticible degree, but it
made fun reading. I think this might be a starting point:

http://www.ornl.gov/cts/archives/mailing-lists/qmail/1998/05/msg00667.html

A more recent posting from Linus where he continues to recommend
fsync() is this:

http://www.cs.helsinki.fi/linux/linux-kernel/2001-29/0659.html

I've not heard that any other Unix-like OS has abandoned the
traditional and POSIX semantic.

> assuming that it were portable, which I doubt. What we need to worry
> about is whether fsync'ing a bunch of our own data files is a practical
> substitute for a global sync() call.

I wish that it were. There are situations (serveral GB buffer caches,
for example) where I mistrust the current use of sync() to have all
writes completed before the sleep() returns. My concern is
theoretical at the moment -- I never get to play with machines that
large!

Regards,

Giles


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Giles Lean <giles(at)nemeton(dot)com(dot)au>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-01-31 21:11:47
Message-ID: 20030131211147.GA3192@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 13, 2003 at 07:31:08PM +1100, Giles Lean wrote:
>
> Is the "Single Unix Standard, version 2" (aka UNIX98) any better?
> It says for fsync():
>
> "The fsync() function forces all currently queued I/O operations
> associated with the file indicated by file descriptor fildes to
> the synchronised I/O completion state. All I/O operations are
> completed as defined for synchronised I/O file integrity
> completion."

In version 3 it says:

The fsync() function shall request that all data for the open file
descriptor named by fildes is to be transferred to the storage
device associated with the file described by fildes in an
implementation-defined manner. The fsync() function shall not
return until the system has completed that action or until an error
is detected.

[SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the
fsync() function shall force all currently queued I/O operations
associated with the file indicated by file descriptor fildes to the
synchronized I/O completion state. All I/O operations shall be
completed as defined for synchronized I/O file integrity
completion. [Option End]

Kurt


From: Kevin Brown <kevin(at)sysexperts(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-02-01 16:15:17
Message-ID: 20030201161517.GN12957@filer
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx wrote:
> [SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the
> fsync() function shall force all currently queued I/O operations
> associated with the file indicated by file descriptor fildes to the
> synchronized I/O completion state. All I/O operations shall be
> completed as defined for synchronized I/O file integrity
> completion. [Option End]

Hmmm....so if I consistently want these semantics out of fsync() I
have to #define _POSIX_SYNCHRONIZED_IO? Or does the above mean that
you'll get those semantics if and only if the OS defines the above for
you?

I certainly hope the former is the case, because the newer semantics
which you mentioned in the section I cut don't do us any good at all
and we can't rely on the OS to define something like
_POSIX_SYNCHRONIZED_IO for us...

Being able to open a file, do an fsync(), and have the kernel actually
write all the buffers associated with that file to disk could be, I
think, a significant performance win compared with the "flush
everything known to the kernel" approach we take now, at least on
systems that do something other than PostgreSQL...

--
Kevin Brown kevin(at)sysexperts(dot)com


From: Kurt Roeckx <Q(at)ping(dot)be>
To: Kevin Brown <kevin(at)sysexperts(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-02-01 16:56:21
Message-ID: 20030201165621.GA7198@ping.be
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Feb 01, 2003 at 08:15:17AM -0800, Kevin Brown wrote:
> Kurt Roeckx wrote:
> > [SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the
> > fsync() function shall force all currently queued I/O operations
> > associated with the file indicated by file descriptor fildes to the
> > synchronized I/O completion state. All I/O operations shall be
> > completed as defined for synchronized I/O file integrity
> > completion. [Option End]
>
> Hmmm....so if I consistently want these semantics out of fsync() I
> have to #define _POSIX_SYNCHRONIZED_IO? Or does the above mean that
> you'll get those semantics if and only if the OS defines the above for
> you?

It's something that will be defined in unistd.h. Depending on
the value you know if the system supports it always, you can turn
it on per application, or it's always on.

You know that this standard is freely available on internet?
(http://www.unix-systems.org/version3/online.html)

There are other comments in about the usage of it.

Note that there also is a function call fdatasync() in the
Synchronized IO extention.

Kurt


From: Kevin Brown <kevin(at)sysexperts(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: sync()
Date: 2003-02-01 17:20:36
Message-ID: 20030201172036.GR12957@filer
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Kurt Roeckx wrote:
> On Sat, Feb 01, 2003 at 08:15:17AM -0800, Kevin Brown wrote:
> > Kurt Roeckx wrote:
> > > [SIO] [Option Start] If _POSIX_SYNCHRONIZED_IO is defined, the
> > > fsync() function shall force all currently queued I/O operations
> > > associated with the file indicated by file descriptor fildes to the
> > > synchronized I/O completion state. All I/O operations shall be
> > > completed as defined for synchronized I/O file integrity
> > > completion. [Option End]
> >
> > Hmmm....so if I consistently want these semantics out of fsync() I
> > have to #define _POSIX_SYNCHRONIZED_IO? Or does the above mean that
> > you'll get those semantics if and only if the OS defines the above for
> > you?
>
> It's something that will be defined in unistd.h. Depending on
> the value you know if the system supports it always, you can turn
> it on per application, or it's always on.
>
> You know that this standard is freely available on internet?
> (http://www.unix-systems.org/version3/online.html)
>
> There are other comments in about the usage of it.
>
> Note that there also is a function call fdatasync() in the
> Synchronized IO extention.

Ah, excellent, thank you. Yes, fdatasync() is *exactly* what we need,
since it's defined thusly: "The functionality shall be equivalent to
fsync() with the symbol _POSIX_SYNCHRONIZED_IO defined, with the
exception that all I/O operations shall be completed as defined for
synchronized I/O data integrity completion".

Looks to me like we have a winner. Question is, can we bank on its
existence and, if so, is it properly implemented on all platforms that
support it?

Since we've been talking about porting to rather different platforms
(win32 in particular), it seems logical to build a PGFileSync()
function or something (perhaps a single PGSync() which synchronizes
all relevant PG files to disk, with sync() if necessary) and which
would thus use fdatasync() or its equivalent.

--
Kevin Brown kevin(at)sysexperts(dot)com