Re: initdb and fsync

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Jeff Davis <pgsql(at)j-davis(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>
Subject: Re: initdb and fsync
Date: 2012-06-18 16:05:29
Message-ID: 201206181805.29195.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wednesday, June 13, 2012 06:53:17 PM Jeff Davis wrote:
> On Wed, 2012-06-13 at 13:53 +0300, Peter Eisentraut wrote:
> > The --help output for the -N option was copy-and-pasted wrongly.
> >
> > The message issued when using -N is also a bit content-free. Maybe
> > something like
> >
> > "Running in nosync mode. The data directory might become corrupt if the
> > operating system crashes.\n"
>
> Thank you, fixed.
>
> > Which leads to the question, how does one get out of this state? Is
> > running sync(1) enough? Is starting the postgres server enough?
>
> sync(1) calls sync(2), and the man page says:
>
> "According to the standard specification (e.g., POSIX.1-2001), sync()
> schedules the writes, but may return before the actual writing is done.
> However, since version 1.3.20 Linux does actually wait. (This still
> does not guarantee data integrity: modern disks have large caches.)"
>
> So it looks like sync is enough if you are on linux *and* you have any
> unprotected write cache disabled.
Protection can include write barries, it doesn't need to be a BBU....

> I don't think starting the postgres server is enough.
Agreed.

> Before, I think we were safe because we could assume that the OS would
> flush the buffers before you had time to store any important data. But
> now, that window can be much larger.
I think to a large degree we didn't see any problem because of the old ext3
"sync the whole world" type of behaviour which was common for a very long
time.
So on ext3 (with data=ordered, the default) any fsync (checkpoints, commit,
...) would lead to all the other files being synced as well which means
starting the server once would have been enough on such a system...

Quick review:
- defaulting to initdb -N in the regression suite is not a good imo, because
that way the buildfarm won't catch problems in that area...
- could the copydir.c and initdb.c versions of walkdir/sync_fname et al be
unified?
- I personally would find it way nicer to put USE_PRE_SYNC into pre_sync_fname
instead of cluttering the main function with it

Looks good otherwise!

Thanks,

Andres
--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2012-06-18 16:24:17 Re: pgsql_fdw in contrib
Previous Message Peter Geoghegan 2012-06-18 16:04:39 Re: sortsupport for text