Re: initdb -S and tablespaces

Lists: pgsql-hackers
From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: initdb -S and tablespaces
Date: 2014-09-29 08:39:01
Message-ID: 20140929083901.GA30946@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

I just noticed that initdb -S ("Safely write all database files to disk
and exit") does (only) the following in perform_fsync:

pre_sync_fname(pdir, true);
walkdir(pg_data, pre_sync_fname);

fsync_fname(pdir, true);
walkdir(pg_data, fsync_fname);

walkdir() reads the directory and calls itself recursively for S_ISDIR
entries, or calls the function for S_ISREG entries… which means it
doesn't follow links.

Which means it doesn't fsync the contents of tablespaces.

-- Abhijit


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-09-29 09:54:10
Message-ID: 20140929095410.GC4716@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-29 14:09:01 +0530, Abhijit Menon-Sen wrote:
> Hi.
>
> I just noticed that initdb -S ("Safely write all database files to disk
> and exit") does (only) the following in perform_fsync:
>
> pre_sync_fname(pdir, true);
> walkdir(pg_data, pre_sync_fname);
>
> fsync_fname(pdir, true);
> walkdir(pg_data, fsync_fname);
>
> walkdir() reads the directory and calls itself recursively for S_ISDIR
> entries, or calls the function for S_ISREG entries… which means it
> doesn't follow links.
>
> Which means it doesn't fsync the contents of tablespaces.

Which means at least pg_upgrade is unsafe right
now... c.f. 630cd14426dc1daf85163ad417f3a224eb4ac7b0.

Note that the perform_fsync() *was* ok for its original purpose in
initdb. At the end of initdb there's no relevant tablespaces. But if
used *after* pg_upgrade, that's not necessarily the case.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-09-29 10:13:32
Message-ID: 20140929101332.GA32005@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-09-29 11:54:10 +0200, andres(at)2ndquadrant(dot)com wrote:
>
> Note that the perform_fsync() *was* ok for its original purpose in
> initdb. At the end of initdb there's no relevant tablespaces. But if
> used *after* pg_upgrade, that's not necessarily the case.

Right.

So, since I'm writing a function to fsync everything inside PGDATA
anyway, it makes sense to call it both from initdb and StartupXLOG.
It'll do what initdb -S now does, plus follow links in pg_tblspc.

Any suggestions about where to put such a function? (I was looking at
backend/utils/init, but I'm not sure that's a good place for this.)

-- Abhijit


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-09-29 10:59:09
Message-ID: 20140929105909.GE4716@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-09-29 15:43:32 +0530, Abhijit Menon-Sen wrote:
> At 2014-09-29 11:54:10 +0200, andres(at)2ndquadrant(dot)com wrote:
> >
> > Note that the perform_fsync() *was* ok for its original purpose in
> > initdb. At the end of initdb there's no relevant tablespaces. But if
> > used *after* pg_upgrade, that's not necessarily the case.
>
> Right.
>
> So, since I'm writing a function to fsync everything inside PGDATA
> anyway, it makes sense to call it both from initdb and StartupXLOG.
> It'll do what initdb -S now does, plus follow links in pg_tblspc.
>
> Any suggestions about where to put such a function? (I was looking at
> backend/utils/init, but I'm not sure that's a good place for this.)

That can't work unfortunately. Both frontend and backend code need to
execute it... I'm not sure it's realistic to handle both cases the
same. The error handling, opening files/directories, and all will be
different. It'll also make backpatching hard :(.

So I'm afraid at least in a first patch it'll need to be a bit of
duplication. Fixing initdb's code back to 9.3 and the backend all the
way back to 9.0.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-09-29 11:02:32
Message-ID: 20140929110232.GA32705@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-09-29 12:59:09 +0200, andres(at)2ndquadrant(dot)com wrote:
>
> So I'm afraid at least in a first patch it'll need to be a bit of
> duplication. Fixing initdb's code back to 9.3 and the backend all
> the way back to 9.0.

OK, thanks, I'll submit two separate patches then.

-- Abhijit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-10-30 09:00:28
Message-ID: 20141030090027.GA1214@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-09-29 11:54:10 +0200, andres(at)2ndquadrant(dot)com wrote:
>
> On 2014-09-29 14:09:01 +0530, Abhijit Menon-Sen wrote:
> >
> > I just noticed that initdb -S ("Safely write all database files to disk
> > and exit") does (only) the following in perform_fsync:
> >
> > pre_sync_fname(pdir, true);
> > walkdir(pg_data, pre_sync_fname);
> >
> > fsync_fname(pdir, true);
> > walkdir(pg_data, fsync_fname);
> >
> > walkdir() reads the directory and calls itself recursively for S_ISDIR
> > entries, or calls the function for S_ISREG entries… which means it
> > doesn't follow links.
> >
> > Which means it doesn't fsync the contents of tablespaces.
>
> Which means at least pg_upgrade is unsafe right
> now... c.f. 630cd14426dc1daf85163ad417f3a224eb4ac7b0.

Here's a proposed patch to initdb to make initdb -S fsync everything
under pg_tblspc. It introduces a new function that calls walkdir on
every entry under pg_tblspc. This is only one approach: I could have
also changed walkdir to follow links, but that would have required a
bunch of #ifdefs for Windows (because it doesn't have symlinks), and I
guessed a separate function with two calls might be easier to patch into
back branches. I've tested this patch under various conditions on Linux,
but it could use some testing on Windows.

-- Abhijit

Attachment Content-Type Size
tblspclinks.diff text/x-diff 2.6 KB

From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-11-06 12:26:53
Message-ID: 20141106122653.GA18963@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2014-10-30 14:30:27 +0530, ams(at)2ndQuadrant(dot)com wrote:
>
> Here's a proposed patch to initdb to make initdb -S fsync everything
> under pg_tblspc.

Oops, I meant to include the corresponding patch to xlog.c to do the
same at startup. It's based on the initdb patch, but modified to not
use fprintf/exit_nicely and so on. (Note that this was written in a
single chunk to aid backpatching. There's no attempt made to share
code in this set of patches.)

Now attached.

-- Abhijit

Attachment Content-Type Size
0001-If-we-need-to-perform-crash-recovery-fsync-PGDATA-re.patch text/x-diff 6.2 KB

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2014-11-15 00:37:33
Message-ID: 20141115003733.GA27042@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2014-10-30 14:30:28 +0530, Abhijit Menon-Sen wrote:
> Here's a proposed patch to initdb to make initdb -S fsync everything
> under pg_tblspc. It introduces a new function that calls walkdir on
> every entry under pg_tblspc. This is only one approach: I could have
> also changed walkdir to follow links, but that would have required a
> bunch of #ifdefs for Windows (because it doesn't have symlinks), and I
> guessed a separate function with two calls might be easier to patch into
> back branches. I've tested this patch under various conditions on Linux,
> but it could use some testing on Windows.

I've pushed this. The windows buildfarm animals that run pg_upgrade (and
thus --sync-only) will have to tell us whether there's a problem. I sure
hope there's none...

Thanks for that patch!

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-01-14 10:59:08
Message-ID: 20150114105908.GK5245@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On 2014-11-06 17:56:53 +0530, Abhijit Menon-Sen wrote:
> + /*
> + * If we need to perform crash recovery, we issue an fsync on the
> + * data directory and its contents to try to ensure that any data
> + * written before the crash are flushed to disk. Otherwise a power
> + * failure in the near future might cause earlier unflushed writes
> + * to be lost, even though more recent data written to disk from
> + * here on would be persisted.
> + */
> +
> + if (ControlFile->state != DB_SHUTDOWNED &&
> + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> + perform_fsync(data_directory);
> +

a) Please think of a slightly more descriptive name than perform_fsync
b) I think we should check here whether fsync is enabled.
c) I'm wondering if we should add fsync to the control file and also
perform an fsync if the last shutdown was clear, but fsync was
disabled.

> if (ControlFile->state == DB_SHUTDOWNED)
> {
> /* This is the expected case, so don't be chatty in standalone mode */
> @@ -11262,3 +11281,168 @@ SetWalWriterSleeping(bool sleeping)
> XLogCtl->WalWriterSleeping = sleeping;
> SpinLockRelease(&XLogCtl->info_lck);
> }
> +
> +/*
> + * Hint to the OS that it should get ready to fsync() this file.
> + *
> + * Adapted from pre_sync_fname in initdb.c
> + */
> +static void
> +pre_sync_fname(char *fname, bool isdir)
> +{

this essentially already exists in the backend inparts. C.f. pg_flush_data.

> +/*
> + * walkdir: recursively walk a directory, applying the action to each
> + * regular file and directory (including the named directory itself).
> + *
> + * Adapted from copydir() in copydir.c.
> + */
> +static void
> +walkdir(char *path, void (*action) (char *fname, bool isdir))
> +{
> + DIR *dir;
> + struct dirent *de;
> +
> + dir = AllocateDir(path);
> + while ((de = ReadDir(dir, path)) != NULL)
> + {
> + char subpath[MAXPGPATH];
> + struct stat fst;
> +
> + CHECK_FOR_INTERRUPTS();
> +
> + if (strcmp(de->d_name, ".") == 0 ||
> + strcmp(de->d_name, "..") == 0)
> + continue;
> +
> + snprintf(subpath, MAXPGPATH, "%s/%s", path, de->d_name);
> +
> + if (lstat(subpath, &fst) < 0)
> + ereport(ERROR,
> + (errcode_for_file_access(),
> + errmsg("could not stat file \"%s\": %m", subpath)));
> +
> + if (S_ISDIR(fst.st_mode))
> + walkdir(subpath, action);
> + else if (S_ISREG(fst.st_mode))
> + (*action) (subpath, false);

Theoretically you should also invoke fsync on directories.

> +/*
> + * Issue fsync recursively on PGDATA and all its contents, including the
> + * links under pg_tblspc.
> + *
> + * Adapted from perform_fsync in initdb.c
> + */
> +static void
> +perform_fsync(char *pg_data)
> +{
> + char pdir[MAXPGPATH];
> + char pg_tblspc[MAXPGPATH];
> +
> + /*
> + * We need to name the parent of PGDATA. get_parent_directory() isn't
> + * enough here, because it can result in an empty string.
> + */
> + snprintf(pdir, MAXPGPATH, "%s/..", pg_data);
> + canonicalize_path(pdir);

Hm. Why is this neded? Just syncing . should work?

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-01-15 05:32:43
Message-ID: 20150115053243.GA29279@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-01-14 11:59:08 +0100, andres(at)2ndquadrant(dot)com wrote:
>
> > + if (ControlFile->state != DB_SHUTDOWNED &&
> > + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > + perform_fsync(data_directory);
> > +
>
> a) Please think of a slightly more descriptive name than perform_fsync

OK. (I just copied the name initdb uses, because at the time I was still
thinking in terms of a later patch moving this to src/common.) What do
you think of fsync_recursively? fsync_pgdata?

I think fsync_recursively(data_directory) reads well.

> b) I think we should check here whether fsync is enabled.

OK, will do.

> c) I'm wondering if we should add fsync to the control file and also
> perform an fsync if the last shutdown was clear, but fsync was
> disabled.

Explain? "Add fsync to the control file" means store the value of the
fsync GUC setting in the control file? And would the fsync you mention
be dependent on the setting, or unconditional?

> > +static void
> > +pre_sync_fname(char *fname, bool isdir)
> > +{
>
> this essentially already exists in the backend inparts. C.f.
> pg_flush_data.

OK, I missed that. I'll rework the patch to use it.

> Theoretically you should also invoke fsync on directories.

OK.

> > + * We need to name the parent of PGDATA. get_parent_directory() isn't
> > + * enough here, because it can result in an empty string.
> > + */
> > + snprintf(pdir, MAXPGPATH, "%s/..", pg_data);
> > + canonicalize_path(pdir);
>
> Hm. Why is this neded? Just syncing . should work?

Not sure, will investigate.

Thanks.

-- Abhijit


From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-01-15 13:32:45
Message-ID: 20150115133245.GG5245@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-01-15 11:02:43 +0530, Abhijit Menon-Sen wrote:
> At 2015-01-14 11:59:08 +0100, andres(at)2ndquadrant(dot)com wrote:
> >
> > > + if (ControlFile->state != DB_SHUTDOWNED &&
> > > + ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
> > > + perform_fsync(data_directory);
> > > +
> >
> > a) Please think of a slightly more descriptive name than perform_fsync
>
> OK. (I just copied the name initdb uses, because at the time I was still
> thinking in terms of a later patch moving this to src/common.) What do
> you think of fsync_recursively? fsync_pgdata?

I like fsync_pgdata/datadir or something.

Note that I think you'll have to check/handle pg_xlog being a symlink -
we explicitly support that as a usecase...

> > c) I'm wondering if we should add fsync to the control file and also
> > perform an fsync if the last shutdown was clear, but fsync was
> > disabled.
>
> Explain? "Add fsync to the control file" means store the value of the
> fsync GUC setting in the control file?

Yes.

> And would the fsync you mention be dependent on the setting, or unconditional?

What I am thinking of is that, currently, if you start the server for
initial loading with fsync=off, and then restart it, you're open to data
loss. So when the current config file setting is changed from off to on,
we should fsync the data directory. Even if there was no crash restart.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-03-10 07:49:48
Message-ID: 20150310074948.GA22050@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-01-15 14:32:45 +0100, andres(at)2ndquadrant(dot)com wrote:
>
> What I am thinking of is that, currently, if you start the server for
> initial loading with fsync=off, and then restart it, you're open to
> data loss. So when the current config file setting is changed from off
> to on, we should fsync the data directory. Even if there was no crash
> restart.

Patch attached.

Changes:

1. Renamed perform_fsync to fsync_recursively (otherwise it would read
"fsync_pgdata(pg_data)")
2. Added ControlData->fsync_disabled to record whether fsync was ever
disabled while the server was running (tested in CreateCheckPoint)
3. Run fsync_recursively at startup only if fsync is enabled
4. Run it if we're doing crash recovery, or fsync was disabled
5. Use pg_flush_data in pre_sync_fname
6. Issue fsync on directories too
7. Tested that it works if pg_xlog is a symlink (no changes).

(In short, everything you mentioned in your earlier mail.)

Note that I set ControlData->fsync_disabled to false in BootstrapXLOG,
but it gets set to true during a later CreateCheckPoint(). This means
we run fsync again at startup after initdb. I'm not sure what to do
about that.

Is this about what you had in mind?

-- Abhijit

Attachment Content-Type Size
0001-Recursively-fsync-PGDATA-at-startup-if-needed.patch text/x-diff 8.2 KB

From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-03 16:32:32
Message-ID: 20150403163232.GA28444@eldon.alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen wrote:
> At 2015-01-15 14:32:45 +0100, andres(at)2ndquadrant(dot)com wrote:

> Patch attached.
>
> Changes:
>
> 1. Renamed perform_fsync to fsync_recursively (otherwise it would read
> "fsync_pgdata(pg_data)")

Okay, but as far as I can tell this function is very specific to
PGDATA; you couldn't use it in any other directory (or pg_tblspc would
be missing and that would cause an error, right?) Therefore I think it
would make sense to have the name reflect this; maybe
fsync_datadir_recursively(data_directory)
or
fsync_pgdata_recursively(data_directory)
would work? But then, since the name is already telling us that it's
all about PGDATA, maybe we can remove the "recursively" part of the
name? Not sure about any of this; other opinions?

I also noticed that walkdir() thinks it is completely agnostic on what
the passed action is; except that the comment at the bottom talks about
how fsync on directories is important, which seems out of place.

I wonder about walktblspc_links() too. Seems to me that that function
is pretty much the same as walkdir(); would it work to add a flag to the
latter to change the behavior in whatever way needs to be changed, and
remove the former? Hmm ... Actually, since surely we must follow
symlinks everywhere, why do we have to do this separately for pg_tblspc?
Shouldn't that link-following occur automatically when walking PGDATA in
the first place?

> 2. Added ControlData->fsync_disabled to record whether fsync was ever
> disabled while the server was running (tested in CreateCheckPoint)
> 3. Run fsync_recursively at startup only if fsync is enabled
> 4. Run it if we're doing crash recovery, or fsync was disabled
> 5. Use pg_flush_data in pre_sync_fname
> 6. Issue fsync on directories too
> 7. Tested that it works if pg_xlog is a symlink (no changes).
>
> (In short, everything you mentioned in your earlier mail.)
>
> Note that I set ControlData->fsync_disabled to false in BootstrapXLOG,
> but it gets set to true during a later CreateCheckPoint(). This means
> we run fsync again at startup after initdb. I'm not sure what to do
> about that.

This all looks reasonable to me. I just noticed, though, that
the fd.c routines test enableFsync and do nothing if it's not enabled;
but fsync_recursively goes to all the trouble of doing stuff even if
disabled, and the actions are skipped later; the enableFsync check is
then responsibility of the caller. This seems a bit prone to later
confusion. Maybe fsync_recursively should Assert() that it's only being
called with enableFsync=on; or perhaps we can have it return early if
it's unset.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-06 07:16:07
Message-ID: 20150406071607.GA16629@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Álvaro.

Thanks for taking a look at the patch.

At 2015-04-03 13:32:32 -0300, alvherre(at)2ndquadrant(dot)com wrote:
>
> But then, since the name is already telling us that it's all about
> PGDATA, maybe we can remove the "recursively" part of the name?

Sure, that makes sense too. Since you and Andres both like
«fsync_pgdata(data_directory)», I'll change it accordingly.

> I also noticed that walkdir() thinks it is completely agnostic on
> what the passed action is; except that the comment at the bottom talks
> about how fsync on directories is important, which seems out of place.

Yes. The function behaves as documented, but the comment is clearly too
specific. I'm not sure where to put it. I could make walkdir() NOT do
it, and instead do it in the caller with the same comment. Thoughts?

> Hmm ... Actually, since surely we must follow symlinks everywhere,
> why do we have to do this separately for pg_tblspc? Shouldn't that
> link-following occur automatically when walking PGDATA in the first
> place?

I'm not sure about that (and that's why I've not attached an updated
patch here). The original idea was to follow only those links that we
expect to be in PGDATA.

I suppose it would be easier in terms of the code to follow all links,
but I don't know if it's the right thing. If that's what you think we
should do, I can post a simplified patch.

> Maybe fsync_recursively should Assert() that it's only being called
> with enableFsync=on; or perhaps we can have it return early if it's
> unset.

I prefer the latter. Will change.

-- Abhijit


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-06 13:16:36
Message-ID: 20150406131636.GD4369@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Abhijit Menon-Sen wrote:

Hi,

> At 2015-04-03 13:32:32 -0300, alvherre(at)2ndquadrant(dot)com wrote:

> > I also noticed that walkdir() thinks it is completely agnostic on
> > what the passed action is; except that the comment at the bottom talks
> > about how fsync on directories is important, which seems out of place.
>
> Yes. The function behaves as documented, but the comment is clearly too
> specific. I'm not sure where to put it. I could make walkdir() NOT do
> it, and instead do it in the caller with the same comment. Thoughts?

I think it's enough to state in the function comment that the action is
applied to the top element too. Maybe add the fsync comment on the
callsite.

> > Hmm ... Actually, since surely we must follow symlinks everywhere,
> > why do we have to do this separately for pg_tblspc? Shouldn't that
> > link-following occur automatically when walking PGDATA in the first
> > place?
>
> I'm not sure about that (and that's why I've not attached an updated
> patch here). The original idea was to follow only those links that we
> expect to be in PGDATA.
>
> I suppose it would be easier in terms of the code to follow all links,
> but I don't know if it's the right thing. If that's what you think we
> should do, I can post a simplified patch.

Well, we have many things that can be set as symlinks; some you can have
initdb create the links for you (initdb --xlogdir), others you can move
manually. I think not following all links might lead to impossible-to-
detect bugs such as failing to fsync new pgdata subdirectories we add in
the future, for example.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-15 06:10:34
Message-ID: 20150415061034.GA16656@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-04-06 10:16:36 -0300, alvherre(at)2ndquadrant(dot)com wrote:
>
> Well, we have many things that can be set as symlinks; some you can
> have initdb create the links for you (initdb --xlogdir), others you
> can move manually.

Looking at sendDir() in backend/replication/basebackup.c, I notice that
the only places where it expects/allows symlinks are for pg_xlog and in
pg_tblspc.

Still, thanks to the example in basebackup.c, I've got most of a patch
that should follow any symlinks in PGDATA. I just need to test a little
more before I post it.

-- Abhijit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-15 06:23:28
Message-ID: 20150415062328.GB16656@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-04-15 11:40:34 +0530, ams(at)2ndQuadrant(dot)com wrote:
>
> Still, thanks to the example in basebackup.c, I've got most of a patch
> that should follow any symlinks in PGDATA.

I notice that src/bin/pg_rewind/copy_fetch.c has a traverse_datadir()
function that does what we want (but it recurses into symlinks only
inside pg_tblspc).

-- Abhijit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-04-16 13:24:59
Message-ID: 20150416132459.GA22686@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi.

Here's a variation of the earlier patch that follows all links in
PGDATA. Does this look more like what you had in mind?

-- Abhijit

Attachment Content-Type Size
0001-20150416-Recursively-fsync-PGDATA-at-startup-if-needed.patch text/x-diff 8.0 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 19:37:44
Message-ID: CA+TgmoazKw=HEH=CKNJpgE847n5qCCLJR_XwKZq+hAJ5fKJ7+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 16, 2015 at 9:24 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> Here's a variation of the earlier patch that follows all links in
> PGDATA. Does this look more like what you had in mind?

I'm really confused by the additional control-file field. It is
documented as indicating whether fsync was ever disabled while the
server was running. But:

1. It doesn't do that. As soon as we fsync the data directory, we
reset the flag. That's not what "ever disabled" means to me.

2. I don't know why it's part of this patch. Tracking whether fsync
was ever disabled could be useful forensically, but isn't related to
fsync-ing the data directory after a crash, so I dunno why we'd put
that in this patch. Tracking whether fsync was disabled recently, as
the patch actually does, doesn't seem to be of any obvious value in
preventing corruption either.

Also, it seems awfully unfortunate to me that we're duplicating a
whole pile of code into xlog.c here. Maybe there's no way to avoid
the code duplication, but pre_sync_fname() seems like it'd more
naturally go in fd.c than here. I dunno where walkdir should go, but
again, not in xlog.c.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 22:18:08
Message-ID: 20150430221808.GW4369@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:

> Also, it seems awfully unfortunate to me that we're duplicating a
> whole pile of code into xlog.c here. Maybe there's no way to avoid
> the code duplication, but pre_sync_fname() seems like it'd more
> naturally go in fd.c than here. I dunno where walkdir should go, but
> again, not in xlog.c.

Hm, there's an interest in backpatching this as a bugfix, if I
understand correctly; hence the duplicated code. We could remove the
duplicity later with a refactoring patch in master only.

However, now that you mention a pg_control flag, it becomes evident to
me that a change to pg_control cannot be back-patched ...

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 22:20:44
Message-ID: CA+TgmobEDoqpAQZkFvusMa91fsUtJVx42wdTyKLK2hGu0YForw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 6:18 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
>> Also, it seems awfully unfortunate to me that we're duplicating a
>> whole pile of code into xlog.c here. Maybe there's no way to avoid
>> the code duplication, but pre_sync_fname() seems like it'd more
>> naturally go in fd.c than here. I dunno where walkdir should go, but
>> again, not in xlog.c.
>
> Hm, there's an interest in backpatching this as a bugfix, if I
> understand correctly; hence the duplicated code. We could remove the
> duplicity later with a refactoring patch in master only.

That seems pretty silly. If we going to add pre_sync_fname() to every
branch, we should add it to the same (correct) file in all of them,
not put it in xlog.c in the back-branches and fd.c in master.

> However, now that you mention a pg_control flag, it becomes evident to
> me that a change to pg_control cannot be back-patched ...

Indeed. But I think we can solve that problem by just ripping that
part out. Unless I'm missing something, it's not really doing
anything useful.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 22:44:14
Message-ID: 20150430224414.GA4369@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas wrote:
> On Thu, Apr 30, 2015 at 6:18 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
> >> Also, it seems awfully unfortunate to me that we're duplicating a
> >> whole pile of code into xlog.c here. Maybe there's no way to avoid
> >> the code duplication, but pre_sync_fname() seems like it'd more
> >> naturally go in fd.c than here. I dunno where walkdir should go, but
> >> again, not in xlog.c.
> >
> > Hm, there's an interest in backpatching this as a bugfix, if I
> > understand correctly; hence the duplicated code. We could remove the
> > duplicity later with a refactoring patch in master only.
>
> That seems pretty silly. If we going to add pre_sync_fname() to every
> branch, we should add it to the same (correct) file in all of them,
> not put it in xlog.c in the back-branches and fd.c in master.

Ah, so that's not the duplicate code that I was remembering -- I think
it's walkdir() or something like that, which is in initdb IIRC.

--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 23:49:32
Message-ID: CA+TgmoZ=nm5wZG3gNPydmfGZoDH8zgUKbUvFOBi9JKQ6jTUhCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 6:44 PM, Alvaro Herrera
<alvherre(at)2ndquadrant(dot)com> wrote:
> Robert Haas wrote:
>> On Thu, Apr 30, 2015 at 6:18 PM, Alvaro Herrera
>> <alvherre(at)2ndquadrant(dot)com> wrote:
>> >> Also, it seems awfully unfortunate to me that we're duplicating a
>> >> whole pile of code into xlog.c here. Maybe there's no way to avoid
>> >> the code duplication, but pre_sync_fname() seems like it'd more
>> >> naturally go in fd.c than here. I dunno where walkdir should go, but
>> >> again, not in xlog.c.
>> >
>> > Hm, there's an interest in backpatching this as a bugfix, if I
>> > understand correctly; hence the duplicated code. We could remove the
>> > duplicity later with a refactoring patch in master only.
>>
>> That seems pretty silly. If we going to add pre_sync_fname() to every
>> branch, we should add it to the same (correct) file in all of them,
>> not put it in xlog.c in the back-branches and fd.c in master.
>
> Ah, so that's not the duplicate code that I was remembering -- I think
> it's walkdir() or something like that, which is in initdb IIRC.

Yeah, walkdir() is there too. But if we're going to add that to the
backend, I think it should go in src/backend/storage/file, not
src/backend/access/transam.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-04-30 23:56:17
Message-ID: 37007.1430438177@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Apr 30, 2015 at 6:44 PM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> Ah, so that's not the duplicate code that I was remembering -- I think
>> it's walkdir() or something like that, which is in initdb IIRC.

> Yeah, walkdir() is there too. But if we're going to add that to the
> backend, I think it should go in src/backend/storage/file, not
> src/backend/access/transam.

Agreed that .../transam is a pretty horrid choice; but maybe we should
be thinking about putting it in src/common, so there's only one copy?

As for the notion that this needs to be back-patched, I would say no.

regards, tom lane


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 03:29:38
Message-ID: 20150501032938.GA17628@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-04-30 15:37:44 -0400, robertmhaas(at)gmail(dot)com wrote:
>
> 1. It doesn't do that. As soon as we fsync the data directory, we
> reset the flag. That's not what "ever disabled" means to me.

Could you suggest an acceptable alternative wording? I can't immediately
think of anything better than "disabled since the last restart". That is
conditional on our resetting the flag, which we will do only if fsync is
enabled at startup. So it's true, but not the whole truth.

> 2. I don't know why it's part of this patch.

In 20150115133245(dot)GG5245(at)awork2(dot)anarazel(dot)de, Andres explained his
rationale as follows:

«What I am thinking of is that, currently, if you start the server
for initial loading with fsync=off, and then restart it, you're open
to data loss. So when the current config file setting is changed
from off to on, we should fsync the data directory. Even if there
was no crash restart.»

That's what I tried to implement.

> Also, it seems awfully unfortunate to me that we're duplicating a
> whole pile of code into xlog.c here.

I have pointed out and discussed the duplication several times. I did it
this way only because we were considering backporting the changes, and
at the time it seemed better to do this and fix the duplication in a
separate patch.

-- Abhijit


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 03:34:19
Message-ID: 20150501033419.GB17628@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-04-30 16:56:17 -0700, tgl(at)sss(dot)pgh(dot)pa(dot)us wrote:
>
> As for the notion that this needs to be back-patched, I would say no.

Not even just the "fsync after crash" part? I could separate that out
from the control file changes and try to eliminate the duplication. I
think that would be worth back-patching, at least.

-- Abhijit


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 12:10:16
Message-ID: CA+TgmoZBioCxkqP+TOGNOrEZ8DcbEooGR4gQumBEgy+7dxi2dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Apr 30, 2015 at 11:29 PM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
>> 2. I don't know why it's part of this patch.
>
> In 20150115133245(dot)GG5245(at)awork2(dot)anarazel(dot)de, Andres explained his
> rationale as follows:
>
> «What I am thinking of is that, currently, if you start the server
> for initial loading with fsync=off, and then restart it, you're open
> to data loss. So when the current config file setting is changed
> from off to on, we should fsync the data directory. Even if there
> was no crash restart.»

That's awfully clever, but I'm not sure I like the idea of trying to
be that clever. I think if users temporarily disable fsync, they
should be responsible for using initdb -S after if that is needed in
their situation, and this should be documented.

It seems to me that, at a minimum, it would be good to split those
controversial and definitely not-back-patchable changes into their own
patch.

>> Also, it seems awfully unfortunate to me that we're duplicating a
>> whole pile of code into xlog.c here.
>
> I have pointed out and discussed the duplication several times. I did it
> this way only because we were considering backporting the changes, and
> at the time it seemed better to do this and fix the duplication in a
> separate patch.

As I've mentioned a few times, I don't mind duplicating the code if we
have frontend and backend versions that are materially different. But
I do mind putting it into xlog.c instead of some place that's actually
appropriate.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 12:42:53
Message-ID: 20150501124253.GA2556@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-05-01 08:10:16 -0400, robertmhaas(at)gmail(dot)com wrote:
>
> It seems to me that, at a minimum, it would be good to split those
> controversial and definitely not-back-patchable changes into their
> own patch.

OK, split here (0002*).

> I do mind putting it into xlog.c instead of some place that's actually
> appropriate.

OK, moved to storage/file/fd.c (0001*).

-- Abhijit

Attachment Content-Type Size
0001-Recursively-fsync-PGDATA-at-startup-after-a-crash.patch text/x-diff 6.3 KB
0002-Recursively-fsync-PGDATA-on-the-next-restart-after-f.patch text/x-diff 3.5 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 13:57:28
Message-ID: CA+TgmoZ_ZAGLt7QB9tmYN3-htVPJe_qvyrqogWg-J2O8hPHS0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 1, 2015 at 8:42 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> At 2015-05-01 08:10:16 -0400, robertmhaas(at)gmail(dot)com wrote:
>> It seems to me that, at a minimum, it would be good to split those
>> controversial and definitely not-back-patchable changes into their
>> own patch.
>
> OK, split here (0002*).
>
>> I do mind putting it into xlog.c instead of some place that's actually
>> appropriate.
>
> OK, moved to storage/file/fd.c (0001*).

Here's a revised version of your 0001 patch which I am comfortable
with. I changed some of the comments, and I moved the fsync_pgdata()
call slightly later, so that we don't do a (possibly long) set of
fsyncs before printing out the first log message that tells the user
what is going on.

If you don't object to this version, I'll commit it. I believe this
part *should* be back-patched, but Tom seemed to disagree, for reasons
I'm not really clear on. This is a potential data corrupting bug as
legitimate as any other, so a back-patch seems right to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
fsync-pgdata-rmh.patch binary/octet-stream 5.7 KB

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 14:03:15
Message-ID: 20150501140315.GD22649@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

I agree that splitting the patch into two separate ones is a good one.

On 2015-05-01 09:57:28 -0400, Robert Haas wrote:
> If you don't object to this version, I'll commit it. I believe this
> part *should* be back-patched, but Tom seemed to disagree, for reasons
> I'm not really clear on. This is a potential data corrupting bug as
> legitimate as any other, so a back-patch seems right to me.

Agreed. Especially for WAL files this seems to be a pretty clear
correctness issue to me.

I unsurprisingly think the other patch is a good idea too. But it's
clearly *not* something for the back branches.

Greetings,

Andres Freund


From: Abhijit Menon-Sen <ams(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: initdb -S and tablespaces
Date: 2015-05-01 14:41:58
Message-ID: 20150501144158.GA4388@toroid.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

At 2015-05-01 09:57:28 -0400, robertmhaas(at)gmail(dot)com wrote:
>
> If you don't object to this version, I'll commit it.

Looks fine to me, thank you.

As for the non-backpatchable 0002, I agree with Andres that it should be
included in 9.5; but I take it you're still not convinced? Should I add
that to the CF separately for discussion, or what?

-- Abhijit


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-04 18:23:16
Message-ID: CA+TgmoYhCJzNt4gyPfw05H_e0H7kMPpy7g_PnCiP_WjgaEd7MA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 1, 2015 at 10:41 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> At 2015-05-01 09:57:28 -0400, robertmhaas(at)gmail(dot)com wrote:
>>
>> If you don't object to this version, I'll commit it.
>
> Looks fine to me, thank you.

OK, committed and back-patched.

> As for the non-backpatchable 0002, I agree with Andres that it should be
> included in 9.5; but I take it you're still not convinced?

No, I'm not convinced. That patch will protect you in one extremely
specific scenario: you turn off fsync, do some stuff, shut down, turn
fsync back on again, and start up. But it won't protect you if you
crash while fsync is off, or after you shut down with fsync=off and
before you restart with fsync=on. And there's no documentation change
here that would help anyone distinguish between the situations in
which they are protected and the situations in which they are not
protected. Without that, a lot of people are going to get this wrong.

As an alternative, how about fsync=shutdown parameter? This could be
documented to fsync the data directory at shutdown. It could document
that there is a risk of corruption if the server crashes, but that the
database is OK if shut down cleanly. fsync=off could document that
you must run initdb --sync-only after shutting down, else you are
unsafe.

I'm not wedded to any particular solution, but an undocumented hack
that some people will manage to use safely some of the time doesn't
seem good enough to me.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-05 10:26:46
Message-ID: CAApHDvq+wcO7LfnjuUnzcO76EhW-bKXnona=oPR=a4DGinxWZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 5 May 2015 at 06:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>
> OK, committed and back-patched.
>

There's a couple of problems with this that the windows buildfarm members
are not happy with.

The attached patch seems to work locally.

Regards

David Rowley

Attachment Content-Type Size
fsync_win32_fix.patch application/octet-stream 975 bytes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-05 13:14:11
Message-ID: CA+TgmoZRvH+SSOjbzVVbqMrVy1-KV2+z9EO1CcL-eeLhEG0ovQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, May 5, 2015 at 6:26 AM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On 5 May 2015 at 06:23, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>
>> OK, committed and back-patched.
>
> There's a couple of problems with this that the windows buildfarm members
> are not happy with.
>
> The attached patch seems to work locally.

Thanks. I think the open() stuff should be fixed by using
BasicOpenFile() rather than introducing support for the two-argument
form of open().

I'll push a fix shortly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-08 23:53:06
Message-ID: 20150508235306.GC12950@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-04 14:23:16 -0400, Robert Haas wrote:
> On Fri, May 1, 2015 at 10:41 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
> > As for the non-backpatchable 0002, I agree with Andres that it should be
> > included in 9.5; but I take it you're still not convinced?
>
> No, I'm not convinced. That patch will protect you in one extremely
> specific scenario: you turn off fsync, do some stuff, shut down, turn
> fsync back on again, and start up.

Hm. ISTM it'd not be hard to actually make it safe in nearly all
situations. What about tracking the last checkpoint's fsync setting and
do a fsync_pgdata() in the checkpointer at the end of a checkpointer if
the previous setting was off and the current one is on? Combined with
doing so at startup if the settings changed between runs, that should
give pretty decent protection. And seems fairly simple to implement.

Greetings,

Andres Freund


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-09 02:08:31
Message-ID: CA+TgmoZWaYy1pJO72vuKaiK4Q5FLLpyXEYRTedqtmAQQTcJ8Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, May 8, 2015 at 7:53 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2015-05-04 14:23:16 -0400, Robert Haas wrote:
>> On Fri, May 1, 2015 at 10:41 AM, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com> wrote:
>> > As for the non-backpatchable 0002, I agree with Andres that it should be
>> > included in 9.5; but I take it you're still not convinced?
>>
>> No, I'm not convinced. That patch will protect you in one extremely
>> specific scenario: you turn off fsync, do some stuff, shut down, turn
>> fsync back on again, and start up.
>
> Hm. ISTM it'd not be hard to actually make it safe in nearly all
> situations. What about tracking the last checkpoint's fsync setting and
> do a fsync_pgdata() in the checkpointer at the end of a checkpointer if
> the previous setting was off and the current one is on? Combined with
> doing so at startup if the settings changed between runs, that should
> give pretty decent protection. And seems fairly simple to implement.

That seems a bit better. I think it's really important, if we're
going to start to try to make fsync=off anything other than a toy,
that we document really clearly the circumstances in which it is or is
not safe:

- If you crash while fsync=off, your cluster may be corrupted.
- If you crash while fsync=on, but it was off at the last checkpoint,
your cluster may be corrupted.
- If you turn fsync=off, do stuff, turn fsync=on, and checkpoint
successfully, a subsequent crash should not corrupt anything.

Of course, even the last one isn't totally bullet-proof. Suppose one
backend fails to absorb the new setting for some reason...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-09 20:56:57
Message-ID: 20150509205657.GF12950@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 2015-05-08 22:08:31 -0400, Robert Haas wrote:
> That seems a bit better. I think it's really important, if we're
> going to start to try to make fsync=off anything other than a toy,

I think it's long past that. I've seen many, many people use it during
initial data loading.

> that we document really clearly the circumstances in which it is or is
> not safe:

Yea, we really should have done that a long time ago.

> - If you crash while fsync=off, your cluster may be corrupted.

HW crash, right?

> - If you crash while fsync=on, but it was off at the last checkpoint,
> your cluster may be corrupted.
> - If you turn fsync=off, do stuff, turn fsync=on, and checkpoint
> successfully, a subsequent crash should not corrupt anything.

Yep.

> Of course, even the last one isn't totally bullet-proof. Suppose one
> backend fails to absorb the new setting for some reason...

I've a hard time worrying much about that one...

Greetings,

Andres Freund


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Abhijit Menon-Sen <ams(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: initdb -S and tablespaces
Date: 2015-05-09 21:40:50
Message-ID: 13545.1431207650@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2015-05-08 22:08:31 -0400, Robert Haas wrote:
>> Of course, even the last one isn't totally bullet-proof. Suppose one
>> backend fails to absorb the new setting for some reason...

> I've a hard time worrying much about that one...

You should. At the very least, whatever recipe we write for changing
fsync safely has to include a clause like "wait for all postmaster
children to have absorbed the new fsync setting". The facts that (a) this
could be a long time and (b) there's no easy way to be entirely certain
about when it's done don't make it something you should ignore.

I wonder whether we should change fsync to be PGC_POSTMASTER and then
document the safe procedure as requiring a postmaster restart.

regards, tom lane