Re: WIP: Restricting pg_rewind to data/wal dirs

Lists: pgsql-hackers
From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-28 11:22:20
Message-ID: CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi;

There are still some cleanup bits needed here but I wanted to get feedback
on my general direction.

I hope to submit for commit fest soon if the general feedback is good.
Tests are passing (with adjustments intended for change of behaviour in one
test script). I want to note that Crimson Thompson (also with Adjust) has
provided some valuable discussion on this code.

The Problem:

pg_rewind makes no real guarantees regarding the final state of non-data
files or directories. It.checks to see if the timeline has incremented
(and therefore guarantees that if successful the data directories are on
the same timeline) but for non-data files, these are clobbered if we rewind
and left intact if not. These other files include postgresql.auto.conf,
replication slots, and can include log files.

Copying logs over to the new slave is something one probably never wants to
do (same with replication slots), and the behaviours here can mask
troubleshooting regarding what a particular master failed, cause wal
segments to build up, automatic settings changes, and other undesirable
behaviours. Together these make pg_rewind very difficult to use properly
and push tasks to replication management tooling that the management tools
are not in a good position to handle correctly.

Designing the Fix:

Two proposed fixes have been considered and one selected: Either we
whitelist directories and only rewind those. The other was to allow shell
globs to be provided that could be used to exclude files. The whitelisting
solution was chosen for a number of reasons.

When pg_rewind "succeeds" but not quite correctly the result is usually a
corrupted installation which requires a base backup to replace it anyway.
In a recovery situation, sometimes pressures occur which render human
judgment less effective, and therefore glob exclusion strikes me as
something which would probably do more harm than good, but maybe I don't
understand the use case (comments as to why some people look at the other
solution as preferable would be welcome).

In going with the whitelisting solution, we chose to include all
directories with WAL-related information. This allows more predicable
interaction with other parts of the replication chain. Consequently not
only do we copy pg_wal and pg_xact but also commit timestamps and so forth.

The Solution:

The solution is a whitelist of directories specified which are the only
ones which are synchronised. The relevant part of this patch is:

+/* List of directories to synchronize:

+ * base data dirs (and ablespaces)

+ * wal/transaction data

+ * and that is it.

+ *

+ * This array is null-terminated to make

+ * it easy to expand

+ */

+

+const char *rewind_dirs[] = {

+ "base",

+ "global",

+ "pg_commit_ts",

+ "pg_logical",

+ "pg_multixact",

+ "pg_serial",

+ "pg_subtrans",

+ "pg_tblspc",

+ "pg_twophase",

+ "pg_wal",

+ "pg_xact",

+ NULL

+};

From there we iterate over this array for directory-based approaches in
copy_fetch.c and with one query per directory in libpq_fetch.c. This also
means shifting from the basic interface from PQexec to PQexecParams. It
would be possible to move to binary formats too, but this was not done
currently in order to simplify code review (that could be a separate
independent patch at a later time).

Testing Done:

The extra files tests correctly test this change in behaviour. The tests
have been modified, and diagnostics in cases of failures expanded, in this
case. The other tests provide good coverage of whether pg_rewind does what
it is supposed to do.

Cleanup still required:

I accidentally left Carp::Always in the PM in this perl module. It will be
fixed.

I renamed one of the functions used to have a more descriptive name but
currently did not remove the old function yet.

Feedback is very welcome. pg_rewind is a very nice piece of software. I
am hoping that these sorts of changes will help ensure that it is easier to
use and provides more predictable results.
--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
pg_rewind_restrict.patch application/octet-stream 11.4 KB

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-29 13:48:59
Message-ID: CA+Tgmoa1tpohba2=UERY9objJy02ugZA8KZQhR8rZdtWCXbckg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 28, 2017 at 4:52 PM, Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> There are still some cleanup bits needed here but I wanted to get feedback
> on my general direction.
>
> I hope to submit for commit fest soon if the general feedback is good.

I think you should submit to the CommitFest regardless, to increase
your chances of getting feedback.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 08:04:06
Message-ID: CAB7nPqS1oqOcRO=g4ej1_iVUJSzQngut4FjNHBUeawbUTO0++g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> The Solution:
> The solution is a whitelist of directories specified which are the only ones
> which are synchronised. The relevant part of this patch is:
>
> +/* List of directories to synchronize:
> + * base data dirs (and ablespaces)
> + * wal/transaction data
> + * and that is it.
> + *
> + * This array is null-terminated to make
> + * it easy to expand
> + */
>
> +const char *rewind_dirs[] = {
> + "base",
> + "global",
> + "pg_commit_ts",
> + "pg_logical",
> + "pg_multixact",
> + "pg_serial",
> + "pg_subtrans",
> + "pg_tblspc",
> + "pg_twophase",
> + "pg_wal",
> + "pg_xact",
> + NULL
> +};

The problem with a white list, which is one reason why I do not favor
it, is in the case where a feature adds in the data folder a new path
for its data, particularly in the case where this is critical for a
base backup. If this folder is not added in this list, then a rewind
may be silently corrupted as well. There are also a set of directories
in this list that we do not care about, pg_serial being one.
pg_subtrans is a second, as it gets zeroed on startup.

And if you think about it, pg_rewind is actually the *same* processing
as a base backup taken using the replication protocol. So instead of
this patch I would recommend using excludeFiles and excludeDirContents
by moving this list to a common header where frontend and backend can
refer to. Note that basebackup.h includes replnodes.h, so my thoughts
is that you should split the current header with something like
basebackup_internal.h which is backend-only, and have pg_rewind fetch
the list of directories to automatically ignore as those ones. You can
also simplify a bit the code of pg_rewind a bit so as things like
postmaster.opts. On top of that, there is visibly no need to tweak the
SQL query fetching the directory list (which is useful for debugging
as shaped now actually), but just change process_source_file so as its
content is not added in the file map.

Then there is a second case where you do not want a set of folders to
be included, but those can be useful by default, an example here being
pg_wal where one might want to have an empty path to begin with. A
third case is a set of folders that you do not care about, but depends
on the deployment, being here "log" for example. For those you could
just add an --exclude-path option which simply piles up paths into a
linked list when called multiple times. This could happen with a
second patch.

> From there we iterate over this array for directory-based approaches in
> copy_fetch.c and with one query per directory in libpq_fetch.c. This also
> means shifting from the basic interface from PQexec to PQexecParams. It
> would be possible to move to binary formats too, but this was not done
> currently in order to simplify code review (that could be a separate
> independent patch at a later time).

- res = PQexec(conn, sql);
+ for (p = 0; rewind_dirs[p] != NULL; ++p)
+ {
+ const char *paths[1];
+ paths[0] = rewind_dirs[p];
+ res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0);
Calling multiple times the query to list all directories is messy IMO.
And this is N-costly processing if there are many files to look at,
say many relation files.
--
Michael


From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 09:43:13
Message-ID: CAN-RpxBvGPC9UjPe-=_Vq3p6qK7yj5ajT2RgZHtGcFKQjvQzwA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

First, thanks for your thoughts on this, and I am interested in probing
them more.

On Mon, Oct 30, 2017 at 9:04 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
wrote:

> On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
> > The Solution:
> > The solution is a whitelist of directories specified which are the only
> ones
> > which are synchronised. The relevant part of this patch is:
> >
> > +/* List of directories to synchronize:
> > + * base data dirs (and ablespaces)
> > + * wal/transaction data
> > + * and that is it.
> > + *
> > + * This array is null-terminated to make
> > + * it easy to expand
> > + */
> >
> > +const char *rewind_dirs[] = {
> > + "base",
> > + "global",
> > + "pg_commit_ts",
> > + "pg_logical",
> > + "pg_multixact",
> > + "pg_serial",
> > + "pg_subtrans",
> > + "pg_tblspc",
> > + "pg_twophase",
> > + "pg_wal",
> > + "pg_xact",
> > + NULL
> > +};
>
> The problem with a white list, which is one reason why I do not favour
> it, is in the case where a feature adds in the data folder a new path
> for its data, particularly in the case where this is critical for a
> base backup. If this folder is not added in this list, then a rewind
> may be silently corrupted as well. There are also a set of directories
> in this list that we do not care about, pg_serial being one.
> pg_subtrans is a second, as it gets zeroed on startup.
>

The problem with an exclude list is that we cannot safely exclude
configuration files or logs (because these could be named anything), right?

Are there any cases right now where you have features added by extensions
that write to directories which are required for a rewind? I am asking
because I would like to see if this is the minimum working change or if
this change is fundamentally broken currently and must be extended to allow
custom paths to be sync'd as well.

>
> And if you think about it, pg_rewind is actually the *same* processing
> as a base backup taken using the replication protocol. So instead of
> this patch I would recommend using excludeFiles and excludeDirContents
> by moving this list to a common header where frontend and backend can
> refer to. Note that basebackup.h includes replnodes.h, so my thoughts
> is that you should split the current header with something like
> basebackup_internal.h which is backend-only, and have pg_rewind fetch
> the list of directories to automatically ignore as those ones. You can
> also simplify a bit the code of pg_rewind a bit so as things like
> postmaster.opts. On top of that, there is visibly no need to tweak the
> SQL query fetching the directory list (which is useful for debugging
> as shaped now actually), but just change process_source_file so as its
> content is not added in the file map.
>

I am not sure it *should* be the same, however. In a backup we probably
want to backup the postgresql.auto.conf, but on a failover, I don't think
we want to clobber configuration. We certainly do not want to sometimes
clobber configuration but not other times (which is what happens right now
in some cases). And under no circumstances do we want to clobber logs on a
failed server with logs on a working server. That's asking for serious
problems in my view.

If you think about it, there's a huge difference in use case in backing up
a database cluster (Including replication slots, configs in the dir etc)
and re-syncing the data so that replication can resume, and I think there
are some dangers that come up when assuming these should be closely tied
together.

>
> Then there is a second case where you do not want a set of folders to
> be included, but those can be useful by default, an example here being
> pg_wal where one might want to have an empty path to begin with. A
> third case is a set of folders that you do not care about, but depends
> on the deployment, being here "log" for example. For those you could
> just add an --exclude-path option which simply piles up paths into a
> linked list when called multiple times. This could happen with a
> second patch.
>

Agreed. And one could add an "--include-path" option to allow for unusual
cases where you want extra directories, such as replication slots, or the
like.

I think another patch could also specifically empty and perhaps create a
replication slot allowing for one to bring tings back up via streaming
replication more safely.

>
> > From there we iterate over this array for directory-based approaches in
> > copy_fetch.c and with one query per directory in libpq_fetch.c. This
> also
> > means shifting from the basic interface from PQexec to PQexecParams. It
> > would be possible to move to binary formats too, but this was not done
> > currently in order to simplify code review (that could be a separate
> > independent patch at a later time).
>
> - res = PQexec(conn, sql);
> + for (p = 0; rewind_dirs[p] != NULL; ++p)
> + {
> + const char *paths[1];
> + paths[0] = rewind_dirs[p];
> + res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0);
> Calling multiple times the query to list all directories is messy IMO.
> And this is N-costly processing if there are many files to look at,
> say many relation files.
>

We wouldn't recurse into the relation files dir more than once since this
is filtered out on the initial query in the recursive CTE before
recursion. In other words, the query filters out the top-level query
before it recurses into any directory so there is a smalll cost (planning,
the fact that the root of the pgdata is queried once per) but this seems
pretty minor given that we need at least one query per file that we sync
anyway.

I am open to changing it to an array, but one thing to think about is that
if we want to add an ability to add extra directories, this makes it much
easier to do that.

> --
> Michael
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 09:57:16
Message-ID: CAB7nPqTO3ZzQgsjctoYK+biT9TETpLe5FvbxG26ehdXA_e2x8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> Are there any cases right now where you have features added by extensions that write to directories which are required for a rewind?

In some of the stuff I maintain, I actually have one case now of a
configuration file included with include_if_exists by postgresql.conf
and is expected to be generated by a component that my code doing the
rewind has no direct access on... I can control how pg_rewind kicks
in, but I think that you would actually break silently the current
code, which is scary especially if I don't control the code where
pg_rewind is called when Postgres 11 gets integrated into the product
I am thinking about, and even more scary if there is no way to include
something.

> The problem with an exclude list is that we cannot safely exclude
> configuration files or logs (because these could be named anything), right?

You have the exact same problem with base backups now, and any
configuration files created by extensions are a per-case problem...
The pattern that base backups now use is an exclude list. In the
future I would rather see base backups and pg_rewind using the same
infrastructure for both things:
- pg_rewind should use the replication protocol with BASE_BACKUP.
Having it rely on root access now is crazy.
- BASE_BACKUP should have an option where it is possible to exclude
custom paths.
What you are proposing here would make both diverge more, which is in
my opinion not a good approach.
--
Michael


From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 10:15:31
Message-ID: CAN-RpxDmJ_agTqWNnfZ5125iXrtHuy-t+kQapu7JTvVQ0Bn3pA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 30, 2017 at 10:57 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

> On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris(dot)travers(at)adjust(dot)com>
> wrote:
> > Are there any cases right now where you have features added by
> extensions that write to directories which are required for a rewind?
>
> In some of the stuff I maintain, I actually have one case now of a
> configuration file included with include_if_exists by postgresql.conf
> and is expected to be generated by a component that my code doing the
> rewind has no direct access on... I can control how pg_rewind kicks
> in, but I think that you would actually break silently the current
> code, which is scary especially if I don't control the code where
> pg_rewind is called when Postgres 11 gets integrated into the product
> I am thinking about, and even more scary if there is no way to include
> something.
>

Ok, so there is an argument that there needs to be a way to include
additional paths in this patch. One important question I would have in
these cases is if you expect these to be set only on the master. If not,
then is this a problem and if not then how do you handle the fail-back
problem etc?

This also brings up a fairly major concern more generally about control by
the way. A lot of cases where pg_rewind is called, the user doesn't
necessarily have much control on how it is called. Moreover in many of
these cases, the user is probably not in a position to understand the
internals well enough to grasp what to check after.

>
> > The problem with an exclude list is that we cannot safely exclude
> > configuration files or logs (because these could be named anything),
> right?
>
> You have the exact same problem with base backups now, and any
> configuration files created by extensions are a per-case problem...
>

Right, but there is a use case difference between "I am taking a backup of
a server" and "I need to get the server into rejoin the replication as a
standby."

A really good example of where this is a big problem is with replication
slots. On a backup I would expect you want replication slots to be copied
in. However when setting yourself up as a slave you most certainly do not
want to just copy these over from the master unless you have infinite disk
space. I would argue that under *no* circumstance should pg_rewind *ever*
copy replication slots. But pg_base_backup really *should* (and when
provisioning a new slave you should clear these as soon as it is set up).

The pattern that base backups now use is an exclude list. In the
> future I would rather see base backups and pg_rewind using the same
> infrastructure for both things:
> - pg_rewind should use the replication protocol with BASE_BACKUP.
> Having it rely on root access now is crazy.
> - BASE_BACKUP should have an option where it is possible to exclude
> custom paths.
> What you are proposing here would make both diverge more, which is in
> my opinion not a good approach.
>

How does rep mgr or other programs using pg_rewind know what to exclude?

Again I am not convinced setting up a replica and taking a backup for
disaster recovery are the same use case or have the same requirements.

--
> Michael
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, marco(dot)nenciarini(at)2ndquadrant(dot)it
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 10:36:06
Message-ID: CAB7nPqTv2DH0_QW4SafVdbnvPaG305cuYLoqY_Yv8iJ497ZzXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
<chris(dot)travers(at)adjust(dot)com> wrote:
> This also brings up a fairly major concern more generally about control by
> the way. A lot of cases where pg_rewind is called, the user doesn't
> necessarily have much control on how it is called. Moreover in many of
> these cases, the user is probably not in a position to understand the
> internals well enough to grasp what to check after.

Likely they are not.

> Right, but there is a use case difference between "I am taking a backup of a
> server" and "I need to get the server into rejoin the replication as a
> standby."

The intersection of the first and second categories is not empty. Some
take backups and use them to deploy standbys.

> A really good example of where this is a big problem is with replication
> slots. On a backup I would expect you want replication slots to be copied
> in.

I would actually expect the contrary, and please note that replication
slots are not taken in a base backup, which is what the documentation
says as well:
https://www.postgresql.org/docs/10/static/protocol-replication.html
"pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
they are symbolic links)."

Some code I have with 9.6's pg_bsaebackup removes manually replication
slots as this logic is new in 10 ;)

>> The pattern that base backups now use is an exclude list. In the
>> future I would rather see base backups and pg_rewind using the same
>> infrastructure for both things:
>> - pg_rewind should use the replication protocol with BASE_BACKUP.
>> Having it rely on root access now is crazy.
>> - BASE_BACKUP should have an option where it is possible to exclude
>> custom paths.
>> What you are proposing here would make both diverge more, which is in
>> my opinion not a good approach.
>
> How does rep mgr or other programs using pg_rewind know what to exclude?

Good question. Answers could come from folks such as David Steele
(pgBackRest) or Marco (barman) whom I am attaching in CC.
--
Michael


From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, marco(dot)nenciarini(at)2ndquadrant(dot)it
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 11:22:59
Message-ID: CAN-RpxB1jJ5pF5G3cbTe1BmXZx_2JOZLMCHW9zEo+t2XDSRhJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
> <chris(dot)travers(at)adjust(dot)com> wrote:
> > This also brings up a fairly major concern more generally about control
> by
> > the way. A lot of cases where pg_rewind is called, the user doesn't
> > necessarily have much control on how it is called. Moreover in many of
> > these cases, the user is probably not in a position to understand the
> > internals well enough to grasp what to check after.
>
> Likely they are not.
>

So currently I am submitting the current patch to commit fest. I am open
to adding a new
--Include-path option in this patch, but I am worried about atomicity
concerns, and I am still
not really sure what the impact is for you (I haven't heard whether you
expect this file to be
there before the rewind, i.e. whether it would be on both master and slave,
or just on the slave).

Sp there is the question of under what specific circumstances this would
break for you.

>
> > Right, but there is a use case difference between "I am taking a backup
> of a
> > server" and "I need to get the server into rejoin the replication as a
> > standby."
>
> The intersection of the first and second categories is not empty. Some
> take backups and use them to deploy standbys.
>

Sure, but it is not complete either.

>
> > A really good example of where this is a big problem is with replication
> > slots. On a backup I would expect you want replication slots to be
> copied
> > in.
>
> I would actually expect the contrary, and please note that replication
> slots are not taken in a base backup, which is what the documentation
> says as well:
> https://www.postgresql.org/docs/10/static/protocol-replication.html
> "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
> pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
> they are symbolic links)."
>

Many of these are emptied on server restart but repslot is not. What is
the logic
for excluding it from backups other than to avoid problems with replication
provisioning?

I mean if I have a replication slot for taking backups with barman (and I
am not actually doing replication) why would I *not* want that in my base
backup if I might have to restore to a new machine somewhere?

>
> Some code I have with 9.6's pg_bsaebackup removes manually replication
> slots as this logic is new in 10 ;)
>
> >> The pattern that base backups now use is an exclude list. In the
> >> future I would rather see base backups and pg_rewind using the same
> >> infrastructure for both things:
> >> - pg_rewind should use the replication protocol with BASE_BACKUP.
> >> Having it rely on root access now is crazy.
> >> - BASE_BACKUP should have an option where it is possible to exclude
> >> custom paths.
> >> What you are proposing here would make both diverge more, which is in
> >> my opinion not a good approach.
> >
> > How does rep mgr or other programs using pg_rewind know what to exclude?
>
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.
>

Two points that further occur to me:

Shell globs seem to me to be foot guns in this area, I think special paths
should be one path per invocation of the option not "--exclude=pg_w*" since
this is just asking for problems as time goes on and things get renamed.

It also seems to me that adding specific paths is far safer than removing
specific paths.

> --
> Michael
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, marco(dot)nenciarini(at)2ndquadrant(dot)it
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-31 22:24:06
Message-ID: f731beb1-466c-a5bb-0745-78ee701c8f95@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10/30/17 6:36 AM, Michael Paquier wrote:
> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
>>
>> How does rep mgr or other programs using pg_rewind know what to exclude?
>
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.

pgBackRest does not use pg_rewind. However, pgBackRest does use the
same exclusion list for backups as pg_basebackup.

--
-David
david(at)pgmasters(dot)net


From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-11-01 08:58:05
Message-ID: CAN-RpxDq1zBo7_6JGQ0g4BqV6GVa_BGkw8_c5OH6zuWmVyuT5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Attached is the patch as submitted for commitfest.

Please note, I am not adverse to adding an additional --Include-path
directive if that avoids backwards-compatibility problems. However the
patch is complex enough I would really prefer review on the rest of it to
start first. This doesn't strike me as perfectly trivial and I think it is
easier to review pieces separately. I have already started on the
--Include-path directive and could probably get it attached to a later
version of the patch very soon.

I would also like to address a couple of important points here:

1. I think default restrictions plus additional paths is the best, safest
way forward. Excluding shell-globs doesn't solve the "I need this
particular config file" very well particularly if we want to support this
outside of an internal environment. Shell globs also tend to be overly
inclusive and so if you exclude based on them, you run into a chance that
your rewind is corrupt for being overly exclusive.

2. I would propose any need for an additional paths be specified using an
--Include-path directive. This could be specified multiple times and could
point to a file or directory which would be added to the paths rewound. I
have a patch for this mostly done but I am concerned that these sorts of
changes result in a combination of changes that are easier to review
separately than together. So if needed, I can add it or in a separate
patch following.

3. I think it would be a mistake to tie backup solutions in non-replicated
environments to replication use cases, and vice versa. Things like
replication slots (used for streaming backups) have different
considerations in different environments. Rather than build the same
infrastructure first, I think it is better to support different use cases
well and then build common infrastructure to support the different cases.
I am not against building common infrastructure for pg_rewind and
pg_basebackup. I am very much against having the core guarantees being the
exact same.

Best Wishes,
Chris Travers

On Sat, Oct 28, 2017 at 1:22 PM, Chris Travers <chris(dot)travers(at)adjust(dot)com>
wrote:

> Hi;
>
> There are still some cleanup bits needed here but I wanted to get feedback
> on my general direction.
>
> I hope to submit for commit fest soon if the general feedback is good.
> Tests are passing (with adjustments intended for change of behaviour in one
> test script). I want to note that Crimson Thompson (also with Adjust) has
> provided some valuable discussion on this code.
>
> The Problem:
>
> pg_rewind makes no real guarantees regarding the final state of non-data
> files or directories. It.checks to see if the timeline has incremented
> (and therefore guarantees that if successful the data directories are on
> the same timeline) but for non-data files, these are clobbered if we rewind
> and left intact if not. These other files include postgresql.auto.conf,
> replication slots, and can include log files.
>
> Copying logs over to the new slave is something one probably never wants
> to do (same with replication slots), and the behaviours here can mask
> troubleshooting regarding what a particular master failed, cause wal
> segments to build up, automatic settings changes, and other undesirable
> behaviours. Together these make pg_rewind very difficult to use properly
> and push tasks to replication management tooling that the management tools
> are not in a good position to handle correctly.
>
> Designing the Fix:
>
> Two proposed fixes have been considered and one selected: Either we
> whitelist directories and only rewind those. The other was to allow shell
> globs to be provided that could be used to exclude files. The whitelisting
> solution was chosen for a number of reasons.
>
> When pg_rewind "succeeds" but not quite correctly the result is usually a
> corrupted installation which requires a base backup to replace it anyway.
> In a recovery situation, sometimes pressures occur which render human
> judgment less effective, and therefore glob exclusion strikes me as
> something which would probably do more harm than good, but maybe I don't
> understand the use case (comments as to why some people look at the other
> solution as preferable would be welcome).
>
> In going with the whitelisting solution, we chose to include all
> directories with WAL-related information. This allows more predicable
> interaction with other parts of the replication chain. Consequently not
> only do we copy pg_wal and pg_xact but also commit timestamps and so forth.
>
> The Solution:
>
> The solution is a whitelist of directories specified which are the only
> ones which are synchronised. The relevant part of this patch is:
>
> +/* List of directories to synchronize:
>
> + * base data dirs (and ablespaces)
>
> + * wal/transaction data
>
> + * and that is it.
>
> + *
>
> + * This array is null-terminated to make
>
> + * it easy to expand
>
> + */
>
> +
>
> +const char *rewind_dirs[] = {
>
> + "base",
>
> + "global",
>
> + "pg_commit_ts",
>
> + "pg_logical",
>
> + "pg_multixact",
>
> + "pg_serial",
>
> + "pg_subtrans",
>
> + "pg_tblspc",
>
> + "pg_twophase",
>
> + "pg_wal",
>
> + "pg_xact",
>
> + NULL
>
> +};
>
>
> From there we iterate over this array for directory-based approaches in
> copy_fetch.c and with one query per directory in libpq_fetch.c. This also
> means shifting from the basic interface from PQexec to PQexecParams. It
> would be possible to move to binary formats too, but this was not done
> currently in order to simplify code review (that could be a separate
> independent patch at a later time).
>
> Testing Done:
>
> The extra files tests correctly test this change in behaviour. The tests
> have been modified, and diagnostics in cases of failures expanded, in this
> case. The other tests provide good coverage of whether pg_rewind does what
> it is supposed to do.
>
> Cleanup still required:
>
> I accidentally left Carp::Always in the PM in this perl module. It will
> be fixed.
>
> I renamed one of the functions used to have a more descriptive name but
> currently did not remove the old function yet.
>
>
> Feedback is very welcome. pg_rewind is a very nice piece of software. I
> am hoping that these sorts of changes will help ensure that it is easier to
> use and provides more predictable results.
> --
> Best Regards,
> Chris Travers
> Database Administrator
>
> Tel: +49 162 9037 210 <+49%20162%209037210> | Skype: einhverfr |
> www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

Attachment Content-Type Size
pg_rewind_restrict_dirs.v2.patch application/octet-stream 11.0 KB

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-11-29 05:46:33
Message-ID: CAB7nPqStpjPqjT5wLqkt8Qc+JeUDyjtby-cND5uTKheLtO010A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Nov 1, 2017 at 5:58 PM, Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
> I would also like to address a couple of important points here:
>
> 1. I think default restrictions plus additional paths is the best, safest
> way forward. Excluding shell-globs doesn't solve the "I need this
> particular config file" very well particularly if we want to support this
> outside of an internal environment. Shell globs also tend to be overly
> inclusive and so if you exclude based on them, you run into a chance that
> your rewind is corrupt for being overly exclusive.
>
> 2. I would propose any need for an additional paths be specified using an
> --Include-path directive. This could be specified multiple times and could
> point to a file or directory which would be added to the paths rewound. I
> have a patch for this mostly done but I am concerned that these sorts of
> changes result in a combination of changes that are easier to review
> separately than together. So if needed, I can add it or in a separate patch
> following.
>
> 3. I think it would be a mistake to tie backup solutions in non-replicated
> environments to replication use cases, and vice versa. Things like
> replication slots (used for streaming backups) have different considerations
> in different environments. Rather than build the same infrastructure first,
> I think it is better to support different use cases well and then build
> common infrastructure to support the different cases. I am not against
> building common infrastructure for pg_rewind and pg_basebackup. I am very
> much against having the core guarantees being the exact same.

+const char *rewind_dirs[] = {
+ "base", // Default tablespace
+ "global", // global tablespace
+ "pg_commit_ts", // In case we need to do PITR before up to sync
+ "pg_logical", // WAL related and no good reason to exclude
+ "pg_multixact", // WAL related and may need for vacuum-related reasons
+ "pg_tblspc", // Pther tablespaces
+ "pg_twophase", // mostly to *clear*
+ "pg_wal", // WAL
+ "pg_xact", // Commits of transactions
+ NULL
+};
Incorrect comment format here.

+ in full. The advantage of <application>pg_rewind</> over taking a new base
+ backup, or tools like <application>rsync</>, is that
<application>pg_rewind</>
This generates warnings on HEAD when compiling the documentation.

Please note that I am still -1 for using a methodology different than
what is used for base backups with an inclusive method, and would much
prefer an exclusive method by reusing the existing entries in
basebackup.c. Still, I am the only one who expressed an opinion about
this patch, so moved to next CF with waiting on author as status.
--
Michael


From: David Steele <david(at)pgmasters(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-11-29 12:30:01
Message-ID: b984796f-3177-67c7-a09b-0050d3440ef0@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 11/29/17 12:46 AM, Michael Paquier wrote:> On Wed, Nov 1, 2017 at
5:58 PM, Chris Travers <chris(dot)travers(at)adjust(dot)com> wrote:
>
> Please note that I am still -1 for using a methodology different than
> what is used for base backups with an inclusive method, and would much
> prefer an exclusive method by reusing the existing entries in
> basebackup.c. Still, I am the only one who expressed an opinion about
> this patch, so moved to next CF with waiting on author as status.

I'm also -1 on the inclusive methodology. Forgetting something in the
exclusion list just makes the process less efficient while forgetting
something in the inclusion list may mean breakage. Furthermore,
maintaining two lists does not sound like a good idea.

I worry that extensions using generic WAL might be writing in places we
don't expect and don't think manually adding inclusions is a good
solution as the requirement will not be obvious to the user.

Regards,
--
-David
david(at)pgmasters(dot)net


From: Stephen Frost <sfrost(at)snowman(dot)net>
To: Chris Travers <chris(dot)travers(at)adjust(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Date: 2018-01-22 20:12:58
Message-ID: 20180122201258.GT2416@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Chris,

* Chris Travers (chris(dot)travers(at)adjust(dot)com) wrote:
> Attached is the patch as submitted for commitfest.

This has been stuck in Waiting for Author since before the commitfest
started. I'll try to kick-start it but it seems like it's stuck because
there's a fundamental disagreement about if we should be using include
or exclude for pg_rewind (and, possibly, pg_basebackup, et al).

> Please note, I am not adverse to adding an additional --Include-path
> directive if that avoids backwards-compatibility problems. However the
> patch is complex enough I would really prefer review on the rest of it to
> start first. This doesn't strike me as perfectly trivial and I think it is
> easier to review pieces separately. I have already started on the
> --Include-path directive and could probably get it attached to a later
> version of the patch very soon.
>
> I would also like to address a couple of important points here:
>
> 1. I think default restrictions plus additional paths is the best, safest
> way forward. Excluding shell-globs doesn't solve the "I need this
> particular config file" very well particularly if we want to support this
> outside of an internal environment. Shell globs also tend to be overly
> inclusive and so if you exclude based on them, you run into a chance that
> your rewind is corrupt for being overly exclusive.

There's been a number of strong points made as to why pg_basebackup uses
an exclude list but you seem to keep coming back to a concern around
shell globs when considering exclusion lists.

Having an exclude list doesn't mean that shell globs have to be used to
in the exclude list and, indeed, there are no such globs used in the
exclude list for pg_basebackup today- every single file or directory
excluded by pg_basebackup is an explicit file or directory (compared
using a full strcmp()).

Further, the specific concerns raised are about things which are very
clearly able to be dealt with using specific paths
(postgresql.auto.conf, pg_log) and which an administrator is likely to
be familiar with (such as pg_log, in particular).

Personally, I'm a big advocate of *not* having PG's logs in the data
directory and part of it is because, really, the data directory is PG's
purview while logs are for the administrator. I wasn't ever a fan of
postgresql.auto.conf and I'm not surprised that we're having this
discussion about if it should be pulled over by pg_rewind or not.

> 3. I think it would be a mistake to tie backup solutions in non-replicated
> environments to replication use cases, and vice versa. Things like
> replication slots (used for streaming backups) have different
> considerations in different environments. Rather than build the same
> infrastructure first, I think it is better to support different use cases
> well and then build common infrastructure to support the different cases.
> I am not against building common infrastructure for pg_rewind and
> pg_basebackup. I am very much against having the core guarantees being the
> exact same.

Backup solutions are already involved in understanding of replicated
environments as they can be used to back up from replicas rather than
the primary (or perhaps using both in some cases), so I'm not really
sure why you're argueing that backups are somehow different between
non-replicated and replicated environments.

As it relates to the question about if the core guarantees between
pg_basebackup and pg_rewind being the same or not, I've not see much
argument for why they'd be different. The intent of pg_rewind is to do
what pg_basebackup would do, but in a more efficient manner, as
discussed in the documentation for pg_rewind.

I would think the next step here, as Michael suggested very early on in
this thread, would be to bring the exclude list and perhaps logic for
pg_basebackup into the common code and have pg_rewind leverage that
instead of having its own code that largely does the same and then
adding an option to exclude additional items to that. There's no sense
having pg_rewind operate on files that are going to end up getting wiped
out when recovery starts anyway. Perhaps there's a use-case for
overriding the exclude list with a 'include' option too, but I'm not
convinced there is.

Thanks!

Stephen


From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Chris Travers <chris(dot)travers(at)adjust(dot)com>, PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
Date: 2018-02-01 06:41:25
Message-ID: 20180201064125.GF6398@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 22, 2018 at 03:12:58PM -0500, Stephen Frost wrote:
> I would think the next step here, as Michael suggested very early on in
> this thread, would be to bring the exclude list and perhaps logic for
> pg_basebackup into the common code and have pg_rewind leverage that
> instead of having its own code that largely does the same and then
> adding an option to exclude additional items to that. There's no sense
> having pg_rewind operate on files that are going to end up getting wiped
> out when recovery starts anyway. Perhaps there's a use-case for
> overriding the exclude list with a 'include' option too, but I'm not
> convinced there is.

Me neither. I'll look into getting something for the next commit fest.
There have been way too many complaints about how pg_rewind copies too
much data for nothing to ignore doing something (the last one about
pg_replslot data). A good first step would be what you are writing in
the paragraph above, so I intend to do that as I am sure that it would
be a good addition. For now I have marked the proposed patches as
returned with feedback.
--
Michael