Re: WIP: Restricting pg_rewind to data/wal dirs

From: Chris Travers <chris(dot)travers(at)adjust(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>, David Steele <david(at)pgmasters(dot)net>, marco(dot)nenciarini(at)2ndquadrant(dot)it
Subject: Re: WIP: Restricting pg_rewind to data/wal dirs
Date: 2017-10-30 11:22:59
Message-ID: CAN-RpxB1jJ5pF5G3cbTe1BmXZx_2JOZLMCHW9zEo+t2XDSRhJQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Oct 30, 2017 at 11:36 AM, Michael Paquier <michael(dot)paquier(at)gmail(dot)com
> wrote:

> On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
> <chris(dot)travers(at)adjust(dot)com> wrote:
> > This also brings up a fairly major concern more generally about control
> by
> > the way. A lot of cases where pg_rewind is called, the user doesn't
> > necessarily have much control on how it is called. Moreover in many of
> > these cases, the user is probably not in a position to understand the
> > internals well enough to grasp what to check after.
>
> Likely they are not.
>

So currently I am submitting the current patch to commit fest. I am open
to adding a new
--Include-path option in this patch, but I am worried about atomicity
concerns, and I am still
not really sure what the impact is for you (I haven't heard whether you
expect this file to be
there before the rewind, i.e. whether it would be on both master and slave,
or just on the slave).

Sp there is the question of under what specific circumstances this would
break for you.

>
> > Right, but there is a use case difference between "I am taking a backup
> of a
> > server" and "I need to get the server into rejoin the replication as a
> > standby."
>
> The intersection of the first and second categories is not empty. Some
> take backups and use them to deploy standbys.
>

Sure, but it is not complete either.

>
> > A really good example of where this is a big problem is with replication
> > slots. On a backup I would expect you want replication slots to be
> copied
> > in.
>
> I would actually expect the contrary, and please note that replication
> slots are not taken in a base backup, which is what the documentation
> says as well:
> https://www.postgresql.org/docs/10/static/protocol-replication.html
> "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
> pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
> they are symbolic links)."
>

Many of these are emptied on server restart but repslot is not. What is
the logic
for excluding it from backups other than to avoid problems with replication
provisioning?

I mean if I have a replication slot for taking backups with barman (and I
am not actually doing replication) why would I *not* want that in my base
backup if I might have to restore to a new machine somewhere?

>
> Some code I have with 9.6's pg_bsaebackup removes manually replication
> slots as this logic is new in 10 ;)
>
> >> The pattern that base backups now use is an exclude list. In the
> >> future I would rather see base backups and pg_rewind using the same
> >> infrastructure for both things:
> >> - pg_rewind should use the replication protocol with BASE_BACKUP.
> >> Having it rely on root access now is crazy.
> >> - BASE_BACKUP should have an option where it is possible to exclude
> >> custom paths.
> >> What you are proposing here would make both diverge more, which is in
> >> my opinion not a good approach.
> >
> > How does rep mgr or other programs using pg_rewind know what to exclude?
>
> Good question. Answers could come from folks such as David Steele
> (pgBackRest) or Marco (barman) whom I am attaching in CC.
>

Two points that further occur to me:

Shell globs seem to me to be foot guns in this area, I think special paths
should be one path per invocation of the option not "--exclude=pg_w*" since
this is just asking for problems as time goes on and things get renamed.

It also seems to me that adding specific paths is far safer than removing
specific paths.

> --
> Michael
>

--
Best Regards,
Chris Travers
Database Administrator

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-30 11:27:51 Re: Current int & float overflow checking is slow.
Previous Message Oleg Bartunov 2017-10-30 11:07:41 Re: How to implement a SP-GiST index as a extension module?