Feature request: pg_basebackup --force

Lists: pgsql-hackers
From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Feature request: pg_basebackup --force
Date: 2011-04-09 18:26:14
Message-ID: 2014240922.53212.1302373574331.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus, all:

It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup. This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.

Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 10:46:45
Message-ID: BANLkTi=KRWB9Ju+7H9XTKjjvpcuJM-0KQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 9, 2011 at 20:26, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> Magnus, all:
>
> It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup.  This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.
>
> Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

That could certainly be useful, yes. But I have a feeling whomever
tries to get that into 9.1 will be killed - but it's certainly good to
put ont he list of things for 9.2.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


From: Joshua Berkus <josh(at)agliodbs(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 16:06:16
Message-ID: 1997520414.3818.1302451576638.JavaMail.root@mail-1.01.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Magnus,

> That could certainly be useful, yes. But I have a feeling whomever
> tries to get that into 9.1 will be killed - but it's certainly good to
> put ont he list of things for 9.2.

Oh, no question. At some point in 9.2 we should also discuss how basebackup considers "emtpy" directories. Because the other thing I find myself constantly scripting is replacing the conf files on the replica after the base backup sync.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com
San Francisco


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joshua Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 16:28:49
Message-ID: BANLkTi=DbJPpVAKJ+gtO-V4ye6FDghzamA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
> It seems a bit annoying to have to do an rm -rf * $PGDATA/ before resynching a standby using pg_basebackup.  This means that I still need to wrap basebackup in a shell script, instead of having it do everything for me ... especially if I have multiple tablespaces.
>
> Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

What would be even more useful us some kind of support for
differential copy, a la rsync.

(Now I'm waiting for someone to tell me this is a pipe dream.)

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 16:35:36
Message-ID: 9784.1302453336@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>> Couldn't we have a --force option which would clear all data and tablespace directories before resynching?

> What would be even more useful us some kind of support for
> differential copy, a la rsync.

> (Now I'm waiting for someone to tell me this is a pipe dream.)

Not so much a pipe dream as reinventing the wheel. Why not use rsync?

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 16:41:36
Message-ID: BANLkTinuRX8dCPPSfSgS+m5BYp8WjJk=Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 10, 2011 at 12:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Sat, Apr 9, 2011 at 2:26 PM, Joshua Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Couldn't we have a --force option which would clear all data and tablespace directories before resynching?
>
>> What would be even more useful us some kind of support for
>> differential copy, a la rsync.
>
>> (Now I'm waiting for someone to tell me this is a pipe dream.)
>
> Not so much a pipe dream as reinventing the wheel.  Why not use rsync?

It's not integrated and I doubt it's conveniently available on Windows.

One of the biggest problems with our replication functionality right
now is that it's hard to set up. We've actually done a good job
making the very simplest case (one slave, no archive) reasonably
simple, but how many PostgreSQL users do you think can manage to set
up SR + HS + archiving, with two slaves that can use the archive if
they fall too far behind the master, but with the archive regularly
trimmed to the farthest-back segment that is still needed?

We have pg_archivecleanup, but AIUI that's only smart enough to handle
the one-standby case.

Admittedly, the above is a slightly different problem, but I think it
all points in the direction of needing more automation and more ease
of use.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Joshua Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 17:06:24
Message-ID: BANLkTi=aKzjwWY7JCk9UjXreHsx5RW3-zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> It's not integrated and I doubt it's conveniently available on Windows.
>
> One of the biggest problems with our replication functionality right
> now is that it's hard to set up.  We've actually done a good job
> making the very simplest case (one slave, no archive) reasonably
> simple, but how many PostgreSQL users do you think can manage to set
> up SR + HS + archiving, with two slaves that can use the archive if
> they fall too far behind the master, but with the archive regularly
> trimmed to the farthest-back segment that is still needed?
>
> We have pg_archivecleanup, but AIUI that's only smart enough to handle
> the one-standby case.
>
> Admittedly, the above is a slightly different problem, but I think it
> all points in the direction of needing more automation and more ease
> of use.

And let me also note that the difficulty of getting this all exactly
right is one of the things that causes people to come up with creative
solutions like this:

http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php

That's why we need to put it in a box, tie a bow around it, and put up
a big sign that says "do not look into laser with remaining eye".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Joshua Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-10 17:15:39
Message-ID: 4DA1E5BB.1040207@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10.04.2011 20:06, Robert Haas wrote:
> On Sun, Apr 10, 2011 at 12:41 PM, Robert Haas<robertmhaas(at)gmail(dot)com> wrote:
>> Admittedly, the above is a slightly different problem, but I think it
>> all points in the direction of needing more automation and more ease
>> of use.
>
> And let me also note that the difficulty of getting this all exactly
> right is one of the things that causes people to come up with creative
> solutions like this:
>
> http://archives.postgresql.org/pgsql-hackers/2010-12/msg02514.php
>
> That's why we need to put it in a box, tie a bow around it, and put up
> a big sign that says "do not look into laser with remaining eye".

That's exactly what pg_basebackup does. Once you move into more
complicated scenarios with multiple standbys and WAL archiving, it's
inevitably going to be more complicated to set up.

That doesn't mean that we can't make it easier - we can and we should -
but I don't think the common complaint that replication is hard to set
up is true anymore.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "Robert Haas" <robertmhaas(at)gmail(dot)com>
Cc: "Joshua Berkus" <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>,"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Feature request: pg_basebackup --force
Date: 2011-04-11 15:03:49
Message-ID: 4DA2D205020000250003C62C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> That's exactly what pg_basebackup does. Once you move into more
> complicated scenarios with multiple standbys and WAL archiving,
> it's inevitably going to be more complicated to set up.
>
> That doesn't mean that we can't make it easier - we can and we
> should - but I don't think the common complaint that replication
> is hard to set up is true anymore.

Getting back to the rsync-like behavior, which is what led the
conversation in this direction, I think -- the point of that seemed
to be to allow similar ease of use for those activating a replicated
node as the master, without requiring that the entire data directory
be sent over a slow WAN or Internet path when the delta needed to
modify what was already at the remote end to match the new master
might be orders of magnitude less than data than that.

The intelligence to support that would be a fraction of what is in
rsync. In fact, since we might want to ignore hint bit differences
where possible, rsync might not work nearly as well as a home-grown
solution.

-Kevin