Re: Streaming Replication and archiving

Lists: pgsql-hackers
From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Streaming Replication and archiving
Date: 2010-01-19 22:13:36
Message-ID: 4B562E90.3080107@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I've been having a look at this, one master + one replica and also one
master + 2 replicas. I gotta say this is a nice piece of functionality
(particularly the multiple replicas).

I've been using the wiki page
(http://wiki.postgresql.org/wiki/Streaming_Replication) as a guide, and
I notice that it recommends the master (and replicas) have a non-trivial
archive_command even after the backup step is completed. ISTM that after
the backup the master's archive_command can be set to '' or '/bin/true'
as the walsender does not make any use of the WAL archive (AFAICS
anyway). Clearly it might be desirable to have the archived segments
around for other reasons - but equally it might be desirable *not* to
have to have to (e.g disk space), or am I overlooking something?

Cheers

Mark


From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-19 22:35:40
Message-ID: m2hbqh7mf7.fsf@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> I've been using the wiki page
> (http://wiki.postgresql.org/wiki/Streaming_Replication) as a guide, and I
> notice that it recommends the master (and replicas) have a non-trivial
> archive_command even after the backup step is completed. ISTM that after the
> backup the master's archive_command can be set to '' or '/bin/true' as the
> walsender does not make any use of the WAL archive (AFAICS anyway). Clearly
> it might be desirable to have the archived segments around for other reasons
> - but equally it might be desirable *not* to have to have to (e.g disk
> space), or am I overlooking something?

I think it's still necessary in the case the connection between a slave
and the master breaks. If the transient error is long enough for the
slave requesting WALs that the master no longer has, restore_command
will get used.

IIUC from the mails here, the restore_command is run directly by the
slave itself, so it needs to have access to master archives embedded in
the restore_command.

Take all this with a huge grain of salt, that's my understanding without
having had the time to read the patch or play with it yet.

Regards,
--
dim

PS: maybe the states and transitions and what data flows from where to
where in each of those would be good at least in the documentation, if
not explicit in the code?


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-20 00:16:21
Message-ID: 4B564B55.8050600@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Dimitri Fontaine wrote:
> Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
>
>> I've been using the wiki page
>> (http://wiki.postgresql.org/wiki/Streaming_Replication) as a guide, and I
>> notice that it recommends the master (and replicas) have a non-trivial
>> archive_command even after the backup step is completed. ISTM that after the
>> backup the master's archive_command can be set to '' or '/bin/true' as the
>> walsender does not make any use of the WAL archive (AFAICS anyway). Clearly
>> it might be desirable to have the archived segments around for other reasons
>> - but equally it might be desirable *not* to have to have to (e.g disk
>> space), or am I overlooking something?
>>
>
> I think it's still necessary in the case the connection between a slave
> and the master breaks. If the transient error is long enough for the
> slave requesting WALs that the master no longer has, restore_command
> will get used.
>
> IIUC from the mails here, the restore_command is run directly by the
> slave itself, so it needs to have access to master archives embedded in
> the restore_command.
>
> Take all this with a huge grain of salt, that's my understanding without
> having had the time to read the patch or play with it yet.
>
>
Thanks Dimitri, I'd missed that thread. Ok, slave will need a suitable
restore_comand in addition to primary_conninfo in recovery.conf, and
then extended communication failures (or shutting down the slave for a
while!) will not break the streaming setup (FWIW I tried this just now).

regards

Mark


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-20 20:47:14
Message-ID: 4B576BD2.4050809@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Thanks Dimitri, I'd missed that thread. Ok, slave will need a suitable
> restore_comand in addition to primary_conninfo in recovery.conf, and
> then extended communication failures (or shutting down the slave for a
> while!) will not break the streaming setup (FWIW I tried this just now).

Sure, but if the archived WAL segments are NOT needed, how are they
supposed to get deleted? It doesn't take long to run out of disk space
if they're not being rotated.

Masao-san, can you comment on this?

--Josh


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-20 21:33:25
Message-ID: 4B5776A5.4030909@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Josh Berkus wrote:
>> Thanks Dimitri, I'd missed that thread. Ok, slave will need a suitable
>> restore_comand in addition to primary_conninfo in recovery.conf, and
>> then extended communication failures (or shutting down the slave for a
>> while!) will not break the streaming setup (FWIW I tried this just now).
>>
>
> Sure, but if the archived WAL segments are NOT needed, how are they
> supposed to get deleted? It doesn't take long to run out of disk space
> if they're not being rotated.
>
>

+1

From what I am seeing at the moment (8.5 devel from 2 days ago), the
archived segments are not deleted at all (I have several hundred now
after a number of pgbench runs over the last day or so).

regards

Mark


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-20 23:22:01
Message-ID: 22807.1264029721@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
> Josh Berkus wrote:
>> Sure, but if the archived WAL segments are NOT needed, how are they
>> supposed to get deleted? It doesn't take long to run out of disk space
>> if they're not being rotated.

> From what I am seeing at the moment (8.5 devel from 2 days ago), the
> archived segments are not deleted at all (I have several hundred now
> after a number of pgbench runs over the last day or so).

Huh? *Archived* segments aren't supposed to get deleted, at least not
by any automatic Postgres action. It would be up to the DBA how long
he wants to keep them around.

regards, tom lane


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 00:37:44
Message-ID: 4B57A1D8.1000900@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz> writes:
>
>> Josh Berkus wrote:
>>
>>> Sure, but if the archived WAL segments are NOT needed, how are they
>>> supposed to get deleted? It doesn't take long to run out of disk space
>>> if they're not being rotated.
>>>
>
>
>> From what I am seeing at the moment (8.5 devel from 2 days ago), the
>> archived segments are not deleted at all (I have several hundred now
>> after a number of pgbench runs over the last day or so).
>>
>
> Huh? *Archived* segments aren't supposed to get deleted, at least not
> by any automatic Postgres action. It would be up to the DBA how long
> he wants to keep them around.
>
>
>

Exactly - there was a comment in the 'retry from archive' thread that
suggested otherwise. The likely typical use case for streaming
replication makes a good case and automated safe way of pruning these
guys - I've seen a few cases where overly aggressive cleanup has broken
log shipping setups (usually 8.2, before the restart option was available).

regards

Mark


From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 00:39:29
Message-ID: 4B57A241.4050709@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Mark Kirkwood wrote:
> The likely typical use case for streaming replication makes a good
> case and automated safe way of pruning these guys

Sorry, stupid typo: should read '...makes a good case for an automated
safe way of pruning these....'


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 01:48:22
Message-ID: 4B57B266.2050209@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


> Huh? *Archived* segments aren't supposed to get deleted, at least not
> by any automatic Postgres action. It would be up to the DBA how long
> he wants to keep them around.

OK. The docs indicated that the segments needed to be kept around in
case the slave fell behind. If that's not the case (as it appears not
to be) then they can just be deleted by cron job, or the archive_command
on the master can be changed.

Presumably, however, if the slave falls sufficiently behind and there
are no archive logs, then the slave would not be able to resynch with
the master, no?

--Josh Berkus


From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 02:54:54
Message-ID: 3f0b79eb1001201854i6974951fyf9189a349d4dda30@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 21, 2010 at 10:48 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> Presumably, however, if the slave falls sufficiently behind and there
> are no archive logs, then the slave would not be able to resynch with
> the master, no?

In that case, we would need to make a fresh backup of the primary,
and start the standby from that data.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 13:57:01
Message-ID: 407d949e1001210557r6fdd1d22m7d1e6b78a261a67e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jan 21, 2010 at 1:48 AM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Huh?  *Archived* segments aren't supposed to get deleted, at least not
>> by any automatic Postgres action.  It would be up to the DBA how long
>> he wants to keep them around.
>
> OK.  The docs indicated that the segments needed to be kept around in
> case the slave fell behind.  If that's not the case (as it appears not
> to be) then they can just be deleted by cron job, or the archive_command
> on the master can be changed.

It's definitely the case.

Generally speaking each base backup image has an oldest archived log
which is needed to make it useful. And each standby database -- which
is just a recovered base backup which has been rolled forward some
distance already -- also has one.

What would be useful is a tool which given a list of standby databases
and list of base backup images can apply a set of policy rules to
determine which base backups and archived logs to delete.

The policy might look something like "keep one base backup per week
going back a month and one per day going back seven days and keep
archived logs going back far enough for any of these base backups or
any of these live replicas."

Bonus points if you can say "also keep one base backup per month going
back three years with just enough archived logs to recover the base
backup to a consistent state".

--
greg


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Josh Berkus" <josh(at)agliodbs(dot)com>,"Greg Stark" <gsstark(at)mit(dot)edu>
Cc: "Mark Kirkwood" <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>, "Dimitri Fontaine" <dfontaine(at)hi-media(dot)com>, "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Streaming Replication and archiving
Date: 2010-01-21 15:31:52
Message-ID: 4B581F08020000250002E947@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> wrote:

> What would be useful is a tool which given a list of standby
> databases and list of base backup images can apply a set of policy
> rules to determine which base backups and archived logs to delete.
>
> The policy might look something like "keep one base backup per
> week going back a month and one per day going back seven days and
> keep archived logs going back far enough for any of these base
> backups or any of these live replicas."
>
> Bonus points if you can say "also keep one base backup per month
> going back three years with just enough archived logs to recover
> the base backup to a consistent state".

Hmmm... Our policy is "Keep the most recent base backup and all WAL
files from the point needed to use it to current on the backup
server local to the source database, keep the most recent two weekly
backups and all the WAL files from the point needed to start the
earlier to current on the central backup server (along with a warm
standby instance running for each source to confirm that the backup
and WAL files are usable), and keep the first weekly backup of each
month and just enough WAL files for a consistent start of each on a
mirrored SAN archive for one year."

Just in case you're looking for real-life policies currently in use.

By the way, the .backup files and the information from
pg_controldata and pg_ctl status currently provide just enough
information for this all to be run from bash scripts without much
human attention, but the effort required to get there is far from
trivial. I'm sure that something which made such a policy easy to
implement would be useful to some shops. Hopefully that can be done
without breaking current scripts.

-Kevin