Re: Off-site storage for PITR logs

Lists: pgsql-admin
From: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>
To: pgsql-admin(at)postgresql(dot)org
Subject: Off-site storage for PITR logs
Date: 2010-03-12 16:40:43
Message-ID: 4B9A6E8B.4050303@magneta.com.cy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi,
I am planning an off-site backup solution for a fairly busy
(mostly-write) 8.3 database. The database is currently about 200GB in
size. I though about using log shipping and cold standby since it's easy
in terms of administration and also offers point in time recovery.

The database generates about 3 PITR log files per minute. If my
calculations are correct the sites need to be connected with a 7MBit
connection and the logs will need about 68GB of storage per day!

Does anyone have any suggestions on how to significantly reduce the
volume of log files or recommend another off-site backup solution that
would require less bandwidth and storage?

Thanks.

--
Regards,

Nicos Panayides
IT Manager

Magneta Technologies Ltd
Tel: +357 22721919, 22317400
Fax: +357 22721917
Web: http://www.magneta.eu


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>
Cc: pgsql-admin(at)postgresql(dot)org, koichi(dot)szk(at)gmail(dot)com
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-12 17:11:17
Message-ID: 201003121711.o2CHBHf27975@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Nicos Panayides wrote:
> Hi,
> I am planning an off-site backup solution for a fairly busy
> (mostly-write) 8.3 database. The database is currently about 200GB in
> size. I though about using log shipping and cold standby since it's easy
> in terms of administration and also offers point in time recovery.
>
> The database generates about 3 PITR log files per minute. If my
> calculations are correct the sites need to be connected with a 7MBit
> connection and the logs will need about 68GB of storage per day!
>
> Does anyone have any suggestions on how to significantly reduce the
> volume of log files or recommend another off-site backup solution that
> would require less bandwidth and storage?

I would use pg_lesslog to reduce the size of the WAL files:

http://pgfoundry.org/projects/pglesslog/

However, there is a bug in pg_lesslog so I wouldn't use it until that is
fixed:

http://archives.postgresql.org/pgsql-announce/2010-02/msg00005.php

Koichi, do you have any update on this? I don't see that a new version
has been uploaded, and I also see no mention on the pgfoundry site about
the bug.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Nicos Panayides" <nicos(at)magneta(dot)com(dot)cy>, <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-12 17:20:07
Message-ID: 4B9A2367020000250002FCEC@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Nicos Panayides <nicos(at)magneta(dot)com(dot)cy> wrote:

> The database generates about 3 PITR log files per minute. If my
> calculations are correct the sites need to be connected with a
> 7MBit connection and the logs will need about 68GB of storage per
> day!

I'd have put the minimum line speed at 8Mb/second, but I figure 10
bits per byte to allow for parity and packet overhead, and the
assumption that I might want to use the line for something else
(like monitoring) at the same time; but yeah, those numbers are in
the right ballpark.

> Does anyone have any suggestions on how to significantly reduce
> the volume of log files or recommend another off-site backup
> solution that would require less bandwidth and storage?

We stream the WAL files through gzip in our archive script. That
reduces them to 4MB to 8MB during normal usage, and 12MB to 15MB
during our routine database vacuums. If you have relatively idle
periods during which the log is pushed by archive_timeout, you
should filter through pg_clearxlogtail or pglesslog before using
gzip; the former yields a 16kB file during idle periods. (I'm not
sure about the other, but it should be capable of getting even
smaller.)

You'll need to determine how far back it's valuable to have the
"point in time recovery" capabilities. We do weekly base backups
and keep WAL files to restore forward from the earlier to current,
but then keep monthly "archival" backups, where each base backup is
kept with just the WAL files needed to successfully restore it.

-Kevin


From: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>, pgsql-admin(at)postgresql(dot)org
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-13 00:53:45
Message-ID: ef4f49ae1003121653m2c69a15ar2fd5bb9d87d46d63@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Sorry for that. The bug was incorrect calculation of XNOOP record
size in GiST WAL when incremental log is created from the full backup.
Because the bug impact is so serious, I'm rebuilding the test
environment which is taking long. I'll report the cause of the bug
to hackers and bugs.

If GiST is not used, WAL compression works correctly but I'm not
comfortable to use lesslog with known bug.

----------
Koichi Suzuki

2010/3/13 Bruce Momjian <bruce(at)momjian(dot)us>:
> Nicos Panayides wrote:
>> Hi,
>> I am planning an off-site backup solution for a fairly busy
>> (mostly-write) 8.3 database. The database is currently about 200GB in
>> size. I though about using log shipping and cold standby since it's easy
>> in terms of administration and also offers point in time recovery.
>>
>> The database generates about 3 PITR log files per minute. If my
>> calculations are correct the sites need to be connected with a 7MBit
>> connection and the logs will need about 68GB of storage per day!
>>
>> Does anyone have any suggestions on how to significantly reduce the
>> volume of log files or recommend another off-site backup solution that
>> would require less bandwidth and storage?
>
> I would use pg_lesslog to reduce the size of the WAL files:
>
>        http://pgfoundry.org/projects/pglesslog/
>
> However, there is a bug in pg_lesslog so I wouldn't use it until that is
> fixed:
>
>        http://archives.postgresql.org/pgsql-announce/2010-02/msg00005.php
>
> Koichi, do you have any update on this?  I don't see that a new version
> has been uploaded, and I also see no mention on the pgfoundry site about
> the bug.
>
> --
>  Bruce Momjian  <bruce(at)momjian(dot)us>        http://momjian.us
>  EnterpriseDB                             http://enterprisedb.com
>
>  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
>


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-14 08:50:12
Message-ID: 1268556612.3825.5025.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Fri, 2010-03-12 at 18:40 +0200, Nicos Panayides wrote:

> I am planning an off-site backup solution for a fairly busy
> (mostly-write) 8.3 database. The database is currently about 200GB in
> size. I though about using log shipping and cold standby since it's easy
> in terms of administration and also offers point in time recovery.
>
> The database generates about 3 PITR log files per minute. If my
> calculations are correct the sites need to be connected with a 7MBit
> connection and the logs will need about 68GB of storage per day!
>
> Does anyone have any suggestions on how to significantly reduce the
> volume of log files or recommend another off-site backup solution that
> would require less bandwidth and storage?

Consider turning off full_page_writes which will permanently minimise
the amount of WAL traffic, as well as being an integrated and
fully-supported option with Postgres. You will need additional
non-volatile cache on your disks to ensure that is a safe option, though
that will yield performance advantages also.

--
Simon Riggs www.2ndQuadrant.com


From: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>
To: Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-15 09:27:03
Message-ID: 4B9DFD67.3050106@magneta.com.cy
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi Simon,
thanks for the suggestion. What kind of space savings should I expected
from turning off full_page_writes? The servers have battery-backed
write-caches for the disk controllers, so it should be safe to give this
a try. Does turning off full_page_writes have any other effect on the
database?

Regards,

Nicos Panayides
IT Manager

Magneta Technologies Ltd
Tel: +357 22721919, 22317400
Fax: +357 22721917
Web: http://www.magneta.eu

On 14/03/2010 10:50, Simon Riggs wrote:
> On Fri, 2010-03-12 at 18:40 +0200, Nicos Panayides wrote:
>
>> I am planning an off-site backup solution for a fairly busy
>> (mostly-write) 8.3 database. The database is currently about 200GB in
>> size. I though about using log shipping and cold standby since it's easy
>> in terms of administration and also offers point in time recovery.
>>
>> The database generates about 3 PITR log files per minute. If my
>> calculations are correct the sites need to be connected with a 7MBit
>> connection and the logs will need about 68GB of storage per day!
>>
>> Does anyone have any suggestions on how to significantly reduce the
>> volume of log files or recommend another off-site backup solution that
>> would require less bandwidth and storage?
>
> Consider turning off full_page_writes which will permanently minimise
> the amount of WAL traffic, as well as being an integrated and
> fully-supported option with Postgres. You will need additional
> non-volatile cache on your disks to ensure that is a safe option, though
> that will yield performance advantages also.
>


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Nicos Panayides <nicos(at)magneta(dot)com(dot)cy>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Off-site storage for PITR logs
Date: 2010-03-15 09:57:01
Message-ID: 1268647021.3825.7469.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Mon, 2010-03-15 at 11:27 +0200, Nicos Panayides wrote:

> thanks for the suggestion. What kind of space savings should I expected
> from turning off full_page_writes?

Substantial, though you should measure it and see, since it is workload
dependent.

--
Simon Riggs www.2ndQuadrant.com


From: Renato Oliveira <renato(dot)oliveira(at)grant(dot)co(dot)uk>
To: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: PITR load on servers - increased 20%
Date: 2010-03-16 10:05:08
Message-ID: 7965A9DCF12CC14984420BCC37B1608F25AAFB989B@Elzar.grant.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Dear all,

I have been testing PITR and I have noticed a 20% increase on the load of the Disk subsystem.

Is that a normal thing to expect? Have you come across this or have you noticed this increase?

I know some of you guys have been using it for quite some time, would you mind in sharing your experiences, please?

Thank you very much

Best regards

Renato

Renato Oliveira
Systems Administrator
e-mail: renato(dot)oliveira(at)grant(dot)co(dot)uk

Tel: +44 (0)1763 260811
Fax: +44 (0)1763 262410
http://www.grant.co.uk/

Grant Instruments (Cambridge) Ltd

Company registered in England, registration number 658133

Registered office address:
29 Station Road,
Shepreth,
CAMBS SG8 6GB
UK

P Please consider the environment before printing this email
CONFIDENTIALITY: The information in this e-mail and any attachments is confidential. It is intended only for the named recipients(s). If you are not the named recipient please notify the sender immediately and do not disclose the contents to another person or take copies.

VIRUSES: The contents of this e-mail or attachment(s) may contain viruses which could damage your own computer system. Whilst Grant Instruments (Cambridge) Ltd has taken every reasonable precaution to minimise this risk, we cannot accept liability for any damage which you sustain as a result of software viruses. You should therefore carry out your own virus checks before opening the attachment(s).

OpenXML: For information about the OpenXML file format in use within Grant Instruments please visit our http://www.grant.co.uk/Support/openxml.html


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Renato Oliveira" <renato(dot)oliveira(at)grant(dot)co(dot)uk>
Cc: <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR load on servers - increased 20%
Date: 2010-03-17 13:55:53
Message-ID: 4BA09919020000250002FE36@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Renato Oliveira <renato(dot)oliveira(at)grant(dot)co(dot)uk> wrote:

> I have been testing PITR and I have noticed a 20% increase on the
> load of the Disk subsystem.
>
> Is that a normal thing to expect?

It depends on your workload and archive script, but that doesn't
seem too shocking.

For one thing, turning on archiving causes more to be written to the
WAL files, so you should expect some increase just by enabling
archiving, even if your archive script just consists of 'exit 0'.
Reading the WAL file in the archive script will often be reading
from OS cache, but not necessarily. If you're copying the WAL files
to a local file system (which often makes sense), that obviously
increases I/O. If there's enough delay between the archive script
writing the file to a local drive and something (e.g., rsync)
picking it up, it might need to read from the disk, and rsync is
likely to do some logging.

I would be surprised if you couldn't measure any increase, and 20%
is certainly within a reasonable range, depending on all the above.
If you need to try to minimize the impact, besides considering all
of the above, you might want to look at the full_page_writes
configuration option.

http://www.postgresql.org/docs/8.4/interactive/runtime-config-wal.html#GUC-FULL-PAGE-WRITES

-Kevin


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Renato Oliveira <renato(dot)oliveira(at)grant(dot)co(dot)uk>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: PITR load on servers - increased 20%
Date: 2010-03-18 09:07:57
Message-ID: 1268903277.3827.1699.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

On Tue, 2010-03-16 at 10:05 +0000, Renato Oliveira wrote:

> I have been testing PITR and I have noticed a 20% increase on the load of the Disk subsystem.
>
> Is that a normal thing to expect? Have you come across this or have you noticed this increase?
>
> I know some of you guys have been using it for quite some time, would you mind in sharing your experiences, please?

I think you need to explain what things you've been testing and how.

There are some well documented bulk data load optimizations that we
cannot take advantage of when running archiving, but apart from those
the load is actually very low in normal running.

--
Simon Riggs www.2ndQuadrant.com


From: Iñigo Martinez Lasala <imartinez(at)vectorsf(dot)com>
To: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Missing tsearch2 functions
Date: 2010-03-18 16:41:31
Message-ID: 1268930491.10034.63.camel@deimos
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Hi everybody.

We are trying to migrate a gforge (again) with postgres 8.1/2 to
postgres 8.4.

We have adapted our schema in order to use tsearch2 from contrib folder
for not having to rewrite all functions. We have cleaned most of old
tsearch2 stuff, but some tsearch2 functions have dissapeared or at least
we don't know how to invoke with postgres 8.4

Theses functions are:
exectsq
rexectsq
concat

I've been unable to locate them at pg_catalog. I've not found reference
in tsearch2 documentation for exectsq and rexectsq. But concat should be
in place but it's missing!!
Any hint?

Thanks in advance.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Iñigo Martinez Lasala <imartinez(at)vectorsf(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Missing tsearch2 functions
Date: 2010-03-18 16:46:46
Message-ID: 13151.1268930806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

=?ISO-8859-1?Q?I=F1igo?= Martinez Lasala <imartinez(at)vectorsf(dot)com> writes:
> We have adapted our schema in order to use tsearch2 from contrib folder
> for not having to rewrite all functions. We have cleaned most of old
> tsearch2 stuff, but some tsearch2 functions have dissapeared or at least
> we don't know how to invoke with postgres 8.4

> Theses functions are:
> exectsq
> rexectsq
> concat

None of those functions were ever meant to be invoked directly. Use the
corresponding operators (@@ and ||) instead.

regards, tom lane


From: Iñigo Martinez Lasala <imartinez(at)vectorsf(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Missing tsearch2 functions
Date: 2010-03-18 19:39:48
Message-ID: 2087950908.632701268941188658.JavaMail.root@hermes.vectorsf.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

Thank you, Tom.

In fact what was failing was the creation of operators || and @@. Both used the procedures concat, exectsq and rexectsq. So we can delete those operator creation sentences.

----- Mensaje original -----
De: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Para: "Iñigo Martinez Lasala" <imartinez(at)vectorsf(dot)com>
CC: "pgsql-admin" <pgsql-admin(at)postgresql(dot)org>
Enviados: Jueves, 18 de Marzo 2010 17:46:46 GMT +01:00 Amsterdam / Berlín / Berna / Roma / Estocolmo / Viena
Asunto: Re: [ADMIN] Missing tsearch2 functions

=?ISO-8859-1?Q?I=F1igo?= Martinez Lasala <imartinez(at)vectorsf(dot)com> writes:
> We have adapted our schema in order to use tsearch2 from contrib folder
> for not having to rewrite all functions. We have cleaned most of old
> tsearch2 stuff, but some tsearch2 functions have dissapeared or at least
> we don't know how to invoke with postgres 8.4

> Theses functions are:
> exectsq
> rexectsq
> concat

None of those functions were ever meant to be invoked directly. Use the
corresponding operators (@@ and ||) instead.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Iñigo Martinez Lasala <imartinez(at)vectorsf(dot)com>
Cc: pgsql-admin <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Missing tsearch2 functions
Date: 2010-03-18 19:48:49
Message-ID: 16234.1268941729@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-admin

=?utf-8?Q?I=C3=B1igo_Martinez_Lasala?= <imartinez(at)vectorsf(dot)com> writes:
> In fact what was failing was the creation of operators || and @@. Both used the procedures concat, exectsq and rexectsq. So we can delete those operator creation sentences.

Actually, it sounds to me like you might want to think about installing
the newer version's contrib/tsearch2 first. See
http://www.postgresql.org/docs/8.4/static/textsearch-migration.html

In any case note that it's not expected that a dump containing the old
tsearch2 object definitions will load without any noise. You should
expect some errors like this. Just ignore the errors, unless they are
preventing loading some object that doesn't belong to the old tsearch2
module.

regards, tom lane