Re: archive_timeout?

Lists: pgsql-hackers
From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: pgsql-hackers(at)postgresql(dot)org
Subject: archive_timeout?
Date: 2006-10-10 13:26:48
Message-ID: 20061010.222648.58446397.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

If archive_timeout is set to non 0, it seems an archive log segment is
created every time checkpoint occurs even there's no database
updation. This leads to creating 16MB log segment files every 5
minutes (default checkpoint period), which will in turn produce 4.6GB
log segments with bogus data. Is this normal?

This is PostgreSQL 8.2 beta1.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: ishii(at)sraoss(dot)co(dot)jp
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 13:29:05
Message-ID: 20061010.222905.110002911.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> If archive_timeout is set to non 0, it seems an archive log segment is
> created every time checkpoint occurs even there's no database
> updation. This leads to creating 16MB log segment files every 5
> minutes (default checkpoint period), which will in turn produce 4.6GB
> log segments with bogus data. Is this normal?

I mean 4.6GB per day.

> This is PostgreSQL 8.2 beta1.
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 14:11:36
Message-ID: 17470.1160489496@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> writes:
> If archive_timeout is set to non 0, it seems an archive log segment is
> created every time checkpoint occurs even there's no database
> updation. This leads to creating 16MB log segment files every 5
> minutes (default checkpoint period), which will in turn produce 4.6GB
> log segments with bogus data. Is this normal?

Yeah, that was intentional, see discussion a few weeks ago.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 17:07:02
Message-ID: 1160500022.2515.12.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2006-10-10 at 22:26 +0900, Tatsuo Ishii wrote:
> If archive_timeout is set to non 0, it seems an archive log segment is
> created every time checkpoint occurs even there's no database
> updation. This leads to creating 16MB log segment files every 5
> minutes (default checkpoint period), which will in turn produce 4.6GB
> log segments with bogus data. Is this normal?
>
> This is PostgreSQL 8.2 beta1.

If the WAL is pretty much empty, gzip brings it from 16MB down to about
16KB, which is much more reasonable. I've noticed that even when idle
there are a few files that seem to compress only to about 32KB, and some
only to 880KB. I don't know exactly why those files are different,
perhaps something with the stats collector? Autovacuum was off for this
test.

There should be a documentation note to let people know that the archive
will grow even when idle. Perhaps we should suggest compression in the
docs so that people don't get worried about many gigabytes of mostly-
empty files filling up their backup storage.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 17:12:16
Message-ID: 25993.1160500336@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> There should be a documentation note to let people know that the archive
> will grow even when idle. Perhaps we should suggest compression in the
> docs so that people don't get worried about many gigabytes of mostly-
> empty files filling up their backup storage.

Actually, per the previous discussion: if you want to reduce WAL traffic
then one of the most important things to do is stretch out
checkpoint_timeout.

regards, tom lane


From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 18:07:53
Message-ID: 1160503674.2515.46.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2006-10-10 at 13:12 -0400, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > There should be a documentation note to let people know that the archive
> > will grow even when idle. Perhaps we should suggest compression in the
> > docs so that people don't get worried about many gigabytes of mostly-
> > empty files filling up their backup storage.
>
> Actually, per the previous discussion: if you want to reduce WAL traffic
> then one of the most important things to do is stretch out
> checkpoint_timeout.
>

I assume you refer to this message:
http://archives.postgresql.org/pgsql-hackers/2006-08/msg01190.php

I understand that stretching the checkpoint timeout is useful if you
have steady traffic and want to reduce the WAL volume. Higher checkpoint
intervals mean fewer copies of data pages (at least before 8.2), and
probably other data necessary at checkpoint.

However, if you have a database with long idle times, higher checkpoint
intervals combined with archive_timeout can still waste a lot of data
(unless you stretch out the checkpoint timeout by orders of magnitude).
This situation is also most the most useful situation for
archive_timeout. If someone is concerned about idle time eating up
gigabytes of backup storage, compression seems like a logical choice.

Maybe I just don't understand checkpoint timeout? Could it reasonably be
set to something like 12 hours? I can't think why not, but the config
default is 5 minutes, so I would be hesitant to change it by that much.

Regards,
Jeff Davis


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: archive_timeout?
Date: 2006-10-10 18:17:57
Message-ID: 673.1160504277@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> Maybe I just don't understand checkpoint timeout? Could it reasonably be
> set to something like 12 hours? I can't think why not, but the config
> default is 5 minutes, so I would be hesitant to change it by that much.

The only constraining factor on it is how much WAL data are you willing
to replay in order to recover from a crash. If you've got a low-volume
database then replaying up to 12 hours' worth of activity might not be
unacceptable. Also, if you have spikes of activity, then
checkpoint_segments would kick in after a spike had generated X amount
of data. So I don't see any strong reason why it couldn't be set much
higher than archive_timeout.

Now the other side of the coin is that if you do have a steady low level
of activity then a small archive_timeout is still going to result in
shipping lots of partially-filled WAL files. Compression might help
some, but the bottom line is simply that archive_timeout isn't an
efficient mechanism for dealing with low-volume databases.

regards, tom lane