Archive log compression keeping physical log available in the crash recovery

Lists: pgsql-hackers
From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Archive log compression keeping physical log available in the crash recovery
Date: 2007-01-29 07:15:08
Message-ID: 45BD9EFC.5050207@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is a proposal for archive log compression keeping physical log in WAL.

In PotgreSQL 8.2, full-page_writes option came back to cut out physical
log both from WAL and archive log. To deal with the partial write
during the online backup, physical log is written only during the online
backup.

Although this dramatically reduces the log size, it can risk the crash
recovery. If any page is inconsisitent because of the fault, crash
recovery doesn't work because full page images are necessary to recover
the page in such case. For critical use, especially in commercial use,
we don't like to risk the crash recovery chance, while reducing the
archive log size will be crucial too for larger databases. WAL size
itself may be less critical, because they're reused cyclickly.

Here, I have a simple idea to reduce archive log size while keeping
physical log in xlog:

1. Create new GUC: full_page_compress,

2. Turn on both the full_page_writes and full_page_compress: physical
log will be written to WAL at the first write to a page after the
checkpoint, just as conventional full_page_writes ON.

3. Unless physical log is written during the online backup, this can be
removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
(XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
three of them are in use) and this mark can be set in XLogInsert().
With the both full_page_writes and full_page_compress on, both logical
log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
flag on. Having both physical and logical log in a same WAL is not
harmful in the crash recovery. In the crash recovery, physical log is
used if it's available. Logical log is used in the archive recovery, as
the corresponding physical log will be removed.

4. The archive command (separate binary), removes physical logs if
XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
minumum information of very small size, which is used to restore the
physical log to keep other log records's LSN consistent.

5. The restore command (separate binary) restores removed physical log
using the dummy record and restores LSN of other log records.

6. We need to rewrite redo functions so that they ignore the dummy
record inserted in 5. The amount of code modification will be very small.

As a result, size of the archive log becomes as small as the case with
full_page_writes off, while the physical log is still available in the
crash recovery, maintaining the crash recovery chance.

Comments, questions and any input is welcome.

-----
Koichi Suzuki, NTT Open Source Center

--
Koichi Suzuki


From: Jim Nasby <decibel(at)decibel(dot)org>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-02 04:04:58
Message-ID: 8A8A4F68-FDC5-401B-97E9-D67AECA9387E@decibel.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I thought the drive behind full_page_writes = off was to reduce the
amount of data being written to pg_xlog, not to shrink the size of a
PITR log archive.

ISTM that if you want to shrink a PITR log archive you'd be able to
get good results by (b|g)zip'ing the WAL files in the archive. I
quick test on my laptop shows over a 4x reduction in size. Presumably
that'd be even larger if you increased the size of WAL segments.

On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:

> This is a proposal for archive log compression keeping physical log
> in WAL.
>
> In PotgreSQL 8.2, full-page_writes option came back to cut out
> physical
> log both from WAL and archive log. To deal with the partial write
> during the online backup, physical log is written only during the
> online
> backup.
>
> Although this dramatically reduces the log size, it can risk the crash
> recovery. If any page is inconsisitent because of the fault, crash
> recovery doesn't work because full page images are necessary to
> recover
> the page in such case. For critical use, especially in commercial
> use,
> we don't like to risk the crash recovery chance, while reducing the
> archive log size will be crucial too for larger databases. WAL size
> itself may be less critical, because they're reused cyclickly.
>
> Here, I have a simple idea to reduce archive log size while keeping
> physical log in xlog:
>
> 1. Create new GUC: full_page_compress,
>
> 2. Turn on both the full_page_writes and full_page_compress: physical
> log will be written to WAL at the first write to a page after the
> checkpoint, just as conventional full_page_writes ON.
>
> 3. Unless physical log is written during the online backup, this
> can be
> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
> three of them are in use) and this mark can be set in XLogInsert().
> With the both full_page_writes and full_page_compress on, both logical
> log and physical log will also be written to WAL with
> XLR_BKP_REMOVABLE
> flag on. Having both physical and logical log in a same WAL is not
> harmful in the crash recovery. In the crash recovery, physical log is
> used if it's available. Logical log is used in the archive
> recovery, as
> the corresponding physical log will be removed.
>
> 4. The archive command (separate binary), removes physical logs if
> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
> minumum information of very small size, which is used to restore the
> physical log to keep other log records's LSN consistent.
>
> 5. The restore command (separate binary) restores removed physical log
> using the dummy record and restores LSN of other log records.
>
> 6. We need to rewrite redo functions so that they ignore the dummy
> record inserted in 5. The amount of code modification will be very
> small.
>
> As a result, size of the archive log becomes as small as the case with
> full_page_writes off, while the physical log is still available in the
> crash recovery, maintaining the crash recovery chance.
>
> Comments, questions and any input is welcome.
>
> -----
> Koichi Suzuki, NTT Open Source Center
>
> --
> Koichi Suzuki
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>

--
Jim Nasby jim(at)nasby(dot)net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Jim Nasby <decibel(at)decibel(dot)org>
Cc: PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-09 04:00:10
Message-ID: 45CBF1CA.1020201@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Sorry for the late responce;

Gzip can reduce the archive log size about one fourth. My point is
that it can still be large enough. Removing physical log record (by
replacing them with logical log record) from archive log will achieve
will shrink the size of the archive log to one twentieth, in the case of
pgbehcn test about ten hours (3,600,000 transactions) with database size
about 2GB. In the case of gzip, maybe becuase of higher CPU load,
total throughput for gzip is less than just copying WAL to archive. In
our case, throughput seems to be slightly higher than just copying
(preserving physical log) or gzip. I'll gather the meaturement result
and try to post.

The size of archive log seems not affected by the size of the database,
but just by the number of transactions. In the case of
full_page_writes=on and full_page_compress=on, compressed archive log
size seems to be dependent only on the number of transactions and
transaction characteristics.

Our evaluation result is as follows:
Database size: 2GB
WAL size (after 10hours pgbench run): 48.3GB
gzipped size: 8.8GB
removal of the physical log: 2.36GB
fullpage_writes=off log size: 2.42GB

The reason why archive log size of our case is slightly smaller than
full_page_writes=off is because we remove not only the physical logs
but also each page header and the dummy part at the tail of each log
segment.

Further, we can apply gzip to this archive (2.36GB). Final size is
0.75GB, less than one sixtieth of the original WAL.

Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec,
and our compression to 2.36GB needed about 1010sec, slightly less than
just cat command (1386sec). When gzip is combined with our compression
(48.3GB to 0.75GB), total duration was about 1330sec.

This shows that phyiscal log removal is good selection for the following
case:

1) Need same crash recovery possibility as full_page_writes=on, and
2) Need to shrink the size of archive log for loger period to store.

Of course, if we care crash recovery in PITR slave, we still need
physical log records in archive log. In this case, because archive log
is not intended to be kept long, its size will not be an issue.

I'm planning to do archive log size evalutation with other benchmarks
such as DBT-2 as well.

Materials for this has already been thrown to HACKERS and PATCHES. I
hope you try this.

Jim Nasby wrote:
> I thought the drive behind full_page_writes = off was to reduce the
> amount of data being written to pg_xlog, not to shrink the size of a
> PITR log archive.
>
> ISTM that if you want to shrink a PITR log archive you'd be able to get
> good results by (b|g)zip'ing the WAL files in the archive. I quick test
> on my laptop shows over a 4x reduction in size. Presumably that'd be
> even larger if you increased the size of WAL segments.
>
> On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
>
>> This is a proposal for archive log compression keeping physical log in
>> WAL.
>>
>> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
>> log both from WAL and archive log. To deal with the partial write
>> during the online backup, physical log is written only during the online
>> backup.
>>
>> Although this dramatically reduces the log size, it can risk the crash
>> recovery. If any page is inconsisitent because of the fault, crash
>> recovery doesn't work because full page images are necessary to recover
>> the page in such case. For critical use, especially in commercial use,
>> we don't like to risk the crash recovery chance, while reducing the
>> archive log size will be crucial too for larger databases. WAL size
>> itself may be less critical, because they're reused cyclickly.
>>
>> Here, I have a simple idea to reduce archive log size while keeping
>> physical log in xlog:
>>
>> 1. Create new GUC: full_page_compress,
>>
>> 2. Turn on both the full_page_writes and full_page_compress: physical
>> log will be written to WAL at the first write to a page after the
>> checkpoint, just as conventional full_page_writes ON.
>>
>> 3. Unless physical log is written during the online backup, this can be
>> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
>> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
>> three of them are in use) and this mark can be set in XLogInsert().
>> With the both full_page_writes and full_page_compress on, both logical
>> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
>> flag on. Having both physical and logical log in a same WAL is not
>> harmful in the crash recovery. In the crash recovery, physical log is
>> used if it's available. Logical log is used in the archive recovery, as
>> the corresponding physical log will be removed.
>>
>> 4. The archive command (separate binary), removes physical logs if
>> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
>> minumum information of very small size, which is used to restore the
>> physical log to keep other log records's LSN consistent.
>>
>> 5. The restore command (separate binary) restores removed physical log
>> using the dummy record and restores LSN of other log records.
>>
>> 6. We need to rewrite redo functions so that they ignore the dummy
>> record inserted in 5. The amount of code modification will be very
>> small.
>>
>> As a result, size of the archive log becomes as small as the case with
>> full_page_writes off, while the physical log is still available in the
>> crash recovery, maintaining the crash recovery chance.
>>
>> Comments, questions and any input is welcome.
>>
>> -----
>> Koichi Suzuki, NTT Open Source Center
>>
>> --Koichi Suzuki
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>
> --
> Jim Nasby jim(at)nasby(dot)net
> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>
>
>

--
Koichi Suzuki


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-09 05:14:42
Message-ID: 45CC0342.7020200@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Further information about the following evaluation:

Pgbench throughput was as follows:
Full WAL archiving (full_page_writes=on), 48.3GB archive: 123TPS
Gzip WAL compress, 8.8GB archive: 145TPS
Physical log removal, 2.36GB archive: 148TPS
full_page_writes=off, 2.42GB archive: 161TPS

Koichi Suzuki wrote:
> Sorry for the late responce;
>
> Gzip can reduce the archive log size about one fourth. My point is
> that it can still be large enough. Removing physical log record (by
> replacing them with logical log record) from archive log will achieve
> will shrink the size of the archive log to one twentieth, in the case of
> pgbehcn test about ten hours (3,600,000 transactions) with database size
> about 2GB. In the case of gzip, maybe becuase of higher CPU load,
> total throughput for gzip is less than just copying WAL to archive. In
> our case, throughput seems to be slightly higher than just copying
> (preserving physical log) or gzip. I'll gather the meaturement result
> and try to post.
>
> The size of archive log seems not affected by the size of the database,
> but just by the number of transactions. In the case of
> full_page_writes=on and full_page_compress=on, compressed archive log
> size seems to be dependent only on the number of transactions and
> transaction characteristics.
>
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
>
> The reason why archive log size of our case is slightly smaller than
> full_page_writes=off is because we remove not only the physical logs but
> also each page header and the dummy part at the tail of each log segment.
>
> Further, we can apply gzip to this archive (2.36GB). Final size is
> 0.75GB, less than one sixtieth of the original WAL.
>
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec,
> and our compression to 2.36GB needed about 1010sec, slightly less than
> just cat command (1386sec). When gzip is combined with our compression
> (48.3GB to 0.75GB), total duration was about 1330sec.
>
> This shows that phyiscal log removal is good selection for the following
> case:
>
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
>
> Of course, if we care crash recovery in PITR slave, we still need
> physical log records in archive log. In this case, because archive log
> is not intended to be kept long, its size will not be an issue.
>
> I'm planning to do archive log size evalutation with other benchmarks
> such as DBT-2 as well.
>
> Materials for this has already been thrown to HACKERS and PATCHES. I
> hope you try this.
>
>
> Jim Nasby wrote:
>> I thought the drive behind full_page_writes = off was to reduce the
>> amount of data being written to pg_xlog, not to shrink the size of a
>> PITR log archive.
>>
>> ISTM that if you want to shrink a PITR log archive you'd be able to
>> get good results by (b|g)zip'ing the WAL files in the archive. I quick
>> test on my laptop shows over a 4x reduction in size. Presumably that'd
>> be even larger if you increased the size of WAL segments.
>>
>> On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
>>
>>> This is a proposal for archive log compression keeping physical log
>>> in WAL.
>>>
>>> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
>>> log both from WAL and archive log. To deal with the partial write
>>> during the online backup, physical log is written only during the online
>>> backup.
>>>
>>> Although this dramatically reduces the log size, it can risk the crash
>>> recovery. If any page is inconsisitent because of the fault, crash
>>> recovery doesn't work because full page images are necessary to recover
>>> the page in such case. For critical use, especially in commercial use,
>>> we don't like to risk the crash recovery chance, while reducing the
>>> archive log size will be crucial too for larger databases. WAL size
>>> itself may be less critical, because they're reused cyclickly.
>>>
>>> Here, I have a simple idea to reduce archive log size while keeping
>>> physical log in xlog:
>>>
>>> 1. Create new GUC: full_page_compress,
>>>
>>> 2. Turn on both the full_page_writes and full_page_compress: physical
>>> log will be written to WAL at the first write to a page after the
>>> checkpoint, just as conventional full_page_writes ON.
>>>
>>> 3. Unless physical log is written during the online backup, this can be
>>> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
>>> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
>>> three of them are in use) and this mark can be set in XLogInsert().
>>> With the both full_page_writes and full_page_compress on, both logical
>>> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
>>> flag on. Having both physical and logical log in a same WAL is not
>>> harmful in the crash recovery. In the crash recovery, physical log is
>>> used if it's available. Logical log is used in the archive recovery, as
>>> the corresponding physical log will be removed.
>>>
>>> 4. The archive command (separate binary), removes physical logs if
>>> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
>>> minumum information of very small size, which is used to restore the
>>> physical log to keep other log records's LSN consistent.
>>>
>>> 5. The restore command (separate binary) restores removed physical log
>>> using the dummy record and restores LSN of other log records.
>>>
>>> 6. We need to rewrite redo functions so that they ignore the dummy
>>> record inserted in 5. The amount of code modification will be very
>>> small.
>>>
>>> As a result, size of the archive log becomes as small as the case with
>>> full_page_writes off, while the physical log is still available in the
>>> crash recovery, maintaining the crash recovery chance.
>>>
>>> Comments, questions and any input is welcome.
>>>
>>> -----
>>> Koichi Suzuki, NTT Open Source Center
>>>
>>> --Koichi Suzuki
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>>
>>
>> --
>> Jim Nasby jim(at)nasby(dot)net
>> EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
>>
>>
>>
>
>

--
Koichi Suzuki


From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>, "Jim Nasby" <decibel(at)decibel(dot)org>
Cc: "PGSQL Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log availablein the crash recovery
Date: 2007-02-09 09:10:09
Message-ID: E1539E0ED7043848906A8FF995BDA57901C132F1@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB

> I'm planning to do archive log size evalutation with other benchmarks
> such as DBT-2 as well.

Looks promising :-)

Did you use the standard 5 minute checkpoint_timeout?
Very nice would be a run with checkpoint_timeout increased
to 30 min, because that is what you would tune if you are concerned
about fullpage overhead.

Andreas


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-09 14:11:05
Message-ID: 20070209141105.GA20293@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Feb 09, 2007 at 01:00:10PM +0900, Koichi Suzuki wrote:
> Further, we can apply gzip to this archive (2.36GB). Final size is
> 0.75GB, less than one sixtieth of the original WAL.

Note that if you were compressing on the fly, you'll have to tell gzip
to regularly flush its buffers to make sure all the data actually hits
disk. That cuts into your compression ratio...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-02-13 02:02:14
Message-ID: 45D11C26.6040200@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Further, because pgbench writes many zero values to fixed length
columns, gzip can achieve better compression. There're another
suggestion to test with longer checkpoint interval. I will post the
result.

Thanks.

Martijn van Oosterhout wrote:
> On Fri, Feb 09, 2007 at 01:00:10PM +0900, Koichi Suzuki wrote:
>> Further, we can apply gzip to this archive (2.36GB). Final size is
>> 0.75GB, less than one sixtieth of the original WAL.
>
> Note that if you were compressing on the fly, you'll have to tell gzip
> to regularly flush its buffers to make sure all the data actually hits
> disk. That cuts into your compression ratio...
>
> Have a nice day,

--
Koichi Suzuki


From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log availablein the crash recovery
Date: 2007-02-13 02:20:00
Message-ID: 45D12050.5020506@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As suggested, the result I've posted used standard checkpoint_timeout.

With longer checkpoint timeout, overall WAL size tend to decrease. So
I've also run DBT-2. The measurement shows that the resultant WAL size
is reasonably small, while preserving full page writes to WAL,
maintaining the chance of crash recovery as full_page_writes=on case.

I think this is good score and practical for archivelo log recovery with
online backup.

Here's a result:

Database size: 4.13GB (40WH)
Checkpoint setments: 1000
Checkpoint timeout: 20min
Measurement: 60min (run 30min in advance for stabilization)
Total WAL size: 2.98GB
Gzip'ed WAL size: 1.67GB
After physical log removal (proposed patch): 0.38GB
full_page_writes=off: 0.39GB

Database size: 12.35GB (120WH)
Checkpoint setments: 1000
Checkpoint timeout: 20min
Measurement: 60min (run 30min in advance for stabilization)
Total WAL size: 4.20GB
Gzip'ed WAL size: 2.16GB
After physical log removal (proposed patch): 0.32GB
full_page_writes=off: 0.31GB

Database size: 4.13GB (40WH)
Checkpoint setments: 1000
Checkpoint timeout: 60min
Measurement: 60min (run 30min in advance for stabilization)
> maybe we need to run longer to include checkpoint effect
more accurately.
Total WAL size: 2.14GB
Gzip'ed WAL size: 1.22GB
After physical log removal (proposed patch): 0.39GB
full_page_writes=off: 0.38GB

As expected, after phyiscal log records are removed, the resultant
archive log size seems not affected by checkpoint timeout. Rather, the
size seems to depend mainly on the number of transactions.

Zeugswetter Andreas ADI SD wrote:
>> Our evaluation result is as follows:
>> Database size: 2GB
>> WAL size (after 10hours pgbench run): 48.3GB
>> gzipped size: 8.8GB
>> removal of the physical log: 2.36GB
>> fullpage_writes=off log size: 2.42GB
>
>> I'm planning to do archive log size evalutation with other benchmarks
>> such as DBT-2 as well.
>
> Looks promising :-)
>
> Did you use the standard 5 minute checkpoint_timeout?
> Very nice would be a run with checkpoint_timeout increased
> to 30 min, because that is what you would tune if you are concerned
> about fullpage overhead.
>
> Andreas
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

--
Koichi Suzuki


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Jim Nasby <decibel(at)decibel(dot)org>, PGSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Archive log compression keeping physical log available in the crash recovery
Date: 2007-03-27 17:22:37
Message-ID: 200703271722.l2RHMbg23221@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Where are we on this patch idea?

---------------------------------------------------------------------------

Koichi Suzuki wrote:
> Sorry for the late responce;
>
> Gzip can reduce the archive log size about one fourth. My point is
> that it can still be large enough. Removing physical log record (by
> replacing them with logical log record) from archive log will achieve
> will shrink the size of the archive log to one twentieth, in the case of
> pgbehcn test about ten hours (3,600,000 transactions) with database size
> about 2GB. In the case of gzip, maybe becuase of higher CPU load,
> total throughput for gzip is less than just copying WAL to archive. In
> our case, throughput seems to be slightly higher than just copying
> (preserving physical log) or gzip. I'll gather the meaturement result
> and try to post.
>
> The size of archive log seems not affected by the size of the database,
> but just by the number of transactions. In the case of
> full_page_writes=on and full_page_compress=on, compressed archive log
> size seems to be dependent only on the number of transactions and
> transaction characteristics.
>
> Our evaluation result is as follows:
> Database size: 2GB
> WAL size (after 10hours pgbench run): 48.3GB
> gzipped size: 8.8GB
> removal of the physical log: 2.36GB
> fullpage_writes=off log size: 2.42GB
>
> The reason why archive log size of our case is slightly smaller than
> full_page_writes=off is because we remove not only the physical logs
> but also each page header and the dummy part at the tail of each log
> segment.
>
> Further, we can apply gzip to this archive (2.36GB). Final size is
> 0.75GB, less than one sixtieth of the original WAL.
>
> Overall duration to gzip from WAL (48.3GB to 8.8GB) was about 4000sec,
> and our compression to 2.36GB needed about 1010sec, slightly less than
> just cat command (1386sec). When gzip is combined with our compression
> (48.3GB to 0.75GB), total duration was about 1330sec.
>
> This shows that phyiscal log removal is good selection for the following
> case:
>
> 1) Need same crash recovery possibility as full_page_writes=on, and
> 2) Need to shrink the size of archive log for loger period to store.
>
> Of course, if we care crash recovery in PITR slave, we still need
> physical log records in archive log. In this case, because archive log
> is not intended to be kept long, its size will not be an issue.
>
> I'm planning to do archive log size evalutation with other benchmarks
> such as DBT-2 as well.
>
> Materials for this has already been thrown to HACKERS and PATCHES. I
> hope you try this.
>
>
> Jim Nasby wrote:
> > I thought the drive behind full_page_writes = off was to reduce the
> > amount of data being written to pg_xlog, not to shrink the size of a
> > PITR log archive.
> >
> > ISTM that if you want to shrink a PITR log archive you'd be able to get
> > good results by (b|g)zip'ing the WAL files in the archive. I quick test
> > on my laptop shows over a 4x reduction in size. Presumably that'd be
> > even larger if you increased the size of WAL segments.
> >
> > On Jan 29, 2007, at 2:15 AM, Koichi Suzuki wrote:
> >
> >> This is a proposal for archive log compression keeping physical log in
> >> WAL.
> >>
> >> In PotgreSQL 8.2, full-page_writes option came back to cut out physical
> >> log both from WAL and archive log. To deal with the partial write
> >> during the online backup, physical log is written only during the online
> >> backup.
> >>
> >> Although this dramatically reduces the log size, it can risk the crash
> >> recovery. If any page is inconsisitent because of the fault, crash
> >> recovery doesn't work because full page images are necessary to recover
> >> the page in such case. For critical use, especially in commercial use,
> >> we don't like to risk the crash recovery chance, while reducing the
> >> archive log size will be crucial too for larger databases. WAL size
> >> itself may be less critical, because they're reused cyclickly.
> >>
> >> Here, I have a simple idea to reduce archive log size while keeping
> >> physical log in xlog:
> >>
> >> 1. Create new GUC: full_page_compress,
> >>
> >> 2. Turn on both the full_page_writes and full_page_compress: physical
> >> log will be written to WAL at the first write to a page after the
> >> checkpoint, just as conventional full_page_writes ON.
> >>
> >> 3. Unless physical log is written during the online backup, this can be
> >> removed from the archive log. One bit in XLR_BKP_BLOCK_MASK
> >> (XLR_BKP_REMOVABLE) is available to indicate this (out of four, only
> >> three of them are in use) and this mark can be set in XLogInsert().
> >> With the both full_page_writes and full_page_compress on, both logical
> >> log and physical log will also be written to WAL with XLR_BKP_REMOVABLE
> >> flag on. Having both physical and logical log in a same WAL is not
> >> harmful in the crash recovery. In the crash recovery, physical log is
> >> used if it's available. Logical log is used in the archive recovery, as
> >> the corresponding physical log will be removed.
> >>
> >> 4. The archive command (separate binary), removes physical logs if
> >> XLR_BKP_REMOVABLE flag is on. Physical logs will be replaced by a
> >> minumum information of very small size, which is used to restore the
> >> physical log to keep other log records's LSN consistent.
> >>
> >> 5. The restore command (separate binary) restores removed physical log
> >> using the dummy record and restores LSN of other log records.
> >>
> >> 6. We need to rewrite redo functions so that they ignore the dummy
> >> record inserted in 5. The amount of code modification will be very
> >> small.
> >>
> >> As a result, size of the archive log becomes as small as the case with
> >> full_page_writes off, while the physical log is still available in the
> >> crash recovery, maintaining the crash recovery chance.
> >>
> >> Comments, questions and any input is welcome.
> >>
> >> -----
> >> Koichi Suzuki, NTT Open Source Center
> >>
> >> --Koichi Suzuki
> >>
> >> ---------------------------(end of broadcast)---------------------------
> >> TIP 6: explain analyze is your friend
> >>
> >
> > --
> > Jim Nasby jim(at)nasby(dot)net
> > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
> >
> >
> >
>
>
> --
> Koichi Suzuki
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +