Re: Making pg_standby compression-friendly

Lists: pgsql-hackers
From: "Charles Duffy" <charles(at)dyfis(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Making pg_standby compression-friendly
Date: 2008-10-23 03:10:55
Message-ID: e4ccc24e0810222010p12bae2f4xa3a11cb2bc51bd89@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Howdy, all.

I'm interested in compressing archived WAL segments in an environment
set up for PITR in the interests of reducing both network traffic and
storage requirements. However, pg_standby presently checks file sizes,
requiring that an archive segment be exactly the right size to be
considered valid. The idea of compressing log segments is not new --
the clearxlogtail project in pgfoundry provides a tool to make such
compression more effective, and is explicitly intended for said
purpose -- but as of 8.3.4, pg_standby appears not to support such
environments; I propose adding such support.

To allow pg_standby to operate in an environment where archive
segments are compressed, two behaviors are necessary:

- suppressing the file-size checks. This puts the onus on the user to
create these files via an atomic mechanism, but is necessary to allow
compressed files to be considered.
- allowing a custom restore command to be provided. This permits the
user to specify the mechanism to be used to decompress the segment.
One bikeshed is determining whether the user should pass in a command
suitable for use in a pipeline or a command which accepts input and
output as arguments.

A sample implementation is attached, intended only to kickstart
discussion; I'm not attached to either its implementation or its
proposed command-line syntax.

Thoughts?

Attachment Content-Type Size
pg_standby-pipe.patch text/x-diff 4.5 KB

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Charles Duffy <charles(at)dyfis(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-23 06:15:39
Message-ID: 4900168B.5020101@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Charles Duffy wrote:
> I'm interested in compressing archived WAL segments in an environment
> set up for PITR in the interests of reducing both network traffic and
> storage requirements. However, pg_standby presently checks file sizes,
> requiring that an archive segment be exactly the right size to be
> considered valid. The idea of compressing log segments is not new --
> the clearxlogtail project in pgfoundry provides a tool to make such
> compression more effective, and is explicitly intended for said
> purpose -- but as of 8.3.4, pg_standby appears not to support such
> environments; I propose adding such support.

Can't you decompress the files in whatever script you use to copy them
to the archive location?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: "Charles Duffy" <charles(at)dyfis(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-23 06:54:41
Message-ID: e4ccc24e0810222354n395c1082r3aef0f8e065659dd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Oct 23, 2008 at 1:15 AM, Heikki Linnakangas <
heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:

> Charles Duffy wrote:
>
>> I'm interested in compressing archived WAL segments in an environment
>> set up for PITR in the interests of reducing both network traffic and
>> storage requirements. However, pg_standby presently checks file sizes,
>> requiring that an archive segment be exactly the right size to be
>> considered valid. The idea of compressing log segments is not new --
>> the clearxlogtail project in pgfoundry provides a tool to make such
>> compression more effective, and is explicitly intended for said
>> purpose -- but as of 8.3.4, pg_standby appears not to support such
>> environments; I propose adding such support.
>>
>
> Can't you decompress the files in whatever script you use to copy them to
> the archive location?

To be sure I understand -- you're proposing a scenario in which the
archive_command on the master compresses the files, passes them over to the
slave while compressed, and then decompresses them on the slave for storage
in their decompressed state? That succeeds in the goal of decreasing network
bandwidth, but (1) isn't necessarily easy to implement over NFS, and (2)
doesn't succeed in decreasing storage requirements on the slave.

(While pg_standby's behavior is to delete segments which are no longer
needed to keep a warm standby slave running, I maintain a separate archive
for PITR use with hardlinked copies of those same archive segments; storage
on the slave is a much bigger issue in this environment than it would be if
the space used for segments were being deallocated as soon as pg_standby
chose to unlink them).

[Heikki, please accept my apologies for the initial off-list response; I
wasn't paying enough attention to gmail's default reply behavior].


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Charles Duffy" <charles(at)dyfis(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-23 15:24:32
Message-ID: a778a7260810230824g6e1ddd32r86de6efa34e22231@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In terms of compress/decompress WAL in archive/restore, please take a
look at my project pglesslog,
http://pgfoundry.org/projects/pglesslog/

This project compresses WAL segment by replacing full page writes with
corresponding incremental logs. When restored, it inserts dummy WAL
record to maintain LSN and file size.

This can be applied to log-shipping mechanism, asynchronous or synchronous.

2008/10/23 Charles Duffy <charles(at)dyfis(dot)net>:
> Howdy, all.
>
> I'm interested in compressing archived WAL segments in an environment
> set up for PITR in the interests of reducing both network traffic and
> storage requirements. However, pg_standby presently checks file sizes,
> requiring that an archive segment be exactly the right size to be
> considered valid. The idea of compressing log segments is not new --
> the clearxlogtail project in pgfoundry provides a tool to make such
> compression more effective, and is explicitly intended for said
> purpose -- but as of 8.3.4, pg_standby appears not to support such
> environments; I propose adding such support.
>
> To allow pg_standby to operate in an environment where archive
> segments are compressed, two behaviors are necessary:
>
> - suppressing the file-size checks. This puts the onus on the user to
> create these files via an atomic mechanism, but is necessary to allow
> compressed files to be considered.
> - allowing a custom restore command to be provided. This permits the
> user to specify the mechanism to be used to decompress the segment.
> One bikeshed is determining whether the user should pass in a command
> suitable for use in a pipeline or a command which accepts input and
> output as arguments.
>
> A sample implementation is attached, intended only to kickstart
> discussion; I'm not attached to either its implementation or its
> proposed command-line syntax.
>
> Thoughts?
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

--
------
Koichi Suzuki


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Koichi Suzuki <koichi(dot)szk(at)gmail(dot)com>
Cc: Charles Duffy <charles(at)dyfis(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-23 15:30:38
Message-ID: 4900989E.3050103@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Koichi Suzuki wrote:
> In terms of compress/decompress WAL in archive/restore, please take a
> look at my project pglesslog,
> http://pgfoundry.org/projects/pglesslog/
>
> This project compresses WAL segment by replacing full page writes with
> corresponding incremental logs. When restored, it inserts dummy WAL
> record to maintain LSN and file size.
>
> This can be applied to log-shipping mechanism, asynchronous or synchronous.

I believe Charles' question was: how do you hook that decompression into
pg_standby? I suggested that whatever script is run on the standby
server to copy xlog files to the archive location, should also call the
decompression program, like pglesslog, but apparently there is no such
script in his setup. How would you set up a standby server, using
pg_lesslog?

> 2008/10/23 Charles Duffy <charles(at)dyfis(dot)net>:
>> Howdy, all.
>>
>> I'm interested in compressing archived WAL segments in an environment
>> set up for PITR in the interests of reducing both network traffic and
>> storage requirements. However, pg_standby presently checks file sizes,
>> requiring that an archive segment be exactly the right size to be
>> considered valid. The idea of compressing log segments is not new --
>> the clearxlogtail project in pgfoundry provides a tool to make such
>> compression more effective, and is explicitly intended for said
>> purpose -- but as of 8.3.4, pg_standby appears not to support such
>> environments; I propose adding such support.
>>
>> To allow pg_standby to operate in an environment where archive
>> segments are compressed, two behaviors are necessary:
>>
>> - suppressing the file-size checks. This puts the onus on the user to
>> create these files via an atomic mechanism, but is necessary to allow
>> compressed files to be considered.
>> - allowing a custom restore command to be provided. This permits the
>> user to specify the mechanism to be used to decompress the segment.
>> One bikeshed is determining whether the user should pass in a command
>> suitable for use in a pipeline or a command which accepts input and
>> output as arguments.
>>
>> A sample implementation is attached, intended only to kickstart
>> discussion; I'm not attached to either its implementation or its
>> proposed command-line syntax.
>>
>> Thoughts?
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
>
>
>

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Charles Duffy <Charles_Duffy(at)messageone(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-24 20:12:22
Message-ID: gdta80$7m9$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

In the absence of further feedback from 'yall (and in the presence of
some positive results from internal QA), I'm adding the posted patch
as-is to the 2008-11 CommitFest queue. That said, any such additional
feedback would be gratefully appreciated.


From: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>
To: "Charles Duffy" <Charles_Duffy(at)messageone(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-27 08:41:21
Message-ID: a778a7260810270141o59f3e202ncf54d7b0d500ba86@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

As Heikki pointed out, the issue is not to decompress the compressed
WAL, but also how we can keep archive log still compressed after it is
handled by pg_standby.

I'm afraid pg_standby cannot handle this solely, may need some support
by the pg core. For example, after closing archive log in archive
recovery, pg_core can call some backend to re-compress the archive log
for later use.

I'm not sure if archive_commend argument works in this scene too, but
very sceptical not.

Any further thoughts?
-----------------
Koichi Suzuki

2008/10/25 Charles Duffy <Charles_Duffy(at)messageone(dot)com>:
> In the absence of further feedback from 'yall (and in the presence of some
> positive results from internal QA), I'm adding the posted patch as-is to the
> 2008-11 CommitFest queue. That said, any such additional feedback would be
> gratefully appreciated.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com>, "Charles Duffy" <Charles_Duffy(at)messageone(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-27 15:03:06
Message-ID: 490591DA.EE98.0025.0@wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>>> "Koichi Suzuki" <koichi(dot)szk(at)gmail(dot)com> wrote:
> As Heikki pointed out, the issue is not to decompress the compressed
> WAL, but also how we can keep archive log still compressed after it
is
> handled by pg_standby.
>
> I'm afraid pg_standby cannot handle this solely, may need some
support
> by the pg core. For example, after closing archive log in archive
> recovery, pg_core can call some backend to re-compress the archive
log
> for later use.

Why decompress and re-compress? We're using simple bash scripts, so I
can't speak to pg_standby; but we just pipe the file through gunzip in
the script called by recovery.conf. The source file isn't modified --
it stays compressed for archiving.

-Kevin


From: Charles Duffy <Charles_Duffy(at)messageone(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Making pg_standby compression-friendly
Date: 2008-10-27 16:28:25
Message-ID: ge4q83$thl$1@ger.gmane.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Koichi Suzuki wrote:
> As Heikki pointed out, the issue is not to decompress the compressed
> WAL, but also how we can keep archive log still compressed after it is
> handled by pg_standby.

pg_standby makes a *copy* of the segment from the archive, and need only
ensure that the copy is decompressed; it has no reason to ever
decompress the original version in the archive.

I don't see the problem here.