Re: Compression of full-page-writes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: KONDO Mitsumasa <kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compression of full-page-writes
Date: 2013-10-19 05:58:47
Message-ID: CAA4eK1+4_b1OayphqAzoEr1+b2K9vaBtPvUbeCBHuLMHixQ=zw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 15, 2013 at 11:41 AM, KONDO Mitsumasa
<kondo(dot)mitsumasa(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> (2013/10/15 13:33), Amit Kapila wrote:
>>
>> Snappy is good mainly for un-compressible data, see the link below:
>>
>> http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4YebMOA-wA@mail.gmail.com
>
> This result was gotten in ARM architecture, it is not general CPU.
> Please see detail document.
> http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9

I think in general also snappy is mostly preferred for it's low CPU
usage not for compression, but overall my vote is also for snappy.

> I found compression algorithm test in HBase. I don't read detail, but it
> indicates snnapy algorithm gets best performance.
> http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of

The dataset used for performance is quite different from the data
which we are talking about here (WAL).
"These are the scores for a data which consist of 700kB rows, each
containing a binary image data. They probably won’t apply to things
like numeric or text data."

> In fact, most of modern NoSQL storages use snappy. Because it has good
> performance and good licence(BSD license).
>
>
>> I think it is bit difficult to prove that any one algorithm is best
>> for all kind of loads.
>
> I think it is necessary to make best efforts in community than I do the best
> choice with strict test.

Sure, it is good to make effort to select the best algorithm, but if
you are combining this patch with inclusion of new compression
algorithm in PG, it can only make the patch to take much longer time.

In general, my thinking is that we should prefer compression to reduce
IO (WAL volume), because reducing WAL volume has other benefits as
well like sending it to subscriber nodes. I think it will help cases
where due to less n/w bandwidth, the disk allocated for WAL becomes
full due to high traffic on master and then users need some
alternative methods to handle such situations.

I think many users would like to use a method which can reduce WAL
volume and the users which don't find it enough useful in their
environments due to decrease in TPS or not significant reduction in
WAL have the option to disable it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-10-19 06:39:57 Re: Patch for reserved connections for replication users
Previous Message Amit Kapila 2013-10-19 04:57:39 Re: Review: Patch to compute Max LSN of Data Pages