Re: pluggable compression support

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org, Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
Subject: Re: pluggable compression support
Date: 2013-06-21 00:09:00
Message-ID: 20130621000900.GA12425@alap2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-06-15 12:20:28 +0200, Andres Freund wrote:
> On 2013-06-14 21:56:52 -0400, Robert Haas wrote:
> > I don't think we need it. I think what we need is to decide is which
> > algorithm is legally OK to use. And then put it in.
> >
> > In the past, we've had a great deal of speculation about that legal
> > question from people who are not lawyers. Maybe it would be valuable
> > to get some opinions from people who ARE lawyers. Tom and Heikki both
> > work for real big companies which, I'm guessing, have substantial
> > legal departments; perhaps they could pursue getting the algorithms of
> > possible interest vetted. Or, I could try to find out whether it's
> > possible do something similar through EnterpriseDB.
>
> I personally don't think the legal arguments holds all that much water
> for snappy and lz4. But then the opinion of a european non-lawyer doesn't
> hold much either.
> Both are widely used by a large number open and closed projects, some of
> which have patent grant clauses in their licenses. E.g. hadoop,
> cassandra use lz4, and I'd be surprised if the companies behind those
> have opened themselves to litigation.
>
> I think we should preliminarily decide which algorithm to use before we
> get lawyers involved. I'd surprised if they can make such a analysis
> faster than we can rule out one of them via benchmarks.
>
> Will post an updated patch that includes lz4 as well.

Attached.

Changes:
* add lz4 compression algorithm (2 clause bsd)
* move compression algorithms into own subdirectory
* clean up compression/decompression functions
* allow 258 compression algorithms, uses 1byte extra for any but the
first three
* don't pass a varlena to pg_lzcompress.c anymore, but data directly
* add pglz_long as a test fourth compression method that uses the +1
byte encoding
* us postgres' endian detection in snappy for compatibility with osx

Based on the benchmarks I think we should go with lz4 only for now. The
patch provides the infrastructure should somebody else want to add more
or even proper configurability.

Todo:
* windows build support
* remove toast_compression_algo guc
* remove either snappy or lz4 support
* remove pglz_long support (just there for testing)

New benchmarks:

Table size:
List of relations
Schema | Name | Type | Owner | Size | Description
--------+--------------------+-------+--------+--------+-------------
public | messages_pglz | table | andres | 526 MB |
public | messages_snappy | table | andres | 523 MB |
public | messages_lz4 | table | andres | 522 MB |
public | messages_pglz_long | table | andres | 527 MB |
(4 rows)

Workstation (2xE5520, enough s_b for everything):

Data load:
pglz: 36643.384 ms
snappy: 24626.894 ms
lz4: 23871.421 ms
pglz_long: 37097.681 ms

COPY messages_* TO '/dev/null' WITH BINARY;
pglz: 3116.083 ms
snappy: 2524.388 ms
lz4: 2349.396 ms
pglz_long: 3104.134 ms

COPY (SELECT rawtxt FROM messages_*) TO '/dev/null' WITH BINARY;
pglz: 1609.969 ms
snappy: 1031.696 ms
lz4: 886.782 ms
pglz_long: 1606.803 ms

On my elderly laptop (core 2 duo), too load shared buffers:

Data load:
pglz: 39968.381 ms
snappy: 26952.330 ms
lz4: 29225.472 ms
pglz_long: 39929.568 ms

COPY messages_* TO '/dev/null' WITH BINARY;
pglz: 3920.588 ms
snappy: 3421.938 ms
lz4: 3311.540 ms
pglz_long: 3885.920 ms

COPY (SELECT rawtxt FROM messages_*) TO '/dev/null' WITH BINARY;
pglz: 2238.145 ms
snappy: 1753.403 ms
lz4: 1638.092 ms
pglz_long: 2227.804 ms

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Add-snappy-compression-algorithm-to-contrib.patch text/x-patch 53.8 KB
0002-Add-lz4-compression-algorithm-to-contrib.patch text/x-patch 64.9 KB
0003-Introduce-pluggable-compression.patch text/x-patch 28.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2013-06-21 00:13:48 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Bruce Momjian 2013-06-20 23:40:17 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)