pg_lzcompress strategy parameters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, Jan Wieck <JanWieck(at)Yahoo(dot)com>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: pg_lzcompress strategy parameters
Date: 2007-08-04 22:19:30
Message-ID: 8566.1186265970@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg complained here
http://archives.postgresql.org/pgsql-patches/2007-07/msg00342.php
that the default strategy parameters used by the TOAST compressor
might need some adjustment. After thinking about it a little I wonder
whether they're not even more broken than that. The present behavior
is:

1. Never compress for inputs < min_input_size (256 bytes by default).
2. Compress inputs >= force_input_size (6K by default), as long as
compression produces a result at least 1 byte smaller than the input.
3. For inputs between min_input_size and force_input_size, compress only
if compression of at least min_comp_rate percent is achieved
(20% by default).

This whole structure seems a bit broken, independently of whether the
particular parameter values are good. If the compressor is given an
input of 1000000 bytes and manages to compress it to 999999 bytes,
we'll store it compressed, and pay for decompression cycles on every
access, even though the I/O savings are nonexistent. That's not sane.

I'm inclined to think that the concept of force_input_size is wrong.
Instead I suggest that we have a min_comp_rate (minimum percentage
savings) and a min_savings (minimum absolute savings), and compress
if either one is met. For instance, with min_comp_rate = 10% and
min_savings = 1MB, then for inputs below 10MB you'd require at least
10% savings to compress them, but for inputs above 10MB you'd require
at least 1MB saved to compress.

Or maybe it should just be a min_comp_rate and nothing else.
Compressing a 1GB field to 999MB is probably not very sane either.

This is all independent of what the specific parameter settings should
be, but I concur with Greg that those could do with a fresh look.

Thoughts?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-08-05 01:21:06 Re: pg_lzcompress strategy parameters
Previous Message Tom Lane 2007-08-04 21:31:08 Re: Document and/or remove unreachable code in tuptoaster.c from varvarlena patch