Re: Table and Index compression

From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Table and Index compression
Date: 2009-08-07 12:38:36
Message-ID: 20090807123835.GD5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 07, 2009 at 12:59:57PM +0100, Greg Stark wrote:
> On Fri, Aug 7, 2009 at 12:48 PM, Sam Mason<sam(at)samason(dot)me(dot)uk> wrote:
> >> Well most users want compression for the space savings. So running out
> >> of space sooner than without compression when most of the space is
> >> actually unused would disappoint them.
> >
> > Note, that as far as I can tell for a filesystems you only need to keep
> > enough reserved for the amount of uncompressed dirty buffers you have in
> > memory. As space runs out in the filesystem all that happens is that
> > the amount of (uncompressed?) dirty buffers you can safely have around
> > decreases.
>
> And when it drops to zero?

That was why I said you need to have one page left "to handle the base
case". I was treating the inductive case as the interesting common case
and considered the base case of lesser interest.

> > In PG's case, it would seem possible to do the compression and then
> > check to see if the resulting size is greater than 4kB. If it is you
> > write into the 4kB page size and write uncompressed data. Upon reading
> > you do the inverse, if it's 4kB then no need to decompress. I believe
> > TOAST does this already.
>
> It does, as does gzip and afaik every compression system.

It's still a case that needs to be handled explicitly by the code. Just
for reference, gzip does not appear to do this when I test it:

echo -n 'a' | gzip > tmp.gz
gzip -l --verbose tmp.gz

says the compression ratio is "-200%" (an empty string results in
an infinite increase in size yet gets displayed as "0%" for some
strange reason). It's only when you hit six 'a's that you start to get
positive ratios. Note that that this is taking headers into account;
the compressed size is 23 bytes for both 'aaa' and 'aaaaaa' but the
uncompressed size obviously changes.

gzip does indeed have a "copy" method, but it doesn't seem to be being
used.

--
Sam http://samason.me.uk/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pierre Frédéric Caillaud 2009-08-07 12:44:48 Re: Table and Index compression
Previous Message Greg Stark 2009-08-07 12:18:22 Re: Table and Index compression