Re: Table and Index compression

From: Pierre Frédéric Caillaud <lists(at)peufeu(dot)com>
To: "Greg Stark" <gsstark(at)mit(dot)edu>, "Sam Mason" <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Table and Index compression
Date: 2009-08-07 12:44:48
Message-ID: op.uyaloyw1cke6l8@soyouz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> For reference what I'm picturing is this:
>
> When a table is compressed it's marked read-only which bars any new
> tuples from being inserted or existing tuples being deleted. Then it's
> frozen and any pages which contain tuples wich can't be frozen are
> waited on until they can be. When it's finished every tuple has to be
> guaranteed to be fully frozen.
>
> Then the relation is rewritten in compressed form. Each block is
> compressed one by one and written one after the other to disk.
>
> At the same time a new fork is written which contains a pointer to
> each block. It could just be a directly addressed array of offsets and
> lengths. All block lookups have to first load the page of the
> indirection map, then read the appropriate section of the original
> file and decompress it into shared buffers.

I had pondered the idea of a fork storing the compressed status of each
page, because it has advantages :
- no need to change the page layout to insert a "is compressed" flag
- possible to compress any data, not just standard pages
- if you know the compressed size of a page in advance, it is much easier
to prefetch it entirely and not just the first chunk, or read too much...

> From a programming point of view this is nice and simple. From a
> user's point of view it's a bit of a pain since it means you have to
> rewrite your whole table when you want to compress it. And it means
> you have to rewrite it all again if you decide you want to set it back
> to read-write. My experience with people who have very large tables is
> that they design their whole process around the goal of avoiding
> having to move the data once it's written.

Note that if a table is huge, it is always cut in (currently) 1GB slices,
so you could operate on one slice at a time, then release a lock, let the
backlog of queries flow, and resume.

Realtime compression would be much less of a hassle to use, though...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-08-07 12:44:49 Re: Table and Index compression
Previous Message Sam Mason 2009-08-07 12:38:36 Re: Table and Index compression