Quick Links

Re: Compression of tables

Lists:	pgsql-hackers

From:	Thomas Munro <munro(at)ip9(dot)org>
To:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Compression of tables
Date:	2013-12-10 07:18:40
Message-ID:	CADLWmXWgpXayQQpL6XDj3i2KvUoPN2sjPbf1xUodzJ4t-+WnVA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

I have been wondering what the minimum useful heap table compression
system would be for Postgres, in order to reduce disk footprint of
large mostly static datasets. Do you think an approach similar to the
static row-level compression of DB2 could make sense? I imagine
something like this:

1. You have a table which already has data in it.

2. You run a COMPRESS operation, which builds a static dictionary,
and rewrites the whole table with compressed frozen tuples. Frozen
tuples have CTIDs just like regular tuples, and can be pointed to by
indexes. They are decompressed on the fly when needed.

Clearly things get tricky once you need to update rows. Assume for
simplicity that future UPDATEs and INSERTs produce normal,
non-compressed tuples that would only be compressed by a subsequent
COMPRESS operation. The question is how to deal with the existing
compressed rows when UPDATEd or DELETEd. Some approaches:

1. Just don't allow updates of compressed rows (!).

2. Exclusively lock the whole page when trying to update any
compressed row, while you explode it into regular uncompressed tuples
on new pages which you can work on (!).

3. Pull the minimum header fields out of the compressed tuples so
that the MVCC machinery can work, to support updates of compressed
tuples. Perhaps just the t_xmax, t_ctid values (t_xmin == frozen is
implied), so that a writer can update them. This means an overhead of
at least 10 bytes per tuple over the compressed size (plus the item
offsets in the page header).

4. Something far cleverer.

Well, these are straw-man suggestions really and I probably don't
understand enough about PG internals (MVCC and implications for
VACUUM) to be making them. But I'm curious to know if anyone has
researched something like this.

For example, I have a system that occupies a couple of TB on disk, but
only a few to a few hundred MB per day change, mostly adding data to
an active partition. I periodically run CLUSTER on any partition that
has pg_stat.correlation < 0.9 (this effectively just re-CLUSTERs the
active one). At the same times I would COMPRESS, and the DB could
potentially fit on much smaller SSDs.

Most commercial database systems I encounter these days are using
compression of some sort (more sophisticated than the above,
with dynamic dictionaries, and sometimes column based storage etc).

Thanks

Thomas

From:	Merlin Moncure <mmoncure(at)gmail(dot)com>
To:	Thomas Munro <munro(at)ip9(dot)org>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of tables
Date:	2013-12-10 15:15:51
Message-ID:	CAHyXU0wpqftA+s8ctVVtzJLoJyTeNKMRK7q6cBqQd5VQqpcuCA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Dec 10, 2013 at 1:18 AM, Thomas Munro <munro(at)ip9(dot)org> wrote:
> Hi
>
> I have been wondering what the minimum useful heap table compression
> system would be for Postgres, in order to reduce disk footprint of
> large mostly static datasets. Do you think an approach similar to the
> static row-level compression of DB2 could make sense? I imagine
> something like this:
>
> 1. You have a table which already has data in it.
>
> 2. You run a COMPRESS operation, which builds a static dictionary,
> and rewrites the whole table with compressed frozen tuples. Frozen
> tuples have CTIDs just like regular tuples, and can be pointed to by
> indexes. They are decompressed on the fly when needed.
>
> Clearly things get tricky once you need to update rows. Assume for
> simplicity that future UPDATEs and INSERTs produce normal,
> non-compressed tuples that would only be compressed by a subsequent
> COMPRESS operation. The question is how to deal with the existing
> compressed rows when UPDATEd or DELETEd. Some approaches:
>
> 1. Just don't allow updates of compressed rows (!).
>
> 2. Exclusively lock the whole page when trying to update any
> compressed row, while you explode it into regular uncompressed tuples
> on new pages which you can work on (!).
>
> 3. Pull the minimum header fields out of the compressed tuples so
> that the MVCC machinery can work, to support updates of compressed
> tuples. Perhaps just the t_xmax, t_ctid values (t_xmin == frozen is
> implied), so that a writer can update them. This means an overhead of
> at least 10 bytes per tuple over the compressed size (plus the item
> offsets in the page header).
>
> 4. Something far cleverer.
>
> Well, these are straw-man suggestions really and I probably don't
> understand enough about PG internals (MVCC and implications for
> VACUUM) to be making them. But I'm curious to know if anyone has
> researched something like this.
>
> For example, I have a system that occupies a couple of TB on disk, but
> only a few to a few hundred MB per day change, mostly adding data to
> an active partition. I periodically run CLUSTER on any partition that
> has pg_stat.correlation < 0.9 (this effectively just re-CLUSTERs the
> active one). At the same times I would COMPRESS, and the DB could
> potentially fit on much smaller SSDs.
>
> Most commercial database systems I encounter these days are using
> compression of some sort (more sophisticated than the above,
> with dynamic dictionaries, and sometimes column based storage etc).

postgres compresses TOASTED data: one strategy could be to arrange
your data somehow to utilize TOAST.

I doubt you'll ever see generally heap compressed data in the way
you're thinking: postgres has a strong informal policy of not
implementing features which are dubious and or excessively complicated
with limited benefit, particularly if there are ways to handle this
outside the database; there are various operating system level tricks
that can cause a compressed file or even an entire tablespace (o/s
folder) masquerade as a regular structures. So maybe you are asking
for a feature we already have: CREATE TABLESPACE.

For example take a look here:
https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F

(out of curiosity, if this strategy fits the bill for you I wouldn't
mind seeing a follow up on how this handles your static data use
case).

merlin

From:	Thomas Munro <munro(at)ip9(dot)org>
To:	Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Compression of tables
Date:	2013-12-14 09:46:28
Message-ID:	CADLWmXW-jKW2qRuZEtTtmoQdrL8WJOF9FaZSMjM5LpenHHcXpg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 10 December 2013 15:15, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:

> I doubt you'll ever see generally heap compressed data in the way
> you're thinking: postgres has a strong informal policy of not
> implementing features which are dubious and or excessively complicated
> with limited benefit, particularly if there are ways to handle this
> outside the database; there are various operating system level tricks
> that can cause a compressed file or even an entire tablespace (o/s
> folder) masquerade as a regular structures. So maybe you are asking
> for a feature we already have: CREATE TABLESPACE.
>
> For example take a look here:
>
> https://btrfs.wiki.kernel.org/index.php/Compression#How_do_I_enable_compression.3F
>
> (out of curiosity, if this strategy fits the bill for you I wouldn't
> mind seeing a follow up on how this handles your static data use
> case).
>

Thanks for the suggestion. I see your point, those other database
generally do more themselves (direct IO, raw disk devices etc).

So I started experimenting with btrfs. I copied a 1.1TB pg_data
directory onto a zlib compressed btrfs raid1 filesystem, and it used
~840GB of physical disk. Not a great compression ratio, but I didn't
explore options other than the default for compression. Then my 3.2
kernel printed a few nasty messages and all IO locked up... I should
probably try a more recent kernel!

Googling, I see a number of people have reported success with PG on
ZFS with compression, with 2x to 4x compression and faster scans
due to increased effective IO bandwidth.

This does seem to make a lot of sense for my append-only static data
use case. I'll have to work on my fear of new filesystems.