Re: page compression

Lists: pgsql-hackers
From: Andy Colson <andy(at)squeakycode(dot)net>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: page compression
Date: 2010-12-28 15:10:13
Message-ID: 4D19FDD5.8010704@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I know its been discussed before, and one big problem is license and
patent problems.

Would this project be a problem:

http://oldhome.schmorp.de/marc/liblzf.html

-Andy


From: Joachim Wieland <joe(at)mcknight(dot)de>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page compression
Date: 2010-12-28 15:33:07
Message-ID: AANLkTimwgCaA_hG9f7A7fpNnH2d2gY0fWM8Kv6A668-d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
> I know its been discussed before, and one big problem is license and patent
> problems.
>
> Would this project be a problem:
>
> http://oldhome.schmorp.de/marc/liblzf.html

It looks like even liblzf is not going to be accepted. I have proposed
to only link against liblzf if available for pg_dump and have somehow
failed, see:

http://archives.postgresql.org/pgsql-hackers/2010-11/msg00824.php

Remember that PostgreSQL has toast tables to compress large values and
store them externally, so it still has to be proven that page
compression has the same benefit for PostgreSQL as for other
databases.

Ironically we also use an LZ compression algorithm for toast
compression (defined in pg_lzcompress.c). I am still failing to
understand why linking against liblzf would bring us deeper into the
compression patents mine field than we already are by hardwiring and
shipping this other algorithm in pg_lzcompress.c.

Joachim


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joachim Wieland <joe(at)mcknight(dot)de>
Cc: Andy Colson <andy(at)squeakycode(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page compression
Date: 2010-12-28 15:50:06
Message-ID: 52866E3F-349E-46C9-B9C0-406FEA0C9031@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Dec 28, 2010, at 10:33 AM, Joachim Wieland <joe(at)mcknight(dot)de> wrote:
> On Tue, Dec 28, 2010 at 10:10 AM, Andy Colson <andy(at)squeakycode(dot)net> wrote:
>> I know its been discussed before, and one big problem is license and patent
>> problems.
>>
>> Would this project be a problem:
>>
>> http://oldhome.schmorp.de/marc/liblzf.html
>
> It looks like even liblzf is not going to be accepted. I have proposed
> to only link against liblzf if available for pg_dump and have somehow
> failed, see:

I thought that was mostly about not wanting multiple changes in one patch. I don't see why liblzf would be objectionable in general.

...Robert


From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Andy Colson <andy(at)squeakycode(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page compression
Date: 2011-01-02 23:36:02
Message-ID: 1294011362.2090.4214.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote:

> I know its been discussed before, and one big problem is license and
> patent problems.

Would like to see a design for that. There's a few different ways we
might want to do that, and I'm interested to see if its possible to get
compressed pages to be indexable as well.

For example, if you compress 2 pages into 8Kb then you do one I/O and
out pops 2 buffers. That would work nicely with ring buffers.

Or you might try to have pages > 8Kb in one block, which would mean
decompressing every time you access the page. That wouldn't be much of a
problem if we were just seq scanning.

Or you might want to compress the whole table at once, so it can only be
read by seq scan. Efficient, but not indexes.

It would be interesting to explore pre-populating the compression
dictionary with some common patterns.

Anyway, interesting topic.

--
Simon Riggs http://www.2ndQuadrant.com/books/
PostgreSQL Development, 24x7 Support, Training and Services


From: Jim Nasby <jim(at)nasby(dot)net>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andy Colson <andy(at)squeakycode(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page compression
Date: 2011-01-03 09:02:25
Message-ID: 40030A90-48A3-4AD5-AD19-350A90112181@nasby.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jan 2, 2011, at 5:36 PM, Simon Riggs wrote:
> On Tue, 2010-12-28 at 09:10 -0600, Andy Colson wrote:
>
>> I know its been discussed before, and one big problem is license and
>> patent problems.
>
> Would like to see a design for that. There's a few different ways we
> might want to do that, and I'm interested to see if its possible to get
> compressed pages to be indexable as well.
>
> For example, if you compress 2 pages into 8Kb then you do one I/O and
> out pops 2 buffers. That would work nicely with ring buffers.
>
> Or you might try to have pages > 8Kb in one block, which would mean
> decompressing every time you access the page. That wouldn't be much of a
> problem if we were just seq scanning.
>
> Or you might want to compress the whole table at once, so it can only be
> read by seq scan. Efficient, but not indexes.

FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you modified a row, it ended up un-compressed, presumably in a different page that was also un-compressed.

I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could do that I don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific either...
--
Jim C. Nasby, Database Architect jim(at)nasby(dot)net
512.569.9461 (cell) http://jim.nasby.net


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andy Colson <andy(at)squeakycode(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: page compression
Date: 2011-01-03 15:03:39
Message-ID: AANLkTim4WJpMrJ_EYZgXPTyj-d48DooWHLOr47iLhsu9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Jan 3, 2011 at 4:02 AM, Jim Nasby <jim(at)nasby(dot)net> wrote:
> FWIW, last time I looked at how Oracle handled compression, it would only compress existing data. As soon as you modified a row, it ended up un-compressed, presumably in a different page that was also un-compressed.

IIUC, InnoDB basically compresses a block as small as it'll go, and
then stores it in a regular size block. That leaves free space at the
end, which can be used to cram additional tuples into the page.
Eventually that free space is exhausted, at which point you try to
recompress the whole page and see if that gives you room to cram in
even more stuff.

I thought that was a pretty clever approach.

> I wonder if it would be feasible to use a fork to store where a compressed page lives inside the heap... if we could do that I don't see any reason why indexes wouldn't work. The changes required to support that might not be too horrific either...

At first blush, that sounds like a recipe for large amounts of
undesirable random I/O.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company