Compression

Lists: pgsql-general
From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Compression
Date: 2011-04-14 23:01:54
Message-ID: BANLkTinQ+GRsJuRGcgP=FBeAgnZ3H-Jkgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Is there any effort to add compression into PG, a la MySQL's
row_format=compressed or HBase's LZO block compression?


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Yang Zhang <yanghatespam(at)gmail(dot)com>
Subject: Re: Compression
Date: 2011-04-14 23:04:58
Message-ID: 201104141604.58913.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 14, 2011 4:01:54 pm Yang Zhang wrote:
> Is there any effort to add compression into PG, a la MySQL's
> row_format=compressed or HBase's LZO block compression?

TOAST?
http://www.postgresql.org/docs/9.0/interactive/storage-toast.html
--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Compression
Date: 2011-04-14 23:50:44
Message-ID: 4DA78854.8020401@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 15/04/2011 7:01 AM, Yang Zhang wrote:
> Is there any effort to add compression into PG, a la MySQL's
> row_format=compressed or HBase's LZO block compression?

There's no row compression, but as mentioned by others there is
out-of-line compression of large values using TOAST.

Row compression would be interesting, but I can't imagine it not having
been investigated already.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Cc: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>, Yang Zhang <yanghatespam(at)gmail(dot)com>
Subject: Re: Compression
Date: 2011-04-15 00:07:43
Message-ID: 201104141707.43492.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
> On 15/04/2011 7:01 AM, Yang Zhang wrote:
> > Is there any effort to add compression into PG, a la MySQL's
> > row_format=compressed or HBase's LZO block compression?
>
> There's no row compression, but as mentioned by others there is
> out-of-line compression of large values using TOAST.

I could be misunderstanding but I thought compression happened in the row as
well. From the docs:

"EXTENDED allows both compression and out-of-line storage. This is the default
for most TOAST-able data types. Compression will be attempted first, then out-of-
line storage if the row is still too big. "

>
> Row compression would be interesting, but I can't imagine it not having
> been investigated already.

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 00:51:21
Message-ID: BANLkTim3LqeHZpfF=URwY4iARhwiVHkHkQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com> wrote:
> On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
>
>> On 15/04/2011 7:01 AM, Yang Zhang wrote:
>
>> > Is there any effort to add compression into PG, a la MySQL's
>
>> > row_format=compressed or HBase's LZO block compression?
>
>>
>
>> There's no row compression, but as mentioned by others there is
>
>> out-of-line compression of large values using TOAST.
>
> I could be misunderstanding but I thought compression happened in the row as
> well. From the docs:
>
> "EXTENDED allows both compression and out-of-line storage. This is the
> default for most TOAST-able data types. Compression will be attempted first,
> then out-of-
>
> line storage if the row is still too big. "
>
>>
>
>> Row compression would be interesting, but I can't imagine it not having
>
>> been investigated already.
>
> --
>
> Adrian Klaver
>
> adrian(dot)klaver(at)gmail(dot)com

Already know about TOAST. I could've been clearer, but that's not the
same as the block-/page-level compression I was referring to.

--
Yang Zhang
http://yz.mit.edu/


From: "mark" <dvlhntr(at)gmail(dot)com>
To: "'Yang Zhang'" <yanghatespam(at)gmail(dot)com>, "'Adrian Klaver'" <adrian(dot)klaver(at)gmail(dot)com>
Cc: <pgsql-general(at)postgresql(dot)org>, "'Craig Ringer'" <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 01:46:17
Message-ID: 009001cbfb0e$eacfec40$c06fc4c0$@com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> -----Original Message-----
> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
> owner(at)postgresql(dot)org] On Behalf Of Yang Zhang
> Sent: Thursday, April 14, 2011 6:51 PM
> To: Adrian Klaver
> Cc: pgsql-general(at)postgresql(dot)org; Craig Ringer
> Subject: Re: [GENERAL] Compression
>
> On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
> <adrian(dot)klaver(at)gmail(dot)com> wrote:
> > On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
> >
> >> On 15/04/2011 7:01 AM, Yang Zhang wrote:
> >
> >> > Is there any effort to add compression into PG, a la MySQL's
> >
> >> > row_format=compressed or HBase's LZO block compression?
> >
> >>
> >
> >> There's no row compression, but as mentioned by others there is
> >
> >> out-of-line compression of large values using TOAST.
> >
> > I could be misunderstanding but I thought compression happened in the
> row as
> > well. From the docs:
> >
> > "EXTENDED allows both compression and out-of-line storage. This is
> the
> > default for most TOAST-able data types. Compression will be attempted
> first,
> > then out-of-
> >
> > line storage if the row is still too big. "
> >
> >>
> >
> >> Row compression would be interesting, but I can't imagine it not
> having
> >
> >> been investigated already.
> >
> > --
> >
> > Adrian Klaver
> >
> > adrian(dot)klaver(at)gmail(dot)com
>
> Already know about TOAST. I could've been clearer, but that's not the
> same as the block-/page-level compression I was referring to.

There is a (closed source) PG fork that has row (or column) oriented storage
that can have compression applied to them.... if you are willing to give up
updates and deletes on the table that is.

I haven't seen a lot of people talking about wanting that in the Postgres
core tho.

-M

>
> --
> Yang Zhang
> http://yz.mit.edu/
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 02:42:01
Message-ID: 201104141942.02384.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:

> >
> > adrian(dot)klaver(at)gmail(dot)com
>
> Already know about TOAST. I could've been clearer, but that's not the
> same as the block-/page-level compression I was referring to.

I am obviously missing something. The TOAST mechanism is designed to keep tuple
data below the default 8KB page size. In fact it kicks in at a lower level than
that:

"The TOAST code is triggered only when a row value to be stored in a table is
wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code will
compress and/or move field values out-of-line until the row value is shorter than
TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains can be had.
During an UPDATE operation, values of unchanged fields are normally preserved as-
is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none
of the out-of-line values change.'

Granted no all data types are TOASTable. Are you looking for something more
aggressive than that?

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 02:46:34
Message-ID: BANLkTik_9fGXhhDChwZu+99W5G6uJYztPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com> wrote:
> On Thursday, April 14, 2011 5:51:21 pm Yang Zhang wrote:
>
>> >
>
>> > adrian(dot)klaver(at)gmail(dot)com
>
>>
>
>> Already know about TOAST. I could've been clearer, but that's not the
>
>> same as the block-/page-level compression I was referring to.
>
> I am obviously missing something. The TOAST mechanism is designed to keep
> tuple data below the default 8KB page size. In fact it kicks in at a lower
> level than that:
>
> "The TOAST code is triggered only when a row value to be stored in a table
> is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST code
> will compress and/or move field values out-of-line until the row value is
> shorter than TOAST_TUPLE_TARGET bytes (also normally 2 kB) or no more gains
> can be had. During an UPDATE operation, values of unchanged fields are
> normally preserved as-is; so an UPDATE of a row with out-of-line values
> incurs no TOAST costs if none of the out-of-line values change.'
>
> Granted no all data types are TOASTable. Are you looking for something more
> aggressive than that?

Yes.

http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html

http://wiki.apache.org/hadoop/UsingLzoCompression

http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-algorithms.html

>
> --
>
> Adrian Klaver
>
> adrian(dot)klaver(at)gmail(dot)com

--
Yang Zhang
http://yz.mit.edu/


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Yang Zhang <yanghatespam(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 04:06:58
Message-ID: 201104142106.58731.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 14, 2011 7:46:34 pm Yang Zhang wrote:
> On Thu, Apr 14, 2011 at 7:42 PM, Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
wrote:

> > Granted no all data types are TOASTable. Are you looking for something
> > more aggressive than that?
>
> Yes.
>
> http://blog.oskarsson.nu/2009/03/hadoop-feat-lzo-save-disk-space-and.html
>
> http://wiki.apache.org/hadoop/UsingLzoCompression
>
> http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals-
> algorithms.html

I can see that as a another use case for SQL/MED in 9.1+.

>
> > --
> >
> > Adrian Klaver
> >
> > adrian(dot)klaver(at)gmail(dot)com

--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: Yang Zhang <yanghatespam(at)gmail(dot)com>
To: mark <dvlhntr(at)gmail(dot)com>
Cc: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Subject: Re: Compression
Date: 2011-04-15 04:16:35
Message-ID: BANLkTinUMVpVWFO7-iDoZ5nu3sqw3_FiEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thu, Apr 14, 2011 at 6:46 PM, mark <dvlhntr(at)gmail(dot)com> wrote:
>
>
>> -----Original Message-----
>> From: pgsql-general-owner(at)postgresql(dot)org [mailto:pgsql-general-
>> owner(at)postgresql(dot)org] On Behalf Of Yang Zhang
>> Sent: Thursday, April 14, 2011 6:51 PM
>> To: Adrian Klaver
>> Cc: pgsql-general(at)postgresql(dot)org; Craig Ringer
>> Subject: Re: [GENERAL] Compression
>>
>> On Thu, Apr 14, 2011 at 5:07 PM, Adrian Klaver
>> <adrian(dot)klaver(at)gmail(dot)com> wrote:
>> > On Thursday, April 14, 2011 4:50:44 pm Craig Ringer wrote:
>> >
>> >> On 15/04/2011 7:01 AM, Yang Zhang wrote:
>> >
>> >> > Is there any effort to add compression into PG, a la MySQL's
>> >
>> >> > row_format=compressed or HBase's LZO block compression?
>> >
>> >>
>> >
>> >> There's no row compression, but as mentioned by others there is
>> >
>> >> out-of-line compression of large values using TOAST.
>> >
>> > I could be misunderstanding but I thought compression happened in the
>> row as
>> > well. From the docs:
>> >
>> > "EXTENDED allows both compression and out-of-line storage. This is
>> the
>> > default for most TOAST-able data types. Compression will be attempted
>> first,
>> > then out-of-
>> >
>> > line storage if the row is still too big. "
>> >
>> >>
>> >
>> >> Row compression would be interesting, but I can't imagine it not
>> having
>> >
>> >> been investigated already.
>> >
>> > --
>> >
>> > Adrian Klaver
>> >
>> > adrian(dot)klaver(at)gmail(dot)com
>>
>> Already know about TOAST.  I could've been clearer, but that's not the
>> same as the block-/page-level compression I was referring to.
>
> There is a (closed source) PG fork that has row (or column) oriented storage
> that can have compression applied to them.... if you are willing to give up
> updates and deletes on the table that is.

Greenplum and Aster?

We *are* mainly doing analytical (non-updating/deleting) processing.
But it's not a critical pain point - we're mainly interested in FOSS
for now.

>
>
> I haven't seen a lot of people talking about wanting that in the Postgres
> core tho.
>
>
> -M
>
>>
>> --
>> Yang Zhang
>> http://yz.mit.edu/
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>
>

--
Yang Zhang
http://yz.mit.edu/


From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org, Yang Zhang <yanghatespam(at)gmail(dot)com>
Subject: Re: Compression
Date: 2011-04-15 04:37:10
Message-ID: 4DA7CB76.2050208@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 15/04/2011 8:07 AM, Adrian Klaver wrote:

> "EXTENDED allows both compression and out-of-line storage. This is the
> default for most TOAST-able data types. Compression will be attempted
> first, then out-of-
>
> line storage if the row is still too big. "

Good point. I was unclear; thanks for pointing it out.

What I was trying to say is that there's no whole-row compression, ie
compression of the whole tuple except for minimal headers. A value in a
field may be compressed, but you can't (say) compress a 100-column row
of integers in Pg, because the individual fields don't support compression.

--
Craig Ringer

Tech-related writing at http://soapyfrogs.blogspot.com/


From: Adrian Klaver <adrian(dot)klaver(at)gmail(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: pgsql-general(at)postgresql(dot)org, Yang Zhang <yanghatespam(at)gmail(dot)com>
Subject: Re: Compression
Date: 2011-04-15 13:33:37
Message-ID: 201104150633.37778.adrian.klaver@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Thursday, April 14, 2011 9:37:10 pm Craig Ringer wrote:
> On 15/04/2011 8:07 AM, Adrian Klaver wrote:
> > "EXTENDED allows both compression and out-of-line storage. This is the
> > default for most TOAST-able data types. Compression will be attempted
> > first, then out-of-
> >
> > line storage if the row is still too big. "
>
> Good point. I was unclear; thanks for pointing it out.
>
> What I was trying to say is that there's no whole-row compression, ie
> compression of the whole tuple except for minimal headers. A value in a
> field may be compressed, but you can't (say) compress a 100-column row
> of integers in Pg, because the individual fields don't support compression.

Got it now, thanks.
--
Adrian Klaver
adrian(dot)klaver(at)gmail(dot)com


From: rtshadow <przemek(at)hadapt(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Compression
Date: 2012-10-10 15:01:49
Message-ID: 1349881309870-5727363.post@n5.nabble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Where do I find more information about PG fork you mentioned?

--
View this message in context: http://postgresql.1045698.n5.nabble.com/Compression-tp4304322p5727363.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.