Re: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column

Lists: pgsql-general
From: Joel Stevenson <jstevenson(at)bepress(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Date: 2011-04-11 22:19:23
Message-ID: 469BA72B-2E6E-4E59-BAEE-2C9377ABB486@bepress.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from the outside the setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrong numbers. If I create two new tables with a single bytea column and set one of them to external storage, then insert an existing bytea value from another table into each one, they appear to be of exactly the same size. This is using PG 9.0.3 on Debian Lenny, using the backports-sloppy deb package of PG 9.

(I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.)

create table obj1 ( object bytea );
create table obj2 ( object bytea );
alter table obj2 alter column object set storage external;
insert into obj1 ( object ) select object from serialized_content where id = 12345;
insert into obj2 ( object ) select object from obj1;
select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
o1 | otoast1 | o2 | otoast2
-------+---------+-------+---------
65536 | 57344 | 65536 | 57344
(1 row)

Now at this point if I perform a vacuum full on one or both, they'll both shrink down to a bit over half that size:

vacuum full obj1;
vacuum full obj2;
select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
o1 | otoast1 | o2 | otoast2
-------+---------+-------+---------
40960 | 32768 | 40960 | 32768

This shrinkage doesn't occur with a normal vacuum, in fact the reported sizes increase if I do a normal vacuum at that point, but that's not germane to my question AFAICT.

Can I use the relation size like this to determine whether or not compression is happening for these toast columns? If not, is there a way that I can confirm that it is or isn't active? The results appear to be similar for text columns.

Thanks,
Joel


From: Bill Moran <wmoran(at)potentialtech(dot)com>
To: Joel Stevenson <jstevenson(at)bepress(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Date: 2011-04-12 17:33:45
Message-ID: 20110412133345.882267b4.wmoran@potentialtech.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

In response to Joel Stevenson <jstevenson(at)bepress(dot)com>:

> Hi all,
>
> I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from the outside the setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrong numbers. If I create two new tables with a single bytea column and set one of them to external storage, then insert an existing bytea value from another table into each one, they appear to be of exactly the same size. This is using PG 9.0.3 on Debian Lenny, using the backports-sloppy deb package of PG 9.
>
> (I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.)
>
> create table obj1 ( object bytea );
> create table obj2 ( object bytea );
> alter table obj2 alter column object set storage external;
> insert into obj1 ( object ) select object from serialized_content where id = 12345;
> insert into obj2 ( object ) select object from obj1;
> select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
> o1 | otoast1 | o2 | otoast2
> -------+---------+-------+---------
> 65536 | 57344 | 65536 | 57344
> (1 row)
>
> Now at this point if I perform a vacuum full on one or both, they'll both shrink down to a bit over half that size:
>
> vacuum full obj1;
> vacuum full obj2;
> select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
> o1 | otoast1 | o2 | otoast2
> -------+---------+-------+---------
> 40960 | 32768 | 40960 | 32768

I'm not an expert, but it looks like you're not storing enough data to
actually see the difference, since the actual sizes of the tables will
always be rounded to an even page size. With only 1 row, it's always
going to take a minimum amount.

Also, are you sure you're storing compressible data? For example, if
you're putting PNG or JPEG images in there, they're not going to compress
any.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/


From: Joel Stevenson <jstevenson(at)bepress(dot)com>
To: Bill Moran <wmoran(at)potentialtech(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Date: 2011-04-12 22:47:28
Message-ID: 6B806731-69E5-4AFD-9822-A3F6D22BEB7B@bepress.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


On Apr 12, 2011, at 10:33 AM, Bill Moran wrote:

> In response to Joel Stevenson <jstevenson(at)bepress(dot)com>:
>
>> select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
>> o1 | otoast1 | o2 | otoast2
>> -------+---------+-------+---------
>> 40960 | 32768 | 40960 | 32768
>
> I'm not an expert, but it looks like you're not storing enough data to
> actually see the difference, since the actual sizes of the tables will
> always be rounded to an even page size. With only 1 row, it's always
> going to take a minimum amount.
>
> Also, are you sure you're storing compressible data? For example, if
> you're putting PNG or JPEG images in there, they're not going to compress
> any.
>

Thanks for the reply, Bill.

The data is very compressible, the raw data is 33392 bytes long and gzips down to 6965 bytes. As far as not storing enough, the description of the 'SET STORAGE' clause and the TOAST strategy it sounds like the TOASTer will try to compress anything that doesn't fit into the PG page ( default 8Kb ) so I would've thought that compression would be used for the EXTENDED column and not used for the EXTERNAL column since my single-row data is larger than that.

To be certain I stored 10 rows of that data and rechecked the reported size after a vacuum full:

select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
o1 | otoast1 | o2 | otoast2
--------+---------+--------+---------
147456 | 139264 | 147456 | 139264

So, again from the outside, the column storage settings don't appear to be behaving as I'd expect them too.

Stumped.

- Joel


From: Noah Misch <noah(at)leadboat(dot)com>
To: Joel Stevenson <jstevenson(at)bepress(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Date: 2011-04-21 21:02:34
Message-ID: 20110421210234.GA30833@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Mon, Apr 11, 2011 at 03:19:23PM -0700, Joel Stevenson wrote:
> I'm trying to do some comparisons between the EXTERNAL and the EXTENDED storage methods on a bytea column and from the outside the setting doesn't appear to affect the value stored on initial insert, but perhaps I'm looking at the wrong numbers. If I create two new tables with a single bytea column and set one of them to external storage, then insert an existing bytea value from another table into each one, they appear to be of exactly the same size. This is using PG 9.0.3 on Debian Lenny, using the backports-sloppy deb package of PG 9.
>
> (I've verified that the first table has "extended" storage via pg_attribute and that the second table has external.)
>
> create table obj1 ( object bytea );
> create table obj2 ( object bytea );
> alter table obj2 alter column object set storage external;
> insert into obj1 ( object ) select object from serialized_content where id = 12345;
> insert into obj2 ( object ) select object from obj1;

If the value that shows up for insertion is already compressed, EXTERNAL storage
will not decompress it. Change this line to

insert into obj2 ( object ) select object || '' from obj1;

to observe the effect you seek.

Given the purpose of EXTERNAL storage, this might qualify as a bug.

> select pg_total_relation_size('obj1') as o1, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj1' ) ) as otoast1, pg_total_relation_size('obj2') as o2, pg_total_relation_size( (select reltoastrelid from pg_class where relname = 'obj2' ) ) as otoast2;
> o1 | otoast1 | o2 | otoast2
> -------+---------+-------+---------
> 65536 | 57344 | 65536 | 57344

> Can I use the relation size like this to determine whether or not compression is happening for these toast columns? If not, is there a way that I can confirm that it is or isn't active? The results appear to be similar for text columns.

Yes; the sizes you're seeing through that method should be accurate.

nm


From: Joel Stevenson <jstevenson(at)bepress(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Column storage (EXTERNAL/EXTENDED) settings for bytea/text column
Date: 2011-04-27 15:52:49
Message-ID: 48F3046A-3DA9-406A-B3AA-02952B7E9355@bepress.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thanks for the help with that, Noah. Indeed the sizes do look like I'd expect them to if I force deflation of the bytea value before inserting it into the EXTENDED column.

On Apr 21, 2011, at 2:02 PM, Noah Misch wrote:

> On Mon, Apr 11, 2011 at 03:19:23PM -0700, Joel Stevenson wrote:
>> create table obj1 ( object bytea );
>> create table obj2 ( object bytea );
>> alter table obj2 alter column object set storage external;
>> insert into obj1 ( object ) select object from serialized_content where id = 12345;
>> insert into obj2 ( object ) select object from obj1;
>
> If the value that shows up for insertion is already compressed, EXTERNAL storage
> will not decompress it. Change this line to
>
> insert into obj2 ( object ) select object || '' from obj1;
>
> to observe the effect you seek.
>
> Given the purpose of EXTERNAL storage, this might qualify as a bug.