Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-14 00:52:30
Message-ID: 14953.1407977550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Seems we have two issues:
> 1) the header makes testing for compression likely to fail
> 2) use of pointers rather than offsets reduces compression potential

> I understand we are focusing on #1, but how much does compression reduce
> the storage size with and without #2? Seems we need to know that answer
> before deciding if it is worth reducing the ability to do fast lookups
> with #2.

That's a fair question. I did a very very simple hack to replace the item
offsets with item lengths -- turns out that that mostly requires removing
some code that changes lengths to offsets ;-). I then loaded up Larry's
example of a noncompressible JSON value, and compared pg_column_size()
which is just about the right thing here since it reports datum size after
compression. Remembering that the textual representation is 12353 bytes:

json: 382 bytes
jsonb, using offsets: 12593 bytes
jsonb, using lengths: 406 bytes

So this confirms my suspicion that the choice of offsets not lengths
is what's killing compressibility. If it used lengths, jsonb would be
very nearly as compressible as the original text.

Hack attached in case anyone wants to collect more thorough statistics.
We'd not actually want to do it like this because of the large expense
of recomputing the offsets on-demand all the time. (It does pass the
regression tests, for what that's worth.)

regards, tom lane

Attachment Content-Type Size
jsonb-with-lengths-hack.patch text/x-diff 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2014-08-14 01:01:43 Re: jsonb format is pessimal for toast compression
Previous Message Stephen Frost 2014-08-14 00:26:12 Re: replication commands and log_statements