Re: jsonb format is pessimal for toast compression

From: Claudio Freire <klaussfreire(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-13 17:41:48
Message-ID: CAGTBQpZb4q0Bg2eL+SYfoQjgpSHriq6aA8Sv2v0zeU3JSW+JBg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 12, 2014 at 8:00 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Mon, Aug 11, 2014 at 01:44:05PM -0700, Peter Geoghegan wrote:
>> On Mon, Aug 11, 2014 at 1:01 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>> > We've got a clear example of someone, quite reasonably, expecting their
>> > JSONB object to be compressed using the normal TOAST mechanism, and
>> > we're failing to do that in cases where it's actually a win to do so.
>> > That's the focus of this discussion and what needs to be addressed
>> > before 9.4 goes out.
>>
>> Sure. I'm not trying to minimize that. We should fix it, certainly.
>> However, it does bear considering that JSON data, with each document
>> stored in a row is not an effective target for TOAST compression in
>> general, even as text.
>
> Seems we have two issues:
>
> 1) the header makes testing for compression likely to fail
> 2) use of pointers rather than offsets reduces compression potential

I do think the best solution for 2 is what's been proposed already, to
do delta-coding of the pointers in chunks (ie, 1 pointer, 15 deltas,
repeat).

But it does make binary search quite more complex.

Alternatively, it could be somewhat compressed as follows:

Segment = 1 pointer head, 15 deltas
Pointer head = pointers[0]
delta[i] = pointers[i] - pointers[0] for i in 1..15

(delta to segment head, not previous value)

Now, you can have 4 types of segments. 8, 16, 32, 64 bits, which is
the size of the deltas. You achieve between 8x and 1x compression, and
even when 1x (no compression), you make it easier for pglz to find
something compressible.

Accessing it is also simple, if you have a segment index (tough part here).

Replace the 15 for something that makes such segment index very compact ;)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-08-13 17:51:24 Re: WAL format and API changes (9.5)
Previous Message Claudio Freire 2014-08-13 17:33:39 Re: Proposal: Incremental Backup