Re: jsonb format is pessimal for toast compression

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Kevin Grittner <kgrittn(at)ymail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-11 19:39:21
Message-ID: CAM3SWZSDMkntNCG8dm-grcke_BjZ6U3sSDdMVWhpC_VXJwQ_Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 11, 2014 at 12:07 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I think that's a good point.

I think that there may be something to be said for the current layout.
Having adjacent keys and values could take better advantage of CPU
cache characteristics. I've heard of approaches to improving B-Tree
locality that forced keys and values to be adjacent on individual
B-Tree pages [1], for example. I've heard of this more than once. And
FWIW, I believe based on earlier research of user requirements in this
area that very large jsonb datums are not considered all that
compelling. Document database systems have considerable limitations
here.

> On the general topic, I don't think it's reasonable to imagine that
> we're going to come up with a single heuristic that works well for
> every kind of input data. What pglz is doing - assuming that if the
> beginning of the data is incompressible then the rest probably is too
> - is fundamentally reasonable, nonwithstanding the fact that it
> doesn't happen to work out well for JSONB. We might be able to tinker
> with that general strategy in some way that seems to fix this case and
> doesn't appear to break others, but there's some risk in that, and
> there's no obvious reason in my mind why PGLZ should be require to fly
> blind. So I think it would be a better idea to arrange some method by
> which JSONB (and perhaps other data types) can provide compression
> hints to pglz.

If there is to be any effort to make jsonb a more effective target for
compression, I imagine that that would have to target redundancy
between JSON documents. With idiomatic usage, we can expect plenty of
it.

[1] http://www.vldb.org/conf/1999/P7.pdf , "We also forced each key
and child pointer to be adjacent to each other physically"
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-08-11 19:41:29 Re: psql: show only failed queries
Previous Message Tom Lane 2014-08-11 19:35:11 Re: jsonb format is pessimal for toast compression