Re: jsonb format is pessimal for toast compression

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Claudio Freire <klaussfreire(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Jan Wieck <jan(at)wi3ck(dot)info>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-09-16 16:47:39
Message-ID: 541869AB.4080506@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 09/16/2014 06:31 AM, Robert Haas wrote:
> On Mon, Sep 15, 2014 at 7:44 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:
>> On Mon, Sep 15, 2014 at 4:05 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>>> Actually, having the keys all at the same level *is* relevant for the
>>> issue we're discussing. If those 270 keys are organized in a tree, it's
>>> not the same as having them all on one level (and not as problematic).
>>
>> I believe Robert meant that the 270 keys are not at the top level, but
>> are at some level (in other words, some object has 270 pairs). That is
>> equivalent to having them at the top level for the purposes of this
>> discussion.
>
> Yes, that's exactly what I meant.
>
>> FWIW, I am slightly concerned about weighing use cases around very
>> large JSON documents too heavily. Having enormous jsonb documents just
>> isn't going to work out that well, but neither will equivalent designs
>> in popular document database systems for similar reasons. For example,
>> the maximum BSON document size supported by MongoDB is 16 megabytes,
>> and that seems to be something that their users don't care too much
>> about. Having 270 pairs in an object isn't unreasonable, but it isn't
>> going to be all that common either.

Well, I can only judge from the use cases I personally have, none of
which involve more than 100 keys at any level for most rows. So far
I've seen some people argue hypotetical use cases involving hundreds of
keys per level, but nobody who *actually* has such a use case. Also,
note that we currently don't know where the "last value" extraction
becomes a performance problem at this stage, except that it's somewhere
between 200 and 100,000. Also, we don't have a test which shows the
hybrid approach (Heikki's patch) performing better with 1000's of keys.

Basically, if someone is going to make a serious case for Heikki's
hybrid approach over the simpler lengths-only approach, then please post
some test data showing the benefit ASAP, since I can't demonstrate it.
Otherwise, let's get beta 3 out the door so we can get the 9.4 release
train moving again.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-09-16 16:51:24 Re: Scaling shared buffer eviction
Previous Message Álvaro Hernández Tortosa 2014-09-16 16:12:58 Re: PL/pgSQL 2