Re: jsonb format is pessimal for toast compression

From: Larry White <ljw1001(at)gmail(dot)com>
To: obartunov(at)gmail(dot)com
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-14 18:49:35
Message-ID: CAMdbzVgDx4Lx+bHpazEKGBwQURmegX1-_Gr05R5c-EpDrZ3+EA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I attached a json file of approximately 513K. It contains two repetitions
of a single json structure. The values are quasi-random. It might make a
decent test case of meaningfully sized data.

best

On Thu, Aug 14, 2014 at 2:25 PM, Oleg Bartunov <obartunov(at)gmail(dot)com> wrote:

> I did quick test on the same bookmarks to test performance of 9.4beta2 and
> 9.4beta2+patch
>
> The query was the same we used in pgcon presentation:
> SELECT count(*) FROM jb WHERE jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;
>
> table size | time (ms)
> 9.4beta2: 1374 Mb | 1160
> 9.4beta2+patch: 1373 Mb | 1213
>
>
> Yes, performance degrades, but not much. There is also small win in
> table size, but bookmarks are not big, so it's difficult to say about
> compression.
>
> Oleg
>
>
>
> On Thu, Aug 14, 2014 at 9:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Bruce Momjian <bruce(at)momjian(dot)us> writes:
>> > On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
>> >> This gets back to the problem of what test case are we going to
>> consider
>> >> while debating what solution to adopt.
>>
>> > Uh, we just one need one 12k JSON document from somewhere. Clearly this
>> > is something we can easily get.
>>
>> I would put little faith in a single document as being representative.
>>
>> To try to get some statistics about a real-world case, I looked at the
>> delicio.us dataset that someone posted awhile back (1252973 JSON docs).
>> These have a minimum length (in text representation) of 604 bytes and
>> a maximum length of 5949 bytes, which means that they aren't going to
>> tell us all that much about large JSON docs, but this is better than
>> no data at all.
>>
>> Since documents of only a couple hundred bytes aren't going to be subject
>> to compression, I made a table of four columns each containing the same
>> JSON data, so that each row would be long enough to force the toast logic
>> to try to do something. (Note that none of these documents are anywhere
>> near big enough to hit the refuses-to-compress problem.) Given that,
>> I get the following statistics for pg_column_size():
>>
>> min max avg
>>
>> JSON (text) representation 382 1155 526.5
>>
>> HEAD's JSONB representation 493 1485 695.1
>>
>> all-lengths representation 440 1257 615.3
>>
>> So IOW, on this dataset the existing JSONB representation creates about
>> 32% bloat compared to just storing the (compressed) user-visible text,
>> and switching to all-lengths would about halve that penalty.
>>
>> Maybe this is telling us it's not worth changing the representation,
>> and we should just go do something about the first_success_by threshold
>> and be done. I'm hesitant to draw such conclusions on the basis of a
>> single use-case though, especially one that doesn't really have that
>> much use for compression in the first place. Do we have other JSON
>> corpuses to look at?
>>
>> regards, tom lane
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>
>

Attachment Content-Type Size
random.json.zip application/zip 45.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-08-14 18:55:29 Re: B-Tree support function number 3 (strxfrm() optimization)
Previous Message Andres Freund 2014-08-14 18:40:05 Re: Function to know last log write timestamp