From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Peter Geoghegan <pg(at)heroku(dot)com> |
Cc: | Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com> |
Subject: | Re: jsonb format is pessimal for toast compression |
Date: | 2014-08-14 21:47:57 |
Message-ID: | 24077.1408052877@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Peter Geoghegan <pg(at)heroku(dot)com> writes:
> On Thu, Aug 14, 2014 at 10:57 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Maybe this is telling us it's not worth changing the representation,
>> and we should just go do something about the first_success_by threshold
>> and be done. I'm hesitant to draw such conclusions on the basis of a
>> single use-case though, especially one that doesn't really have that
>> much use for compression in the first place. Do we have other JSON
>> corpuses to look at?
> Yes. Pavel posted some representative JSON data a while back:
> http://pgsql.cz/data/data.dump.gz (it's a plain dump)
I did some quick stats on that. 206560 rows
min max avg
external text representation 220 172685 880.3
JSON representation (compressed text) 224 78565 541.3
pg_column_size, JSONB HEAD repr. 225 82540 639.0
pg_column_size, all-lengths repr. 225 66794 531.1
So in this data, there definitely is some scope for compression:
just compressing the text gets about 38% savings. The all-lengths
hack is able to beat that slightly, but the all-offsets format is
well behind at 27%.
Not sure what to conclude. It looks from both these examples like
we're talking about a 10 to 20 percent size penalty for JSON objects
that are big enough to need compression. Is that beyond our threshold
of pain? I'm not sure, but there is definitely room to argue that the
extra I/O costs will swamp any savings we get from faster access to
individual fields or array elements.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Gavin Flower | 2014-08-14 22:10:28 | Re: jsonb format is pessimal for toast compression |
Previous Message | Rukh Meski | 2014-08-14 21:18:12 | LIMIT for UPDATE and DELETE |