Re: jsonb format is pessimal for toast compression

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Peter Geoghegan <pg(at)heroku(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-15 01:28:08
Message-ID: 53ED6228.2060507@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> Before changing to to INT_MAX:
>
> thetype | colsize_distribution
> ---------+----------------------------
> json | {1741,1767,1854,1904,2292}
> jsonb | {3551,5866,5910,5958,6168}
>
> After:
>
> thetype | colsize_distribution
> ---------+----------------------------
> json | {1741,1767,1854,1904,2292}
> jsonb | {3515,3543,3636,3690,4038}
>
> So that did improve things, just not as much as we'd like.

And with Tom's test patch:

postgres=# select pg_size_pretty(pg_total_relation_size('jsonic'));

pg_size_pretty
----------------
394 MB
(1 row)

postgres=# select pg_size_pretty(pg_total_relation_size('jsonbish'));
pg_size_pretty
----------------
541 MB
(1 row)

thetype | colsize_distribution
---------+----------------------------
json | {1741,1767,1854,1904,2292}
jsonb | {2037,2114,2288,2348,2746}

Since that improved things a *lot*, just +40% instead of +200%, I
thought I'd test some select queries. I decided to test a GIN lookup
and value extraction, since indexed lookup is really what I care about.

9.4b2 no patches:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on jsonbish (cost=29.55..582.92 rows=200 width=18)
(actual time=20.814..2845.454 rows=100423 loops=1)
Recheck Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
Heap Blocks: exact=1471
-> Bitmap Index Scan on jsonbish_row_to_json_idx (cost=0.00..29.50
rows=200 width=0) (actual time=20.551..20.551 rows=100423 loops=1)
Index Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
Planning time: 0.102 ms
Execution time: 2856.179 ms

9.4b2 TL patch:

postgres=# explain analyze select row_to_json -> 'kt1_total_sum' from
jsonbish where row_to_json @> '{ "rpt_per_dt" : "2003-06-30" }';
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on jsonbish (cost=29.55..582.92 rows=200 width=18)
(actual time=24.071..5201.687 rows=100423 loops=1)
Recheck Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
Heap Blocks: exact=1471
-> Bitmap Index Scan on jsonbish_row_to_json_idx (cost=0.00..29.50
rows=200 width=0) (actual time=23.779..23.779 rows=100423 loops=1)
Index Cond: (row_to_json @> '{"rpt_per_dt": "2003-06-30"}'::jsonb)
Planning time: 0.098 ms
Execution time: 5214.212 ms

... so, an 80% increase in lookup and extraction time for swapping
offsets for lengths. That's actually all extraction time; I tried
removing the extraction from the query, and without it both queries are
close enough to be statstically insignificant.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2014-08-15 02:16:15 run xmllint during build (was Re: need xmllint on borka)
Previous Message Josh Berkus 2014-08-15 01:03:57 Re: Reporting the commit LSN at commit time