Re: jsonb format is pessimal for toast compression

From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-14 18:25:31
Message-ID: CAF4Au4wGH1R1_S9+BKJtY1PYCpcPJRrAon8UF6Z3eHCcKWZr9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I did quick test on the same bookmarks to test performance of 9.4beta2 and
9.4beta2+patch

The query was the same we used in pgcon presentation:
SELECT count(*) FROM jb WHERE jb @> '{"tags":[{"term":"NYC"}]}'::jsonb;

table size | time (ms)
9.4beta2: 1374 Mb | 1160
9.4beta2+patch: 1373 Mb | 1213

Yes, performance degrades, but not much. There is also small win in table
size, but bookmarks are not big, so it's difficult to say about compression.

Oleg

On Thu, Aug 14, 2014 at 9:57 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > On Thu, Aug 14, 2014 at 12:22:46PM -0400, Tom Lane wrote:
> >> This gets back to the problem of what test case are we going to consider
> >> while debating what solution to adopt.
>
> > Uh, we just one need one 12k JSON document from somewhere. Clearly this
> > is something we can easily get.
>
> I would put little faith in a single document as being representative.
>
> To try to get some statistics about a real-world case, I looked at the
> delicio.us dataset that someone posted awhile back (1252973 JSON docs).
> These have a minimum length (in text representation) of 604 bytes and
> a maximum length of 5949 bytes, which means that they aren't going to
> tell us all that much about large JSON docs, but this is better than
> no data at all.
>
> Since documents of only a couple hundred bytes aren't going to be subject
> to compression, I made a table of four columns each containing the same
> JSON data, so that each row would be long enough to force the toast logic
> to try to do something. (Note that none of these documents are anywhere
> near big enough to hit the refuses-to-compress problem.) Given that,
> I get the following statistics for pg_column_size():
>
> min max avg
>
> JSON (text) representation 382 1155 526.5
>
> HEAD's JSONB representation 493 1485 695.1
>
> all-lengths representation 440 1257 615.3
>
> So IOW, on this dataset the existing JSONB representation creates about
> 32% bloat compared to just storing the (compressed) user-visible text,
> and switching to all-lengths would about halve that penalty.
>
> Maybe this is telling us it's not worth changing the representation,
> and we should just go do something about the first_success_by threshold
> and be done. I'm hesitant to draw such conclusions on the basis of a
> single use-case though, especially one that doesn't really have that
> much use for compression in the first place. Do we have other JSON
> corpuses to look at?
>
> regards, tom lane
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2014-08-14 18:37:22 Re: Function to know last log write timestamp
Previous Message Jeff Davis 2014-08-14 18:21:14 Re: 9.5: Memory-bounded HashAgg