Re: jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Peter Geoghegan <pg(at)heroku(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Robert Haas <robertmhaas(at)gmail(dot)com>, Kevin Grittner <kgrittn(at)ymail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Larry White <ljw1001(at)gmail(dot)com>
Subject: Re: jsonb format is pessimal for toast compression
Date: 2014-08-14 15:04:54
Message-ID: 3418.1408028694@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> For comparison, here's a patch that implements the scheme that Alexander
> Korotkov suggested, where we store an offset every 8th element, and a
> length in the others. It compresses Larry's example to 525 bytes.
> Increasing the "stride" from 8 to 16 entries, it compresses to 461 bytes.

> A nice thing about this patch is that it's on-disk compatible with the
> current format, hence initdb is not required.

TBH, I think that's about the only nice thing about it :-(. It's
conceptually a mess. And while I agree that this way avoids creating
a big-O performance issue for large arrays/objects, I think the micro
performance is probably going to be not so good. The existing code is
based on the assumption that JBE_OFF() and JBE_LEN() are negligibly cheap;
but with a solution like this, it's guaranteed that one or the other is
going to be not-so-cheap.

I think if we're going to do anything to the representation at all,
we need to refactor the calling code; at least fixing the JsonbIterator
logic so that it tracks the current data offset rather than expecting to
able to compute it at no cost.

The difficulty in arguing about this is that unless we have an agreed-on
performance benchmark test, it's going to be a matter of unsupported
opinions whether one solution is faster than another. Have we got
anything that stresses key lookup and/or array indexing?

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Davis 2014-08-14 15:39:24 Re: 9.5: Memory-bounded HashAgg
Previous Message Tomas Vondra 2014-08-14 14:17:47 Re: 9.5: Memory-bounded HashAgg