Re: pgsql: Clean up jsonb code.

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
Cc: pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: Re: pgsql: Clean up jsonb code.
Date: 2014-05-07 23:25:11
Message-ID: CAM3SWZTVWYaMRWpvRfrDXO_U6V6-ciK1eb=TOoX19_o8UaJ-HQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

Thanks for cleaning this up.

On Wed, May 7, 2014 at 1:18 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)iki(dot)fi> wrote:
> The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
> array. That was a premature optimization, the idea being that if there are
> duplicates in the input array, you only need to check them once. Also,
> sorting the array saves some effort in the binary search used to find a key
> within an object. But there were drawbacks too: the sorting and
> deduplicating obviously isn't free, and in the typical case there are no
> duplicates to remove, and the gain in the binary search was minimal. Remove
> all that, which makes the code simpler too.

This is not the reason why the code did that. De-duplication was not
the point at all. findJsonbValueFromSuperHeader()'s lowbound argument
previously served to establish a low bound for searching when
searching for multiple keys (so the second and subsequent
user-supplied key could skip much of the object). In the case of
jsonb_exists_any(), say, if you only have a reasonable expectation
that about 1 key exists, and that happens to be the last key that the
user passed to the text[] argument (to the existence/? operator), then
n - 1 calls to what is now findJsonbValueFromContainer() (which now
does not accept a lowbound) are wasted. That's elem_count - 1
top-level binary searches of the entire jsonb. Or elem_count such
calls rather than 1 call (plus 1 sort of the supplied array) in the
common case where jsonb_exists_any() will return false.

Granted, that might not be that bad right now, given that it's only
ever (say) elem_count or elem_count - 1 wasted binary searches through
the *top* level, but that might not always be true. And even today,
sorting a presumably much smaller user-passed lookup array once has to
be cheaper than searching through the entire jsonb perhaps elem_count
times per call.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2014-05-08 01:39:31 pgsql: Avoid buffer bloat in libpq when server is consistently faster t
Previous Message Robert Haas 2014-05-07 21:45:26 pgsql: When a background worker exists with code 0, unregister it.

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2014-05-08 00:00:44 [WIP] showing index maintenance on EXPLAIN
Previous Message Petr Jelinek 2014-05-07 21:58:06 Re: bgworker crashed or not?