From: | Peter Geoghegan <pg(at)heroku(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi> |
Cc: | pgsql-committers <pgsql-committers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com> |
Subject: | Re: pgsql: Clean up jsonb code. |
Date: | 2014-05-07 23:25:11 |
Message-ID: | CAM3SWZTVWYaMRWpvRfrDXO_U6V6-ciK1eb=TOoX19_o8UaJ-HQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-committers pgsql-hackers |
Thanks for cleaning this up.
On Wed, May 7, 2014 at 1:18 PM, Heikki Linnakangas
<heikki(dot)linnakangas(at)iki(dot)fi> wrote:
> The jsonb_exists_any and jsonb_exists_all functions no longer sort the input
> array. That was a premature optimization, the idea being that if there are
> duplicates in the input array, you only need to check them once. Also,
> sorting the array saves some effort in the binary search used to find a key
> within an object. But there were drawbacks too: the sorting and
> deduplicating obviously isn't free, and in the typical case there are no
> duplicates to remove, and the gain in the binary search was minimal. Remove
> all that, which makes the code simpler too.
This is not the reason why the code did that. De-duplication was not
the point at all. findJsonbValueFromSuperHeader()'s lowbound argument
previously served to establish a low bound for searching when
searching for multiple keys (so the second and subsequent
user-supplied key could skip much of the object). In the case of
jsonb_exists_any(), say, if you only have a reasonable expectation
that about 1 key exists, and that happens to be the last key that the
user passed to the text[] argument (to the existence/? operator), then
n - 1 calls to what is now findJsonbValueFromContainer() (which now
does not accept a lowbound) are wasted. That's elem_count - 1
top-level binary searches of the entire jsonb. Or elem_count such
calls rather than 1 call (plus 1 sort of the supplied array) in the
common case where jsonb_exists_any() will return false.
Granted, that might not be that bad right now, given that it's only
ever (say) elem_count or elem_count - 1 wasted binary searches through
the *top* level, but that might not always be true. And even today,
sorting a presumably much smaller user-passed lookup array once has to
be cheaper than searching through the entire jsonb perhaps elem_count
times per call.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2014-05-08 01:39:31 | pgsql: Avoid buffer bloat in libpq when server is consistently faster t |
Previous Message | Robert Haas | 2014-05-07 21:45:26 | pgsql: When a background worker exists with code 0, unregister it. |
From | Date | Subject | |
---|---|---|---|
Next Message | Jaime Casanova | 2014-05-08 00:00:44 | [WIP] showing index maintenance on EXPLAIN |
Previous Message | Petr Jelinek | 2014-05-07 21:58:06 | Re: bgworker crashed or not? |