Re: [COMMITTERS] pgsql: Clean up jsonb code.

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Peter Geoghegan <pg(at)heroku(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <obartunov(at)gmail(dot)com>
Subject: Re: [COMMITTERS] pgsql: Clean up jsonb code.
Date: 2014-05-09 17:27:34
Message-ID: CAPpHfdvhWPm82t6T3a-dZAYM1oVoo+kK4R_5T_HqZ4cqCUGEgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, May 8, 2014 at 11:45 AM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 05/08/2014 02:25 AM, Peter Geoghegan wrote:
>
>> findJsonbValueFromSuperHeader()'s lowbound argument
>> previously served to establish a low bound for searching when
>> searching for multiple keys (so the second and subsequent
>> user-supplied key could skip much of the object).
>>
>
> Got that.
>
> In the case of
>> jsonb_exists_any(), say, if you only have a reasonable expectation
>> that about 1 key exists, and that happens to be the last key that the
>> user passed to the text[] argument (to the existence/? operator), then
>> n - 1 calls to what is now findJsonbValueFromContainer() (which now
>> does not accept a lowbound) are wasted.
>>
>
> Check.
>
> That's elem_count - 1
>> top-level binary searches of the entire jsonb. Or elem_count such
>> calls rather than 1 call (plus 1 sort of the supplied array) in the
>> common case where jsonb_exists_any() will return false.
>>
>
> Ok, but I don't see any big difference in that regard. It still called
> findJsonbValueFromContainer() elem_count times, before this commit. Each
> call was somewhat cheaper, because the lower bound of the binary search was
> initialized to where the previous search ended, but you still had to
> perform the search.
>
> Granted, that might not be that bad right now, given that it's only
>> ever (say) elem_count or elem_count - 1 wasted binary searches through
>> the *top* level, but that might not always be true.
>>
>
> If we are ever to perform a deep search, I think we'll want to do much
> more to optimize that than just keep track of the lower bound. Like, start
> an iterator of tree and check for all of the keys in one go.
>
> And even today,
>> sorting a presumably much smaller user-passed lookup array once has to
>> be cheaper than searching through the entire jsonb perhaps elem_count
>> times per call.
>>
>
> Well, the quick testing I did suggested otherwise. It's not a big
> difference, but sorting has all kinds of overhead, like pallocing the array
> to sort, copying the elements around etc. For a small array, the startup
> cost of sorting trumps the savings in the binary searches. Possibly the way
> the sorting was done was not optimal, and you could reduce the copying and
> allocations involved in that, but it's hardly worth the trouble.

With current head I can't load delicious dataset into jsonb format. I got
segfault. It looks like memory corruption.

$ gzip -c -d js.copy.gz | psql postgres -c 'copy js from stdin;'
WARNING: problem in alloc set ExprContext: bogus aset link in block
0x14a846000, chunk 0x14a84d278
CONTEXT: COPY js, line 246766
WARNING: problem in alloc set ExprContext: bad single-chunk 0x14a804b18 in
block 0x14a7e6000
CONTEXT: COPY js, line 1009820
WARNING: problem in alloc set ExprContext: bogus aset link in block
0x14a7e6000, chunk 0x14a804b18
CONTEXT: COPY js, line 1009820
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

You can get dataset here:
http://mira.sai.msu.ru/~megera/tmp/js.copy.gz

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Peter Geoghegan 2014-05-09 18:27:40 Re: [COMMITTERS] pgsql: Clean up jsonb code.
Previous Message Tom Lane 2014-05-09 16:55:48 pgsql: Get rid of bogus dependency on typcategory in to_json() and frie

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-05-09 17:29:03 pg_class.relpages/allvisible probably shouldn't be a int4
Previous Message Tom Lane 2014-05-09 16:59:09 Re: PQputCopyEnd doesn't adhere to its API contract