Re: jsonb access operators inefficiency

Lists: pgsql-hackers
From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: jsonb access operators inefficiency
Date: 2014-05-30 13:35:22
Message-ID: 5388891A.2030709@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi!

jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to access
to needed field, proportional O(N/2). Attached patch suggests for text operators
O(log(N)) and for integer - O(1). The fuctions with fast access already are
existed in current code and are used in contains operation, for example.
Attached patch uses that functions instead of sequentual loop over object/array.
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

Attachment Content-Type Size
jsonb_access.patch.gz application/x-tar 1.4 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 15:08:21
Message-ID: 53889EE5.3090805@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 05/30/2014 09:35 AM, Teodor Sigaev wrote:
> Hi!
>
> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods
> to access to needed field, proportional O(N/2). Attached patch
> suggests for text operators O(log(N)) and for integer - O(1). The
> fuctions with fast access already are existed in current code and are
> used in contains operation, for example. Attached patch uses that
> functions instead of sequentual loop over object/array.

Teodor, thanks for this.

My only worry is this sort of code, which in a quick search I didn't
find used elsewhere

- (void) JsonbToCString(jtext, &tjb->root , -1);
- result = cstring_to_text_with_len(jtext->data,
jtext->len);
+ appendStringInfoSpaces(jtext, VARHDRSZ);
+ (void) JsonbToCString(jtext, v->val.binary.data,
-1);
+
+ result = (text*)jtext->data;
+ SET_VARSIZE(result, jtext->len);

If we're going to construct varlena objects inside a StringInfo, maybe
we need a proper API for it. Isn't there a danger that data member of
the StringInfo won't be properly aligned to allow us to do this? In any
case, we should get most of the benefit of your patch without this
"optimization".

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 16:22:55
Message-ID: 5388B05F.8050203@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 05/30/2014 11:08 AM, Andrew Dunstan wrote:
>
> On 05/30/2014 09:35 AM, Teodor Sigaev wrote:
>> Hi!
>>
>> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods
>> to access to needed field, proportional O(N/2). Attached patch
>> suggests for text operators O(log(N)) and for integer - O(1). The
>> fuctions with fast access already are existed in current code and are
>> used in contains operation, for example. Attached patch uses that
>> functions instead of sequentual loop over object/array.
>
> Teodor, thanks for this.
>
> My only worry is this sort of code, which in a quick search I didn't
> find used elsewhere
>
> - (void) JsonbToCString(jtext, &tjb->root , -1);
> - result = cstring_to_text_with_len(jtext->data,
> jtext->len);
> + appendStringInfoSpaces(jtext, VARHDRSZ);
> + (void) JsonbToCString(jtext, v->val.binary.data,
> -1);
> +
> + result = (text*)jtext->data;
> + SET_VARSIZE(result, jtext->len);
>
>
> If we're going to construct varlena objects inside a StringInfo, maybe
> we need a proper API for it. Isn't there a danger that data member of
> the StringInfo won't be properly aligned to allow us to do this? In
> any case, we should get most of the benefit of your patch without this
> "optimization".

I see that palloc.h says:

The result of palloc() is always word-aligned

so maybe my alignment fear is misplaced. So my remaining question is
whether this is OK stylistically.

cheers

andrew


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 16:27:48
Message-ID: 5388B184.205@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

>
> If we're going to construct varlena objects inside a StringInfo, maybe
> we need a proper API for it. Isn't there a danger that data member of
> the StringInfo won't be properly aligned to allow us to do this? In any
> case, we should get most of the benefit of your patch without this
> "optimization".

I believe that StringInfo->data is palloc'ed, it means it's MAXALIGNed.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 16:45:26
Message-ID: 5388B5A6.8060002@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I see that palloc.h says:
>
> The result of palloc() is always word-aligned

void *
repalloc(void *pointer, Size size)
{
...
/*
* Try to detect bogus pointers handed to us, poorly though we can.
* Presumably, a pointer that isn't MAXALIGNED isn't pointing at an
* allocated chunk.
*/
Assert(pointer != NULL);
Assert(pointer == (void *) MAXALIGN(pointer));
...

>
>
> so maybe my alignment fear is misplaced. So my remaining question is
> whether this is OK stylistically.

Something like this?
initStringInfoVarlena()/makeStringInfoVarlena()
{
initStringInfo()
appendStringInfoSpaces(jtext, VARHDRSZ);
}

char*
formStringInfoVarlena()
{
SET_VARSIZE(jtext->data, jtext->len);
return jtext->data;
}

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Oleg Bartunov <obartunov(at)gmail(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 17:30:05
Message-ID: CAF4Au4xsQNASJU6LkohQWv55EyBYNi0LEX0kZm3QEkVBdmy_mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The patch really improves access performance to jsonb. On the
delicious bookmarks I got 5 times better performance.Now jsonb
outperforms json on simple access (slide 12 of pgcon presentation) by
103 times !

Oleg

On Fri, May 30, 2014 at 9:35 AM, Teodor Sigaev <teodor(at)sigaev(dot)ru> wrote:
> Hi!
>
> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to
> access to needed field, proportional O(N/2). Attached patch suggests for
> text operators O(log(N)) and for integer - O(1). The fuctions with fast
> access already are existed in current code and are used in contains
> operation, for example. Attached patch uses that functions instead of
> sequentual loop over object/array.
> --
> Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
> WWW:
> http://www.sigaev.ru/
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: obartunov(at)gmail(dot)com
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 17:39:17
Message-ID: 5388C245.90306@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 05/30/2014 01:30 PM, Oleg Bartunov wrote:
> The patch really improves access performance to jsonb. On the
> delicious bookmarks I got 5 times better performance.Now jsonb
> outperforms json on simple access (slide 12 of pgcon presentation) by
> 103 times !
>
>

(Oleg, please try not to top-post)

The question is whether the speedup comes from the reduction in lookup
times or the reduction in string copying. I have a strong suspicion that
it's mostly the first, not the second.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 17:41:56
Message-ID: 6302.1401471716@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> If we're going to construct varlena objects inside a StringInfo, maybe
> we need a proper API for it. Isn't there a danger that data member of
> the StringInfo won't be properly aligned to allow us to do this? In any
> case, we should get most of the benefit of your patch without this
> "optimization".

As noted, the data buffer is maxaligned; but nonetheless I agree that
this is a serious stylistic abuse, and it's not buying much compared
to the rest of the patch. I'd stick with the cstring_to_text_with_len
coding.

At the other end of the process, why are we using PG_GETARG_TEXT_P
and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header
inputs? The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR,
so it's already prepared for unaligned input.

regards, tom lane


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>, obartunov(at)gmail(dot)com
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-05-30 17:47:06
Message-ID: 5388C41A.4000109@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> The question is whether the speedup comes from the reduction in lookup
> times or the reduction in string copying. I have a strong suspicion that
> it's mostly the first, not the second.

Agree with about Oleg's test, but some users put huge values into
json... Actually, I don't insist, ITSM, that's easy way to prevent extra
copying.

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: jsonb access operators inefficiency
Date: 2014-06-01 23:13:47
Message-ID: 538BB3AB.5070603@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


On 05/30/2014 01:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
>> If we're going to construct varlena objects inside a StringInfo, maybe
>> we need a proper API for it. Isn't there a danger that data member of
>> the StringInfo won't be properly aligned to allow us to do this? In any
>> case, we should get most of the benefit of your patch without this
>> "optimization".
> As noted, the data buffer is maxaligned; but nonetheless I agree that
> this is a serious stylistic abuse, and it's not buying much compared
> to the rest of the patch. I'd stick with the cstring_to_text_with_len
> coding.
>
> At the other end of the process, why are we using PG_GETARG_TEXT_P
> and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header
> inputs? The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR,
> so it's already prepared for unaligned input.
>
>

Committed with these changes.

cheers

andrew