Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-02-26 21:07:21
Message-ID: CAPpHfdvUmurpcqQ83vWr-SKAdWzrOMxH+_XDQ-A=9bQTpjrnxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 20, 2014 at 1:48 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 02/09/2014 12:11 PM, Alexander Korotkov wrote:
>
>> I've rebased catalog changes with last master. Patch is attached. I've
>> rerun my test suite with both last master ('committed') and attached
>> patch ('ternary-consistent').
>>
>
> Thanks!
>
>
> method | sum
>> ------------------------+------------------
>> committed | 143491.715000001
>> fast-scan-11 | 126916.111999999
>> fast-scan-light | 137321.211
>> fast-scan-light-heikki | 138168.028000001
>> master | 446976.288
>> ternary-consistent | 125923.514
>>
>> I explain regression in last master by change of MAX_MAYBE_ENTRIES from 8
>> to 4.
>>
>
> Yeah, probably. I set MAX_MAYBE_ENTRIES to 8 in initial versions to make
> sure we get similar behavior in Tomas' tests that used 6 search terms. But
> I always felt that it was too large for real queries, once we have the
> catalog changes, that's why I lowered to 4 when committing. If an opclass
> benefits greatly from fast scan, it should provide the ternary consistent
> function, and not rely on the shim implementation.
>
>
> I'm not sure about decision to reserve separate procedure number for
>> ternary consistent. Probably, it would be better to add ginConfig method.
>> It would be useful for post 9.4 improvements.
>>
>
> Hmm, it might be useful for an opclass to provide both, a boolean and
> ternary consistent function, if the boolean version is significantly more
> efficient when all the arguments are TRUE/FALSE. OTOH, you could also do a
> quick check through the array to see if there are any MAYBE arguments,
> within the consistent function. But I'm inclined to keep the possibility to
> provide both versions. As long as we support the boolean version at all,
> there's not much difference in terms of the amount of code to support
> having them both for the same opclass.
>
> A ginConfig could be useful for many other things, but I don't think it's
> worth adding it now.
>
>
> What's the difference between returning GIN_MAYBE and GIN_TRUE+recheck? We
> discussed that earlier, but didn't reach any conclusion. That needs to be
> clarified in the docs. One possibility is to document that they're
> equivalent. Another is to forbid one of them. Yet another is to assign a
> different meaning to each.
>
> I've been thinking that it might be useful to define them so that a MAYBE
> result from the tri-consistent function means that it cannot decide if you
> have a match or not, because some of the inputs were MAYBE. And
> TRUE+recheck means that even if all the MAYBE inputs were passed as TRUE or
> FALSE, the result would be the same, TRUE+recheck. The practical difference
> would be that if the tri-consistent function returns TRUE+recheck, ginget.c
> wouldn't need to bother fetching the other entries, it could just return
> the entry with recheck=true immediately. While with MAYBE result, it would
> fetch the other entries and call tri-consistent again. ginget.c doesn't
> currently use the tri-consistent function that way - it always fetches all
> the entries for a potential match before calling tri-consistent, but it
> could. I had it do that in some of the patch versions, but Tomas' testing
> showed that it was a big loss on some queries, because the consistent
> function was called much more often. Still, something like that might be
> sensible in the future, so it might be good to distinguish those cases in
> the API now. Note that ginarrayproc is already using the return values like
> that: in GinContainedStrategy, it always returns TRUE+recheck regardless of
> the inputs, but in other cases it uses GIN_MAYBE.

Next revision of patch is attached.

In this version opclass should provide at least one consistent function:
binary or ternary. It's expected to achieve best performance when opclass
provide both of them. However, tests shows opposite :( I've to recheck it
carefully.

I've removed recheck flag from ternary consistent function. It have to
return GIN_MAYBE instead.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-ternary-consistent-2.patch application/octet-stream 32.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2014-02-26 21:09:14 Re: extension_control_path
Previous Message Stephen Frost 2014-02-26 21:01:15 Re: extension_control_path