Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2013-11-20 20:14:11
Message-ID: CAPpHfdsEKFmrOU51EWSbF-RDESasSVa2YjoyFoG+0jCZOn1qjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 20, 2013 at 3:06 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:

> On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com
> > wrote:
>
>> On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
>> hlinnakangas(at)vmware(dot)com> wrote:
>>
>>> On 14.11.2013 19:26, Alexander Korotkov wrote:
>>>
>>>> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
>>>> hlinnakangas(at)vmware(dot)com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>>>>
>>>>> Now, I got the point of three state consistent: we can keep only one
>>>>>> consistent in opclasses that support new interface. exact true and
>>>>>> exact
>>>>>> false values will be passed in the case of current patch consistent;
>>>>>> exact
>>>>>> false and unknown will be passed in the case of current patch
>>>>>> preConsistent. That's reasonable.
>>>>>>
>>>>>>
>>>>> I'm going to mark this as "returned with feedback". For the next
>>>>> version,
>>>>> I'd like to see the API changed per above. Also, I'd like us to do
>>>>> something about the tidbitmap overhead, as a separate patch before
>>>>> this, so
>>>>> that we can assess the actual benefit of this patch. And a new test
>>>>> case
>>>>> that demonstrates the I/O benefits.
>>>>>
>>>>
>>>>
>>>> Revised version of patch is attached.
>>>> Changes are so:
>>>> 1) Patch rebased against packed posting lists, not depends on additional
>>>> information now.
>>>> 2) New API with tri-state logic is introduced.
>>>>
>>>
>>> Thanks! A couple of thoughts after a 5-minute glance:
>>>
>>> * documentation
>>>
>>
>> Will provide documented version this week.
>>
>>
>>> * How about defining the tri-state consistent function to also return a
>>> tri-state? True would mean that the tuple definitely matches, false means
>>> the tuple definitely does not match, and Unknown means it might match. Or
>>> does return value true with recheck==true have the same effect? If I
>>> understood the patch, right, returning Unknown or True wouldn't actually
>>> make any difference, but it's conceivable that we might come up with more
>>> optimizations in the future that could take advantage of that. For example,
>>> for a query like "foo OR (bar AND baz)", you could immediately return any
>>> tuples that match foo, and not bother scanning for bar and baz at all.
>>
>>
>> The meaning of recheck flag when input contains unknown is undefined now.
>> :)
>> For instance, we could define it in following ways:
>> 1) Like returning Unknown meaning that consistent with true of false
>> instead of input Unknown could return either true or false.
>> 2) Consistent with true of false instead of input Unknown could return
>> recheck. This meaning is probably logical, but I don't see any usage of it.
>>
>> I'm not against idea of tri-state returning value for consisted, because
>> it's logical continuation of its tri-state input. However, I don't see
>> usage of distinguish True and Unknown in returning value for now :)
>>
>> In example you give we can return foo immediately, but we have to create
>> full bitmap. So we anyway will have to scan (bar AND baz). We could skip
>> part of trees for bar and baz. But it's possible only when foo contains
>> large amount of sequential TIDS so we can be sure that we didn't miss any
>> TIDs. This seems to be very narrow use-case for me.
>>
>> Another point is that one day we probably could immediately return tuples
>> in gingettuple. And with LIMIT clause and no sorting we can don't search
>> for other tuples. However, gingettuple was removed because of reasons of
>> concurrency. And my patches for index-based ordering didn't return it in
>> previous manner: it collects all the results and then returns them
>> one-by-one.
>>
>
> I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
> out that I don't understand how GinFuzzySearchLimit works. Why with
> GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
> a bug?
>

Revised version of patch is attached. Changes are so:
1) Support for GinFuzzySearchLimit.
2) Some documentation.
Question about GinFuzzySearchLimit is still relevant.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-fast-scan.8.patch.gz application/x-gzip 10.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-11-20 20:21:21 Re: Proof of concept: standalone backend with full FE/BE protocol
Previous Message Peter Eisentraut 2013-11-20 20:13:24 Re: Proof of concept: standalone backend with full FE/BE protocol