Re: GIN improvements part2: fast scan

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: GIN improvements part2: fast scan
Date: 2014-01-14 15:35:25
Message-ID: CAPpHfdvs-EHiwmTgfK=uJ4W3OjskVHyfkuehtyrKfKY_KmMpVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 21, 2013 at 12:14 AM, Alexander Korotkov
<aekorotkov(at)gmail(dot)com>wrote:

> On Wed, Nov 20, 2013 at 3:06 AM, Alexander Korotkov <aekorotkov(at)gmail(dot)com>wrote:
>
>> On Fri, Nov 15, 2013 at 11:19 AM, Alexander Korotkov <
>> aekorotkov(at)gmail(dot)com> wrote:
>>
>>> On Fri, Nov 15, 2013 at 12:34 AM, Heikki Linnakangas <
>>> hlinnakangas(at)vmware(dot)com> wrote:
>>>
>>>> On 14.11.2013 19:26, Alexander Korotkov wrote:
>>>>
>>>>> On Sun, Jun 30, 2013 at 3:00 PM, Heikki Linnakangas <
>>>>> hlinnakangas(at)vmware(dot)com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> On 28.06.2013 22:31, Alexander Korotkov wrote:
>>>>>>
>>>>>> Now, I got the point of three state consistent: we can keep only one
>>>>>>> consistent in opclasses that support new interface. exact true and
>>>>>>> exact
>>>>>>> false values will be passed in the case of current patch consistent;
>>>>>>> exact
>>>>>>> false and unknown will be passed in the case of current patch
>>>>>>> preConsistent. That's reasonable.
>>>>>>>
>>>>>>>
>>>>>> I'm going to mark this as "returned with feedback". For the next
>>>>>> version,
>>>>>> I'd like to see the API changed per above. Also, I'd like us to do
>>>>>> something about the tidbitmap overhead, as a separate patch before
>>>>>> this, so
>>>>>> that we can assess the actual benefit of this patch. And a new test
>>>>>> case
>>>>>> that demonstrates the I/O benefits.
>>>>>>
>>>>>
>>>>>
>>>>> Revised version of patch is attached.
>>>>> Changes are so:
>>>>> 1) Patch rebased against packed posting lists, not depends on
>>>>> additional
>>>>> information now.
>>>>> 2) New API with tri-state logic is introduced.
>>>>>
>>>>
>>>> Thanks! A couple of thoughts after a 5-minute glance:
>>>>
>>>> * documentation
>>>>
>>>
>>> Will provide documented version this week.
>>>
>>>
>>>> * How about defining the tri-state consistent function to also return a
>>>> tri-state? True would mean that the tuple definitely matches, false means
>>>> the tuple definitely does not match, and Unknown means it might match. Or
>>>> does return value true with recheck==true have the same effect? If I
>>>> understood the patch, right, returning Unknown or True wouldn't actually
>>>> make any difference, but it's conceivable that we might come up with more
>>>> optimizations in the future that could take advantage of that. For example,
>>>> for a query like "foo OR (bar AND baz)", you could immediately return any
>>>> tuples that match foo, and not bother scanning for bar and baz at all.
>>>
>>>
>>> The meaning of recheck flag when input contains unknown is undefined
>>> now. :)
>>> For instance, we could define it in following ways:
>>> 1) Like returning Unknown meaning that consistent with true of false
>>> instead of input Unknown could return either true or false.
>>> 2) Consistent with true of false instead of input Unknown could return
>>> recheck. This meaning is probably logical, but I don't see any usage of it.
>>>
>>> I'm not against idea of tri-state returning value for consisted, because
>>> it's logical continuation of its tri-state input. However, I don't see
>>> usage of distinguish True and Unknown in returning value for now :)
>>>
>>> In example you give we can return foo immediately, but we have to create
>>> full bitmap. So we anyway will have to scan (bar AND baz). We could skip
>>> part of trees for bar and baz. But it's possible only when foo contains
>>> large amount of sequential TIDS so we can be sure that we didn't miss any
>>> TIDs. This seems to be very narrow use-case for me.
>>>
>>> Another point is that one day we probably could immediately return
>>> tuples in gingettuple. And with LIMIT clause and no sorting we can don't
>>> search for other tuples. However, gingettuple was removed because of
>>> reasons of concurrency. And my patches for index-based ordering didn't
>>> return it in previous manner: it collects all the results and then returns
>>> them one-by-one.
>>>
>>
>> I'm trying to make fastscan work with GinFuzzySearchLimit. Then I figure
>> out that I don't understand how GinFuzzySearchLimit works. Why with
>> GinFuzzySearchLimit startScan can return without doing startScanKey? Is it
>> a bug?
>>
>
> Revised version of patch is attached. Changes are so:
> 1) Support for GinFuzzySearchLimit.
> 2) Some documentation.
> Question about GinFuzzySearchLimit is still relevant.
>

Attached version is rebased against last version of packed posting lists.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
gin-fast-scan.9.patch.gz application/x-gzip 11.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2014-01-14 15:39:33 Re: plpgsql.consistent_into
Previous Message Tom Lane 2014-01-14 15:33:08 Re: extension_control_path