Re: KNN-GiST with recheck

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNN-GiST with recheck
Date: 2014-02-03 16:02:07
Message-ID: CAPpHfdtpam+Qah0U6R4_7wf+GFa+UDJ3B2603DcdXS6FYVe7+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 28, 2014 at 9:32 PM, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com
> wrote:

> On 01/28/2014 04:12 PM, Alexander Korotkov wrote:
>
>> On Tue, Jan 28, 2014 at 5:54 PM, Heikki Linnakangas <
>> hlinnakangas(at)vmware(dot)com
>>
>>> wrote:
>>>
>>
>> 4. (as you mentioned in the other thread: ) It's a modularity violation
>>> that you peek into the heap tuple from gist. I think the proper way to do
>>> this would be to extend the IndexScan executor node to perform the
>>> re-shuffling of tuples that come from the index in wrong order, or
>>> perhaps
>>> add a new node type for it.
>>>
>>> Of course that's exactly what your partial sort patch does :-). I haven't
>>> looked at that in detail, but I don't think the approach the partial sort
>>> patch takes will work here as is. In the KNN-GiST case, the index is
>>> returning tuples roughly in the right order, but a tuple that it returns
>>> might in reality belong somewhere later in the ordering. In the partial
>>> sort patch, the "input stream" of tuples is divided into non-overlapping
>>> groups, so that the tuples within the group are not sorted, but the
>>> groups
>>> are. I think the partial sort case is a special case of the KNN-GiST
>>> case,
>>> if you consider the lower bound of each tuple to be the leading keys that
>>> you don't need to sort.
>>>
>>
>> Yes. But, for instance btree accesses heap for unique checking. Is really
>> it so crimilal? :-)
>>
>
> Well, it is generally considered an ugly hack in b-tree too. I'm not 100%
> opposed to doing such a hack in GiST, but would very much prefer not to.
>
>
> This is not only question of a new node or extending existing node. We
>> need
>> to teach planner/executor access method can return value of some
>> expression
>> which is lower bound of another expression. AFICS now access method can
>> return only original indexed datums and TIDs. So, I afraid that enormous
>> infrastructure changes are required. And I can hardly imagine what they
>> should look like.
>>
>
> Yeah, I'm not sure either. Maybe a new field in IndexScanDesc, along with
> xs_itup. Or as an attribute of xs_itup itself.

This shouldn't look like a hack too. Otherwise I see no point of it: it's
better to have some isolated hack in access method than hack in
planner/executor. So I see following changes to be needed to implement this
right way:
1) Implement new relation between operators: operator1 is lower bound of
operator2.
2) Extend am interface to let it return values of operators.
3) Implement new node for knn-sorting.
However, it requires a lot of changes in PostgreSQL infrastructure and can
appear to be not enough general too (we don't know until we have another
application).

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2014-02-03 16:08:04 Re: GIN improvements part2: fast scan
Previous Message Robert Haas 2014-02-03 15:49:44 Re: Regression tests failing if not launched on db "regression"