Re: KNN-GiST with recheck

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Emre Hasegeli <emre(at)hasegeli(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: KNN-GiST with recheck
Date: 2014-09-25 17:25:02
Message-ID: CAPpHfds_j-c+WtQWA87U_PkF0wp_X0yHGakDrTR0JTOnwCrt-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 25, 2014 at 9:00 PM, Emre Hasegeli <emre(at)hasegeli(dot)com> wrote:

> > Fixed, thanks.
>
> Here are my questions and comments about the code.
>
> doc/src/sgml/gist.sgml:812:
> > be rechecked from heap tuple before tuple is returned. If
> > <literal>recheck</> flag isn't set then it's true by default for
> > compatibility reasons. The <literal>recheck</> flag can be used
> only
>
> Recheck flag is set to false on gistget.c so I think it should say
> "false by default". On the other hand, it is true by default on
> the consistent function. It is written as "the safest assumption"
> on the code comments. I don't know why the safest is chosen over
> the backwards compatible for the consistent function.
>

Agree. It should be clarified in docs.

src/backend/access/gist/gistget.c:505:
> > /* Recheck distance from heap tuple if needed */
> > if (GISTSearchItemIsHeap(*item) &&
> > searchTreeItemNeedDistanceRecheck(scan,
> so->curTreeItem))
> > {
> > searchTreeItemDistanceRecheck(scan,
> so->curTreeItem, item);
> > continue;
> > }
>
> Why so->curTreeItem is passed to these functions? They can use
> scan->opaque->curTreeItem.
>

I didn't get the difference. Few lines before:

GISTScanOpaque so = (GISTScanOpaque) scan->opaque;

> src/backend/access/gist/gistscan.c:49:
> > /*
> > * When all distance values are the same, items without
> recheck
> > * can be immediately returned. So they are placed first.
> > */
> > if (recheckCmp == 0 && distance_a.recheck !=
> distance_b.recheck)
> > recheckCmp = distance_a.recheck ? 1 : -1;
>
> I don't understand why items without recheck can be immediately
> returned. Do you think it will work correctly when there is
> an operator class which will return recheck true and false for
> the items under the same page?
>

Yes, I believe so. Item with recheck can't decrease it's distance, it can
only increase it. In the corner case item can have same distance after
recheck as it was before. Then anyway items which distances are the same
can be returned in any order.

> src/backend/access/index/indexam.c:258:
> > /* Prepare data structures for getting original indexed values
> from heap */
> > scan->indexInfo = BuildIndexInfo(scan->indexRelation);
> > scan->estate = CreateExecutorState();
> > scan->slot =
> MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));
>
> With the changes in indexam.c, heap access become legal for all index
> access methods. I think it is better than the previous version but
> I am leaving the judgement to someone experienced. I will try to
> summarize the pros and cons of sorting the rows in the GiST access
> method, as far as I understand.
>
> Pros:
>
> * It does not require another queue. It should be effective to sort
> the rows inside the queue the GiST access method already has.
> * It does not complicate index access method infrastructure.
>
> Cons:
>
> * It could be done without additional heap access.
> * Other access methods could make use of the sorting infrastructure
> one day.
> * It could be more transparent to the users. Sorting information
> could be shown on the explain output.
>

It would be also nice to show some information about KNN itself.

> * A more suitable data structure like binary heap could be used
> for the queue to sort the rows.
>

Binary heap seems to be better data structure for whole KNN-GiST. But it's
a subject for a separate patch: replace RB-tree to heap in KNN-GiST. It's
not related to recheck stuff.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2014-09-25 17:25:24 Re: jsonb format is pessimal for toast compression
Previous Message Andres Freund 2014-09-25 17:20:29 Re: jsonb format is pessimal for toast compression