Re: KNNGiST for knn-search

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNNGiST for knn-search
Date: 2009-11-24 11:49:40
Message-ID: 4B0BC854.30705@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I think you'll need to work on that. A WHERE qual shouldn't imply a sort
> order. You'll have to teach the planner how to use the index to speed up
> a query in the first form.
Of course, right now it is a working prototype.

>> 1. KNNGiST is about 5% slower than GiST on non-knn search queries, like
>> contains or contained by, because of some overhead of new algorithm of
>> tree traversal
>
> Is it possible to use the regular GiST traversal algorithm on a
> KNNGiST-tree, when performing regular GiST searches that don't require a
> particular order?
New algorithm works much more with memory for allocation/free to manage lists
and it's a single reason of performance loss. Choosing of algorithm could not be
done by consistent function, it should be done at least in amrescan method or
even earlier - in planner.

>
>> 2. KNNGiST can't be used in bitmap index scan, which destroys order of
>> results,
>> We don't know the way to forbid bitmap index scan only for knn queries.
>> Current version of KNNGiST doesn't distinguish knn-search and usual
>> search
>> and postgres doesn't know about ordered output from KNNGiST.
>
> Yeah, you really need to modify the planner to understand the ordering
> and plan accordingly.
Hmm, I thought about it, but still have no a good idea.
One idea:
SELECT p FROM pt WHERE p << '5.0,5.0'::point ORDER BY (p <-> '5.0,5.0'::point)
DESC LIMIT 10;
And add <-> to opclass (but for now any indexable operation should return
boolean type). Of course, KNNGiST should be modified to support not only
k-nearest search but k-"farest" search and NULLS LAST/FIRST.

Not very convenient, because it's needed to look into expression of ORDER BY.
And now you can specify p >< 'one point' AND p >< 'another point', but it's
impossible to do that by ORDER BY clause.

Second idea with non-standard syntax.
SELECT ... ORDER BY PROXIMITY OF expression[, expression [..]] TO expression[,
expression [..]] USING [operator [, operator [..]]
and operator is distance operator, i.e. it's not a member of btree opclass, but
returns non-negative float8 value.

Without index it will be essentially the same as
ORDER BY expression operator expression[ + ..] DESC NULLS LAST

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2009-11-24 12:09:14 Re: enable-thread-safety defaults?
Previous Message Daniel Farina 2009-11-24 11:48:52 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION