Re: KNNGiST for knn-search

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: KNNGiST for knn-search
Date: 2009-11-25 02:13:10
Message-ID: 1259115190.27757.11194.camel@ebony
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2009-11-23 at 20:44 +0300, Teodor Sigaev wrote:
> Old way:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM
> spots
> order by dist asc LIMIT 10;
>
> Time: 1024.242 ms
>
> knn-search:
> SELECT coordinates, (coordinates <-> '5.0,5.0'::point) AS dist FROM
> spots
> WHERE coordinates >< '5.0,5.0'::point LIMIT 10;
>
> Time: 3.158 ms
>
>
> We didn't patch existing implementation of GiST for several reasons:
>
> 1. KNNGiST is about 5% slower than GiST on non-knn search queries,
> like
> contains or contained by, because of some overhead of new algorithm
> of
> tree traversal
> 2. KNNGiST can't be used in bitmap index scan, which destroys order
> of results,
> We don't know the way to forbid bitmap index scan only for knn
> queries.
> Current version of KNNGiST doesn't distinguish knn-search and usual
> search
> and postgres doesn't know about ordered output from KNNGiST.

Sounds very cool.

Seems like you should look at the way sorted_path works in
query_planner().

If you have a query like this

explain select col1 from s order by col1 limit 10;

then we currently understand that we should use an IndexScan for that.
We don't specifically exclude the bitmap scan, it's just that we know
that the results from the index are ordered and therefore the cost of
sorting the output need not be added. In the bitmap case the cost of the
sort must be added and that's enough to ensure we almost never do that.

I notice that a query like

explain select col1 from s order by abs(col1 - 5) limit 10;

is the one-dimensional equivalent of the type of query you're proposing
and that doesn't work either until you put an index on abs(col1 - 5),
then it just works, but only for k = 5.

Maybe you should look at the above query and see if there are any usable
similarities for the Knn index.

Part of your problem appears to be that cost_sort does not include
anything about the cost of the comparison operators for different
datatypes.

--
Simon Riggs www.2ndQuadrant.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Itagaki Takahiro 2009-11-25 02:16:14 Re: Partitioning option for COPY
Previous Message Tom Lane 2009-11-25 02:10:28 Re: Hot standby and removing VACUUM FULL