Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support)

From: Marcin Mańk <marcin(dot)mank(at)gmail(dot)com>
To: Gordon Mohr <gojomo-pgsql(at)xavvy(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support)
Date: 2013-10-27 20:43:54
Message-ID: CAK61fk4gh8qRc_0+yig4VnjCPpizUt-dq=dguxUVQ-D=Ztx_Ng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 24, 2013 at 3:50 AM, Gordon Mohr <gojomo-pgsql(at)xavvy(dot)com> wrote:

> On 9/22/13 4:38 PM, Stas Kelvich wrote:
>
>> Hello, hackers.
>>
>> Here is the patch that introduces kNN search for cubes with
>> euclidean, taxicab and chebyshev distances.
>>
>
> Thanks for this! I decided to give the patch a try at the bleeding edge
> with some high-dimensional vectors, specifically the 1.4 million
> 1000-dimensional Freebase entity vectors from the Google 'word2vec' project:
>

I believe the curse of dimensionality is affecting you here. I think it is
impossible to get an improvement over sequential scan for 1000 dimensional
vectors. Read here:

http://en.wikipedia.org/wiki/Curse_of_dimensionality#k-nearest_neighbor_classification

Regards
Marcin Mańk

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2013-10-28 01:51:00 Re: CLUSTER FREEZE
Previous Message Pavel Stehule 2013-10-27 09:40:29 Re: proposal: lob conversion functionality