Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Date: 2010-10-13 15:53:35
Message-ID: AANLkTik5AkOahj3GL6ssZCOaRWgSmcc=Pp3kj4vdnZyb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> No doubt, but the actual function runtime is only one component of the
> cost of applying it to a lot of dictionary entries --- I would think
> that the table read costs are the larger component anyway.

Data domain can be not only dictionary but also something like article
titles, urls and so on. On such relatively long strings (about 100
characters and more) this component will be significant (especially if most
part of the table is lying in cache). In this case search of near strings
can be accelerated in more than 10 times. I think that this use case
justifies presence of separate leveshtein_less_equal function.

----
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2010-10-13 15:58:32 Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Previous Message Tom Lane 2010-10-13 15:45:51 Re: leaky views, yet again