Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Itagaki Takahiro <itagaki(dot)takahiro(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: levenshtein_less_equal (was: multibyte charater set in levenshtein function)
Date: 2010-10-13 15:22:17
Message-ID: AANLkTimPZMQ=yMuZazrH8v3VP5opo4Ywu5sFxFtMFMo2@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 13, 2010 at 10:51 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Excerpts from Tom Lane's message of mié oct 13 10:32:36 -0300 2010:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>>> I spent some time hacking on this.  It doesn't appear to be too easy
>>>> to get levenshtein_less_equal() working without slowing down plain old
>>>> levenshtein() by about 6%.
>>>
>>> Is that really enough slowdown to be worth contorting the code to avoid?
>>> I've never heard of an application where the speed of this function was
>>> the bottleneck.
>
>> What if it's used on a expression index on a large table?
>
> So?  Expression indexes don't result in multiple evaluations of the
> function.  If anything, that context would probably be even less
> sensitive to the function runtime than non-index use.
>
> But the main point is that 6% performance penalty in a non-core function
> is well below my threshold of pain.  If it were important enough to care
> about that kind of performance difference, it'd be in core.  I'd rather
> see us keeping the code simple, short, and maintainable.

Well, then you have to wonder whether it's worth having the
lesss-than-or-equal-to version around at all. That's only about 2x
faster on the same test case. I do think it's likely that people who
call this function will call it many times, however - e.g. trying to
find the closest matches from a dictionary for a given input string,
so the worry about performance doesn't seem totally out of place.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2010-10-13 15:24:47 Re: SQL command to edit postgresql.conf, with comments
Previous Message Peter Geoghegan 2010-10-13 15:15:26 Re: ISN patch that applies cleanly with git apply