hunspell and tsearch2 ?

Lists: pgsql-hackers
From: Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: hunspell and tsearch2 ?
Date: 2012-08-27 12:31:15
Message-ID: 503B6893.4020103@thinkproject.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

we have issues with compound words in tsearch2 using the german (ispell)
dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
openoffice dict file to ispell suitable for tsearch):

# select ts_lexize('german_ispell', 'vollklimatisiert');
ts_lexize
--------------------
{vollklimatisiert}
(1 row)

This should return atleast

{vollklimatisiert, voll, klimatisiert}

The issue with compound words in ispell has been addressed in hunspell.
But this has not been integrated fully to tsearch2 (according to the
documentation).

Are there any plans to fully integrate hunspell into tsearch2? What is
needed to do this? What is the functional delta which is missing? Maybe
we can help...

Thanks for help

Dirk


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: hunspell and tsearch2 ?
Date: 2012-08-30 15:39:03
Message-ID: CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
<dirk(dot)lutzebaeck(at)thinkproject(dot)com> wrote:
> we have issues with compound words in tsearch2 using the german (ispell)
> dictionary. This has been discussed before but there is no real solution
> using the recommended german dictionary at
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
> openoffice dict file to ispell suitable for tsearch):
>
> # select ts_lexize('german_ispell', 'vollklimatisiert');
> ts_lexize
> --------------------
> {vollklimatisiert}
> (1 row)
>
> This should return atleast
>
> {vollklimatisiert, voll, klimatisiert}
>
>
> The issue with compound words in ispell has been addressed in hunspell. But
> this has not been integrated fully to tsearch2 (according to the
> documentation).

Just out of curiosity, which part of the documentation are you looking
at? The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.

> Are there any plans to fully integrate hunspell into tsearch2? What is
> needed to do this? What is the functional delta which is missing? Maybe we
> can help...

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: hunspell and tsearch2 ?
Date: 2012-08-31 13:07:24
Message-ID: 5040B70C.70805@thinkproject.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Robert,

there is a note in the pg documentation chapter

12.6.5 Ispell Dictionary

*Note:*MySpell does not support compound words. Hunspell has
sophisticated support for compound words. At present, PostgreSQL
implements only the basic compound word operations of Hunspell.

Regards
Dirk

On 08/30/2012 05:39 PM, Robert Haas wrote:
> On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
> <dirk(dot)lutzebaeck(at)thinkproject(dot)com> wrote:
>> we have issues with compound words in tsearch2 using the german (ispell)
>> dictionary. This has been discussed before but there is no real solution
>> using the recommended german dictionary at
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
>> openoffice dict file to ispell suitable for tsearch):
>>
>> # select ts_lexize('german_ispell', 'vollklimatisiert');
>> ts_lexize
>> --------------------
>> {vollklimatisiert}
>> (1 row)
>>
>> This should return atleast
>>
>> {vollklimatisiert, voll, klimatisiert}
>>
>>
>> The issue with compound words in ispell has been addressed in hunspell. But
>> this has not been integrated fully to tsearch2 (according to the
>> documentation).
> Just out of curiosity, which part of the documentation are you looking
> at? The only mention of hunspell I see in the documentation is a
> mention that we apparently support their dictionary-file format.
>
>> Are there any plans to fully integrate hunspell into tsearch2? What is
>> needed to do this? What is the functional delta which is missing? Maybe we
>> can help...

--

Mit freundlichen Grüßen / Best regards,

*think project! International GmbH & Co. KG*

Dirk Lutzebäck
Geschäftsführer / Managing Director, CTO

Tel +49 30 921 017 90
Fax +49 30 921 017 50
dirk(dot)lutzebaeck(at)thinkproject(dot)com

Rechtliche Informationen zum Absender (Impressum):
www.thinkproject.com/de/info <http://www.thinkproject.com/de/info>

Legal information (imprint): www.thinkproject.com/en/info
<http://www.thinkproject.com/en/info>