Lists: | pgsql-hackers |
---|
From: | Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | hunspell and tsearch2 ? |
Date: | 2012-08-27 12:31:15 |
Message-ID: | 503B6893.4020103@thinkproject.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
we have issues with compound words in tsearch2 using the german (ispell)
dictionary. This has been discussed before but there is no real solution
using the recommended german dictionary at
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
openoffice dict file to ispell suitable for tsearch):
# select ts_lexize('german_ispell', 'vollklimatisiert');
ts_lexize
--------------------
{vollklimatisiert}
(1 row)
This should return atleast
{vollklimatisiert, voll, klimatisiert}
The issue with compound words in ispell has been addressed in hunspell.
But this has not been integrated fully to tsearch2 (according to the
documentation).
Are there any plans to fully integrate hunspell into tsearch2? What is
needed to do this? What is the functional delta which is missing? Maybe
we can help...
Thanks for help
Dirk
From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: hunspell and tsearch2 ? |
Date: | 2012-08-30 15:39:03 |
Message-ID: | CA+Tgmob3Mr3PznHK0E15yYKX5PB2xmqJcCHN=ffV62akME_qnQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
<dirk(dot)lutzebaeck(at)thinkproject(dot)com> wrote:
> we have issues with compound words in tsearch2 using the german (ispell)
> dictionary. This has been discussed before but there is no real solution
> using the recommended german dictionary at
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
> openoffice dict file to ispell suitable for tsearch):
>
> # select ts_lexize('german_ispell', 'vollklimatisiert');
> ts_lexize
> --------------------
> {vollklimatisiert}
> (1 row)
>
> This should return atleast
>
> {vollklimatisiert, voll, klimatisiert}
>
>
> The issue with compound words in ispell has been addressed in hunspell. But
> this has not been integrated fully to tsearch2 (according to the
> documentation).
Just out of curiosity, which part of the documentation are you looking
at? The only mention of hunspell I see in the documentation is a
mention that we apparently support their dictionary-file format.
> Are there any plans to fully integrate hunspell into tsearch2? What is
> needed to do this? What is the functional delta which is missing? Maybe we
> can help...
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From: | Dirk Lutzebäck <dirk(dot)lutzebaeck(at)thinkproject(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: hunspell and tsearch2 ? |
Date: | 2012-08-31 13:07:24 |
Message-ID: | 5040B70C.70805@thinkproject.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi Robert,
there is a note in the pg documentation chapter
12.6.5 Ispell Dictionary
*Note:*MySpell does not support compound words. Hunspell has
sophisticated support for compound words. At present, PostgreSQL
implements only the basic compound word operations of Hunspell.
Regards
Dirk
On 08/30/2012 05:39 PM, Robert Haas wrote:
> On Mon, Aug 27, 2012 at 8:31 AM, Dirk Lutzebäck
> <dirk(dot)lutzebaeck(at)thinkproject(dot)com> wrote:
>> we have issues with compound words in tsearch2 using the german (ispell)
>> dictionary. This has been discussed before but there is no real solution
>> using the recommended german dictionary at
>> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2 (convert old
>> openoffice dict file to ispell suitable for tsearch):
>>
>> # select ts_lexize('german_ispell', 'vollklimatisiert');
>> ts_lexize
>> --------------------
>> {vollklimatisiert}
>> (1 row)
>>
>> This should return atleast
>>
>> {vollklimatisiert, voll, klimatisiert}
>>
>>
>> The issue with compound words in ispell has been addressed in hunspell. But
>> this has not been integrated fully to tsearch2 (according to the
>> documentation).
> Just out of curiosity, which part of the documentation are you looking
> at? The only mention of hunspell I see in the documentation is a
> mention that we apparently support their dictionary-file format.
>
>> Are there any plans to fully integrate hunspell into tsearch2? What is
>> needed to do this? What is the functional delta which is missing? Maybe we
>> can help...
--
Mit freundlichen Grüßen / Best regards,
*think project! International GmbH & Co. KG*
Dirk Lutzebäck
Geschäftsführer / Managing Director, CTO
Tel +49 30 921 017 90
Fax +49 30 921 017 50
dirk(dot)lutzebaeck(at)thinkproject(dot)com
Rechtliche Informationen zum Absender (Impressum):
www.thinkproject.com/de/info <http://www.thinkproject.com/de/info>
Legal information (imprint): www.thinkproject.com/en/info
<http://www.thinkproject.com/en/info>