Tsearch2 and Snowball

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Tsearch2 and Snowball
Date: 2006-10-03 18:53:29
Message-ID: 1159901609.2659.341.camel@holly
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


I'm looking at some of the code in contrib/tsearch2/snowball and see
that the code there is *generated* code. The Snowball stemmer produces
this C code in much the same way bison reads gram.y

My understanding is that the Snowball code moves forwards regularly and
there are many other stemmers we could be including with the
distribution.

Snowball has a BSD licence: http://snowball.tartarus.org/license.php
Would it be possible to include the Snowball source directly and allow
its execution to be part of the make process for tsearch2? Or have
configure check for Snowball at make time? At the very least it would be
good to have a Readme file explaining how to modify the Snowball stemmer
and regenerate for tsearch2.

That would then encourage people to improve the stemmers, as well as
allow us to include French and Spanish versions etc..

Perhaps we should ask translators to provide stop word lists for their
languages. It seems a shame to have docs in so many languages, but no
language capability for Tsearch2.

Also, why do we have another crc32 implementation in there?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2006-10-03 19:03:07 Re: src/tools/msvc usage instructions
Previous Message Magnus Hagander 2006-10-03 18:42:48 Re: src/tools/msvc usage instructions