Re: Tsearch2 & Hebrew

Lists: pgsql-general
From: Yonatan Ben-Nes <nimrod(at)canaan(dot)co(dot)il>
To: pgsql-general(at)postgresql(dot)org
Subject: Tsearch2 & Hebrew
Date: 2006-08-30 16:09:19
Message-ID: 44F5B82F.8050201@canaan.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello all,

I want to use Tsearch2 for a current project I have but I can't seem to
find a way to implement it on hebrew content.

I found that there is an hebrew Ispell project and apparently I can use
it as a dictionary but I can't find any hebrew stemmer to work properly
with it, or maybe I'm wrong and I can work only with the dictionary?

Any information will be helpful.

Thanks a lot in advance!
Yonatan Ben-Nes


From: Michelle Konzack <linux4michelle(at)freenet(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-08-31 22:43:03
Message-ID: ntT1LC.A.Xa.o2D_EB@t1950ct.private
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Jonatan,

Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
> I want to use Tsearch2 for a current project I have but I can't seem to
> find a way to implement it on hebrew content.

I have the same problem since I have an UTF-8 Database of arround
380 GByte (growing 100 MByte per day) in over 60 languages and
can not search in arabic, farsi and hebrew.

It seems, that there is NO solution for those three languages

Greetings
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant

--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
50, rue de Soultz MSM LinuxMichi
0033/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Michelle Konzack <linux4michelle(at)freenet(dot)de>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-04 16:52:02
Message-ID: Pine.GSO.4.63.0609042051290.16344@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

You need to provide more details.

Oleg
On Fri, 1 Sep 2006, Michelle Konzack wrote:

> Hello Jonatan,
>
> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>> I want to use Tsearch2 for a current project I have but I can't seem to
>> find a way to implement it on hebrew content.
>
> I have the same problem since I have an UTF-8 Database of arround
> 380 GByte (growing 100 MByte per day) in over 60 languages and
> can not search in arabic, farsi and hebrew.
>
> It seems, that there is NO solution for those three languages
>
> Greetings
> Michelle Konzack
> Systemadministrator
> Tamay Dogan Network
> Debian GNU/Linux Consultant
>
>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Michelle Konzack <linux4michelle(at)freenet(dot)de>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-04 19:36:21
Message-ID: GIozqC.A.UJ.1s8_EB@t1950ct.private
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello Oleg,

Am 2006-09-04 20:52:02, schrieb Oleg Bartunov:
> You need to provide more details.
>
> Oleg
------------------------- END OF REPLIED MESSAGE -------------------------

One of my two programmers had coded last year stuff in php5 (UNICODE
is now working) to search à la Google in my Database. I am collecting
international stuff about wars, warcrime and violation of human rights.

My database is text/plain UNICODE and has curently arround 380-390 GB
which I have splited into tables of 10 years...

It seems, there is a problem with BIDI searching. Russian and chinese
is NO problem. Many texts are mixed like US-ASCII, arabic and hebrew.

Now if I enter search strings it returns nothing.
Even if I am in psql with multilingual terminal or in pgadmin.

So it can not be a problem with PHP5.

Oh yes, since I have switched to one table per 10 years, tsearch2 do
not want to search my whole Database... but for tsearch2 I think, I
am looking for a PHP5/PGSQL coder on <www.getacoder.com> since I am
not the master of PHP5 and PGSQL. (I am more Sysadmin and soldier
then programmer even if I can code stuff)

Greetings
Michelle Konzack
Systemadministrator
Tamay Dogan Network
Debian GNU/Linux Consultant

--
Linux-User #280138 with the Linux Counter, http://counter.li.org/
##################### Debian GNU/Linux Consultant #####################
Michelle Konzack Apt. 917 ICQ #328449886
50, rue de Soultz MSM LinuxMichi
0033/6/61925193 67100 Strasbourg/France IRC #Debian (irc.icq.com)


From: Yonatan Ben-Nes <yonatan(at)epoch(dot)co(dot)il>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Michelle Konzack <linux4michelle(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-05 12:01:28
Message-ID: 44FD6718.5080806@epoch.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi all,

Well my problem was that I didn't know if Tsearch2 can work on hebrew
data without a fitting stemmer, my current solution is to use the
'simple' dictionary so no lexem is returned.
I wonder if there is an hebrew stemmer which I can use but I can't seem
to find one, so sadly one of the best features of Tsearch2 isn't working
for me.

If I'm wrong please let me know :)

Thanks a lot in advance,

Yonatan Ben-Nes

Oleg Bartunov wrote:

> You need to provide more details.
>
> Oleg
> On Fri, 1 Sep 2006, Michelle Konzack wrote:
>
>> Hello Jonatan,
>>
>> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>>> I want to use Tsearch2 for a current project I have but I can't seem to
>>> find a way to implement it on hebrew content.
>>
>> I have the same problem since I have an UTF-8 Database of arround
>> 380 GByte (growing 100 MByte per day) in over 60 languages and
>> can not search in arabic, farsi and hebrew.
>>
>> It seems, that there is NO solution for those three languages
>>
>> Greetings
>> Michelle Konzack
>> Systemadministrator
>> Tamay Dogan Network
>> Debian GNU/Linux Consultant
>>
>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
> __________ NOD32 1.1739 (20060904) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Yonatan Ben-Nes <yonatan(at)epoch(dot)co(dot)il>
Cc: Michelle Konzack <linux4michelle(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-05 14:34:24
Message-ID: Pine.GSO.4.63.0609051832510.16344@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:

> Hi all,
>
>
> Well my problem was that I didn't know if Tsearch2 can work on hebrew data
> without a fitting stemmer, my current solution is to use the 'simple'
> dictionary so no lexem is returned.
> I wonder if there is an hebrew stemmer which I can use but I can't seem to
> find one, so sadly one of the best features of Tsearch2 isn't working for me.
>

Do you use hebrew ispell dictionary ?

>
> If I'm wrong please let me know :)
>
>
> Thanks a lot in advance,
>
> Yonatan Ben-Nes
>
>
> Oleg Bartunov wrote:
>
>> You need to provide more details.
>>
>> Oleg
>> On Fri, 1 Sep 2006, Michelle Konzack wrote:
>>
>>> Hello Jonatan,
>>>
>>> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>>>> I want to use Tsearch2 for a current project I have but I can't seem to
>>>> find a way to implement it on hebrew content.
>>>
>>> I have the same problem since I have an UTF-8 Database of arround
>>> 380 GByte (growing 100 MByte per day) in over 60 languages and
>>> can not search in arabic, farsi and hebrew.
>>>
>>> It seems, that there is NO solution for those three languages
>>>
>>> Greetings
>>> Michelle Konzack
>>> Systemadministrator
>>> Tamay Dogan Network
>>> Debian GNU/Linux Consultant
>>>
>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 4: Have you searched our list archives?
>>
>> http://archives.postgresql.org
>>
>>
>> __________ NOD32 1.1739 (20060904) Information __________
>>
>> This message was checked by NOD32 antivirus system.
>> http://www.eset.com
>>
>>
>>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Yonatan Ben-Nes <yonatan(at)epoch(dot)co(dot)il>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Michelle Konzack <linux4michelle(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-05 15:37:10
Message-ID: 44FD99A6.7030107@epoch.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

No, I didn't thought that it will be useful if it won't be accompanied
by an hebrew stemmer which will work with it... I'm wrong?

Oleg Bartunov wrote:

> On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:
>
>> Hi all,
>>
>>
>> Well my problem was that I didn't know if Tsearch2 can work on hebrew
>> data without a fitting stemmer, my current solution is to use the
>> 'simple' dictionary so no lexem is returned.
>> I wonder if there is an hebrew stemmer which I can use but I can't
>> seem to find one, so sadly one of the best features of Tsearch2 isn't
>> working for me.
>>
>
> Do you use hebrew ispell dictionary ?
>
>>
>> If I'm wrong please let me know :)
>>
>>
>> Thanks a lot in advance,
>>
>> Yonatan Ben-Nes
>>
>>
>> Oleg Bartunov wrote:
>>
>>> You need to provide more details.
>>>
>>> Oleg
>>> On Fri, 1 Sep 2006, Michelle Konzack wrote:
>>>
>>>> Hello Jonatan,
>>>>
>>>> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>>>>> I want to use Tsearch2 for a current project I have but I can't
>>>>> seem to
>>>>> find a way to implement it on hebrew content.
>>>>
>>>> I have the same problem since I have an UTF-8 Database of arround
>>>> 380 GByte (growing 100 MByte per day) in over 60 languages and
>>>> can not search in arabic, farsi and hebrew.
>>>>
>>>> It seems, that there is NO solution for those three languages
>>>>
>>>> Greetings
>>>> Michelle Konzack
>>>> Systemadministrator
>>>> Tamay Dogan Network
>>>> Debian GNU/Linux Consultant
>>>>
>>>>
>>>>
>>>
>>> Regards,
>>> Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 4: Have you searched our list archives?
>>>
>>> http://archives.postgresql.org
>>>
>>>
>>> __________ NOD32 1.1739 (20060904) Information __________
>>>
>>> This message was checked by NOD32 antivirus system.
>>> http://www.eset.com
>>>
>>>
>>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>
> __________ NOD32 1.1739 (20060904) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>


From: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
To: Yonatan Ben-Nes <yonatan(at)epoch(dot)co(dot)il>
Cc: Michelle Konzack <linux4michelle(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-05 15:57:54
Message-ID: Pine.GSO.4.63.0609051954180.16344@ra.sai.msu.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:

> No, I didn't thought that it will be useful if it won't be accompanied by an
> hebrew stemmer which will work with it... I'm wrong?
>

ispell and stemmer are doing the same job, so you may use

ispell,simple configuration instead of "ideal" one: ispell, stemmer

Of course, some words will not recognized and will leave as is.
Also, you may write very simple stemmer using collection of very common
endings.

Oleg

>
> Oleg Bartunov wrote:
>
>> On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:
>>
>>> Hi all,
>>>
>>>
>>> Well my problem was that I didn't know if Tsearch2 can work on hebrew data
>>> without a fitting stemmer, my current solution is to use the 'simple'
>>> dictionary so no lexem is returned.
>>> I wonder if there is an hebrew stemmer which I can use but I can't seem to
>>> find one, so sadly one of the best features of Tsearch2 isn't working for
>>> me.
>>>
>>
>> Do you use hebrew ispell dictionary ?
>>
>>>
>>> If I'm wrong please let me know :)
>>>
>>>
>>> Thanks a lot in advance,
>>>
>>> Yonatan Ben-Nes
>>>
>>>
>>> Oleg Bartunov wrote:
>>>
>>>> You need to provide more details.
>>>>
>>>> Oleg
>>>> On Fri, 1 Sep 2006, Michelle Konzack wrote:
>>>>
>>>>> Hello Jonatan,
>>>>>
>>>>> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>>>>>> I want to use Tsearch2 for a current project I have but I can't seem to
>>>>>> find a way to implement it on hebrew content.
>>>>>
>>>>> I have the same problem since I have an UTF-8 Database of arround
>>>>> 380 GByte (growing 100 MByte per day) in over 60 languages and
>>>>> can not search in arabic, farsi and hebrew.
>>>>>
>>>>> It seems, that there is NO solution for those three languages
>>>>>
>>>>> Greetings
>>>>> Michelle Konzack
>>>>> Systemadministrator
>>>>> Tamay Dogan Network
>>>>> Debian GNU/Linux Consultant
>>>>>
>>>>>
>>>>>
>>>>
>>>> Regards,
>>>> Oleg
>>>> _____________________________________________________________
>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>
>>>> ---------------------------(end of broadcast)---------------------------
>>>> TIP 4: Have you searched our list archives?
>>>>
>>>> http://archives.postgresql.org
>>>>
>>>>
>>>> __________ NOD32 1.1739 (20060904) Information __________
>>>>
>>>> This message was checked by NOD32 antivirus system.
>>>> http://www.eset.com
>>>>
>>>>
>>>>
>>>
>>
>> Regards,
>> Oleg
>> _____________________________________________________________
>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>> Sternberg Astronomical Institute, Moscow University, Russia
>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>> phone: +007(495)939-16-83, +007(495)939-23-83
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 6: explain analyze is your friend
>>
>>
>> __________ NOD32 1.1739 (20060904) Information __________
>>
>> This message was checked by NOD32 antivirus system.
>> http://www.eset.com
>>
>>
>>
>

Regards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


From: Yonatan Ben-Nes <yonatan(at)epoch(dot)co(dot)il>
To: Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Cc: Michelle Konzack <linux4michelle(at)freenet(dot)de>, pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 & Hebrew
Date: 2006-09-05 16:02:59
Message-ID: 44FD9FB3.9090607@epoch.co.il
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general


Oleg Bartunov wrote:

> On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:
>
>> No, I didn't thought that it will be useful if it won't be
>> accompanied by an hebrew stemmer which will work with it... I'm wrong?
>>
>
> ispell and stemmer are doing the same job, so you may use
>
> ispell,simple configuration instead of "ideal" one: ispell, stemmer
>
> Of course, some words will not recognized and will leave as is.
> Also, you may write very simple stemmer using collection of very common
> endings.
>
> Oleg
>
Thanks a lot I'll do that at once!
Yonatan Ben-Nes

>>
>> Oleg Bartunov wrote:
>>
>>> On Tue, 5 Sep 2006, Yonatan Ben-Nes wrote:
>>>
>>>> Hi all,
>>>>
>>>>
>>>> Well my problem was that I didn't know if Tsearch2 can work on
>>>> hebrew data without a fitting stemmer, my current solution is to
>>>> use the 'simple' dictionary so no lexem is returned.
>>>> I wonder if there is an hebrew stemmer which I can use but I can't
>>>> seem to find one, so sadly one of the best features of Tsearch2
>>>> isn't working for me.
>>>>
>>>
>>> Do you use hebrew ispell dictionary ?
>>>
>>>>
>>>> If I'm wrong please let me know :)
>>>>
>>>>
>>>> Thanks a lot in advance,
>>>>
>>>> Yonatan Ben-Nes
>>>>
>>>>
>>>> Oleg Bartunov wrote:
>>>>
>>>>> You need to provide more details.
>>>>>
>>>>> Oleg
>>>>> On Fri, 1 Sep 2006, Michelle Konzack wrote:
>>>>>
>>>>>> Hello Jonatan,
>>>>>>
>>>>>> Am 2006-08-30 19:09:19, schrieb Yonatan Ben-Nes:
>>>>>>> I want to use Tsearch2 for a current project I have but I can't
>>>>>>> seem to
>>>>>>> find a way to implement it on hebrew content.
>>>>>>
>>>>>> I have the same problem since I have an UTF-8 Database of arround
>>>>>> 380 GByte (growing 100 MByte per day) in over 60 languages and
>>>>>> can not search in arabic, farsi and hebrew.
>>>>>>
>>>>>> It seems, that there is NO solution for those three languages
>>>>>>
>>>>>> Greetings
>>>>>> Michelle Konzack
>>>>>> Systemadministrator
>>>>>> Tamay Dogan Network
>>>>>> Debian GNU/Linux Consultant
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Regards,
>>>>> Oleg
>>>>> _____________________________________________________________
>>>>> Oleg Bartunov, Research Scientist, Head of AstroNet
>>>>> (www.astronet.ru),
>>>>> Sternberg Astronomical Institute, Moscow University, Russia
>>>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>>>
>>>>> ---------------------------(end of
>>>>> broadcast)---------------------------
>>>>> TIP 4: Have you searched our list archives?
>>>>>
>>>>> http://archives.postgresql.org
>>>>>
>>>>>
>>>>> __________ NOD32 1.1739 (20060904) Information __________
>>>>>
>>>>> This message was checked by NOD32 antivirus system.
>>>>> http://www.eset.com
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>> Regards,
>>> Oleg
>>> _____________________________________________________________
>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
>>> Sternberg Astronomical Institute, Moscow University, Russia
>>> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
>>> phone: +007(495)939-16-83, +007(495)939-23-83
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 6: explain analyze is your friend
>>>
>>>
>>> __________ NOD32 1.1739 (20060904) Information __________
>>>
>>> This message was checked by NOD32 antivirus system.
>>> http://www.eset.com
>>>
>>>
>>>
>>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83
>
>
> __________ NOD32 1.1739 (20060904) Information __________
>
> This message was checked by NOD32 antivirus system.
> http://www.eset.com
>
>
>