Tsearch2 - spanish

Lists: pgsql-general
From: Felipe de Jesús Molina Bravo <felipe(dot)molina(at)inegi(dot)gob(dot)mx>
To: pgsql-general(at)postgresql(dot)org
Cc: Felipe(dot)molina(at)inegi(dot)gob(dot)mx
Subject: Tsearch2 - spanish
Date: 2007-09-17 21:23:41
Message-ID: 1190064221.6856.35.camel@fjmb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi

I had installed postgresql-8.2.4 and tsearch2 with dictionary spanish.
My problem is:

prueba=# select to_tsvector('espanol','melón');
ERROR: Affix parse error at 506 line

And if execute:

prueba=# select lexize('sp','melón');
lexize
---------
{melon}
(1 row)

I tried many dictionaries with the same results. Also I change the
codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
iso88591") and got the same error

where can I investigate for resolve about this problem?

My dictionary at 506 line had:

flag *J: # isimo
E > -E, ÍSIMO # grande grandísimo
E > -E, ÍSIMOS # grande grandísimos
E > -E, ÍSIMA # grande grandísima
E > -E, ÍSIMAS # grande grandísimas
O > -O, ÍSIMO # tonto tontísimo
O > -O, ÍSIMA # tonto tontísima
O > -O, ÍSIMOS # tonto tontísimos
O > -O, ÍSIMAS # tonto tontísimas
L > ÍSIMO # formal formalísimo
L > ÍSIMA # formal formalísima
L > ÍSIMOS # formal formalísimos
L > ÍSIMAS # formal formalísimas

If removed "Í" then I don't have problem, but the lexema is incorrect

I saw the post
http://archives.postgresql.org/pgsql-general/2007-07/msg00888.php

Maybe Marcelo had resolve the problem, can you tell me your
configuration of tsearch2?

best regards

PD I need to resolve it for my work


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Felipe de Jesús Molina Bravo <felipe(dot)molina(at)inegi(dot)gob(dot)mx>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Tsearch2 - spanish
Date: 2007-09-18 15:19:00
Message-ID: 46EFEC64.1090207@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> prueba=# select to_tsvector('espanol','melón');
> ERROR: Affix parse error at 506 line
and
> prueba=# select lexize('sp','melón');
> lexize
> ---------
> {melon}
> (1 row)

Looks very strange, can you provide list of dictionaries and configuration map?

> I tried many dictionaries with the same results. Also I change the
> codeset of files :aff and dict (from "latin1 to utf8" and "utf8 to
> iso88591") and got the same error
>
> where can I investigate for resolve about this problem?
>
> My dictionary at 506 line had:
Where do you take this file? And what is encdoing/locale setting of your db?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Felipe de Jesús Molina Bravo <felipe(dot)molina(at)inegi(dot)gob(dot)mx>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch2 - spanish
Date: 2007-09-18 19:47:15
Message-ID: 1190144835.6821.55.camel@fjmb
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi

You are rigth, the output of "show lc_ctype;" is C.

Then I did is:

prueba1=# show lc_ctype;
lc_ctype
-----------------
es_MX.ISO8859-1
(1 row)

and do it

% initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1

(how you do say)

and "createdb -E iso8859-1 prueba1" and finally tsearch2

the original problem is resolved

prueba1=# select to_tsvector('espanol','melón');
to_tsvector
-------------
'melón':1
(1 row)

but if I change the sentece for it:

prueba1=# select to_tsvector('espanol','melón perro mordelón');
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!>

??? lost the connection ... the server is up .... any idea?

The synonym is intentional

thanks in advanced

El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev escribió:
> > LC_CTYPE="POSIX"
>
>
> pls, output of "show lc_ctype;" command. If it's C locale then I can identify
> problem - characters diacritical mark (as ó) is not an alpha character, and
> ispell dictionary will fail. To fix that you should run initdb with options:
> % initdb -D /YOUR/PATH -E LATIN1 --locale es_ES.ISO8859-1
> or
> % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
>
> In last case you should also recode all dictionary's datafile in utf8 encoding.
>
> >>> prueba=# select to_tsvector('espanol','melón');
> >>> ERROR: Affix parse error at 506 line
> >> and
> >>> prueba=# select lexize('sp','melón');
> >>> lexize
> >>> ---------
> >>> {melon}
> >>> (1 row)
> sp is a Snowball stemmer, it doesn't require affix file, so it works.
>
> By the way, why is synonym dictionary paced after ispell? is it intentional?
> Usually, synonym dictionary goes first, then ispell and after all of them snowball.
>


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Felipe de Jesús Molina Bravo <felipe(dot)molina(at)inegi(dot)gob(dot)mx>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch2 - spanish
Date: 2007-09-19 16:30:42
Message-ID: 46F14EB2.2060506@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

> prueba1=# select to_tsvector('espanol','melón perro mordelón');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>

Hmm, can you provide backtrace?

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: marcelo Cortez <jmdc_marcelo(at)yahoo(dot)com(dot)ar>
To: Felipe de Jesús Molina Bravo <felipe(dot)molina(at)inegi(dot)gob(dot)mx>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch2 - spanish
Date: 2007-09-20 12:13:18
Message-ID: 694124.69149.qm@web32110.mail.mud.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Felipe

--- Felipe de Jesús Molina Bravo
<felipe(dot)molina(at)inegi(dot)gob(dot)mx> escribió:

> Hi
>
> You are rigth, the output of "show lc_ctype;" is C.
>
> Then I did is:
>
> prueba1=# show lc_ctype;
> lc_ctype
> -----------------
> es_MX.ISO8859-1
> (1 row)
>
> and do it
>
> % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
>
> (how you do say)
>
> and "createdb -E iso8859-1 prueba1" and finally
> tsearch2
>
> the original problem is resolved
>
> prueba1=# select to_tsvector('espanol','melón');
> to_tsvector
> -------------
> 'melón':1
> (1 row)
>
>
> but if I change the sentece for it:
>
> prueba1=# select to_tsvector('espanol','melón perro
> mordelón');
> server closed the connection unexpectedly
> This probably means the server terminated
> abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting
> reset: Failed.
> !>

The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc
>
>
> ??? lost the connection ... the server is up ....
> any idea?
>
> The synonym is intentional
>
>
> thanks in advanced
>
>
> El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
> escribió:
> > > LC_CTYPE="POSIX"
> >
> >
> > pls, output of "show lc_ctype;" command. If it's C
> locale then I can identify
> > problem - characters diacritical mark (as ó) is
> not an alpha character, and
> > ispell dictionary will fail. To fix that you
> should run initdb with options:
> > % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
> > or
> > % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
> >
> > In last case you should also recode all
> dictionary's datafile in utf8 encoding.
> >
> > >>> prueba=# select
> to_tsvector('espanol','melón');
> > >>> ERROR: Affix parse error at 506 line
> > >> and
> > >>> prueba=# select lexize('sp','melón');
> > >>> lexize
> > >>> ---------
> > >>> {melon}
> > >>> (1 row)
> > sp is a Snowball stemmer, it doesn't require affix
> file, so it works.
> >
> > By the way, why is synonym dictionary paced after
> ispell? is it intentional?
> > Usually, synonym dictionary goes first, then
> ispell and after all of them snowball.
> >
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
> subscribe-nomail command to
> majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list
> cleanly
>

Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby


From: "MOLINA BRAVO FELIPE DE JESUS" <felipe(dot)molina(at)inegi(dot)gob(dot)mx>
To: "marcelo Cortez" <jmdc_marcelo(at)yahoo(dot)com(dot)ar>, "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
Cc: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Tsearch2 - spanish
Date: 2007-09-20 16:51:25
Message-ID: 5CE6C20D880B514E88D5A05E9949E15628F44A@CORREOAGS03.inegi.gob.mx
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi

Thank's Teodor and Marcelo

the problem is solved

regards

-----Mensaje original-----
De: marcelo Cortez [mailto:jmdc_marcelo(at)yahoo(dot)com(dot)ar]
Enviado el: jue 20/09/2007 7:13
Para: MOLINA BRAVO FELIPE DE JESUS; Teodor Sigaev
CC: PostgreSQL General
Asunto: Re: [GENERAL] Tsearch2 - spanish

Felipe

--- Felipe de Jesús Molina Bravo
<felipe(dot)molina(at)inegi(dot)gob(dot)mx> escribió:

> Hi
>
> You are rigth, the output of "show lc_ctype;" is C.
>
> Then I did is:
>
> prueba1=# show lc_ctype;
> lc_ctype
> -----------------
> es_MX.ISO8859-1
> (1 row)
>
> and do it
>
> % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
>
> (how you do say)
>
> and "createdb -E iso8859-1 prueba1" and finally
> tsearch2
>
> the original problem is resolved
>
> prueba1=# select to_tsvector('espanol','melón');
> to_tsvector
> -------------
> 'melón':1
> (1 row)
>
>
> but if I change the sentece for it:
>
> prueba1=# select to_tsvector('espanol','melón perro
> mordelón');
> server closed the connection unexpectedly
> This probably means the server terminated
> abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting
> reset: Failed.
> !>

The same thing he same thing happened my to me at
first time with
Tsearch2 - spanish , i think you need
patch snowball with tsearch_snowball_82 file ,
googling
you find instructions how doit .
best regards
mdc
>
>
> ??? lost the connection ... the server is up ....
> any idea?
>
> The synonym is intentional
>
>
> thanks in advanced
>
>
> El mar, 18-09-2007 a las 21:40 +0400, Teodor Sigaev
> escribió:
> > > LC_CTYPE="POSIX"
> >
> >
> > pls, output of "show lc_ctype;" command. If it's C
> locale then I can identify
> > problem - characters diacritical mark (as ó) is
> not an alpha character, and
> > ispell dictionary will fail. To fix that you
> should run initdb with options:
> > % initdb -D /YOUR/PATH -E LATIN1 --locale
> es_ES.ISO8859-1
> > or
> > % initdb -D /YOUR/PATH -E UTF8 --locale es_ES.UTF8
> >
> > In last case you should also recode all
> dictionary's datafile in utf8 encoding.
> >
> > >>> prueba=# select
> to_tsvector('espanol','melón');
> > >>> ERROR: Affix parse error at 506 line
> > >> and
> > >>> prueba=# select lexize('sp','melón');
> > >>> lexize
> > >>> ---------
> > >>> {melon}
> > >>> (1 row)
> > sp is a Snowball stemmer, it doesn't require affix
> file, so it works.
> >
> > By the way, why is synonym dictionary paced after
> ispell? is it intentional?
> > Usually, synonym dictionary goes first, then
> ispell and after all of them snowball.
> >
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please
> send an appropriate
> subscribe-nomail command to
> majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list
> cleanly
>

Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 2007.
http://ar.sports.yahoo.com/mundialderugby


From: "madhtr" <madhtr(at)schif(dot)org>
To: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: How to clear bits?
Date: 2007-09-20 17:01:47
Message-ID: 009001c7fba7$ee8544d0$7b55503f@useronewin2klt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hello group :)

How do a clear bits in a number in PostGreSQL?

in c++ its:

0xffffff00 &~ 0x0000ffff

what is it in PostGreSQL from the psql command line app?

select ...

Thanx:)


From: "madhtr" <madhtr(at)schif(dot)org>
To: "madhtr" <madhtr(at)schif(dot)org>, "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Subject: Re: How to clear bits?
Date: 2007-09-20 18:10:00
Message-ID: 00d101c7fbb1$76553fb0$7b55503f@useronewin2klt
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

nevermind, I figured it out ...

fails:

0xffffff00 &~ 0x0000ffff

succeeds:

0xffffff00 & ~ 0x0000ffff

I had to add a space.

----- Original Message -----
From: "madhtr" <madhtr(at)schif(dot)org>
To: "PostgreSQL General" <pgsql-general(at)postgresql(dot)org>
Sent: Thursday, September 20, 2007 13:01
Subject: [GENERAL] How to clear bits?

> Hello group :)
>
> How do a clear bits in a number in PostGreSQL?
>
> in c++ its:
>
> 0xffffff00 &~ 0x0000ffff
>
> what is it in PostGreSQL from the psql command line app?
>
> select ...
>
> Thanx:)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match