create custom collation from case insensitive portuguese

Lists: pgsql-hackers
From: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
To: Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: create custom collation from case insensitive portuguese
Date: 2010-11-02 15:40:44
Message-ID: 4CD030FC.5000607@timbira.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexandre Riveira escreveu:
> I've achieved some success in changing collate operating system (linux)
> to generate sort of way of Brazil Portuguese hopes by adding the
> following code in LC_COLLATE
>
This was already discussed; search the archives [1] [2].

> So far, I understood the mechanism of change collate and reproduce in
> postgresql, and I could not generate a case-insensitive search, I
> believe that would change within the LC_COLLATE variable, but could not
> go any further than that.
>
PostgreSQL doesn't support case-insensitive searches specifying the collate
per column yet. Look at [3]. But you could use ILIKE or regular expression to
achieve that.

[1] http://pgfoundry.org/pipermail/brasil-usuarios/20060330/001667.html
[2] http://www.mail-archive.com/brasil-usuarios(at)pgfoundry(dot)org/msg00895.html
[3] http://archives.postgresql.org/pgsql-hackers/2010-07/msg00512.php

--
Euler Taveira de Oliveira
http://www.timbira.com/


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Euler Taveira de Oliveira <euler(at)timbira(dot)com>
Cc: Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: create custom collation from case insensitive portuguese
Date: 2010-11-02 17:57:26
Message-ID: AANLkTi=hYYHp5GH+4SLf5WwmwHAVUG7=uudgJYYpauy7@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
<euler(at)timbira(dot)com> wrote:
> Alexandre Riveira escreveu:
>> I've achieved some success in changing collate operating system (linux)
>> to generate sort of way of Brazil Portuguese hopes by adding the
>> following code in LC_COLLATE
>>
> This was already discussed; search the archives [1] [2].
>
>> So far, I understood the mechanism of change collate and reproduce in
>> postgresql, and I could not generate a case-insensitive search, I
>> believe that would change within the LC_COLLATE variable, but could not
>> go any further than that.
>>
> PostgreSQL doesn't support case-insensitive searches specifying the collate
> per column yet. Look at [3]. But you could use ILIKE or regular expression to
> achieve  that.

Is citext also useful for this?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


From: Dimitri Fontaine <dimitri(at)2ndQuadrant(dot)fr>
To: Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Euler Taveira de Oliveira <euler(at)timbira(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: create custom collation from case insensitive portuguese
Date: 2010-11-02 22:35:39
Message-ID: m2d3qnz744.fsf@2ndQuadrant.fr
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br> writes:
> When mentioned in Portuguese case-insensitive in fact we are also talking
> about accent-insensitive

See unaccent dictionary, but don't use only this one in your text search
configuration, IIRC.

http://www.postgresql.org/docs/9/static/unaccent.html

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


From: Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Euler Taveira de Oliveira <euler(at)timbira(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: create custom collation from case insensitive portuguese
Date: 2010-11-11 22:11:42
Message-ID: 4CDC6A1E.4050004@objectdata.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thank you all for your help !

When mentioned in Portuguese case-insensitive in fact we are also
talking about accent-insensitive

A common example is that the name Jose and José also can be written,
citext or ilike only not solve the problem

My progress is ...

Edit file /usr/share/i18n/locales/i18n e alter section tolower /, an
example:

(<U00C9>,<U00E9>) e alter for (<U00C9>,<U0065>)

LOWER reproduce: LOWER("ITAPAGÉ") => "itapage",

Example success:

SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%')

this behavior is reproduced in citext (logradouro is column citext)

SELECT * FROM endereco WHERE logradouro = 'itapage'
or
SELECT * FROM endereco WHERE logradouro LIKE 'itapage%'

All examples return the desired value "ITAPAGÉ"
Issue:

SELECT * FROM endereco WHERE logradouro LIKE 'itapage%' NOT USE INDEX
I tried
CREATE INDEX cep_ik_logradouro ON cep USING btree (logradouro);
CREATE INDEX like_index ON cep(logradouro varchar_pattern_ops);
CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops);

I've had success with using index
SELECT * FROM endereco WHERE LOWER(logradouro) LIKE LOWER('itapage%')
and CREATE INDEX cep_ci_index ON cep(lower(logradouro) varchar_pattern_ops)
But I want to solve using only citext

Tank's

Alexandre Riveira
Brazil

Robert Haas escreveu:
> On Tue, Nov 2, 2010 at 8:40 AM, Euler Taveira de Oliveira
> <euler(at)timbira(dot)com> wrote:
>
>> Alexandre Riveira escreveu:
>>
>>> I've achieved some success in changing collate operating system (linux)
>>> to generate sort of way of Brazil Portuguese hopes by adding the
>>> following code in LC_COLLATE
>>>
>>>
>> This was already discussed; search the archives [1] [2].
>>
>>
>>> So far, I understood the mechanism of change collate and reproduce in
>>> postgresql, and I could not generate a case-insensitive search, I
>>> believe that would change within the LC_COLLATE variable, but could not
>>> go any further than that.
>>>
>>>
>> PostgreSQL doesn't support case-insensitive searches specifying the collate
>> per column yet. Look at [3]. But you could use ILIKE or regular expression to
>> achieve that.
>>
>
> Is citext also useful for this?
>
>


From: Alexandre Riveira <alexandre(at)objectdata(dot)com(dot)br>
To: pgsql-hackers(at)postgresql(dot)org
Subject: create custom collation from case insensitive portuguese
Date: 2010-12-05 10:33:38
Message-ID: 4CFB6A82.1050907@objectdata.com.br
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Thanks to all the work they have done with this incredible database,
postgresql.

I've achieved some success in changing collate operating system (linux)
to generate sort of way of Brazil Portuguese hopes by adding the
following code in LC_COLLATE

LC_COLLATE
copy "iso14651_t1_ci"
reorder-after <U00A0>
<U0020> <CAP>; <CAP>; <CAP>; <U0020>
reorder-end

And in my tests I could change the behavior of the postgresql UPPER
changing the file i18n ... etc.

toupper /
(<U0061>, <U0041>);

So far, I understood the mechanism of change collate and reproduce in
postgresql, and I could not generate a case-insensitive search, I
believe that would change within the LC_COLLATE variable, but could not
go any further than that.

Could someone guide me please.

Thanks,

Alexandre Riveira
Brazil