Collation in ORDER BY not lexicographical

Lists: pgsql-general
From: Paul Gaspar <devlist(at)revolversoft(dot)com>
To: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Collation in ORDER BY not lexicographical
Date: 2009-09-29 08:52:02
Message-ID: 20BC5D30-5677-4E0F-A9EB-A05DBC2CE866@revolversoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi!

We have big problems with collation in ORDER BY, which happens in
binary order, not alphabetic (lexicographical), like:.

A
B
Z
a
z
Ä
Ö
ä
ö

This is what we have done:

1. initdb -U myuser -E UTF-8 --locale=de_DE.UTF-8 -D /mydata

2. pg_ctl -U myuser -D /mydata -l logfile start

3. createdb -U myuser mydb

4. create table mytable(name text)

5. insert into mytable values('Adam'); ....

6. select * from mytable order by name

The result is:

"Adam"
"Berta"
"Mann"
"Zoo"
"Zoodirektor"
"adam"
"Äpfel"
"Öl"
"äpfel"

"locale -a" shows:

de_DE
de_DE.ISO8859-1
de_DE.ISO8859-15
de_DE.UTF-8

"psql -l" shows:

List of databases
Name | Owner | Encoding | Collation | Ctype
| Access privileges
----------+-----------+----------+-------------+-------------
+-----------------------------------
postgres | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |
template0 | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
: myuser =CTc/myuser
template1 | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 | =c/myuser
:
myuser =CTc/myuser
mydb | myuser | UTF8 | de_DE.UTF-8 | de_DE.UTF-8 |

PG is running on Mac OS X 10.5 and 10.6 Intel.

Any help is appreciated. Thanks very much in advance.

Paul


From: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
To: Paul Gaspar <devlist(at)revolversoft(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Collation in ORDER BY not lexicographical
Date: 2009-09-29 09:21:24
Message-ID: dcc563d10909290221p72dc2b08m6398c2cd5c20dcc9@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist(at)revolversoft(dot)com> wrote:
> Hi!
>
> We have big problems with collation in ORDER BY, which happens in binary
> order, not alphabetic (lexicographical), like:.
>
> A
> B
> Z
> a
> z
> Ä
> Ö
> ä
> ö
>

> PG is running on Mac OS X 10.5 and 10.6 Intel.

I seem to recall there were some problem with Mac locales at some
point being broken. Could be you're running into that issue.


From: Maximilian Tyrtania <maximilian(dot)tyrtania(at)onlinehome(dot)de>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, Paul Gaspar <devlist(at)revolversoft(dot)com>
Cc: 'PostgreSQL pg-general List' <pgsql-general(at)postgresql(dot)org>
Subject: Re: Collation in ORDER BY not lexicographical
Date: 2009-09-29 12:36:57
Message-ID: C6E7CC09.3B8A7%maximilian.tyrtania@onlinehome.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott(dot)marlowe(at)gmail(dot)com:

> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist(at)revolversoft(dot)com> wrote:
>> Hi!
>>
>> We have big problems with collation in ORDER BY, which happens in binary
>> order, not alphabetic (lexicographical), like:.
>>
>> A
>> B
>> Z
>> a
>> z
>> Ä
>> Ö
>> ä
>> ö
>>
>
>> PG is running on Mac OS X 10.5 and 10.6 Intel.
>
> I seem to recall there were some problem with Mac locales at some
> point being broken. Could be you're running into that issue.

Yep, i ran into this as well. Here is my workaround: Create a function like
this:

CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert text)

RETURNS text AS
$BODY$
select
replace(replace(replace(replace(replace(replace($1,'Ä','A'),'Ö','O'),'Ü','U'
),'ä','a'),'ö','o'),'ü','u');

$BODY$

LANGUAGE 'sql' IMMUTABLE STRICT
COST 100;

ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;

Then create an index like this:

create index idx_personen_nachname_orderByFriendly on personen
(f_getorderbyfriendlyversion(nachname))

Now you can do:

select * from personen order by f_getorderbyfriendlyversion(p.nachname)

Seems pretty fast.

Best,

Maximilian Tyrtania


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
Cc: Paul Gaspar <devlist(at)revolversoft(dot)com>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Collation in ORDER BY not lexicographical
Date: 2009-09-30 09:24:16
Message-ID: 1254302656.10072.3.camel@fsopti579.F-Secure.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, 2009-09-29 at 03:21 -0600, Scott Marlowe wrote:
> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist(at)revolversoft(dot)com> wrote:
> > Hi!
> >
> > We have big problems with collation in ORDER BY, which happens in binary
> > order, not alphabetic (lexicographical), like:.
> >
> > A
> > B
> > Z
> > a
> > z
> > Ä
> > Ö
> > ä
> > ö
> >
>
> > PG is running on Mac OS X 10.5 and 10.6 Intel.
>
> I seem to recall there were some problem with Mac locales at some
> point being broken. Could be you're running into that issue.

Yes, the UTF8 locales on BSD systems (Mac OS X, FreeBSD, etc.) are
dysfunctional. Either switch to a non-UTF8 locale or a different
operating system.


From: Paul Gaspar <devlist(at)revolversoft(dot)com>
To: PostgreSQL pg-general List <pgsql-general(at)postgresql(dot)org>
Subject: Re: Collation in ORDER BY not lexicographical
Date: 2009-10-01 15:46:59
Message-ID: EB5DD4FC-76D0-48CC-BB83-B09F1F252AAD@revolversoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Thank you all very much for your help.

Maximilian, we simplified your replacing code:

> replace(replace(replace(replace(replace(replace
> ($1,'Ä','A'),'Ö','O'),'Ü','U'
> ),'ä','a'),'ö','o'),'ü','u');

to this:

translate(upper($1),'ÄÖÜ','AOU')

Paul

Am 29.09.2009 um 14:36 schrieb Maximilian Tyrtania:

> am 29.09.2009 11:21 Uhr schrieb Scott Marlowe unter scott(dot)marlowe(at)gmail(dot)com
> :
>
>> On Tue, Sep 29, 2009 at 2:52 AM, Paul Gaspar <devlist(at)revolversoft(dot)com
>> > wrote:
>>> Hi!
>>>
>>> We have big problems with collation in ORDER BY, which happens in
>>> binary
>>> order, not alphabetic (lexicographical), like:.
>>
>>> PG is running on Mac OS X 10.5 and 10.6 Intel.
>>
>> I seem to recall there were some problem with Mac locales at some
>> point being broken. Could be you're running into that issue.
>
> Yep, i ran into this as well. Here is my workaround: Create a
> function like
> this:
>
> CREATE OR REPLACE FUNCTION f_getorderbyfriendlyversion(texttoconvert
> text)
>
> RETURNS text AS
> $BODY$
> select
> replace(replace(replace(replace(replace(replace
> ($1,'Ä','A'),'Ö','O'),'Ü','U'
> ),'ä','a'),'ö','o'),'ü','u');
>
> $BODY$
>
> LANGUAGE 'sql' IMMUTABLE STRICT
> COST 100;
>
> ALTER FUNCTION f_getorderbyfriendlyversion(text) OWNER TO postgres;
>
> Then create an index like this:
>
> create index idx_personen_nachname_orderByFriendly on personen
> (f_getorderbyfriendlyversion(nachname))
>
>
> Now you can do:
>
> select * from personen order by f_getorderbyfriendlyversion
> (p.nachname)
>
> Seems pretty fast.
>
> Best,
>
> Maximilian Tyrtania