PostgreSQL, Mac OS X and locales

Lists: pgsql-general
From: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: PostgreSQL, Mac OS X and locales
Date: 2005-11-01 21:56:02
Message-ID: EE44A61A-A63B-447E-9706-5857ED149B87@pharmaline.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

Hi.

I have a problem with the use of PostgreSQL on Mac OS X (10.4.x) and
selects with "order by". When there are german umlauts in the column
which is used to order, the sorted result is wrong.

As an example, I get:

a
b
ä

where it should be:

a
ä
b

Had someone here the same problem and solved it? I think the locales,
that come with Mac OS X are broken for this, as it seems they don't
provide the correct LC_COLLATE sort order.

I tried to figure out, how to get a correct LC_COLLATE file but
wasn't successful so far. Someone else?

I'm using the following configuration:

- Mac OS X (Server) 10.4.x and 10.3.x
- PostgreSQL 8.0.3
- initdb with encoding set to UTF-8 or ISO-Latin-1 (same results)

Some hints? Thank you!

cug


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL, Mac OS X and locales
Date: 2005-11-01 22:24:22
Message-ID: 20051101222422.GG21137@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Tue, Nov 01, 2005 at 10:56:02PM +0100, Guido Neitzer wrote:
> Hi.
>
> I have a problem with the use of PostgreSQL on Mac OS X (10.4.x) and
> selects with "order by". When there are german umlauts in the column
> which is used to order, the sorted result is wrong.

PostgreSQL should get the same results as the command-line sort for the
same values of LC_COLLATE. However, the value is fixed at initdb time
so maybe that's what's confusing you.

However, MacOS X inherited FreeBSDs locale setup, which is fairly lame.
They've made many improvements though so your problem should be
fixable.

BTW, you didn't actually say what locale you where using...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
To: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Cc: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: PostgreSQL, Mac OS X and locales
Date: 2005-11-02 06:27:49
Message-ID: BF08ADAC-9BB6-40D7-8582-94AAB018E01E@pharmaline.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 01.11.2005, at 23:24 Uhr, Martijn van Oosterhout wrote:

> PostgreSQL should get the same results as the command-line sort for
> the
> same values of LC_COLLATE. However, the value is fixed at initdb time
> so maybe that's what's confusing you.

Yes, it gets the same results as sort.

I have tried several encoding/locale settings with initdb.

> However, MacOS X inherited FreeBSDs locale setup, which is fairly
> lame.
> They've made many improvements though so your problem should be
> fixable.
>
> BTW, you didn't actually say what locale you where using...

I mostly use UTF-8, so I have called initdb with "-E UTF-8 --
locale=de_DE.UTF-8".

The LC_COLLATE for this locale is a link pointing to "../la_LN.US-
ASCII/LC_COLLATE". This is why I don't think they paid much attention
to the correct sort order of umlauts.

But I have also tried ISO-5589-, ISO-8859-15 and others. No change.
May I take an LC_COLLATE file from another system and use this?

cug


From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL, Mac OS X and locales
Date: 2005-11-02 10:47:57
Message-ID: 20051102104754.GA19550@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On Wed, Nov 02, 2005 at 07:27:49AM +0100, Guido Neitzer wrote:
> The LC_COLLATE for this locale is a link pointing to "../la_LN.US-
> ASCII/LC_COLLATE". This is why I don't think they paid much attention
> to the correct sort order of umlauts.

Ah you noticed that. Yes, many of the UTF-8 locales in FreeBSD simply
point to the ASCII versions which doesn't exactly work very well.

> But I have also tried ISO-5589-, ISO-8859-15 and others. No change.
> May I take an LC_COLLATE file from another system and use this?

Only another FreeBSD system, these files are not portable. However, the
source files for these are plain text so you can edit them. I beleive
the source is in the C library. Find the source files, edit them and
recompile. Then just copy the LC_COLLATE file over.

You might also want to look at some of the other predefined locale
collate orders, maybe someone has put some effert in getting an order
right for another language.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


From: Guido Neitzer <guido(dot)neitzer(at)pharmaline(dot)de>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: PostgreSQL General <pgsql-general(at)postgresql(dot)org>
Subject: Re: PostgreSQL, Mac OS X and locales
Date: 2005-11-02 11:25:41
Message-ID: 02902D0A-D80A-45A1-9BAD-D40DF81E32AB@pharmaline.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general

On 02.11.2005, at 11:47 Uhr, Martijn van Oosterhout wrote:

> Ah you noticed that. Yes, many of the UTF-8 locales in FreeBSD simply
> point to the ASCII versions which doesn't exactly work very well.

Right.

> Only another FreeBSD system, these files are not portable. However,
> the
> source files for these are plain text so you can edit them. I beleive
> the source is in the C library. Find the source files, edit them and
> recompile. Then just copy the LC_COLLATE file over.

I have to look how to do this. I have build locales a very long time
ago, but currently I'm looking where to start ... perhaps with the
Darwin sources. We will see.

> You might also want to look at some of the other predefined locale
> collate orders, maybe someone has put some effert in getting an order
> right for another language.

I tried a lot of them and the all didn't work for me.

cug