Lists: | pgsql-bugs |
---|
From: | Klaus Ita <postgres(at)stro(dot)at> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Sorting Problem in UNICODE/german |
Date: | 2005-09-01 11:02:35 |
Message-ID: | 20050901110235.GC1479@aipc54.ai.wu-wien.ac.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Hi there!
I have a Problem with a DB that was created in UNICODE
* createdb -E UNICODE
and actually shows that it _is_ in UNICODE.
i was able to input data and can read it and everything is fine.
but when i want to "ORDER BY ..." it does not sort the german Umlauts at the
correct postition.
should be:
m n o p
and is:
a b c d
I have tried starting postgres with LC_ALL=de_AT(dot)utf8(at)euro
locale but that did not help.
what xould i do?
regs,
klaus
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Klaus Ita <postgres(at)stro(dot)at> |
Cc: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Sorting Problem in UNICODE/german |
Date: | 2005-09-01 13:30:15 |
Message-ID: | 12189.1125581415@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Klaus Ita <postgres(at)stro(dot)at> writes:
> I have tried starting postgres with LC_ALL=de_AT(dot)utf8(at)euro
> locale but that did not help.
You need to run initdb under that setting. See "Localization" in
the documentation.
regards, tom lane
From: | Klaus Ita <postgres(at)stro(dot)at> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Sorting Problem in UNICODE/german |
Date: | 2005-09-02 06:36:41 |
Message-ID: | 20050902063641.GD1479@aipc54.ai.wu-wien.ac.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote:
> Klaus Ita <postgres(at)stro(dot)at> writes:
> > I have tried starting postgres with LC_ALL=de_AT(dot)utf8(at)euro
> > locale but that did not help.
>
i did read the docs and am still not quite happy with my sorting results.
ok initdb has been rerun
made sure, i had the locale:
locale -a
created new db-cluster with
LC_ALL=de_AT(dot)utf8(at)euro initdb --locale=de_AT(dot)utf8(at)euro -E UNICODE -D /dev/shm/pgutf8
and then still the sorting was not right when i restored another UNICODE db.
another "funny" thing is:
ita(at)aipc54:~/.mutt$ LC_ALL=de_AT(dot)utf8(at)euro sort /tmp/testfile
Abend
Oma
terreich
erflieger
Unter
Zetrix
this is also wrong (There should be 'Unter' and then 'U:berflieger' [berflieger]). so is this a libc bug?
thank you for your help so far! I more than appreciate it. Support for this
DB is sooo much better than for oracle!
klaus
From: | Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Sorting Problem in UNICODE/german |
Date: | 2005-09-02 07:54:39 |
Message-ID: | 87r7c729c0.fsf@gate450.dyndns.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Klaus Ita schrob:
> On Thu, Sep 01, 2005 at 09:30:15AM -0400, Tom Lane wrote:
>> Klaus Ita <postgres(at)stro(dot)at> writes:
>> > I have tried starting postgres with LC_ALL=de_AT(dot)utf8(at)euro
>> > locale but that did not help.
>>
>
> i did read the docs and am still not quite happy with my sorting results.
> ok initdb has been rerun
>
> made sure, i had the locale:
>
> locale -a
>
> created new db-cluster with
> LC_ALL=de_AT(dot)utf8(at)euro initdb --locale=de_AT(dot)utf8(at)euro -E UNICODE -D /dev/shm/pgutf8
>
> and then still the sorting was not right when i restored another
> UNICODE db.
Well, I used the very same command with 8.0.3 to create a database,
and the sort order was correct:
--8<---------------cut here---------------start------------->8---
scratch=# select w from w order by w;
w
-------------
Abend
Oma
Österreich
Überflieger
Unter
Zetrix
(6 rows)
--8<---------------cut here---------------end--------------->8---
So I guess there was some misconfiguration of your current
client_encoding during import, or maybe the dump of your unicode db
got unexpectedly converted by improper settings during dumping.
> another "funny" thing is:
>
> ita(at)aipc54:~/.mutt$ LC_ALL=de_AT(dot)utf8(at)euro sort /tmp/testfile
> Abend
> Oma
> Ãterreich
> Ãerflieger
> Unter
> Zetrix
>
> this is also wrong (There should be 'Unter' and then 'U:berflieger'
> [Überflieger]). so is this a libc bug?
The sort order is correct, so libc did succeed in its part. Maybe your
terminal is having issues with utf-8? If you're using xterm: Did you
run it with -u8 or some utf-8-enabling X-resource? To verify that the
terminal is working properly, typing
echo ö > /tmp/foo
file /tmp/foo
on a shell should tell you that you have a utf-8 text file.
HTH
Andreas
--
From: | Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> |
---|---|
To: | pgsql-bugs(at)postgresql(dot)org |
Subject: | Re: Sorting Problem in UNICODE/german |
Date: | 2005-09-02 09:56:49 |
Message-ID: | 87k6hz23oe.fsf@gate450.dyndns.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Sorry, I just reread your mail: Your MUA is declaring it with
> Content-Type: text/plain; charset=unknown-8bit
This makes it even harder to discuss problems with Umlauts :-).
Andreas Seltenreich schrob:
> Klaus Ita schrob:
>
>> another "funny" thing is:
>>
>> ita(at)aipc54:~/.mutt$ LC_ALL=de_AT(dot)utf8(at)euro sort /tmp/testfile
>> Abend
>> Oma
>> Ãterreich
>> Ãerflieger
>> Unter
>> Zetrix
>>
>> this is also wrong (There should be 'Unter' and then 'U:berflieger'
>> [Überflieger]). so is this a libc bug?
I think I got your point now. Libc appears to be using iso-14651
sorting for all "de" locales. I'm afraid you will have compile a
customized locale to depart from that.
regards,
Andreas
--
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> |
Cc: | pgsql-bugs(at)postgresql(dot)org, Klaus Ita <postgres(at)stro(dot)at> |
Subject: | Re: Sorting Problem in UNICODE/german |
Date: | 2005-09-02 14:25:24 |
Message-ID: | 13915.1125671124@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-bugs |
Andreas Seltenreich <andreas+pg(at)gate450(dot)dyndns(dot)org> writes:
>> Klaus Ita schrob:
>>> this is also wrong (There should be 'Unter' and then 'U:berflieger'
>>> [berflieger]). so is this a libc bug?
> I think I got your point now. Libc appears to be using iso-14651
> sorting for all "de" locales. I'm afraid you will have compile a
> customized locale to depart from that.
I wouldn't call it a libc bug, but a bug in the locale definition.
In any case it doesn't appear to be Postgres' problem --- if we sort
the same way "sort" does under the same locale setting, then we are
doing what we expect.
I think at this point Klaus needs to find some people who know about
hacking locale definitions. I sure don't know enough about them to
help further. Is there a libc mailing list anywhere?
One thing I do know --- if you install a new version of the locale
Postgres is using, you'd better re-initdb, or at least REINDEX all
your indexes on textual columns. Changing sort order is equivalent
to making such indexes corrupt.
regards, tom lane