tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)

Lists: pgsql-hackers
From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: foreign_data test fails with non-C locale
Date: 2009-01-09 14:12:15
Message-ID: 49675B3F.20002@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

The foreign_data test case is failing when I run "make installcheck"
against a server that's been initialized with a locale other than C
(en_GB.UTF-8).

The reason is the different ordering of upper and lower case characters,
per attached diff file. We can simply add an alternative expected output
file, but I'd prefer not to if we can modify the test case instead. We
could rename some of the object so that they sort the same in all
locales, but that seems a bit awkward in this case.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
regression.diffs text/plain 8.5 KB

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 14:17:43
Message-ID: 49675C87.8070303@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> The foreign_data test case is failing when I run "make installcheck"
> against a server that's been initialized with a locale other than C
> (en_GB.UTF-8).
>
> The reason is the different ordering of upper and lower case
> characters, per attached diff file. We can simply add an alternative
> expected output file, but I'd prefer not to if we can modify the test
> case instead. We could rename some of the object so that they sort the
> same in all locales, but that seems a bit awkward in this case.

Regression tests have always failed on non-C locales AFAIK. The
buildfarm goes out of its way to avoid that.

cheers

andrew


From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 14:23:32
Message-ID: 49675DE4.9050603@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Heikki Linnakangas wrote:
>> The foreign_data test case is failing when I run "make installcheck"
>> against a server that's been initialized with a locale other than C
>> (en_GB.UTF-8).
>>
>> The reason is the different ordering of upper and lower case
>> characters, per attached diff file. We can simply add an alternative
>> expected output file, but I'd prefer not to if we can modify the test
>> case instead. We could rename some of the object so that they sort the
>> same in all locales, but that seems a bit awkward in this case.
>
> Regression tests have always failed on non-C locales AFAIK. The
> buildfarm goes out of its way to avoid that.

No, that's the only test case that's failing.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 14:25:18
Message-ID: 49675E4E.1080805@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan wrote:
> Regression tests have always failed on non-C locales AFAIK. The
> buildfarm goes out of its way to avoid that.

The regression tests should work just fine in non-C locales. If the
buildfarm goes out of its way to avoid non-C locales, then it loses some
significant code coverage, considering that there are several variant
code paths for locales, and considering the amount of users that use them.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 14:51:44
Message-ID: 49676480.3020202@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> The foreign_data test case is failing when I run "make installcheck"
> against a server that's been initialized with a locale other than C
> (en_GB.UTF-8).

I have removed one of the differences but can't reproduce the other
right now (although it looks consequential). I'll check that on a
different machine.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 15:21:46
Message-ID: 49676B8A.8020707@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut wrote:
> Andrew Dunstan wrote:
>> Regression tests have always failed on non-C locales AFAIK. The
>> buildfarm goes out of its way to avoid that.
>
> The regression tests should work just fine in non-C locales. If the
> buildfarm goes out of its way to avoid non-C locales, then it loses
> some significant code coverage, considering that there are several
> variant code paths for locales, and considering the amount of users
> that use them.

It was discussed here at the time, IIRC, and we put in the check
precisely because other locales broke the buildfarm. Originally
buildfarm just inherited the locale from its environment.

If it is no longer true that other locales break the tests, then I'm
happy to examine alternatives.

cheers

andrew


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 16:24:55
Message-ID: 26314.1231518295@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> Peter Eisentraut wrote:
>> The regression tests should work just fine in non-C locales.

> It was discussed here at the time, IIRC, and we put in the check
> precisely because other locales broke the buildfarm. Originally
> buildfarm just inherited the locale from its environment.

I don't think we are prepared to buy into a general policy that the
regression tests should pass in *any* locale; maintaining a large
number of variant expected-files isn't very practical. However, the
de facto policy is that we try to keep them passing in locales that
are used by any of the regular developers. I think it would be useful
to have buildfarm members testing in a few common locales.

regards, tom lane


From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andrew Dunstan" <andrew(at)dunslane(dot)net>, "Peter Eisentraut" <peter_e(at)gmx(dot)net>, "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 16:45:05
Message-ID: 1d4e0c10901090845h1d647297w68f6f9a70bb757ec@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jan 9, 2009 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> However, the
> de facto policy is that we try to keep them passing in locales that
> are used by any of the regular developers. I think it would be useful
> to have buildfarm members testing in a few common locales.

If you define common locales, I can set up as many new animals as
needed to cover the locales needed for any branch we'd like to test.

Perhaps we should add a parameter to the buildfarm config file so that
the buildfarm script can check the locale is accepted and set it
directly. Considering that we won't have the locale information in the
animal description, it's a good way to have it in the report.

Just let me know.

--
Guillaume


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-09 17:16:09
Message-ID: 49678659.4040904@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Guillaume Smet wrote:
> On Fri, Jan 9, 2009 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> However, the
>> de facto policy is that we try to keep them passing in locales that
>> are used by any of the regular developers. I think it would be useful
>> to have buildfarm members testing in a few common locales.
>>
>
> If you define common locales, I can set up as many new animals as
> needed to cover the locales needed for any branch we'd like to test.
>
> Perhaps we should add a parameter to the buildfarm config file so that
> the buildfarm script can check the locale is accepted and set it
> directly. Considering that we won't have the locale information in the
> animal description, it's a good way to have it in the report.
>
>
>

Sure, we can easily have buildfarm's initdb step set any locale (and
encoding, for that matter) we like. That's a simple change.

cheers

andrew


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 09:41:59
Message-ID: 200901111142.00220.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 09 January 2009 16:51:44 Peter Eisentraut wrote:
> Heikki Linnakangas wrote:
> > The foreign_data test case is failing when I run "make installcheck"
> > against a server that's been initialized with a locale other than C
> > (en_GB.UTF-8).
>
> I have removed one of the differences but can't reproduce the other
> right now (although it looks consequential). I'll check that on a
> different machine.

Also fixed now.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 10:54:02
Message-ID: 200901111254.03722.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Friday 09 January 2009 18:24:55 Tom Lane wrote:
> I don't think we are prepared to buy into a general policy that the
> regression tests should pass in *any* locale; maintaining a large
> number of variant expected-files isn't very practical. However, the
> de facto policy is that we try to keep them passing in locales that
> are used by any of the regular developers. I think it would be useful
> to have buildfarm members testing in a few common locales.

This called for an extensive test ... :-)

My glibc installation supplies 668 locales (locale -a), which appear to
represent about 225 distinct language/country combinations. (The rest are
encoding variants.)

I ran the regression tests with all of them, and got 95 failures (out of 668).

15 out of the 95 failures are initdb not completing because the encoding
specified by the locale is not supported by PostgreSQL. But it appears that
at least xx_XX.utf8 works for each of these cases, so the language is
supported in some way.

The remaining 80 failures are more-or-less linguistic issues that belong to
the following 26 language/country combinations:

az_AZ sorts k < q < l; Turkish i
br_FR sorts ch separately
crh_UA Turkish i
cs_CZ sorts ch separately; sorts st = s
cy_GB sorts ch separately
da_DK sorts aa = å > z
es_EC sorts ch separately
es_US sorts ch separately
et_EE sorts v = w
fo_FO sorts aa = å > z
ha_NG sorts sh separately
hsb_DE sorts ch separately
ig_NG sorts ch separately; sorts sh separately
ik_CA sorts ch separately
kl_GL sorts aa = å > z
nb_NO sorts aa = å > z
nn_NO sorts aa = å > z
om_ET sorts ch separately (> z); sorts sh separately
om_KE sorts ch separately (> z); sorts sh separately
pl_PL (some other inexplicable sorting regression)
sk_SK sorts ch separately; sorts st = s
sv_SE sorts v = w
tk_TM sorts v = w
tr_CY Turkish i
tr_TR Turkish i
tt_RU sorts k < q < l

The "Turkish i" failures are in the tsearch tests. I'm not completely
comfortable that it's doing the right thing there.

We could easily get rid of the aa, ch, and v/w failures by adjusting the test
data, since the data is completely coincidental anyway. I propose to do
that, and document these issues so that they can be avoided in future tests.

I'm not so worried about the other cases.

Also, considering that some of these alternative sorting rules appear to be
controversial even among users of the language (e.g., we have had actual bug
reports that the es_EC rule is wrong, and the sv_SE rule is also obsolete
according to the language regulators), it might be interesting to write a
small test program that can tell users how their current locale behaves in
known corner cases.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 16:46:30
Message-ID: 21941.1231692390@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> This called for an extensive test ... :-)

> My glibc installation supplies 668 locales (locale -a), which appear to
> represent about 225 distinct language/country combinations. (The rest are
> encoding variants.)

> I ran the regression tests with all of them, and got 95 failures (out of 668).

Fascinating data. I assume you did not remove the existing
locale-variant expected files? IOW this isn't "all the locale
dependencies", but "all the ones we didn't fix previously"?

> We could easily get rid of the aa, ch, and v/w failures by adjusting the test
> data, since the data is completely coincidental anyway. I propose to do
> that, and document these issues so that they can be avoided in future tests.

I have no confidence in the ability of some documentation to keep the
tests clean. However, if we had buildfarm members testing in locales
that exercise each of those cases, it'd be all right.

If we try to fix those cases I think we should try to fix Turkish i
as well ... but I concur that first requires determining if it's
behaving wrong or not. Devrim, or someone?

> Also, considering that some of these alternative sorting rules appear to be
> controversial even among users of the language (e.g., we have had actual bug
> reports that the es_EC rule is wrong, and the sv_SE rule is also obsolete
> according to the language regulators), it might be interesting to write a
> small test program that can tell users how their current locale behaves in
> known corner cases.

Considering the number of people who complain about en_US (expecting C
sort order instead), I'm not sure you should consider this a corner
case.

regards, tom lane


From: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 19:45:00
Message-ID: 1231703100.3285.12.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-01-11 at 12:54 +0200, Peter Eisentraut wrote:
> The "Turkish i" failures are in the tsearch tests. I'm not completely
> comfortable that it's doing the right thing there.

AFAIK, ISO-8859-9 is broken in a way, and the Turkish maintainers are
not interested in fixing them -- they ask us to move to tr_TR.UTF-8.
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org


From: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-11 20:01:57
Message-ID: 1231704117.3285.15.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Sun, 2009-01-11 at 11:46 -0500, Tom Lane wrote:

> If we try to fix those cases I think we should try to fix Turkish i
> as well ... but I concur that first requires determining if it's
> behaving wrong or not. Devrim, or someone?

What exactly do you want to see?

Regards,
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-12 10:06:23
Message-ID: 496B161F.9080206@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Devrim GÜNDÜZ wrote:
> On Sun, 2009-01-11 at 11:46 -0500, Tom Lane wrote:
>
>> If we try to fix those cases I think we should try to fix Turkish i
>> as well ... but I concur that first requires determining if it's
>> behaving wrong or not. Devrim, or someone?
>
> What exactly do you want to see?

Using a glibc system, initdb with --locale=tr_TR (or tr_TR.utf8 or
whatever) and run make installcheck. You should see test failures in
the tsearch and tsdicts tests that appear to relate to issues with
lowercasing the "I" letter correctly. And then use your language skills
to determine what the correct behavior is. ;-)

Note that on Mac OS X with tr_TR locales, the tests do not fail.

I actually suspect that both current answers are wrong.


From: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-12 10:40:16
Message-ID: 1231756816.4331.21.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi,

On Mon, 2009-01-12 at 12:06 +0200, Peter Eisentraut wrote:
> Using a glibc system, initdb with --locale=tr_TR (or tr_TR.utf8 or
> whatever) and run make installcheck. You should see test failures in
> the tsearch and tsdicts tests that appear to relate to issues with
> lowercasing the "I" letter correctly.

Yep, I ran them already, and as you wrote, I'm getting 3 errors (tsearch
tests + foreign_data test).

> And then use your language skills to determine what the correct
> behavior is. ;-)

SKIES would be skıes (dotless i).

Here is the conversion table:

I (capital) <-> ı
İ (capital <-> i

We also have a few more chars, but I did not test them yet:

ş <-> Ş (capital) (S with a tail)
ü <-> Ü (capital) (U with dots)
ç <-> Ç (capital) (C with a tail)
ğ <-> Ğ (capital) (G with a hat)
ö <-> Ö (capital) (O with dots)

Regards,
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)
Date: 2009-01-19 15:03:33
Message-ID: 49749645.5070801@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Devrim GÜNDÜZ wrote:
> Yep, I ran them already, and as you wrote, I'm getting 3 errors (tsearch
> tests + foreign_data test).
>
>> And then use your language skills to determine what the correct
>> behavior is. ;-)
>
> SKIES would be skıes (dotless i).
>
> Here is the conversion table:
>
> I (capital) <-> ı
> İ (capital <-> i

I think the test show that there is a bug in the tsearch support for
Turkish. Here is the test diff:

--- expected/tsearch.out 2008-10-18 12:56:29.000000000 +0300
+++ results/tsearch.out 2009-01-19 16:26:51.000000000 +0200
@@ -962,38 +962,38 @@
SELECT to_tsvector('SKIES My booKs');
to_tsvector
----------------------------
- 'books':3 'my':2 'skies':1
+ 'books':3 'my':2 'skIes':1
(1 row)
[and more of the same]

This is not correct under either Turkish or non-Turkish language rules.

Note that

postgres=# select lower('SKIES');
lower
-------
skıes
(1 row)


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)
Date: 2009-01-19 17:45:16
Message-ID: 4974BC2C.6010806@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> I think the test show that there is a bug in the tsearch support for
> Turkish. Here is the test diff:
How to reproduce that?

% psql -l
List of databases
Name | Owner | Encoding | Collation | Ctype | Access privileges
------------+--------+----------+-------------+-------------+-------------------
postgres | pgsql | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 |
regression | teodor | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 |

% ./pg_regress --inputdir=. --dlpath=. --multibyte=UTF8 --load-language=plpgsql
--top-builddir=../../.. --schedule=./parallel_schedule
...
=======================
All 120 tests passed.
=======================

--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)
Date: 2009-01-19 18:20:26
Message-ID: 1232389226.3331.113.camel@laptop.gunduz.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2009-01-19 at 20:45 +0300, Teodor Sigaev wrote:
> How to reproduce that?

-bash-3.2$ psql -l
List of databases
Name | Owner | Encoding | Collation | Ctype | Access Privileges
------------+----------+----------+-------------+-------------+-------------------------------------
postgres | postgres | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 |
regression | postgres | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 |
template0 | postgres | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 | {=c/postgres,postgres=CTc/postgres}
template1 | postgres | UTF8 | tr_TR.UTF-8 | tr_TR.UTF-8 | {=c/postgres,postgres=CTc/postgres}
(4 rows)

-bash-3.2$ ./pg_regress --inputdir=. --dlpath=. --multibyte=UTF8 --load-language=plpgsql --top-builddir=../../.. --schedule=./parallel_schedule
(using postmaster on Unix socket, default port)

<snip>
timestamp ... FAILED
timestamptz ... FAILED
<snip>
tsearch ... FAILED
tsdicts ... FAILED
foreign_data ... FAILED

========================
5 of 120 tests failed.
========================

This is on a Fedora-9 x86 box, and:

-bash-3.2$ rpm -qv glibc
glibc-2.8-8.i686

Regards,
--
Devrim GÜNDÜZ, RHCE
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
http://www.gunduz.org


From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)
Date: 2009-01-19 18:45:32
Message-ID: 4974CA4C.2030904@sigaev.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

> ========================
> 5 of 120 tests failed.
> ========================
>
> This is on a Fedora-9 x86 box, and:
>
> -bash-3.2$ rpm -qv glibc
> glibc-2.8-8.i686

Interesting. On my notebook all is ok.
% uname -a
FreeBSD ... 7.1-RELEASE-p2 FreeBSD 7.1-RELEASE-p2

Is any possibility of broken locale?
--
Teodor Sigaev E-mail: teodor(at)sigaev(dot)ru
WWW: http://www.sigaev.ru/


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-19 20:13:34
Message-ID: 1232396014.1406.7.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Andrew Dunstan píše v pá 09. 01. 2009 v 12:16 -0500:
>
> Guillaume Smet wrote:
> > On Fri, Jan 9, 2009 at 5:24 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> >
> >> However, the
> >> de facto policy is that we try to keep them passing in locales that
> >> are used by any of the regular developers. I think it would be useful
> >> to have buildfarm members testing in a few common locales.
> >>
> >
> > If you define common locales, I can set up as many new animals as
> > needed to cover the locales needed for any branch we'd like to test.
> >
> > Perhaps we should add a parameter to the buildfarm config file so that
> > the buildfarm script can check the locale is accepted and set it
> > directly. Considering that we won't have the locale information in the
> > animal description, it's a good way to have it in the report.
> >
> >
> >
>
> Sure, we can easily have buildfarm's initdb step set any locale (and
> encoding, for that matter) we like. That's a simple change.

Will be possible to set more locales and run tests without recompilation
on all of them? For example I have installed all Solaris'es locales on
my animal, but currently it means that I need perform whole cycle for
each locale.

Zdenek


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-19 20:39:30
Message-ID: 1232397570.1406.29.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Peter Eisentraut píše v ne 11. 01. 2009 v 12:54 +0200:

> The remaining 80 failures are more-or-less linguistic issues that belong to
> the following 26 language/country combinations:
>

> cs_CZ sorts ch separately; sorts st = s

s < st

Zdenek


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-20 05:44:59
Message-ID: 200901200745.00404.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Monday 19 January 2009 22:39:30 Zdenek Kotala wrote:
> Peter Eisentraut píše v ne 11. 01. 2009 v 12:54 +0200:
> > The remaining 80 failures are more-or-less linguistic issues that belong
> > to the following 26 language/country combinations:
> >
> >
> > cs_CZ sorts ch separately; sorts st = s
>
> s < st

I had initially misinterpreted the failures. The real difference is that
Czech sorts numbers after letters, most other locales do it the other way
around.


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Cc: Devrim GÜNDÜZ <devrim(at)gunduz(dot)org>, pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: tsearch with Turkish locale ( was Re: foreign_data test fails with non-C locale)
Date: 2009-01-20 07:25:44
Message-ID: 49757C78.3090909@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Teodor Sigaev wrote:
>> ========================
>> 5 of 120 tests failed. ========================
>>
>> This is on a Fedora-9 x86 box, and:
>>
>> -bash-3.2$ rpm -qv glibc
>> glibc-2.8-8.i686
>
> Interesting. On my notebook all is ok.
> % uname -a
> FreeBSD ... 7.1-RELEASE-p2 FreeBSD 7.1-RELEASE-p2
>
> Is any possibility of broken locale?

Assuming that the locales on FreeBSD are the same or closely related to
the ones on Mac OS X, I would rather say that the BSD locales are
broken, because they don't actually support the Turkish case conversion
rules:

regression=# show lc_ctype;
lc_ctype
-------------
tr_TR.utf-8
(1 row)

regression=# select lower('SKIES');
lower
-------
skies
(1 row)

regression=# select upper('skies');
upper
-------
SKIES
(1 row)

Thus, the problem that the glibc locales appear to expose is masked here.


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-24 04:57:03
Message-ID: 497A9F9F.6030600@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Andrew Dunstan píše v pá 09. 01. 2009 v 12:16 -0500:
>
>
>> Sure, we can easily have buildfarm's initdb step set any locale (and
>> encoding, for that matter) we like. That's a simple change.
>>
>
> Will be possible to set more locales and run tests without recompilation
> on all of them? For example I have installed all Solaris'es locales on
> my animal, but currently it means that I need perform whole cycle for
> each locale.
>

I'm working on this. Yes, you will be able to specify a list of locales
to check. For each locale the following tests will be run:
installcheck, pl-installcheck, and contrib-installcheck.

However, our tests are still a bit short of working across locales.

PL-check gives the diff below on PLTCL tests under en_US locale. I guess
the simplest answer is to add an alternative result file.

cheers

andrew

select * from T_pkey1 order by key1 using @<, key2;
key1 | key2 | txt
------+----------------------+------------------------------------------
- 1 | KEY1-3 | should work
1 | key1-1 | test key
1 | key1-2 | test key
1 | key1-3 | test key
2 | key2-3 | test key
2 | key2-9 | test key
(6 rows)
--- 166,175 ----
select * from T_pkey1 order by key1 using @<, key2;
key1 | key2 | txt
------+----------------------+------------------------------------------
1 | key1-1 | test key
1 | key1-2 | test key
1 | key1-3 | test key
+ 1 | KEY1-3 | should work
2 | key2-3 | test key
2 | key2-9 | test key
(6 rows)


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-24 11:18:57
Message-ID: 1232795937.1385.7.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Andrew Dunstan píše v pá 23. 01. 2009 v 23:57 -0500:
>
> Zdenek Kotala wrote:
> > Andrew Dunstan píše v pá 09. 01. 2009 v 12:16 -0500:
> >
> >
> >> Sure, we can easily have buildfarm's initdb step set any locale (and
> >> encoding, for that matter) we like. That's a simple change.
> >>
> >
> > Will be possible to set more locales and run tests without recompilation
> > on all of them? For example I have installed all Solaris'es locales on
> > my animal, but currently it means that I need perform whole cycle for
> > each locale.
> >
>
> I'm working on this. Yes, you will be able to specify a list of locales
> to check. For each locale the following tests will be run:
> installcheck, pl-installcheck, and contrib-installcheck.

thanks

> However, our tests are still a bit short of working across locales.

Yes, they are. Peter cleaned up some of them, but there are still open
issues. And MacOS has broken locale which is different problem.

> PL-check gives the diff below on PLTCL tests under en_US locale. I guess
> the simplest answer is to add an alternative result file.

Yes, I thought about add locale suffix for alternative result file, but
it could be useless overhead.

But some tests can be modified. For example

select * from T_pkey1 order by key1 using @<, key2;

can be rewritten as

select * from T_pkey1 order by key1 using @<, key2::name;

Zdenek


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-26 16:09:01
Message-ID: 497DE01D.4070907@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
> Andrew Dunstan píše v pá 23. 01. 2009 v 23:57 -0500:
>
>> Zdenek Kotala wrote:
>>
>>> Andrew Dunstan píše v pá 09. 01. 2009 v 12:16 -0500:
>>>
>>>
>>>
>>>> Sure, we can easily have buildfarm's initdb step set any locale (and
>>>> encoding, for that matter) we like. That's a simple change.
>>>>
>>>>
>>> Will be possible to set more locales and run tests without recompilation
>>> on all of them? For example I have installed all Solaris'es locales on
>>> my animal, but currently it means that I need perform whole cycle for
>>> each locale.
>>>
>>>
>> I'm working on this. Yes, you will be able to specify a list of locales
>> to check. For each locale the following tests will be run:
>> installcheck, pl-installcheck, and contrib-installcheck.
>>
>
> thanks
>
>
>

Example run with locales C, en_US.utf8 and french:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=dungbeetle&dt=2009-01-26%2012:44:01

cheers

andrew


From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-01-31 22:08:15
Message-ID: 4984CBCF.1040703@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala wrote:
>> PL-check gives the diff below on PLTCL tests under en_US locale. I guess
>> the simplest answer is to add an alternative result file.
>>
>
> Yes, I thought about add locale suffix for alternative result file, but
> it could be useless overhead.
>
> But some tests can be modified. For example
>
> select * from T_pkey1 order by key1 using @<, key2;
>
> can be rewritten as
>
> select * from T_pkey1 order by key1 using @<, key2::name;
>
>
>
>

Is that the preferred solution? I want to fix this so I can re-enable
building with TCL in dungbeetle.

cheers

andrew


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Guillaume Smet <guillaume(dot)smet(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: foreign_data test fails with non-C locale
Date: 2009-02-02 07:51:36
Message-ID: 1233561096.1367.1.camel@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


Andrew Dunstan píše v so 31. 01. 2009 v 17:08 -0500:
>
> Zdenek Kotala wrote:
> >> PL-check gives the diff below on PLTCL tests under en_US locale. I guess
> >> the simplest answer is to add an alternative result file.
> >>
> >
> > Yes, I thought about add locale suffix for alternative result file, but
> > it could be useless overhead.
> >
> > But some tests can be modified. For example
> >
> > select * from T_pkey1 order by key1 using @<, key2;
> >
> > can be rewritten as
> >
> > select * from T_pkey1 order by key1 using @<, key2::name;
> >
> >
> >
> >
>
> Is that the preferred solution? I want to fix this so I can re-enable
> building with TCL in dungbeetle.

Probably not in all cases.

Zdenek