initdb of regression test failed.

Lists: pgsql-patches
From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: initdb of regression test failed.
Date: 2007-10-02 17:09:36
Message-ID: 05be01c80517$02657790$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi Tom-san.

initdb does not operate by the mismatch of LOCALE.

-
Running in noclean mode. Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: could not find suitable encoding for locale "Japanese_Japan.932"^M
Rerun initdb with the -E option.^M
Try "initdb --help" for more information.^M
Running in noclean mode. Mistakes will not be cleaned up.^M
-

I think this is required....
Did I miss something?

Regards,
Hiroshi Saito

Attachment Content-Type Size
initdb_patch application/octet-stream 829 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: initdb of regression test failed.
Date: 2007-10-02 17:30:05
Message-ID: 16776.1191346205@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> writes:
> The database cluster will be initialized with locale Japanese_Japan.932.
> initdb: could not find suitable encoding for locale "Japanese_Japan.932"

So, what encoding *should* we use for that locale?

> I think this is required....

We are certainly not going to disable pg_regress's ability to test in
non-C locales. ISTM a proper fix is an addition to the table in
src/port/chklocale.c. This example suggests actually that we need
a boatload more table entries to handle Windows locale names :-(
(count on Microsoft to ignore standards...)

regards, tom lane


From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 02:37:27
Message-ID: 083a01c80566$566a76c0$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi.

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> writes:
>> The database cluster will be initialized with locale Japanese_Japan.932.
>> initdb: could not find suitable encoding for locale "Japanese_Japan.932"
>
> So, what encoding *should* we use for that locale?
>
>> I think this is required....
>
> We are certainly not going to disable pg_regress's ability to test in
> non-C locales. ISTM a proper fix is an addition to the table in
> src/port/chklocale.c. This example suggests actually that we need
> a boatload more table entries to handle Windows locale names :-(
> (count on Microsoft to ignore standards...)

Ah Ok, Please check it.

However, This problem....
-
Running in noclean mode. Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
Encoding SJIS is not allowed as a server-side encoding.^M
Rerun initdb with a different locale selection.^M
Running in noclean mode. Mistakes will not be cleaned up.^M
-
I think that the check of this server side is the right action.!
I desire the further suggestion....

Regards,
Hiroshi Saito

Attachment Content-Type Size
chklocale_patch application/octet-stream 1.3 KB

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 03:11:04
Message-ID: 20071003115649.261B.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> wrote:

> The database cluster will be initialized with locale Japanese_Japan.932.
> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS
> Encoding SJIS is not allowed as a server-side encoding.
> -
> I think that the check of this server side is the right action.!
> I desire the further suggestion....

How about changing initdb to use encoding=UTF-8 and no-locale when the
encoding of default locale is not suppoted in the server? I think it is
the most frequently used combination when we cannot use the default
encoding in server.

The present initdb without options always fails in such environments.
Using UTF-8 with no-locale is better than error.
(Error is better than using wrong locale, though.)

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 03:26:04
Message-ID: 08be01c8056d$21262020$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi.

From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>

>
> "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> wrote:
>
>> The database cluster will be initialized with locale Japanese_Japan.932.
>> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS
>> Encoding SJIS is not allowed as a server-side encoding.
>> -
>> I think that the check of this server side is the right action.!
>> I desire the further suggestion....
>
> How about changing initdb to use encoding=UTF-8 and no-locale when the
> encoding of default locale is not suppoted in the server? I think it is
> the most frequently used combination when we cannot use the default
> encoding in server.

Yeah, as for Japanese, your suggestion at least is right...I think.
However, how is it in other countries? I worry about it...

>
> The present initdb without options always fails in such environments.
> Using UTF-8 with no-locale is better than error.
> (Error is better than using wrong locale, though.)

Is a method specified and isn't it avoided by the document, rather than
ad-hoc management?

Regards,
Hiroshi Saito


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 08:28:28
Message-ID: 20071003171253.2626.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> wrote:

> Ah Ok, Please check it.

Your patch looks useful to prevent mismatch of encoding and locale on Windows,
but I found there is a limitation that user will not able to specify locale.
I added an alternative of nl_langinfo(CODESET) for Win32.

Please check following commands:
initdb --encoding=EUC_jp --locale=Japanese_Japan.932
vs.
initdb --encoding=EUC_jp --locale=Japanese_Japan.20932

One problem is that user need to know codepage numbers. It might
be possible to replace the default codepage to server encodings
automatically if we have a mapping table from encoding to codepage.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
chklocale-v2.patch application/octet-stream 4.3 KB

From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 17:15:49
Message-ID: 05c901c805e1$0c1878d0$0b01a8c0@yourc3ftrhkaod
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi.

----- Original Message -----
From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
>
> "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> wrote:
>
>> Ah Ok, Please check it.
>
> Your patch looks useful to prevent mismatch of encoding and locale on Windows,
> but I found there is a limitation that user will not able to specify locale.
> I added an alternative of nl_langinfo(CODESET) for Win32.
>
> Please check following commands:
> initdb --encoding=EUC_jp --locale=Japanese_Japan.932
> vs.
> initdb --encoding=EUC_jp --locale=Japanese_Japan.20932
>
>
> One problem is that user need to know codepage numbers. It might
> be possible to replace the default codepage to server encodings
> automatically if we have a mapping table from encoding to codepage.

Yes, I think your approach looks very good. Then, It seems that it is necessary
to consider an original initial value problem again. I consider a document publication
or management. Anyway, Thanks.

Regards,
Hiroshi Saito


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: initdb of regression test failed.
Date: 2007-10-03 17:23:09
Message-ID: 26876.1191432189@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> Your patch looks useful to prevent mismatch of encoding and locale on Windows,
> but I found there is a limitation that user will not able to specify locale.
> I added an alternative of nl_langinfo(CODESET) for Win32.

Applied with small correction --- it looked like you'd put in the wrong
PG_ENC code for GBK and BIG5. Not terribly important since we'd reject
them anyway, but we might as well reject with the correct error message.

This still leaves the policy decision of whether we want to have
initdb assume "-E UTF8 --no-locale" if it sees the current locale
has an unusable encoding. I'm not really happy with that idea
because it would disable localization of messages. I think what we
want, at least on Windows, is to switch to the "corresponding" locale
that uses UTF8. Is there a simple way to do that? Or at least some
simple recipe we can put into the documentation? "If you get this
sort of error, use this --locale setting..."

regards, tom lane


From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 02:01:41
Message-ID: 011f01c8062a$81efd230$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi.

regression test surely goes wrong.!

hedule --multibyte=SQL_ASCII --load-language=plpgsql
============== creating temporary installation ==============
============== initializing database system ==============

pg_regress: initdb failed
Examine ./log/initdb.log for the reason.
Command was:
""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb"
-D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L
"C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share" --noclean
> "./log/initdb.log" 2>&1"
make[2]: *** [check] Error 2
make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress'
make[1]: *** [check] Error 2
make[1]: Leaving directory `/home/hiroshi/pgsql/src/test'
make: *** [check] Error 2

-initdb.log-
Running in noclean mode. Mistakes will not be cleaned up.^M
The files belonging to this database system will be owned by user "hiroshi".^M
This user must also own the server process.^M
^M
The database cluster will be initialized with locale Japanese_Japan.932.^M
initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
Encoding SJIS is not allowed as a server-side encoding.^M
Rerun initdb with a different locale selection.^M
Running in noclean mode. Mistakes will not be cleaned up.^M
-

after the patch..

============== shutting down postmaster ==============
server stopped

=======================
All 112 tests passed.
=======================

Anyway, It surely fails now.:-(

Regards,
Hiroshi Saito

Attachment Content-Type Size
pg_regress_patch application/octet-stream 606 bytes

From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 02:28:57
Message-ID: 016001c8062e$51082650$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Oops, patch of pg_regress.c should be disregarded.
Sorry, I think this is desirable.

> Hi.
>
> regression test surely goes wrong.!
>
> hedule --multibyte=SQL_ASCII --load-language=plpgsql
> ============== creating temporary installation ==============
> ============== initializing database system ==============
>
> pg_regress: initdb failed
> Examine ./log/initdb.log for the reason.
> Command was:
> ""C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/bin/initdb"
> -D "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/data" -L
> "C:/MinGW/home/hiroshi/pgsql/src/test/regress/./tmp_check/install//usr/local/pgsql/share"
> --noclean
> > "./log/initdb.log" 2>&1"
> make[2]: *** [check] Error 2
> make[2]: Leaving directory `/home/hiroshi/pgsql/src/test/regress'
> make[1]: *** [check] Error 2
> make[1]: Leaving directory `/home/hiroshi/pgsql/src/test'
> make: *** [check] Error 2
>
> -initdb.log-
> Running in noclean mode. Mistakes will not be cleaned up.^M
> The files belonging to this database system will be owned by user "hiroshi".^M
> This user must also own the server process.^M
> ^M
> The database cluster will be initialized with locale Japanese_Japan.932.^M
> initdb: locale Japanese_Japan.932 requires unsupported encoding SJIS^M
> Encoding SJIS is not allowed as a server-side encoding.^M
> Rerun initdb with a different locale selection.^M
> Running in noclean mode. Mistakes will not be cleaned up.^M
> -
>
> after the patch..
>
> ============== shutting down postmaster ==============
> server stopped
>
> =======================
> All 112 tests passed.
> =======================
>
> Anyway, It surely fails now.:-(
>
> Regards,
> Hiroshi Saito
>

Attachment Content-Type Size
regress_patch application/octet-stream 385 bytes

From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 02:43:29
Message-ID: 20071004110719.BD0A.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


"Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp> wrote:

> regression test surely goes wrong.!

This fix does nothing against the regression failure.

It is probably reasonable to choose UTF-8 as a server encoding when we cannot
support the encoding of the current locale. A remaining issue is which we
should use no-locale, locale of another encoding, or reporting error then.

At least on Windows, locale of another encoding works correctly because
we've already had some Windows-specific hacks. (try grep MultiByteToWideChar)
In fact, we can accept options like:
initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature

I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
we don't support the locale encoding on Windows, i.e. locale is always
enabled on regression tests.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: <pgsql-patches(at)postgresql(dot)org>
Cc: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 04:22:34
Message-ID: 20071004130126.BD13.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


I wrote:
> I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
> we don't support the locale encoding on Windows, i.e. locale is always
> enabled on regression tests.

Here is a patch to do it on Windows.
1. Use UTF-8 if the locale encoding is not available for server.
2. Allow mismatch between server and locale encodings if the server
encoding is UTF-8.

I succeeded to run regression test on Japanese version of Windows
with the patch, but please test it on other language versions.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
utf8_win32.patch application/octet-stream 2.6 KB

From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 05:43:51
Message-ID: 037e01c80649$8b5916f0$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi.

Um, I thinks the examination material of 8.4 by the reason for changing
the feature. Of course, your proposal can be considered to obtain one
solution. Then, discussion is required more.
I feel that it is dangerous for 8.3....

Regards,
Hiroshi Saito

>
> I wrote:
>> I'll suggest to use UTF8 if the encoding is UTF-8 or NOT specified and
>> we don't support the locale encoding on Windows, i.e. locale is always
>> enabled on regression tests.
>
> Here is a patch to do it on Windows.
> 1. Use UTF-8 if the locale encoding is not available for server.
> 2. Allow mismatch between server and locale encodings if the server
> encoding is UTF-8.
>
> I succeeded to run regression test on Japanese version of Windows
> with the patch, but please test it on other language versions.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center
>


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 05:44:35
Message-ID: 9131.1191476675@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> In fact, we can accept options like:
> initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature

Hmm, but does that really work safely? I think varstr_cmp() does work,
because it forces our data into wchar format and then calls wcscoll().
The thing that scares me is that various random other operating-system
calls might deliver strings in an unexpected encoding. We've been
through similar problems with timezone names reported by strftime, for
example.

regards, tom lane


From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 07:02:11
Message-ID: 040301c80654$7c8faf70$c601a8c0@HP22720319231
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Hi Tom-san.

This may be mere information...

In 8.3, when it has different encoding for every database, a locale requires C.
Therefore, I am the reason which desires C by regression test.

--
in>initdb -E EUC_JP -D../data --locale=Japanese_Japan.20932

The files belonging to this database system will be owned by user "hiroshi".
This user must also own the server process.

The database cluster will be initialized with locale Japanese_Japan.20932.
initdb: could not find suitable text search configuration for locale "Japanese_J
apan.20932"
The default text search configuration will be set to "simple".

creating directory ../data ... ok
creating subdirectories ... ok
selecting default max_connections ... 100
selecting default shared_buffers/max_fsm_pages ... 32MB/204800
creating configuration files ... ok
creating template1 database in ../data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling "trust" authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

--
in>psql template1
Welcome to psql 8.3devel, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit

template1=# \l
List of databases
Name | Owner | Encoding
-----------+---------+----------
postgres | hiroshi | EUC_JP
template0 | hiroshi | EUC_JP
template1 | hiroshi | EUC_JP
(3 rows)

template1=# create database hiroshi;
CREATE DATABASE
template1=# \l
List of databases
Name | Owner | Encoding
-----------+---------+----------
hiroshi | hiroshi | EUC_JP
postgres | hiroshi | EUC_JP
template0 | hiroshi | EUC_JP
template1 | hiroshi | EUC_JP
(4 rows)

template1=# show LC_CTYPE;
lc_ctype
----------------------
Japanese_Japan.20932
(1 row)

template1=# create database utfdb encoding='UTF8';
ERROR: encoding UTF8 does not match server's locale Japanese_Japan.20932
DETAIL: The server's LC_CTYPE setting requires encoding EUC_JP.
template1=#


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org, "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 10:11:27
Message-ID: 20071004171712.BD1E.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> > initdb -E UTF8 --locale=Japanese_Japan.932 -- CP932 is SJIS in nature
>
> Hmm, but does that really work safely? I think varstr_cmp() does work,
> because it forces our data into wchar format and then calls wcscoll().
> The thing that scares me is that various random other operating-system
> calls might deliver strings in an unexpected encoding. We've been
> through similar problems with timezone names reported by strftime, for
> example.

Hmm, I see we might need to replace all locale-aware functions to
wchar_t versions, for example, wcsftime instead of strftime.
It requires more tests. It should be saved for 8.4.

The attached is the second plan. It uses UTF-8 and locale=C when
the default locale encoding is not supported and none of encoding and
locale are passed to initdb. It would help users who use the default
settings (including regression test).

At the moment, it reset all of lc_* variables, but it might be possible
use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time
even if lc_collate and lc_ctype are reset to C.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
utf8-nolocale-on-failure.patch application/octet-stream 3.2 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
Subject: Re: initdb of regression test failed.
Date: 2007-10-04 19:26:53
Message-ID: 5409.1191526013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> writes:
> The attached is the second plan. It uses UTF-8 and locale=C when
> the default locale encoding is not supported and none of encoding and
> locale are passed to initdb. It would help users who use the default
> settings (including regression test).

I'm not very happy with this proposal, because for people who don't
actually care about non-ASCII data (which is still a lot of people),
forcing UTF-8 as the default encoding will impose pretty substantial
overhead compared to SQL_ASCII --- it turns on all those
multibyte-encoding checks.

Implicitly selecting --no-locale doesn't seem like a big step forward
either, since then you've just given up whatever you might have learned
from the locale setting. Besides, if that's the behavior the user
wants, he can specify it.

I still think that what we should try to do in the default case is find
a locale that is the same language but UTF-8 encoding.

> At the moment, it reset all of lc_* variables, but it might be possible
> use the default locale at lc_messages, lc_monetary, lc_numeric and lc_time
> even if lc_collate and lc_ctype are reset to C.

Well, that just leaves me wondering what encoding the localized messages
would be presented in ...

regards, tom lane