Lists: | pgsql-hackers |
---|
From: | Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | upper()/lower() truncates the result under Japanese Windows |
Date: | 2008-12-14 10:22:02 |
Message-ID: | 4944DE4A.8050001@tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hi,
Upper(), lower() or initcap() function truncates the result
under Japanese Windows with e.g. the server encoding=UTF-8
and the LC_CTYPE setting Japanese_japan.932 .
Below is an example.
$ psql
psql (8.4devel)
Type "help" for help.
inoue=# \encoding sjis
inoue=# show server_encoding;
server_encoding
-----------------
UTF8
(1 行)
inoue=# show LC_CTYPE;
lc_ctype
--------------------
Japanese_Japan.932
(1 行)
inoue=# \set jpnstr '''カタカナ'''
inoue=# select char_length(:jpnstr);
char_length
-------------
4
(1 行)
inoue=# select upper(:jpnstr);
upper
--------
カタカ
(1 行)
inoue=# select char_length(upper(:jpnstr));
char_length
-------------
3
(1 行)
The output of the last command should be 4 not 3.
Attached is a patch to fix the bug.
After applying the patch the result is
inoue=# select upper(:jpnstr);
upper
----------
カタカナ
(1 行)
inoue=# select char_length(upper(:jpnstr));
char_length
-------------
4
(1 行)
regards,
Hiroshi Inoue
Attachment | Content-Type | Size |
---|---|---|
formatting.patch | text/plain | 3.4 KB |
From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: upper()/lower() truncates the result under Japanese Windows |
Date: | 2008-12-14 16:59:35 |
Message-ID: | 25758.1229273975@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
> Upper(), lower() or initcap() function truncates the result
> under Japanese Windows with e.g. the server encoding=UTF-8
> and the LC_CTYPE setting Japanese_japan.932 .
Hmm, I guess that makes sense, since the LC_CTYPE implies an encoding
other than UTF-8; MB_CUR_MAX should be set according to LC_CTYPE.
The proposed patch seems pretty ugly though. Why don't we just stop
using MB_CUR_MAX altogether? These three functions are the only
references to it AFAICS.
regards, tom lane
From: | Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: upper()/lower() truncates the result under Japanese Windows |
Date: | 2008-12-15 22:19:30 |
Message-ID: | 4946D7F2.3070908@tpf.co.jp |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
>> Upper(), lower() or initcap() function truncates the result
>> under Japanese Windows with e.g. the server encoding=UTF-8
>> and the LC_CTYPE setting Japanese_japan.932 .
>
> Hmm, I guess that makes sense, since the LC_CTYPE implies an encoding
> other than UTF-8; MB_CUR_MAX should be set according to LC_CTYPE.
>
> The proposed patch seems pretty ugly though. Why don't we just stop
> using MB_CUR_MAX altogether? These three functions are the only
> references to it AFAICS.
Although it looks ugly, it only follows what wchar2char() does.
Though I don't like to use MB_CUR_MAX, it seems safe as long as
wchar2char() calls wcstombs().
regards,
Hiroshi Inoue