Re: trouble with to_char('L')

Lists: pgsql-generalpgsql-hackers
From: Mikko <mhannesy(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: trouble with to_char('L')
Date: 2009-04-20 21:00:07
Message-ID: 40c6d9160904201400n79f19a05w64455ba59428a920@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hi,

my database has UTF8 encoding and Finnish locale, the client_encoding
and the console is set to WIN1252. I created a table with a single
NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT
to_char(money, '999D99L') FROM table' through psql gives the following
error message:

ERROR:  invalid byte sequence for encoding "UTF8": 0x80
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

The graphical Query tool returns a set of empty rows. The query works
ok without the 'L'.

Thanks in advance,
Mikko


From: "Albe Laurenz" <laurenz(dot)albe(at)wien(dot)gv(dot)at>
To: "Mikko *EXTERN*" <mhannesy(at)gmail(dot)com>, <pgsql-general(at)postgresql(dot)org>
Subject: Re: trouble with to_char('L')
Date: 2009-04-21 10:36:44
Message-ID: D960CB61B694CF459DCFB4B0128514C202FF6590@exadv11.host.magwien.gv.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Mikko wrote:
> my database has UTF8 encoding and Finnish locale, the client_encoding
> and the console is set to WIN1252. I created a table with a single
> NUMERIC(5,2) column and inserted a few values. Running a query 'SELECT
> to_char(money, '999D99L') FROM table' through psql gives the following
> error message:
>
> ERROR:  invalid byte sequence for encoding "UTF8": 0x80
> HINT:  This error can also happen if the byte sequence does not match
> the encoding expected by the server, which is controlled by
> "client_encoding".
>
> The graphical Query tool returns a set of empty rows. The query works
> ok without the 'L'.

That is strange.

What is your psql version?

What is the output of the following commands:

SHOW server_version;
SHOW server_encoding;
SHOW client_encoding;
SHOW lc_numeric;
SHOW lc_monetary;
SELECT to_char(3.1415::numeric(5,2), '999D99L');

Yours,
Laurenz Albe


From: Mikko <mhannesy(at)gmail(dot)com>
To: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-21 13:58:21
Message-ID: 40c6d9160904210658y590377cfw6dbbecb53d2b8be0@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

psql (PostgreSQL) 8.3.7

server_version 8.3.7
server_encoding UTF8
client_encoding win1252
lc_numeric Finnish, Finland
lc_monetary Finnish, Finland

testdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L');

ERROR: invalid byte sequence for encoding "UTF8": 0x80
HINT: This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

If connected to postgres database the query returns 3,14.

Mikko


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Mikko <mhannesy(at)gmail(dot)com>
Cc: Albe Laurenz <laurenz(dot)albe(at)wien(dot)gv(dot)at>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-21 17:13:38
Message-ID: 20090421171338.GR10358@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Mikko escribió:
> psql (PostgreSQL) 8.3.7
>
> server_version 8.3.7
> server_encoding UTF8
> client_encoding win1252
> lc_numeric Finnish, Finland
> lc_monetary Finnish, Finland
>
> testdb=# SELECT to_char(3.1415::numeric(5,2), '999D99L');
>
> ERROR: invalid byte sequence for encoding "UTF8": 0x80
> HINT: This error can also happen if the byte sequence does not match
> the encoding expected by the server, which is controlled by
> "client_encoding".

FWIW 0x80 is the Euro symbol in Win1252 according to
http://en.wikipedia.org/wiki/Windows-1252

Maybe the problem here is that the chosen locales are not UTF8. Does it
work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
instead? Those should match the server_encoding.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


From: Mikko <mhannesy(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-21 20:32:18
Message-ID: 40c6d9160904211332i73e44103s375f45fc46af77da@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Maybe the problem here is that the chosen locales are not UTF8.  Does it
> work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
> instead?  Those should match the server_encoding.

alter database testdb set lc_monetary(or numeric) to
'Finnish_Finland.65001' returns:
ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"

However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.

Mikko


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Mikko <mhannesy(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-21 23:13:22
Message-ID: 20090421231322.GV10358@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Mikko escribió:
> On Tue, Apr 21, 2009 at 8:13 PM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Maybe the problem here is that the chosen locales are not UTF8.  Does it
> > work if you set lc_numeric and lc_monetary to "Finnish_Finland.65001"
> > instead?  Those should match the server_encoding.
>
> alter database testdb set lc_monetary(or numeric) to
> 'Finnish_Finland.65001' returns:
> ERROR: invalid value for parameter "lc_monetary": "Finnish_Finland.65001"

Ouch ... I thought that was the way that Windows designated UTF8
locales, but maybe I am wrong.

> However, I noticed that both lc_collate and lc_ctype are set to
> Finnish_Finland.1252 by the installer. Should I have just run initdb
> with --locale fi_FI.UTF8 at the very start? The to_char('L') works
> fine with a database with win1252 encoding.

Hmm, it should have disallowed the creation of an UTF8 database then.
Maybe that part is what is broken here.

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Mikko <mhannesy(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-22 17:06:24
Message-ID: 40c6d9160904221006k202284c8y305123f66beeaacb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera
<alvherre(at)commandprompt(dot)com> wrote:
> Ouch ... I thought that was the way that Windows designated UTF8
> locales, but maybe I am wrong.

Ok, now I found out that Windows doesn't support locales with encoding
using more than two bytes per character and initdb falls back to 1252.

http://msdn.microsoft.com/en-us/library/x99tb11d.aspx

I guess I'll have to manage with win1252 encoded dbs for the moment.
Thanks for the answers!

Mikko


From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Mikko <mhannesy(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-22 17:58:48
Message-ID: 20090422175848.GC10358@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Mikko escribió:
> On Wed, Apr 22, 2009 at 2:13 AM, Alvaro Herrera
> <alvherre(at)commandprompt(dot)com> wrote:
> > Ouch ... I thought that was the way that Windows designated UTF8
> > locales, but maybe I am wrong.
>
> Ok, now I found out that Windows doesn't support locales with encoding
> using more than two bytes per character and initdb falls back to 1252.
>
> http://msdn.microsoft.com/en-us/library/x99tb11d.aspx

Hmm.

Does this imply that we shouldn't allow UTF8 database on Windows at all?

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-22 18:35:22
Message-ID: 8328.1240425322@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> Does this imply that we shouldn't allow UTF8 database on Windows at all?

That would be pretty unfortunate :-(

I think what this suggests is that there probably needs to be some
encoding conversion logic near the places we examine localeconv()
output.

regards, tom lane


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Cc: Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2009-04-22 21:35:28
Message-ID: 49EF8DA0.90008@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>> Does this imply that we shouldn't allow UTF8 database on Windows at all?
>
> That would be pretty unfortunate :-(
>
> I think what this suggests is that there probably needs to be some
> encoding conversion logic near the places we examine localeconv()
> output.

Attached is a patch to the current CVS.
It uses a similar way like LC_TIME stuff does.

regards,
Hiroshi Inoue

Attachment Content-Type Size
pg_locale_20090423.patch text/plain 3.1 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2009-05-29 18:16:26
Message-ID: 21710.1243620986@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
> Tom Lane wrote:
>> I think what this suggests is that there probably needs to be some
>> encoding conversion logic near the places we examine localeconv()
>> output.

> Attached is a patch to the current CVS.
> It uses a similar way like LC_TIME stuff does.

I'm not really in a position to test/commit this, since I don't have a
Windows machine. However, since no one else is stepping up to deal with
it, here's a quick review:

* This seems to be assuming that the user has set LC_MONETARY and
LC_NUMERIC the same. What if they're different?

* What if the selected locale corresponds to Unicode (ie UTF16)
encoding?

* #define'ing strdup() to do something rather different from strdup
seems pretty horrid from the standpoint of code readability and
maintainability, especially with nary a comment explaining it.

* Code will dump core on malloc failure.

* Since this code is surely not performance critical, I wouldn't bother
with trying to optimize it; hence drop the special case for all-ASCII.

* Surely we already have a symbol somewhere that can be used in
place of this:
#define MAX_BYTES_PER_CHARACTER 4

regards, tom lane


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2009-06-02 03:22:13
Message-ID: 4A249AE5.9060006@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
>> Tom Lane wrote:
>>> I think what this suggests is that there probably needs to be some
>>> encoding conversion logic near the places we examine localeconv()
>>> output.
>
>> Attached is a patch to the current CVS.
>> It uses a similar way like LC_TIME stuff does.
>
> I'm not really in a position to test/commit this, since I don't have a
> Windows machine. However, since no one else is stepping up to deal with
> it, here's a quick review:

Thanks for the review.
I've forgotten the patch because Japanese doesn't have trouble with
this issue (the currency symbol is ascii \). If this is really
expected to be fixed, I would update the patch according to your
suggestion.

> * This seems to be assuming that the user has set LC_MONETARY and
> LC_NUMERIC the same. What if they're different?

Strictky speaking they should be handled individually.

> * What if the selected locale corresponds to Unicode (ie UTF16)
> encoding?

As far as I tested set_locale(LC_MONETARY, xxx.65001) causes an error.

> * #define'ing strdup() to do something rather different from strdup
> seems pretty horrid from the standpoint of code readability and
> maintainability, especially with nary a comment explaining it.

Maybe using a function instead of strdup() which calls dbstr_win32()
in case of Windows would be better.
BTW grouping and money_grouping seem to be out of encoding conversion.
Are they guaranteed to be null terminated?

> * Code will dump core on malloc failure.

I can take care of it.

> * Since this code is surely not performance critical, I wouldn't bother
> with trying to optimize it; hence drop the special case for all-ASCII.

I can take care of it.
>
> * Surely we already have a symbol somewhere that can be used in
> place of this:
> #define MAX_BYTES_PER_CHARACTER 4

I can't find it.
max(pg_encoding_max_length(encoding), pg_encoding_max_length(PG_UTF8))
may be better.

regards,
Hiroshi Inoue


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2009-06-03 18:06:00
Message-ID: 8122.1244052360@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
> Tom Lane wrote:
>> * This seems to be assuming that the user has set LC_MONETARY and
>> LC_NUMERIC the same. What if they're different?

> Strictky speaking they should be handled individually.

I thought about this some more, and I wonder why you did it like this at
all. The patch claimed to be copying the LC_TIME code, but the LC_TIME
code isn't trying to temporarily change any locale settings. What we
are doing in that code is assuming that the system will give us back
the localized strings in the encoding identified by CP_ACP; so all we
have to do is convert CP_ACP to wide chars and then to UTF8. Can't we
use a similar approach for the output of localeconv?

regards, tom lane


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)commandprompt(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2009-06-03 23:41:41
Message-ID: 4A270A35.3020705@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> writes:
>> Tom Lane wrote:
>>> * This seems to be assuming that the user has set LC_MONETARY and
>>> LC_NUMERIC the same. What if they're different?
>
>> Strictky speaking they should be handled individually.
>
> I thought about this some more, and I wonder why you did it like this at
> all. The patch claimed to be copying the LC_TIME code, but the LC_TIME
> code isn't trying to temporarily change any locale settings.

LC_TIME and LC_CTYPE (on Windows) settings are changed temporarily
in cache_locale_time() in pg_locale.c.

> What we
> are doing in that code is assuming that the system will give us back
> the localized strings in the encoding identified by CP_ACP;

AFAIK it's not right. LC_TIME, LC_MONETARY or LC_NUMERIC related output
is encoded using LC_CTYPE setting.

> so all we
> have to do is convert CP_ACP to wide chars and then to UTF8. Can't we
> use a similar approach for the output of localeconv?

What LC_CTIME code and my patch intend is setting LC_CTYPE to an
appropriate value so that related output is converted correctly.
If we can set LC_CTYPE to xxx_xxx.65001(UTF8), we can eliminate
two steps but it causes an error on Windows.

regards,
HIroshi Inoue


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-25 23:21:58
Message-ID: 201002252321.o1PNLw603637@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Where are we on this issue?

---------------------------------------------------------------------------

Hiroshi Inoue wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> >> Does this imply that we shouldn't allow UTF8 database on Windows at all?
> >
> > That would be pretty unfortunate :-(
> >
> > I think what this suggests is that there probably needs to be some
> > encoding conversion logic near the places we examine localeconv()
> > output.
>
> Attached is a patch to the current CVS.
> It uses a similar way like LC_TIME stuff does.
>
> regards,
> Hiroshi Inoue

> Index: pg_locale.c
> ===================================================================
> RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/pg_locale.c,v
> retrieving revision 1.49
> diff -c -c -r1.49 pg_locale.c
> *** pg_locale.c 1 Apr 2009 09:17:32 -0000 1.49
> --- pg_locale.c 22 Apr 2009 21:08:33 -0000
> ***************
> *** 386,391 ****
> --- 386,449 ----
> free(s->positive_sign);
> }
>
> + #ifdef WIN32
> + #define MAX_BYTES_PER_CHARACTER 4
> + static char *dbstr_win32(bool matchenc, const char *str)
> + {
> + int encoding = GetDatabaseEncoding();
> + bool is_ascii = true;
> + size_t len, ilen, wclen, dstlen;
> + wchar_t *wbuf;
> + char *dst, *ibuf;
> +
> + if (matchenc)
> + return strdup(str);
> + /* Is the str an ascii string ? */
> + for (ibuf = str; *ibuf; ibuf++)
> + {
> + if (!isascii(*ibuf))
> + {
> + is_ascii = false;
> + break;
> + }
> + }
> + /* Simply returns the strdup()ed ascii string */
> + if (is_ascii)
> + return strdup(str);
> +
> + ilen = strlen(str) + 1;
> + wclen = ilen * sizeof(wchar_t);
> + wbuf = (wchar_t *) palloc(wclen);
> + len = mbstowcs(wbuf, str, ilen);
> + if (len == -1)
> + elog(ERROR,
> + "could not convert string to Wide characters:error %lu", GetLastError());
> +
> + dstlen = len * MAX_BYTES_PER_CHARACTER + 1;
> + dst = malloc(dstlen);
> +
> + len = WideCharToMultiByte(CP_UTF8, 0, wbuf, len, dst, dstlen, NULL, NULL);
> + pfree(wbuf);
> + if (len == 0)
> + elog(ERROR,
> + "could not convert string to UTF-8:error %lu", GetLastError());
> +
> + dst[len] = '\0';
> + if (encoding != PG_UTF8)
> + {
> + char *convstr = pg_do_encoding_conversion(dst, len, PG_UTF8, encoding);
> + if (dst != convstr)
> + {
> + strlcpy(dst, convstr, dstlen);
> + pfree(convstr);
> + }
> + }
> +
> + return dst;
> + }
> +
> + #define strdup(str) dbstr_win32(is_encoding_match, str)
> + #endif /* WIN32 */
>
> /*
> * Return the POSIX lconv struct (contains number/money formatting
> ***************
> *** 398,403 ****
> --- 456,466 ----
> struct lconv *extlconv;
> char *save_lc_monetary;
> char *save_lc_numeric;
> + #ifdef WIN32
> + char *save_lc_ctype = NULL;
> + bool lc_ctype_change = false, is_encoding_match;
> + #endif /* WIN32 */
> +
>
> /* Did we do it already? */
> if (CurrentLocaleConvValid)
> ***************
> *** 413,418 ****
> --- 476,492 ----
> if (save_lc_numeric)
> save_lc_numeric = pstrdup(save_lc_numeric);
>
> + #ifdef WIN32
> + save_lc_ctype = setlocale(LC_CTYPE, NULL);
> + if (save_lc_ctype && stricmp(locale_monetary, save_lc_ctype) != 0)
> + {
> + lc_ctype_change = true;
> + save_lc_ctype = pstrdup(save_lc_ctype);
> + setlocale(LC_CTYPE, locale_monetary);
> + }
> + is_encoding_match = (pg_get_encoding_from_locale(locale_monetary) == GetDatabaseEncoding());
> + #endif
> +
> setlocale(LC_MONETARY, locale_monetary);
> setlocale(LC_NUMERIC, locale_numeric);
>
> ***************
> *** 437,442 ****
> --- 511,524 ----
> CurrentLocaleConv.n_sign_posn = extlconv->n_sign_posn;
>
> /* Try to restore internal settings */
> + #ifdef WIN32
> + #undef strdup
> + if (lc_ctype_change)
> + {
> + setlocale(LC_CTYPE, save_lc_ctype);
> + pfree(save_lc_ctype);
> + }
> + #endif /* WIN32 */
> if (save_lc_monetary)
> {
> setlocale(LC_MONETARY, save_lc_monetary);
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-25 23:32:05
Message-ID: 8452.1267140725@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Where are we on this issue?

According to my files, I complained about the extreme ugliness of the
patch (redefining strdup for pete's sake) and the fact that it did not
actually do things anything like the LC_TIME code as was claimed.
Hiroshi rejected those criticisms. I don't know where we are, but
I don't want to see this patch applied in this form.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-25 23:33:29
Message-ID: 201002252333.o1PNXTX05550@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Where are we on this issue?
>
> According to my files, I complained about the extreme ugliness of the
> patch (redefining strdup for pete's sake) and the fact that it did not
> actually do things anything like the LC_TIME code as was claimed.
> Hiroshi rejected those criticisms. I don't know where we are, but
> I don't want to see this patch applied in this form.

Right, but you are saying it is still an open issue, which says we
should look at it.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-25 23:46:53
Message-ID: 9162.1267141613@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Right, but you are saying it is still an open issue, which says we
> should look at it.

Sure. Maybe put it on TODO?

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-26 00:16:15
Message-ID: 201002260016.o1Q0GFt14601@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Tom Lane wrote:
> Bruce Momjian <bruce(at)momjian(dot)us> writes:
> > Right, but you are saying it is still an open issue, which says we
> > should look at it.
>
> Sure. Maybe put it on TODO?

OK, TODO is:

Fix locale-aware handling (e.g. monetary) for specific
server/client encoding combinations

* http://archives.postgresql.org/pgsql-general/2009-04/msg00799.php

If someone wants to work on it, go ahead.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-26 00:44:37
Message-ID: 4B871975.2010500@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Where are we on this issue?

Oops I forgot it completely.
I have a little improved version and would post it tonight.

regards,
Hiroshi Inoue

>
> ---------------------------------------------------------------------------
>
> Hiroshi Inoue wrote:
>> Tom Lane wrote:
>>> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
>>>> Does this imply that we shouldn't allow UTF8 database on Windows at all?
>>> That would be pretty unfortunate :-(
>>>
>>> I think what this suggests is that there probably needs to be some
>>> encoding conversion logic near the places we examine localeconv()
>>> output.
>> Attached is a patch to the current CVS.
>> It uses a similar way like LC_TIME stuff does.
>>
>> regards,
>> Hiroshi Inoue


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-26 00:52:48
Message-ID: 201002260052.o1Q0qmC20676@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > Where are we on this issue?
>
> Oops I forgot it completely.
> I have a little improved version and would post it tonight.

Ah, very good. Thanks.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com
PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do
+ If your life is a hard drive, Christ can be your backup. +


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-26 23:06:31
Message-ID: 4B8853F7.6030908@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Hiroshi Inoue wrote:
>> Bruce Momjian wrote:
>>> Where are we on this issue?
>> Oops I forgot it completely.
>> I have a little improved version and would post it tonight.
>
> Ah, very good. Thanks.

Attached is an improved version.

regards,
Hiroshi Inoue

Attachment Content-Type Size
pg_locale.patch text/plain 5.9 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: trouble with to_char('L')
Date: 2010-02-27 21:42:34
Message-ID: 201002272142.o1RLgY624520@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > Hiroshi Inoue wrote:
> >> Bruce Momjian wrote:
> >>> Where are we on this issue?
> >> Oops I forgot it completely.
> >> I have a little improved version and would post it tonight.
> >
> > Ah, very good. Thanks.
>
> Attached is an improved version.

FYI, I am working on this patch now and will post an updated version.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-02-28 04:28:12
Message-ID: 201002280428.o1S4SCb29156@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > Hiroshi Inoue wrote:
> >> Bruce Momjian wrote:
> >>> Where are we on this issue?
> >> Oops I forgot it completely.
> >> I have a little improved version and would post it tonight.
> >
> > Ah, very good. Thanks.
>
> Attached is an improved version.

I spent many hours on this patch and am attaching an updated version.
I have restructured the code and added many comments, but this is the
main one:

* Ideally, the server encoding and locale settings would
* always match. Unfortunately, WIN32 does not support UTF-8
* values for setlocale(), even though PostgreSQL runs fine with
* a UTF-8 encoding on Windows:
*
* http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
*
* Therefore, we must set LC_CTYPE to match LC_NUMERIC and
* LC_MONETARY, call localeconv(), and use mbstowcs() to
* convert the locale-aware string, e.g. Euro symbol, which
* is not in UTF-8 to the server encoding.

I need someone with WIN32 experience to review and test this patch.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

Attachment Content-Type Size
/pgpatches/pg_locale text/x-diff 8.1 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-02-28 15:13:56
Message-ID: 201002281513.o1SFDuw20144@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Hiroshi Inoue wrote:
> > Bruce Momjian wrote:
> > > Hiroshi Inoue wrote:
> > >> Bruce Momjian wrote:
> > >>> Where are we on this issue?
> > >> Oops I forgot it completely.
> > >> I have a little improved version and would post it tonight.
> > >
> > > Ah, very good. Thanks.
> >
> > Attached is an improved version.
>
> I spent many hours on this patch and am attaching an updated version.
> I have restructured the code and added many comments, but this is the
> main one:
>
> * Ideally, the server encoding and locale settings would
> * always match. Unfortunately, WIN32 does not support UTF-8
> * values for setlocale(), even though PostgreSQL runs fine with
> * a UTF-8 encoding on Windows:
> *
> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
> *
> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and
> * LC_MONETARY, call localeconv(), and use mbstowcs() to
> * convert the locale-aware string, e.g. Euro symbol, which
> * is not in UTF-8 to the server encoding.
>
> I need someone with WIN32 experience to review and test this patch.

I don't understand why cache_locale_time() works on Windows. It sets
the LC_CTYPE but does not do any encoding coversion. Do month and
day-of-week names not work either, or do they work and the encoding
conversion for numeric/money, e.g. Euro, it not necessary?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-01 11:00:15
Message-ID: 4B8B9E3F.6060605@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Bruce Momjian wrote:
>> Hiroshi Inoue wrote:
>>> Bruce Momjian wrote:
>>>> Hiroshi Inoue wrote:
>>>>> Bruce Momjian wrote:
>>>>>> Where are we on this issue?
>>>>> Oops I forgot it completely.
>>>>> I have a little improved version and would post it tonight.
>>>> Ah, very good. Thanks.
>>> Attached is an improved version.
>> I spent many hours on this patch and am attaching an updated version.
>> I have restructured the code and added many comments, but this is the
>> main one:
>>
>> * Ideally, the server encoding and locale settings would
>> * always match. Unfortunately, WIN32 does not support UTF-8
>> * values for setlocale(), even though PostgreSQL runs fine with
>> * a UTF-8 encoding on Windows:
>> *
>> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
>> *
>> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and
>> * LC_MONETARY, call localeconv(), and use mbstowcs() to
>> * convert the locale-aware string, e.g. Euro symbol, which
>> * is not in UTF-8 to the server encoding.
>>
>> I need someone with WIN32 experience to review and test this patch.
>
> I don't understand why cache_locale_time() works on Windows. It sets
> the LC_CTYPE but does not do any encoding coversion.

Doesn't strftime_win32 do the conversion?

> Do month and
> day-of-week names not work either, or do they work and the encoding
> conversion for numeric/money, e.g. Euro, it not necessary?

db_strdup does the conversion.

regards,
Hiroshi Inoue


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-02 03:36:44
Message-ID: 201003020336.o223ai221533@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > Bruce Momjian wrote:
> >> Hiroshi Inoue wrote:
> >>> Bruce Momjian wrote:
> >>>> Hiroshi Inoue wrote:
> >>>>> Bruce Momjian wrote:
> >>>>>> Where are we on this issue?
> >>>>> Oops I forgot it completely.
> >>>>> I have a little improved version and would post it tonight.
> >>>> Ah, very good. Thanks.
> >>> Attached is an improved version.
> >> I spent many hours on this patch and am attaching an updated version.
> >> I have restructured the code and added many comments, but this is the
> >> main one:
> >>
> >> * Ideally, the server encoding and locale settings would
> >> * always match. Unfortunately, WIN32 does not support UTF-8
> >> * values for setlocale(), even though PostgreSQL runs fine with
> >> * a UTF-8 encoding on Windows:
> >> *
> >> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
> >> *
> >> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and
> >> * LC_MONETARY, call localeconv(), and use mbstowcs() to
> >> * convert the locale-aware string, e.g. Euro symbol, which
> >> * is not in UTF-8 to the server encoding.
> >>
> >> I need someone with WIN32 experience to review and test this patch.
> >
> > I don't understand why cache_locale_time() works on Windows. It sets
> > the LC_CTYPE but does not do any encoding coversion.
>
> Doesn't strftime_win32 do the conversion?

Oh, I now see strftime is redefined as a macro in that C files. Thanks.

> > Do month and
> > day-of-week names not work either, or do they work and the encoding
> > conversion for numeric/money, e.g. Euro, it not necessary?
>
> db_strdup does the conversion.

Should we pull the encoding conversion into a separate function and have
strftime_win32() and db_strdup() both call it?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-02 14:40:46
Message-ID: 4B8D236E.8090601@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Hiroshi Inoue wrote:
>> Bruce Momjian wrote:
>>> Bruce Momjian wrote:
>>>> Hiroshi Inoue wrote:
>>>>> Bruce Momjian wrote:
>>>>>> Hiroshi Inoue wrote:
>>>>>>> Bruce Momjian wrote:
>>>>>>>> Where are we on this issue?
>>>>>>> Oops I forgot it completely.
>>>>>>> I have a little improved version and would post it tonight.
>>>>>> Ah, very good. Thanks.
>>>>> Attached is an improved version.
>>>> I spent many hours on this patch and am attaching an updated version.
>>>> I have restructured the code and added many comments, but this is the
>>>> main one:
>>>>
>>>> * Ideally, the server encoding and locale settings would
>>>> * always match. Unfortunately, WIN32 does not support UTF-8
>>>> * values for setlocale(), even though PostgreSQL runs fine with
>>>> * a UTF-8 encoding on Windows:
>>>> *
>>>> * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
>>>> *
>>>> * Therefore, we must set LC_CTYPE to match LC_NUMERIC and
>>>> * LC_MONETARY, call localeconv(), and use mbstowcs() to
>>>> * convert the locale-aware string, e.g. Euro symbol, which
>>>> * is not in UTF-8 to the server encoding.
>>>>
>>>> I need someone with WIN32 experience to review and test this patch.
>>> I don't understand why cache_locale_time() works on Windows. It sets
>>> the LC_CTYPE but does not do any encoding coversion.
>> Doesn't strftime_win32 do the conversion?
>
> Oh, I now see strftime is redefined as a macro in that C files. Thanks.
>
>>> Do month and
>>> day-of-week names not work either, or do they work and the encoding
>>> conversion for numeric/money, e.g. Euro, it not necessary?
>> db_strdup does the conversion.
>
> Should we pull the encoding conversion into a separate function and have
> strftime_win32() and db_strdup() both call it?

We may be able to pull the conversion WideChars => UTF8 =>
a PG encoding into an function.

BTW both PGLC_localeconv() and cache_locale_time() save the current
LC_CTYPE first and restore them just before returning the functions.
I'm suspicious if it's OK when errors occur in middle of the functions.

regards,
Hiroshi Inoue


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Mikko <mhannesy(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-02 18:14:00
Message-ID: 201003021814.o22IE1s26092@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> >>>> I need someone with WIN32 experience to review and test this patch.
> >>> I don't understand why cache_locale_time() works on Windows. It sets
> >>> the LC_CTYPE but does not do any encoding coversion.
> >> Doesn't strftime_win32 do the conversion?
> >
> > Oh, I now see strftime is redefined as a macro in that C files. Thanks.
> >
> >>> Do month and
> >>> day-of-week names not work either, or do they work and the encoding
> >>> conversion for numeric/money, e.g. Euro, it not necessary?
> >> db_strdup does the conversion.
> >
> > Should we pull the encoding conversion into a separate function and have
> > strftime_win32() and db_strdup() both call it?
>
> We may be able to pull the conversion WideChars => UTF8 =>
> a PG encoding into an function.

OK, I have created a new function, win32_wchar_to_db_encoding(), to
share the conversion from wide characters to the database encoding.
New patch attached.

> BTW both PGLC_localeconv() and cache_locale_time() save the current
> LC_CTYPE first and restore them just before returning the functions.
> I'm suspicious if it's OK when errors occur in middle of the functions.

Yea, I added a comment questioning if that is a problem.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do

Attachment Content-Type Size
/pgpatches/pg_locale text/x-diff 11.9 KB

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-12 06:52:07
Message-ID: 20100312155207.968B.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> OK, I have created a new function, win32_wchar_to_db_encoding(), to
> share the conversion from wide characters to the database encoding.
> New patch attached.

Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
db_encoding_strdup() with the function. Like this:

static char *
db_encoding_strdup(const char *str)
{
char *pstr;
char *mstr;

/* convert the string to the database encoding */
pstr = (char *) pg_do_encoding_conversion(
(unsigned char *) str, strlen(str),
GetPlatformEncoding(), GetDatabaseEncoding());
mstr = strdup(pstr);
if (pstr != str)
pfree(pstr);

return mstr;
}

I beleive the code is harmless on all platforms and we can use it
instead of strdup() without any #ifdef WIN32 quotes.

BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-12 17:13:48
Message-ID: 201003121713.o2CHDm828424@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > OK, I have created a new function, win32_wchar_to_db_encoding(), to
> > share the conversion from wide characters to the database encoding.
> > New patch attached.
>
> Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> db_encoding_strdup() with the function. Like this:
>
> static char *
> db_encoding_strdup(const char *str)
> {
> char *pstr;
> char *mstr;
>
> /* convert the string to the database encoding */
> pstr = (char *) pg_do_encoding_conversion(
> (unsigned char *) str, strlen(str),
> GetPlatformEncoding(), GetDatabaseEncoding());
> mstr = strdup(pstr);
> if (pstr != str)
> pfree(pstr);
>
> return mstr;
> }
>
> I beleive the code is harmless on all platforms and we can use it
> instead of strdup() without any #ifdef WIN32 quotes.

OK, I don't have any Win32 people testing this patch so if we want this
fixed for 9.0 someone is going to have to test my patch to see that it
works. Can you make the adjustments suggested above to my patch and
test it to see that it works so we can apply it for 9.0?

> BTW, I found we'd better to add "ANSI_X3.4-1968" as an alias for
> PG_SQL_ASCII. My Fedora 12 returns the name when --no-locale is used.

OK.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-18 03:34:20
Message-ID: 20100318123420.9BAC.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Bruce Momjian <bruce(at)momjian(dot)us> wrote:

> Takahiro Itagaki wrote:
> > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > db_encoding_strdup() with the function. Like this:
>
> OK, I don't have any Win32 people testing this patch so if we want this
> fixed for 9.0 someone is going to have to test my patch to see that it
> works. Can you make the adjustments suggested above to my patch and
> test it to see that it works so we can apply it for 9.0?

Here is a full patch that can be applied cleanly to HEAD.
Can anyone test it on Windows?

I'm not sure why temporary changes of lc_ctype was required in the
original patch. The codes are not included in my patch, but please
notice me it is still needed.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment Content-Type Size
pg_locale_20100318.patch application/octet-stream 3.6 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-03-22 20:14:53
Message-ID: 201003222014.o2MKErr17486@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
>
> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>
> > Takahiro Itagaki wrote:
> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
> > > db_encoding_strdup() with the function. Like this:
> >
> > OK, I don't have any Win32 people testing this patch so if we want this
> > fixed for 9.0 someone is going to have to test my patch to see that it
> > works. Can you make the adjustments suggested above to my patch and
> > test it to see that it works so we can apply it for 9.0?
>
> Here is a full patch that can be applied cleanly to HEAD.
> Can anyone test it on Windows?
>
> I'm not sure why temporary changes of lc_ctype was required in the
> original patch. The codes are not included in my patch, but please
> notice me it is still needed.

Sorry for the delay in replying to you.

I considered your idea of using the existing Postgres encoding
conversion routines to do the conversion of localenv() strings, but
found two problems.

First, GetPlatformEncoding() caches its result, so it assumes the
LC_CTYPE never changes for the server, while fixing this issue actually
requires us to change LC_CTYPE. We could avoid the caching but that
then involves complex table lookups, etc, which seems overly complex:

+ /* convert the string to the database encoding */
+ pstr = (char *) pg_do_encoding_conversion(
+ (unsigned char *) str, strlen(str),
+ GetPlatformEncoding(), GetDatabaseEncoding());

Second, having our backend routines do the conversion seems wrong
because it is possible for someone to set LC_MONETARY to an encoding
that our database does not understand, e.g. UTF16, but one that WIN32
can convert to a valid encoding.

The reason we are doing all this is because of this updated comment in
my patch:

ftp://momjian.us/pub/postgresql/mypatches/pg_locale

+ * Ideally, monetary and numeric local symbols could be returned in
+ * any server encoding. Unfortunately, the WIN32 API does not allow
+ * setlocale() to return values in a codepage/CTYPE that uses more
+ * than two bytes per character, like UTF-8:
+ *
+ * http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
+ *
+ * Evidently, LC_CTYPE allows us to control the encoding used
+ * for strings returned by localeconv(). The Open Group
+ * standard, mentioned at the top of this C file, doesn't
+ * explicitly state this.
+ *
+ * Therefore, we set LC_CTYPE to match LC_NUMERIC and
+ * LC_MONETARY, call localeconv(), and use mbstowcs() to
+ * convert the locale-aware string, e.g. Euro symbol (which
+ * is not in UTF-8), to the server encoding.

One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
on Win32 and then just convert that always to the server encoding with
win32_wchar_to_db_encoding(), instead of using the encoding from
LC_MONETARY to set LC_CTYPE and having to do double-conversion.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

PG East: http://www.enterprisedb.com/community/nav-pg-east-2010.do


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-16 10:52:27
Message-ID: g2o9837222c1004160352n319ac670p647bef30d121e50c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Mar 22, 2010 at 9:14 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> Takahiro Itagaki wrote:
>>
>> Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>>
>> > Takahiro Itagaki wrote:
>> > > Since 9.0 has GetPlatformEncoding() for the purpose, we could simplify
>> > > db_encoding_strdup() with the function. Like this:
>> >
>> > OK, I don't have any Win32 people testing this patch so if we want this
>> > fixed for 9.0 someone is going to have to test my patch to see that it
>> > works.  Can you make the adjustments suggested above to my patch and
>> > test it to see that it works so we can apply it for 9.0?
>>
>> Here is a full patch that can be applied cleanly to HEAD.
>> Can anyone test it on Windows?
>>
>> I'm not sure why temporary changes of lc_ctype was required in the
>> original patch. The codes are not included in my patch, but please
>> notice me it is still needed.
>
> Sorry for the delay in replying to you.
>
> I considered your idea of using the existing Postgres encoding
> conversion routines to do the conversion of localenv() strings, but
> found two problems.
>
> First, GetPlatformEncoding() caches its result, so it assumes the
> LC_CTYPE never changes for the server, while fixing this issue actually
> requires us to change LC_CTYPE.  We could avoid the caching but that
> then involves complex table lookups, etc, which seems overly complex:
>
> +       /* convert the string to the database encoding */
> +       pstr = (char *) pg_do_encoding_conversion(
> +                                               (unsigned char *) str, strlen(str),
> +                                               GetPlatformEncoding(), GetDatabaseEncoding());
>
> Second, having our backend routines do the conversion seems wrong
> because it is possible for someone to set LC_MONETARY to an encoding
> that our database does not understand, e.g. UTF16, but one that WIN32
> can convert to a valid encoding.
>
> The reason we are doing all this is because of this updated comment in
> my patch:
>
>        ftp://momjian.us/pub/postgresql/mypatches/pg_locale
>
> +    *  Ideally, monetary and numeric local symbols could be returned in
> +    *  any server encoding.  Unfortunately, the WIN32 API does not allow
> +    *  setlocale() to return values in a codepage/CTYPE that uses more
> +    *  than two bytes per character, like UTF-8:
> +    *
> +    *      http://msdn.microsoft.com/en-us/library/x99tb11d.aspx
> +    *
> +    *  Evidently, LC_CTYPE allows us to control the encoding used
> +    *  for strings returned by localeconv().  The Open Group
> +    *  standard, mentioned at the top of this C file, doesn't
> +    *  explicitly state this.
> +    *
> +    *  Therefore, we set LC_CTYPE to match LC_NUMERIC and
> +    *  LC_MONETARY, call localeconv(), and use mbstowcs() to
> +    *  convert the locale-aware string, e.g. Euro symbol (which
> +    *  is not in UTF-8), to the server encoding.
>
> One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
> on Win32 and then just convert that always to the server encoding with
> win32_wchar_to_db_encoding(), instead of using the encoding from
> LC_MONETARY to set LC_CTYPE and having to do double-conversion.

So, hugely late, reviving this thread.

Ideally, we should definitely consider doing that. Internally, Windows
will do it in UTF16 anyway. So we're basically doing
UTF16->db->UTF16->UTF8->db or something like that with this patch.

But I'm unsure how that would work. We're talking about the output of
localeconv(), right? I don't see a version of localeconv() that does
wide chars anywhere. (You can't just set LC_CTYPE and use the regular
function - Windows has a separate set of functions for dealing with
UTF16).

Looking at the patch, you're passing "item" to db_encoding_strdup()
but it doesn't seem to be used anywhere. Leftover from previous
experiments, or forgot to use it? Perhaps you intended for it to be in
the error messages?

Also, won't this need special-casing for UTF8? Per comment in
mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
use MultiByteToWideChar().

I also note that we have char2wchar() already - we should perhaps just
call that? Or will that use the wrong locale?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-19 01:59:42
Message-ID: 20100419105942.A6B4.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> But I'm unsure how that would work. We're talking about the output of
> localeconv(), right? I don't see a version of localeconv() that does
> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
> function - Windows has a separate set of functions for dealing with
> UTF16).

Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns
characters in the encoding specified in LC_TYPE, we need to hande the
issue with codes something like:

1. setlocale(LC_CTYPE, lc_monetary)
2. setlocale(LC_MONETARY, lc_monetary)
3. lc = localeconv()
4. pg_do_encoding_conversion(lc->xxx,
FROM pg_get_encoding_from_locale(lc_monetary),
TO GetDatabaseEncoding())
5. Revert LC_CTYPE and LC_MONETARY.

Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
functions, instead of the libc one. It returns locale characters in wide
chars, so we can safely convert them as UTF16->UTF8->db. But it requires
an additional branch in our locale codes only for Windows.

[1] http://msdn.microsoft.com/en-us/library/dd318101

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-19 12:02:32
Message-ID: y2l9837222c1004190502of3ee98f6v11e847c9ef9d6c6b@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

On Mon, Apr 19, 2010 at 03:59, Takahiro Itagaki
<itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>> But I'm unsure how that would work. We're talking about the output of
>> localeconv(), right? I don't see a version of localeconv() that does
>> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
>> function - Windows has a separate set of functions for dealing with
>> UTF16).
>
> Yeah, msvcrt doesn't have wlocaleconv :-( . Since localeconv() returns
> characters in the encoding specified in LC_TYPE, we need to hande the
> issue with codes something like:
>
>    1. setlocale(LC_CTYPE, lc_monetary)
>    2. setlocale(LC_MONETARY, lc_monetary)
>    3. lc = localeconv()
>    4. pg_do_encoding_conversion(lc->xxx,
>          FROM pg_get_encoding_from_locale(lc_monetary),
>          TO GetDatabaseEncoding())
>    5. Revert LC_CTYPE and LC_MONETARY.
>
>
> Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
> functions, instead of the libc one. It returns locale characters in wide
> chars, so we can safely convert them as UTF16->UTF8->db. But it requires
> an additional branch in our locale codes only for Windows.

If we can go UTF16->db directly, it might be a good idea. If we're
going via UTF8 anyway, I doubt it's going to be worth it.

Let's work off what we have now to start with at least. Bruce, can you
comment on that thing about the extra parameter? And UTF8?

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 08:34:09
Message-ID: 20100420173406.93AF.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> > 1. setlocale(LC_CTYPE, lc_monetary)
> > 2. setlocale(LC_MONETARY, lc_monetary)
> > 3. lc = localeconv()
> > 4. pg_do_encoding_conversion(lc->xxx,
> > FROM pg_get_encoding_from_locale(lc_monetary),
> > TO GetDatabaseEncoding())
> > 5. Revert LC_CTYPE and LC_MONETARY.

A patch attached for the above straightforwardly. Does this work?
Note that #ifdef WIN32 parts in the patch are harmless on other platforms
even if they are enabled.

> Let's work off what we have now to start with at least. Bruce, can you
> comment on that thing about the extra parameter? And UTF8?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment Content-Type Size
pg_locale_20100420.patch application/octet-stream 4.3 KB

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 13:10:18
Message-ID: 201004201310.o3KDAIR27248@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Magnus Hagander wrote:
> > One new idea would be to set LC_CTYPE to UTF16/widechars unconditionally
> > on Win32 and then just convert that always to the server encoding with
> > win32_wchar_to_db_encoding(), instead of using the encoding from
> > LC_MONETARY to set LC_CTYPE and having to do double-conversion.
>
> So, hugely late, reviving this thread.
>
> Ideally, we should definitely consider doing that. Internally, Windows
> will do it in UTF16 anyway. So we're basically doing
> UTF16->db->UTF16->UTF8->db or something like that with this patch.
>
> But I'm unsure how that would work. We're talking about the output of
> localeconv(), right? I don't see a version of localeconv() that does
> wide chars anywhere. (You can't just set LC_CTYPE and use the regular
> function - Windows has a separate set of functions for dealing with
> UTF16).

I thought there was an LC_CTYPE for UTF16 that we could use without a
wide version of that function. If not, forget that idea.

> Looking at the patch, you're passing "item" to db_encoding_strdup()
> but it doesn't seem to be used anywhere. Leftover from previous
> experiments, or forgot to use it? Perhaps you intended for it to be in
> the error messages?

It originally was in the error message but can be removed. I have now
removed 'item' from my version of the patch.

> Also, won't this need special-casing for UTF8? Per comment in
> mbutils.c, wcstombs() doesn't work for UTF8 encodings - you need to
> use MultiByteToWideChar().

Well, we don't support UTF8 for any of the non-encoding locales, e.g.
monetary, numeric, so I never considered that we would support it. If
we did support it, we would have to _pick_ a locale that is <= 2 bytes
per character and use that, and then convert to UTF8, but what locale
would we pick? They could use a LC_TYPE that is <= 2 bytes and a
numeric that is UTF8, but I never suspected we would want to support
that, and we would need some logic to detect that case.

> I also note that we have char2wchar() already - we should perhaps just
> call that? Or will that use the wrong locale?

I see char2wchar() calling GetDatabaseEncoding() right away, which does
use the cached value for the server encoding, so I don't think it will
work. We can't use our existing routines to convert _from_ the current
encoding to wide characters (because our numeric encoding might not
match the server encoding). However, we can use existing code that
converts from wide to the server encoding, perhaps replacing
win32_wchar_to_db_encoding().

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 13:23:45
Message-ID: 201004201323.o3KDNjv28954@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
>
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
> > > 1. setlocale(LC_CTYPE, lc_monetary)
> > > 2. setlocale(LC_MONETARY, lc_monetary)
> > > 3. lc = localeconv()
> > > 4. pg_do_encoding_conversion(lc->xxx,
> > > FROM pg_get_encoding_from_locale(lc_monetary),
> > > TO GetDatabaseEncoding())
> > > 5. Revert LC_CTYPE and LC_MONETARY.
>
> A patch attached for the above straightforwardly. Does this work?
> Note that #ifdef WIN32 parts in the patch are harmless on other platforms
> even if they are enabled.

I like this patch. Instead of having special code to convert from the
_current_ locale, you pass the encoding name to our routines. This does
mean we are bound by supporting only the encodings PG supports, not the
full range of Win32 encodings, but that seems fine.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 13:38:24
Message-ID: 201004201338.o3KDcOU00788@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Magnus Hagander wrote:
> > Another idea is to use GetLocaleInfoW() [1], that is win32 native locale
> > functions, instead of the libc one. It returns locale characters in wide
> > chars, so we can safely convert them as UTF16->UTF8->db. But it requires
> > an additional branch in our locale codes only for Windows.
>
> If we can go UTF16->db directly, it might be a good idea. If we're
> going via UTF8 anyway, I doubt it's going to be worth it.
>
> Let's work off what we have now to start with at least. Bruce, can you
> comment on that thing about the extra parameter? And UTF8?

I do like the idea of using UTF16 directly because that would eliminate
our need to even set LC_CTYPE for Win32 in this routine. That would
also eliminate any need to refer to the encoding for numeric/monetary,
so we could get rid of the odd case where their encoding is UTF8 but
their numeric/monetary locale settings have to use a non-UTF8 encoding.
For example, the original bug report has these locale settings:

http://archives.postgresql.org/pgsql-general/2009-04/msg00829.php

psql (PostgreSQL) 8.3.7

server_version 8.3.7
server_encoding UTF8
client_encoding win1252
lc_numeric Finnish, Finland
lc_monetary Finnish, Finland

but really needed to use "Finnish_Finland.1252":

http://archives.postgresql.org/pgsql-general/2009-04/msg00859.php

However, I noticed that both lc_collate and lc_ctype are set to
Finnish_Finland.1252 by the installer. Should I have just run initdb
with --locale fi_FI.UTF8 at the very start? The to_char('L') works
fine with a database with win1252 encoding.

Of course, that still does not work with our current CVS code if the
database encoding is UTF8, which is what we are trying to fix now.

I am not even sure how users set these things properly but I assume the
installer does all that magic. And, of course, if someone manually runs
initdb on Windows, they can easily set things wrong.

Magnus, if I remember correctly, all our non-UTF8 to UTF8 conversion
already has to pass through UTF16 as an intermediary case, so going to
UTF16 directly seems fine.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com


From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-20 14:28:23
Message-ID: 4BCDBA07.3040109@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>
>>> 1. setlocale(LC_CTYPE, lc_monetary)
>>> 2. setlocale(LC_MONETARY, lc_monetary)
>>> 3. lc = localeconv()
>>> 4. pg_do_encoding_conversion(lc->xxx,
>>> FROM pg_get_encoding_from_locale(lc_monetary),
>>> TO GetDatabaseEncoding())
>>> 5. Revert LC_CTYPE and LC_MONETARY.
>
> A patch attached for the above straightforwardly. Does this work?

I have 2 questions about this patch.

1. How does it work when LC_MONETARY and LC_NUMERIC are different?
2. Calling db_encoding_strdup() for lconv->grouping is appropriate?

regards,
Hiroshi Inoue

> Note that #ifdef WIN32 parts in the patch are harmless on other platforms
> even if they are enabled.
>
>> Let's work off what we have now to start with at least. Bruce, can you
>> comment on that thing about the extra parameter? And UTF8?
>
> Regards,
> ---
> Takahiro Itagaki
> NTT Open Source Software Center


From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-21 01:50:26
Message-ID: 20100421105026.9130.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp> wrote:

> 1. How does it work when LC_MONETARY and LC_NUMERIC are different?

I think it is rarely used, but possible. Fixed.

> 2. Calling db_encoding_strdup() for lconv->grouping is appropriate?

Ah, we didn't need it. Removed.

Revised patch attached. Please test it.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center

Attachment Content-Type Size
pg_locale_20100421.patch application/octet-stream 4.7 KB

From: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Magnus Hagander <magnus(at)hagander(dot)net>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-22 02:00:50
Message-ID: 20100422110050.92BA.52131E4D@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers


Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:

> Revised patch attached. Please test it.

I applied this version of the patch.
Please check wheter the bug is fixed and any buildfarm failures.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-24 23:03:28
Message-ID: 201004242303.o3ON3Sk05147@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Takahiro Itagaki wrote:
>
> Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>
> > Revised patch attached. Please test it.
>
> I applied this version of the patch.
> Please check wheter the bug is fixed and any buildfarm failures.

Great. I have merged in my C comments into the code with the attached
patch so we remember why the code is setup as it is.

One thing I am confused about is that, for Win32, our numeric/monetary
handling sets lc_ctype to match numeric/monetary, while our time code in
the same file uses that method _and_ uses wcsftime() to return the value
in wide characters. So, why do we do both for time? Is there any value
to that?

Seems we should do the same for both numeric/monetary and time.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

Attachment Content-Type Size
/rtmp/diff text/x-diff 7.6 KB

From: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-26 00:19:49
Message-ID: 4BD4DC25.9090003@tpf.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Bruce Momjian wrote:
> Takahiro Itagaki wrote:
>> Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>>
>>> Revised patch attached. Please test it.
>> I applied this version of the patch.
>> Please check wheter the bug is fixed and any buildfarm failures.
>
> Great. I have merged in my C comments into the code with the attached
> patch so we remember why the code is setup as it is.
>
> One thing I am confused about is that, for Win32, our numeric/monetary
> handling sets lc_ctype to match numeric/monetary, while our time code in
> the same file uses that method _and_ uses wcsftime() to return the value
> in wide characters. So, why do we do both for time? Is there any value
> to that?

Unfortunately wcsftime() is a halfway conveniece function which uses
ANSI version of functionalities internally.
AFAIC the only way to remove the dependency to LC_CTYPE is to call
GeLocaleInfoW() directly.

regards,
Hiroshi Inoue


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Hiroshi Inoue <inoue(at)tpf(dot)co(dot)jp>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] trouble with to_char('L')
Date: 2010-04-26 13:32:11
Message-ID: 201004261332.o3QDWBM19719@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-general pgsql-hackers

Hiroshi Inoue wrote:
> Bruce Momjian wrote:
> > Takahiro Itagaki wrote:
> >> Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
> >>
> >>> Revised patch attached. Please test it.
> >> I applied this version of the patch.
> >> Please check wheter the bug is fixed and any buildfarm failures.
> >
> > Great. I have merged in my C comments into the code with the attached
> > patch so we remember why the code is setup as it is.
> >
> > One thing I am confused about is that, for Win32, our numeric/monetary
> > handling sets lc_ctype to match numeric/monetary, while our time code in
> > the same file uses that method _and_ uses wcsftime() to return the value
> > in wide characters. So, why do we do both for time? Is there any value
> > to that?
>
> Unfortunately wcsftime() is a halfway conveniece function which uses
> ANSI version of functionalities internally.
> AFAIC the only way to remove the dependency to LC_CTYPE is to call
> GeLocaleInfoW() directly.

Thanks. I have documented this fact in a C comment; patch attached.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

Attachment Content-Type Size
/rtmp/diff text/x-diff 1.2 KB