Re: BUG #6277: Money datatype conversion wrong with Russian locale

Lists: pgsql-bugs
From: "Alexander LAW" <exclusion(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6277: Money datatype conversion wrong with Russian locale
Date: 2011-10-29 12:10:17
Message-ID: 201110291210.p9TCAHBA041757@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 6277
Logged by: Alexander LAW
Email address: exclusion(at)gmail(dot)com
PostgreSQL version: 9.1.1
Operating system: Windows
Description: Money datatype conversion wrong with Russian locale
Details:

When lc_monetary is set to Russian, money values having 4+ digits formatted
incorrectly.
E.g. following select:

set lc_monetary='Russian';
select '1000'::money;

returns a string that can't be displayed.
It's caused by wrong mon_thousands_sep processing in
backend/utils/adt/cash.c, cash_out function.
The code assumes that the thousands separator fits in one character. But in
Russian locale we have non-breakable space as the thousands separator (0xC2
0xA0 in UTF-8).
The cash_out function uses only the first char (0xC2) of it and thus returns
an invalid string: 1\xC2000.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Alexander LAW" <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6277: Money datatype conversion wrong with Russian locale
Date: 2011-10-29 16:17:15
Message-ID: 16309.1319905035@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

"Alexander LAW" <exclusion(at)gmail(dot)com> writes:
> It's caused by wrong mon_thousands_sep processing in
> backend/utils/adt/cash.c, cash_out function.
> The code assumes that the thousands separator fits in one character. But in
> Russian locale we have non-breakable space as the thousands separator (0xC2
> 0xA0 in UTF-8).

Hmm ... looks like cash_out really needs a significant rewrite to make
that work nicely. It's combining counting of digit positions with
counting of output bytes, which was messy enough already, but gets
worse fast if the thousands separator isn't a single byte.

Does anyone know of locales where the decimal point isn't a single byte?
I'm wondering if that assumption needs to be got rid of too.

regards, tom lane


From: Alexander Lakhin <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6277: Money datatype conversion wrong with Russian locale
Date: 2011-10-30 03:46:55
Message-ID: 4EACC8AF.2000708@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

I think there is no need to leave such assumptions. I would propose the
following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too)
I can send it as a patch if you wish. Please notice a comments regarding
regression tests. IMHO at least currency symbol separator should be
processed as specified in lconv. And maybe mon_decimal_point,
currency_symbol and negative_sign should be allowed to be empty too if
it's defined by a locale.

Best regards,
Alexander

29.10.2011 20:17, Tom Lane writes:
> "Alexander LAW"<exclusion(at)gmail(dot)com> writes:
>> It's caused by wrong mon_thousands_sep processing in
>> backend/utils/adt/cash.c, cash_out function.
>> The code assumes that the thousands separator fits in one character. But in
>> Russian locale we have non-breakable space as the thousands separator (0xC2
>> 0xA0 in UTF-8).
> Hmm ... looks like cash_out really needs a significant rewrite to make
> that work nicely. It's combining counting of digit positions with
> counting of output bytes, which was messy enough already, but gets
> worse fast if the thousands separator isn't a single byte.
>
> Does anyone know of locales where the decimal point isn't a single byte?
> I'm wondering if that assumption needs to be got rid of too.
>
> regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6277: Money datatype conversion wrong with Russian locale
Date: 2011-10-30 19:09:34
Message-ID: 25672.1320001774@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> I think there is no need to leave such assumptions. I would propose the
> following fix: http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too)
> I can send it as a patch if you wish. Please notice a comments regarding
> regression tests. IMHO at least currency symbol separator should be
> processed as specified in lconv. And maybe mon_decimal_point,
> currency_symbol and negative_sign should be allowed to be empty too if
> it's defined by a locale.

I've committed a patch that improves the handling of cs_precedes,
sign_posn et al. I don't agree with your proposal to slavishly follow
the locale definition no matter how brain-dead it is, because that would
destroy the ability to dump and reload data without data loss --- for
example, if we obeyed an empty negative_sign setting, we'd lose the
information that a value had been negative. AFAICT there are no such
locales anyway, other than POSIX for which we definitely don't want to
believe the empty-string setting.

regards, tom lane


From: Alexander LAW <exclusion(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6277: Money datatype conversion wrong with Russian locale
Date: 2011-10-31 10:45:44
Message-ID: 4EAE7C58.1070404@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Thank you, Tom.
I tried patched file and it works fine now (with Russian locale both in
Linux and Windows).
I've noticed that you strict mon_decimal_point to one char, but at least
two locales (fa_IR and ps_AF) (I looked at glibc-2.14) define
mon_decimal_point as "<U066B>" so it takes two bytes.
And IMHO the locale sanity check better move to PGLC_localeconv and
don't perform these checks with each number conversion.

Best regards,
Alexander.

30.10.2011 23:09, Tom Lane пишет:
> Alexander Lakhin<exclusion(at)gmail(dot)com> writes:
>> I think there is no need to leave such assumptions. I would propose the
>> following fix:http://pastebin.com/EBw5YB65 (it corrects a BUG #6144 too)
>> I can send it as a patch if you wish. Please notice a comments regarding
>> regression tests. IMHO at least currency symbol separator should be
>> processed as specified in lconv. And maybe mon_decimal_point,
>> currency_symbol and negative_sign should be allowed to be empty too if
>> it's defined by a locale.
> I've committed a patch that improves the handling of cs_precedes,
> sign_posn et al. I don't agree with your proposal to slavishly follow
> the locale definition no matter how brain-dead it is, because that would
> destroy the ability to dump and reload data without data loss --- for
> example, if we obeyed an empty negative_sign setting, we'd lose the
> information that a value had been negative. AFAICT there are no such
> locales anyway, other than POSIX for which we definitely don't want to
> believe the empty-string setting.
>
> regards, tom lane