Re: Clear up strxfrm() in UTF-8 with locale on Windows

Lists: pgsql-patches
From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: pgsql-patches(at)postgresql(dot)org
Subject: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-04-09 09:22:12
Message-ID: 20070409174312.8794.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

The attached patch clears up the usage of strxfrm() on Windows. If the
server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
instead of strxfrm() because UTF-8 locale are not supported on Windows.
We've already have a special version of strcoll() for Windows, but the
usage of strxfrm() was still broken.

When we are caught up in the bug, we see the next error message.
| ERROR: invalid memory alloc request size 2147483648
If the server is wrong configured between the server encoding and the
locale, strxfrm() could be failed and return values like INT_MAX or
(size_t)-1. We've passed the result+1 straight to palloc(), so the server
tried to allocale more than 1GB of memory and gave up.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

Attachment Content-Type Size
fix_strxfrm.patch application/octet-stream 2.5 KB

From: "Hiroshi Saito" <z-saito(at)guitar(dot)ocn(dot)ne(dot)jp>
To: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: <pgsql-patches(at)postgresql(dot)org>
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-04-09 13:21:30
Message-ID: 04d101c77aa9$fc7164e0$c701a8c0@wiseknot.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

From: "ITAGAKI Takahiro" <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>

> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR: invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.

Ahh..., Certainly, the bug lurked there. probably, your patch will help it.
It was not pursued in Japan for the reasons that the locale had been
recommended to be used by C up to now. however, It seems to have
caused the user's confusion. In that sense, I vote.+1

But, I am skeptic the locale setting still functions correctly.

However, I think it is great work.:-)

Thanks.

Regards,
Hiroshi Saito


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-04-10 01:15:12
Message-ID: 200704100115.l3A1FCP17357@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Your patch has been added to the PostgreSQL unapplied patches list at:

http://momjian.postgresql.org/cgi-bin/pgpatches

It will be applied as soon as one of the PostgreSQL committers reviews
and approves it.

---------------------------------------------------------------------------

ITAGAKI Takahiro wrote:
> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR: invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.
>
> Regards,
> ---
> ITAGAKI Takahiro
> NTT Open Source Software Center

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
Cc: pgsql-patches(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-02 20:42:27
Message-ID: 4638F7B3.7040602@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

ITAGAKI Takahiro wrote:
> The attached patch clears up the usage of strxfrm() on Windows. If the
> server encoding is UTF-8 and the locale is not C, we should use wcsxfrm()
> instead of strxfrm() because UTF-8 locale are not supported on Windows.
> We've already have a special version of strcoll() for Windows, but the
> usage of strxfrm() was still broken.
>
> When we are caught up in the bug, we see the next error message.
> | ERROR: invalid memory alloc request size 2147483648
> If the server is wrong configured between the server encoding and the
> locale, strxfrm() could be failed and return values like INT_MAX or
> (size_t)-1. We've passed the result+1 straight to palloc(), so the server
> tried to allocale more than 1GB of memory and gave up.

I was just about to commit this with the following two changes:
* wcsxfrm() sets errno, so you can't use GetLastError() to report problems
* The code added a check for return value >= INT_MAX on Unix as well,
but the spec for strxfrm() says that there is no specific return value
for failure.

Put those in there for reference. But I also recalled a previous
discussion, and found this:
http://archives.postgresql.org/pgsql-hackers/2005-08/msg00760.php

Given this, perhaps the proper approach should instead be to just check
the return value, and go from there? Should be a simple enough patch,
something like the attached.

Tom, can you comment?

Takahiro, can you test if this patch fixes your problem?

//Magnus

Attachment Content-Type Size
xf.patch text/plain 899 bytes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-02 21:25:39
Message-ID: 21456.1178141139@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Magnus Hagander <magnus(at)hagander(dot)net> writes:
> Given this, perhaps the proper approach should instead be to just check
> the return value, and go from there? Should be a simple enough patch,
> something like the attached.

> Tom, can you comment?

Testing against INT_MAX seems like a type pun, or something. Maybe use
MaxAllocSize instead?

if (xfrmlen >= MaxAllocSize)
return val;

Also, since as you note returning (size_t) -1 is not at all standard,
it would be helpful to readers to note that that's what Windows does
on failure and that's what you're testing for. In fact you could
make a good case that the test should be just

if (xfrmlen == (size_t) -1)
return val;

regards, tom lane


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-03 07:31:31
Message-ID: 20070503073131.GA17603@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote:
> Magnus Hagander <magnus(at)hagander(dot)net> writes:
> > Given this, perhaps the proper approach should instead be to just check
> > the return value, and go from there? Should be a simple enough patch,
> > something like the attached.
>
> > Tom, can you comment?
>
> Testing against INT_MAX seems like a type pun, or something. Maybe use
> MaxAllocSize instead?

The windows API documentation specifically says:
On an error, each function sets errno and returns INT_MAX.

So actually an equality test against INT_MAX would be correct. But making
that clear in the comment would probably not be a bad idea :-)

//Magnus


From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-05 17:07:38
Message-ID: 463CB9DA.8010005@hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

Magnus Hagander wrote:
> On Wed, May 02, 2007 at 05:25:39PM -0400, Tom Lane wrote:
>> Magnus Hagander <magnus(at)hagander(dot)net> writes:
>>> Given this, perhaps the proper approach should instead be to just check
>>> the return value, and go from there? Should be a simple enough patch,
>>> something like the attached.
>>> Tom, can you comment?
>> Testing against INT_MAX seems like a type pun, or something. Maybe use
>> MaxAllocSize instead?
>
> The windows API documentation specifically says:
> On an error, each function sets errno and returns INT_MAX.
>
> So actually an equality test against INT_MAX would be correct. But making
> that clear in the comment would probably not be a bad idea :-)

I have applied a fix for this, because it obviously needed fixing
regardless of if it fixes the original issue all the way. Still looking
for confirmation if it does, though.

//Magnus


From: ITAGAKI Takahiro <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-07 01:54:45
Message-ID: 20070507103222.884A.ITAGAKI.TAKAHIRO@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches


Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> > So actually an equality test against INT_MAX would be correct. But making
> > that clear in the comment would probably not be a bad idea :-)
>
> I have applied a fix for this, because it obviously needed fixing
> regardless of if it fixes the original issue all the way. Still looking
> for confirmation if it does, though.

The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932
(SJIS) locale and I didn't see any errors. I think it it is enough to cover
the buggy UTF-8 treatment on Windows.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center


From: "Magnus Hagander" <magnus(at)hagander(dot)net>
To: itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-patches(at)postgresql(dot)org
Subject: Re: Clear up strxfrm() in UTF-8 with locale on Windows
Date: 2007-05-07 05:41:45
Message-ID: 20070507054405.89B7DDCC860@svr2.hagander.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-patches

> > > So actually an equality test against INT_MAX would be correct. But making
> > > that clear in the comment would probably not be a bad idea :-)
> >
> > I have applied a fix for this, because it obviously needed fixing
> > regardless of if it fixes the original issue all the way. Still looking
> > for confirmation if it does, though.
>
> The fix works well. I tested it with UTF-8 encoding and Japanese_Japan.932
> (SJIS) locale and I didn't see any errors. I think it it is enough to cover
> the buggy UTF-8 treatment on Windows.
>

Thanks for testing.

Bruce, that means the original patch is not needed, so you can take it off the patch queue. Thanks.

/Magnus