FW: Win32 unicode vs ICU

Lists: pgsql-hackerspgsql-patches
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: FW: Win32 unicode vs ICU
Date: 2005-08-04 21:33:12
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE094656@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

I just realised this mail didn't go through. Probably because it was too
large for -hackers. So: repost to -patches. Sorry about that. If it's a
duplicate, even more sorry, but I couldn't find it in the archives.

(This may explain that nobody answered me :P)

//Magnus

> -----Original Message-----
> From: Magnus Hagander
> Sent: Sunday, July 31, 2005 2:09 PM
> To: PostgreSQL-development
> Cc: pgsql-hackers-win32(at)postgresql(dot)org
> Subject: Win32 unicode vs ICU
>
> Hi!
>
> I've been working with Palles ICU patch to make it work on
> win32, and I believe I have it done. While doing it I noticed
> that ICU basically converts to UTF16 and back - I previously
> thought it worked on UTF8 strings. Based on this I also tried
> out an implementation for the win32-unicode problem that does
> *not* require ICU. It uses the win32 native functions to map
> to utf16 and back, and then to process the text there. And I
> got through with much less code than the ICU version, while
> doing the same thing.
>
> I am unsure of how to proceed. As I see it there are three paths:
> 1) Use native win32 functionality only on win32
> 2) Use ICU functionality only on win32
> 3) Allow both ICU and native functionality, compile time
> switch --with-icu (same as unix with the ICU patch)
>
>
> The main downsides of ICU vs the native ones are:
> * ICU does not accept win32 locale names. When doing
> setlocale("sv_se"), for example, win32 will return this in
> later calls as "Swedish_Sweden.1252". To get around this in
> the ICU patch, I had to implement a lookup map that converts
> it back to sv_se for ICU.
>
> * ICU is yet another build and runtime dependency, and a
> large one (comes in at 11Mb for the DLL files alone in the
> win32 download)
>
>
> I guess that the main upside of it is that we'd get
> constistent behaviour - in case there are issues with either
> ICU or win32 native they'd otherwise differ. And only one new
> codepath. But we already live with the platform-inconsistency today...
>
> Another upside is that it handles more encodings in ICU - my
> native implementation does *only* UTF8 and relies on existing
> functionality to deal with other encodings. It could of
> course be extended if necessary, but from what I can tell
> UTF8 is the big one.
>
>
>
> I have attached both patches. For the native version, only
> win32_utf8.patch is required. For the ICU version,
> icu_win32.patch is needed and also the files
> localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> src/backend/port/win32. (the localemap needs to be updated to
> do a better-than-linear search, but I wanted to include an example)
>
>
> Thoughts on the options?
>
>
> And anohter question - my native patch touches the same
> functions as the ICU patch. Can somebody who knows the
> internals confirm or deny that these are all the required
> locations, or do we need to modify more?
>
> (I have run simple tests in swedish locale and both behave
> the same and correct, but I'm unsure of exactly how much
> would be affected)
>
> Finally, the win32 patch also changes the normal path to use
> strncoll(). The comment above the function states that we'd
> like to use strncoll but it's not available. Well, on win32
> it is, so it should provide a speedup on win32. It is
> currently not included in the ICU patch, but should probably
> be included whichever path we'd chose.
>
>
> //Magnus
>

Attachment Content-Type Size
win32_utf8.patch application/octet-stream 5.9 KB
icu_win32.patch application/octet-stream 21.8 KB
localemap.pl application/octet-stream 1.4 KB
localemap.c application/octet-stream 1.4 KB

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCHES] FW: Win32 unicode vs ICU
Date: 2005-08-13 02:28:13
Message-ID: 200508130228.j7D2SDJ29887@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Folks, we need to address the questions asked in this email and get it
into CVS soon.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
>
> (This may explain that nobody answered me :P)
>
> //Magnus
>
>
> > -----Original Message-----
> > From: Magnus Hagander
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > Subject: Win32 unicode vs ICU
> >
> > Hi!
> >
> > I've been working with Palles ICU patch to make it work on
> > win32, and I believe I have it done. While doing it I noticed
> > that ICU basically converts to UTF16 and back - I previously
> > thought it worked on UTF8 strings. Based on this I also tried
> > out an implementation for the win32-unicode problem that does
> > *not* require ICU. It uses the win32 native functions to map
> > to utf16 and back, and then to process the text there. And I
> > got through with much less code than the ICU version, while
> > doing the same thing.
> >
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time
> > switch --with-icu (same as unix with the ICU patch)
> >
> >
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing
> > setlocale("sv_se"), for example, win32 will return this in
> > later calls as "Swedish_Sweden.1252". To get around this in
> > the ICU patch, I had to implement a lookup map that converts
> > it back to sv_se for ICU.
> >
> > * ICU is yet another build and runtime dependency, and a
> > large one (comes in at 11Mb for the DLL files alone in the
> > win32 download)
> >
> >
> > I guess that the main upside of it is that we'd get
> > constistent behaviour - in case there are issues with either
> > ICU or win32 native they'd otherwise differ. And only one new
> > codepath. But we already live with the platform-inconsistency today...
> >
> > Another upside is that it handles more encodings in ICU - my
> > native implementation does *only* UTF8 and relies on existing
> > functionality to deal with other encodings. It could of
> > course be extended if necessary, but from what I can tell
> > UTF8 is the big one.
> >
> >
> >
> > I have attached both patches. For the native version, only
> > win32_utf8.patch is required. For the ICU version,
> > icu_win32.patch is needed and also the files
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> > src/backend/port/win32. (the localemap needs to be updated to
> > do a better-than-linear search, but I wanted to include an example)
> >
> >
> > Thoughts on the options?
> >
> >
> > And anohter question - my native patch touches the same
> > functions as the ICU patch. Can somebody who knows the
> > internals confirm or deny that these are all the required
> > locations, or do we need to modify more?
> >
> > (I have run simple tests in swedish locale and both behave
> > the same and correct, but I'm unsure of exactly how much
> > would be affected)
> >
> > Finally, the win32 patch also changes the normal path to use
> > strncoll(). The comment above the function states that we'd
> > like to use strncoll but it's not available. Well, on win32
> > it is, so it should provide a speedup on win32. It is
> > currently not included in the ICU patch, but should probably
> > be included whichever path we'd chose.
> >
> >
> > //Magnus
> >

Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org, "Palle Girgensohn" <girgen(at)pingpong(dot)net>
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-20 16:17:47
Message-ID: 24642.1124554667@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

[ moving to -hackers for wider discussion ]

"Magnus Hagander" <mha(at)sollentuna(dot)net> wrote in
http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php

>> I've been working with Palles ICU patch to make it work on
>> win32, and I believe I have it done. While doing it I noticed
>> that ICU basically converts to UTF16 and back - I previously
>> thought it worked on UTF8 strings. Based on this I also tried
>> out an implementation for the win32-unicode problem that does
>> *not* require ICU. It uses the win32 native functions to map
>> to utf16 and back, and then to process the text there. And I
>> got through with much less code than the ICU version, while
>> doing the same thing.
>>
>> I am unsure of how to proceed. As I see it there are three paths:
>> 1) Use native win32 functionality only on win32
>> 2) Use ICU functionality only on win32
>> 3) Allow both ICU and native functionality, compile time
>> switch --with-icu (same as unix with the ICU patch)

We need to figure out what we're going to do about this. Given where
we are in the release cycle, I am pretty strongly tempted to just apply
the smaller patch (just map utf8/utf16 using Windows native functions)
for PG 8.1.

I think that ICU would be interesting as the base for a much larger
patch that gets us away from depending on libc's locale support at all
(in particular, getting rid of the "one locale per database" problem).
But it seems like a heck of a big dependency to incur for any lesser goal.

I feel it makes sense to apply the smaller patch in any case, so that
there's a Win32 solution not requiring ICU (ie, I can't see an argument
for doing (2) rather than (3)).

Comments?

Also,

> And anohter question - my native patch touches the same
> functions as the ICU patch. Can somebody who knows the
> internals confirm or deny that these are all the required
> locations, or do we need to modify more?

There is a strxfrm() call in src/backend/utils/adt/selfuncs.c,
which probably needs to be looked at too.

regards, tom lane


From: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgreSQL(dot)org, Palle Girgensohn <girgen(at)pingpong(dot)net>
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-20 17:18:51
Message-ID: 20050820171850.GA21765@surnet.cl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

On Sat, Aug 20, 2005 at 12:17:47PM -0400, Tom Lane wrote:

> I think that ICU would be interesting as the base for a much larger
> patch that gets us away from depending on libc's locale support at all
> (in particular, getting rid of the "one locale per database" problem).
> But it seems like a heck of a big dependency to incur for any lesser goal.

There is a locale project from the Gnome guys, with an eye towards a
wider audience. The announcement, which states the goals of the
project, is here:

http://mail.gnome.org/archives/locale-list/2005-August/msg00000.html

The project website is at http://live.gnome.org/LocaleProject

The big problem with this is that the license is likely to be LGPL, so
there's probably not much code we could use. OTOH, it's possible that
we could borrow some ideas from them. In particular, they are based
mostly on the Common Locale Data Repository,
http://www.unicode.org/cldr/

However, this thread on their list, which is about the license they will
choose, hints that rewriting the whole CLDR handling from scratch would
be very painful:

http://mail.gnome.org/archives/locale-list/2005-August/msg00004.html

This is precisely the reason they are using LGPL: they do not want to
have to rewrite it all, which they would were they to choose a license
like BSD. (Personally I think this is folly -- someone else will have
to rewrite it again with a BSD license sometime, and then the value of
their work would be decreased.)

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"A wizard is never late, Frodo Baggins, nor is he early.
He arrives precisely when he means to." (Gandalf, en LoTR FoTR)


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-22 09:40:09
Message-ID: 78B90B1229BE2814D73F0E3E@rambutan.pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

--On lördag, augusti 20, 2005 12.17.47 -0400 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
wrote:

> [ moving to -hackers for wider discussion ]
>
> "Magnus Hagander" <mha(at)sollentuna(dot)net> wrote in
> http://archives.postgresql.org/pgsql-patches/2005-08/msg00039.php
>
>>> I've been working with Palles ICU patch to make it work on
>>> win32, and I believe I have it done. While doing it I noticed
>>> that ICU basically converts to UTF16 and back - I previously
>>> thought it worked on UTF8 strings. Based on this I also tried
>>> out an implementation for the win32-unicode problem that does
>>> *not* require ICU. It uses the win32 native functions to map
>>> to utf16 and back, and then to process the text there. And I
>>> got through with much less code than the ICU version, while
>>> doing the same thing.
>>>
>>> I am unsure of how to proceed. As I see it there are three paths:
>>> 1) Use native win32 functionality only on win32
>>> 2) Use ICU functionality only on win32
>>> 3) Allow both ICU and native functionality, compile time
>>> switch --with-icu (same as unix with the ICU patch)
>
> We need to figure out what we're going to do about this. Given where
> we are in the release cycle, I am pretty strongly tempted to just apply
> the smaller patch (just map utf8/utf16 using Windows native functions)
> for PG 8.1.
>
> I think that ICU would be interesting as the base for a much larger
> patch that gets us away from depending on libc's locale support at all
> (in particular, getting rid of the "one locale per database" problem).
> But it seems like a heck of a big dependency to incur for any lesser goal.
>
> I feel it makes sense to apply the smaller patch in any case, so that
> there's a Win32 solution not requiring ICU (ie, I can't see an argument
> for doing (2) rather than (3)).
>
> Comments?

I don't mind either way, but while Win32 will work with Magnus' patch,
FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD
port where I already have the patch as an ("experiemental") option. Not
every FreeBSD user uses the ports system, though.

So, it is a question whether FreeBSD's unicode support is important or not,
I guess? Win32 will work both ways.

/Palle


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-22 13:19:58
Message-ID: 200508221319.j7MDJw802018@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Palle Girgensohn wrote:
> > I feel it makes sense to apply the smaller patch in any case, so that
> > there's a Win32 solution not requiring ICU (ie, I can't see an argument
> > for doing (2) rather than (3)).
> >
> > Comments?
>
> I don't mind either way, but while Win32 will work with Magnus' patch,
> FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the FreeBSD
> port where I already have the patch as an ("experiemental") option. Not
> every FreeBSD user uses the ports system, though.
>
> So, it is a question whether FreeBSD's unicode support is important or not,
> I guess? Win32 will work both ways.

How is FreeBSD's Unicode support broken? I was not aware of that.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-22 13:49:16
Message-ID: 7FDA7CFE72CE5A0A78CBE4D4@rambutan.pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

--On måndag, augusti 22, 2005 09.19.58 -0400 Bruce Momjian
<pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:

> Palle Girgensohn wrote:
>> > I feel it makes sense to apply the smaller patch in any case, so that
>> > there's a Win32 solution not requiring ICU (ie, I can't see an argument
>> > for doing (2) rather than (3)).
>> >
>> > Comments?
>>
>> I don't mind either way, but while Win32 will work with Magnus' patch,
>> FreeBSD won't; it needs the ICU patch to work. OTH, I maintain the
>> FreeBSD port where I already have the patch as an ("experiemental")
>> option. Not every FreeBSD user uses the ports system, though.
>>
>> So, it is a question whether FreeBSD's unicode support is important or
>> not, I guess? Win32 will work both ways.
>
> How is FreeBSD's Unicode support broken? I was not aware of that.

FreeBSD has no unicode collation support. Hence the need for ICU.

/Palle


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Palle Girgensohn <girgen(at)pingpong(dot)net>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-22 14:12:11
Message-ID: 15587.1124719931@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

Palle Girgensohn <girgen(at)pingpong(dot)net> writes:
> <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>> How is FreeBSD's Unicode support broken? I was not aware of that.

> FreeBSD has no unicode collation support. Hence the need for ICU.

Well, this obviously doesn't bother anyone who uses FreeBSD, so it need
not bother us either. I do not feel a need to take on ICU in order to
implement features that are not present anywhere else on the platform.

regards, tom lane


From: Palle Girgensohn <girgen(at)pingpong(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Magnus Hagander <mha(at)sollentuna(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Win32 unicode vs ICU
Date: 2005-08-22 14:19:00
Message-ID: 3C44968DF3CD8138D379192A@rambutan.pingpong.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches

--On måndag, augusti 22, 2005 10.12.11 -0400 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
wrote:

> Palle Girgensohn <girgen(at)pingpong(dot)net> writes:
>> <pgman(at)candle(dot)pha(dot)pa(dot)us> wrote:
>>> How is FreeBSD's Unicode support broken? I was not aware of that.
>
>> FreeBSD has no unicode collation support. Hence the need for ICU.
>
> Well, this obviously doesn't bother anyone who uses FreeBSD, so it need
> not bother us either. I do not feel a need to take on ICU in order to
> implement features that are not present anywhere else on the platform.

It bothered me enough to patch postgresql. :) And I use it with Java,
which has working unicode support, soo... Oh well, I can live with that -
I'll maintain my patch locally for the time beeing, if that's what's
required.

/Palle


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: FW: Win32 unicode vs ICU
Date: 2005-08-24 15:59:08
Message-ID: 200508241559.j7OFx8008679@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


This has been saved for the 8.2 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
>
> (This may explain that nobody answered me :P)
>
> //Magnus
>
>
> > -----Original Message-----
> > From: Magnus Hagander
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > Subject: Win32 unicode vs ICU
> >
> > Hi!
> >
> > I've been working with Palles ICU patch to make it work on
> > win32, and I believe I have it done. While doing it I noticed
> > that ICU basically converts to UTF16 and back - I previously
> > thought it worked on UTF8 strings. Based on this I also tried
> > out an implementation for the win32-unicode problem that does
> > *not* require ICU. It uses the win32 native functions to map
> > to utf16 and back, and then to process the text there. And I
> > got through with much less code than the ICU version, while
> > doing the same thing.
> >
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time
> > switch --with-icu (same as unix with the ICU patch)
> >
> >
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing
> > setlocale("sv_se"), for example, win32 will return this in
> > later calls as "Swedish_Sweden.1252". To get around this in
> > the ICU patch, I had to implement a lookup map that converts
> > it back to sv_se for ICU.
> >
> > * ICU is yet another build and runtime dependency, and a
> > large one (comes in at 11Mb for the DLL files alone in the
> > win32 download)
> >
> >
> > I guess that the main upside of it is that we'd get
> > constistent behaviour - in case there are issues with either
> > ICU or win32 native they'd otherwise differ. And only one new
> > codepath. But we already live with the platform-inconsistency today...
> >
> > Another upside is that it handles more encodings in ICU - my
> > native implementation does *only* UTF8 and relies on existing
> > functionality to deal with other encodings. It could of
> > course be extended if necessary, but from what I can tell
> > UTF8 is the big one.
> >
> >
> >
> > I have attached both patches. For the native version, only
> > win32_utf8.patch is required. For the ICU version,
> > icu_win32.patch is needed and also the files
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> > src/backend/port/win32. (the localemap needs to be updated to
> > do a better-than-linear search, but I wanted to include an example)
> >
> >
> > Thoughts on the options?
> >
> >
> > And anohter question - my native patch touches the same
> > functions as the ICU patch. Can somebody who knows the
> > internals confirm or deny that these are all the required
> > locations, or do we need to modify more?
> >
> > (I have run simple tests in swedish locale and both behave
> > the same and correct, but I'm unsure of exactly how much
> > would be affected)
> >
> > Finally, the win32 patch also changes the normal path to use
> > strncoll(). The comment above the function states that we'd
> > like to use strncoll but it's not available. Well, on win32
> > it is, so it should provide a speedup on win32. It is
> > currently not included in the ICU patch, but should probably
> > be included whichever path we'd chose.
> >
> >
> > //Magnus
> >

Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: FW: Win32 unicode vs ICU
Date: 2006-03-21 03:41:59
Message-ID: 200603210341.k2L3fx717718@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Is this patch moving toward competion?

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
>
> (This may explain that nobody answered me :P)
>
> //Magnus
>
>
> > -----Original Message-----
> > From: Magnus Hagander
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > Subject: Win32 unicode vs ICU
> >
> > Hi!
> >
> > I've been working with Palles ICU patch to make it work on
> > win32, and I believe I have it done. While doing it I noticed
> > that ICU basically converts to UTF16 and back - I previously
> > thought it worked on UTF8 strings. Based on this I also tried
> > out an implementation for the win32-unicode problem that does
> > *not* require ICU. It uses the win32 native functions to map
> > to utf16 and back, and then to process the text there. And I
> > got through with much less code than the ICU version, while
> > doing the same thing.
> >
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time
> > switch --with-icu (same as unix with the ICU patch)
> >
> >
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing
> > setlocale("sv_se"), for example, win32 will return this in
> > later calls as "Swedish_Sweden.1252". To get around this in
> > the ICU patch, I had to implement a lookup map that converts
> > it back to sv_se for ICU.
> >
> > * ICU is yet another build and runtime dependency, and a
> > large one (comes in at 11Mb for the DLL files alone in the
> > win32 download)
> >
> >
> > I guess that the main upside of it is that we'd get
> > constistent behaviour - in case there are issues with either
> > ICU or win32 native they'd otherwise differ. And only one new
> > codepath. But we already live with the platform-inconsistency today...
> >
> > Another upside is that it handles more encodings in ICU - my
> > native implementation does *only* UTF8 and relies on existing
> > functionality to deal with other encodings. It could of
> > course be extended if necessary, but from what I can tell
> > UTF8 is the big one.
> >
> >
> >
> > I have attached both patches. For the native version, only
> > win32_utf8.patch is required. For the ICU version,
> > icu_win32.patch is needed and also the files
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> > src/backend/port/win32. (the localemap needs to be updated to
> > do a better-than-linear search, but I wanted to include an example)
> >
> >
> > Thoughts on the options?
> >
> >
> > And anohter question - my native patch touches the same
> > functions as the ICU patch. Can somebody who knows the
> > internals confirm or deny that these are all the required
> > locations, or do we need to modify more?
> >
> > (I have run simple tests in swedish locale and both behave
> > the same and correct, but I'm unsure of exactly how much
> > would be affected)
> >
> > Finally, the win32 patch also changes the normal path to use
> > strncoll(). The comment above the function states that we'd
> > like to use strncoll but it's not available. Well, on win32
> > it is, so it should provide a speedup on win32. It is
> > currently not included in the ICU patch, but should probably
> > be included whichever path we'd chose.
> >
> >
> > //Magnus
> >

Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian http://candle.pha.pa.us
SRA OSS, Inc. http://www.sraoss.com

+ If your life is a hard drive, Christ can be your backup. +


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: PostgreSQL-patches <pgsql-patches(at)postgresql(dot)org>
Subject: Re: FW: Win32 unicode vs ICU
Date: 2006-06-14 18:49:02
Message-ID: 200606141849.k5EIn2h17524@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-patches


Added to TODO.detail.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> I just realised this mail didn't go through. Probably because it was too
> large for -hackers. So: repost to -patches. Sorry about that. If it's a
> duplicate, even more sorry, but I couldn't find it in the archives.
>
> (This may explain that nobody answered me :P)
>
> //Magnus
>
>
> > -----Original Message-----
> > From: Magnus Hagander
> > Sent: Sunday, July 31, 2005 2:09 PM
> > To: PostgreSQL-development
> > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > Subject: Win32 unicode vs ICU
> >
> > Hi!
> >
> > I've been working with Palles ICU patch to make it work on
> > win32, and I believe I have it done. While doing it I noticed
> > that ICU basically converts to UTF16 and back - I previously
> > thought it worked on UTF8 strings. Based on this I also tried
> > out an implementation for the win32-unicode problem that does
> > *not* require ICU. It uses the win32 native functions to map
> > to utf16 and back, and then to process the text there. And I
> > got through with much less code than the ICU version, while
> > doing the same thing.
> >
> > I am unsure of how to proceed. As I see it there are three paths:
> > 1) Use native win32 functionality only on win32
> > 2) Use ICU functionality only on win32
> > 3) Allow both ICU and native functionality, compile time
> > switch --with-icu (same as unix with the ICU patch)
> >
> >
> > The main downsides of ICU vs the native ones are:
> > * ICU does not accept win32 locale names. When doing
> > setlocale("sv_se"), for example, win32 will return this in
> > later calls as "Swedish_Sweden.1252". To get around this in
> > the ICU patch, I had to implement a lookup map that converts
> > it back to sv_se for ICU.
> >
> > * ICU is yet another build and runtime dependency, and a
> > large one (comes in at 11Mb for the DLL files alone in the
> > win32 download)
> >
> >
> > I guess that the main upside of it is that we'd get
> > constistent behaviour - in case there are issues with either
> > ICU or win32 native they'd otherwise differ. And only one new
> > codepath. But we already live with the platform-inconsistency today...
> >
> > Another upside is that it handles more encodings in ICU - my
> > native implementation does *only* UTF8 and relies on existing
> > functionality to deal with other encodings. It could of
> > course be extended if necessary, but from what I can tell
> > UTF8 is the big one.
> >
> >
> >
> > I have attached both patches. For the native version, only
> > win32_utf8.patch is required. For the ICU version,
> > icu_win32.patch is needed and also the files
> > localemap.c,localemap.pl, iso639 and iso3166 needs to go in
> > src/backend/port/win32. (the localemap needs to be updated to
> > do a better-than-linear search, but I wanted to include an example)
> >
> >
> > Thoughts on the options?
> >
> >
> > And anohter question - my native patch touches the same
> > functions as the ICU patch. Can somebody who knows the
> > internals confirm or deny that these are all the required
> > locations, or do we need to modify more?
> >
> > (I have run simple tests in swedish locale and both behave
> > the same and correct, but I'm unsure of exactly how much
> > would be affected)
> >
> > Finally, the win32 patch also changes the normal path to use
> > strncoll(). The comment above the function states that we'd
> > like to use strncoll but it's not available. Well, on win32
> > it is, so it should provide a speedup on win32. It is
> > currently not included in the ICU patch, but should probably
> > be included whichever path we'd chose.
> >
> >
> > //Magnus
> >

Content-Description: win32_utf8.patch

[ Attachment, skipping... ]

Content-Description: icu_win32.patch

[ Attachment, skipping... ]

Content-Description: localemap.pl

[ Attachment, skipping... ]

Content-Description: localemap.c

[ Attachment, skipping... ]

>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org

--
Bruce Momjian http://candle.pha.pa.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +