Re: FW: Win32 unicode vs ICU

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: "PostgreSQL-patches" <pgsql-patches(at)postgresql(dot)org>
Subject: Re: FW: Win32 unicode vs ICU
Date: 2006-03-21 16:05:35
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCEA351A2@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Not from me. I'm not working on it anymore. My work on it was directed
to making unicode work on win32, which was accomplished without ICU.

The original patch was to fix it for FreeBSD. But I think it was put
back pending an actual decision on how to deal with multi-locale
situations, since we'd want to pick a solution that will work for that.

//Magnus

>
> Is this patch moving toward competion?
>
> --------------------------------------------------------------
> -------------
>
> Magnus Hagander wrote:
> > I just realised this mail didn't go through. Probably
> because it was
> > too large for -hackers. So: repost to -patches. Sorry about
> that. If
> > it's a duplicate, even more sorry, but I couldn't find it
> in the archives.
> >
> > (This may explain that nobody answered me :P)
> >
> > //Magnus
> >
> >
> > > -----Original Message-----
> > > From: Magnus Hagander
> > > Sent: Sunday, July 31, 2005 2:09 PM
> > > To: PostgreSQL-development
> > > Cc: pgsql-hackers-win32(at)postgresql(dot)org
> > > Subject: Win32 unicode vs ICU
> > >
> > > Hi!
> > >
> > > I've been working with Palles ICU patch to make it work on win32,
> > > and I believe I have it done. While doing it I noticed that ICU
> > > basically converts to UTF16 and back - I previously thought it
> > > worked on UTF8 strings. Based on this I also tried out an
> > > implementation for the win32-unicode problem that does
> > > *not* require ICU. It uses the win32 native functions to map to
> > > utf16 and back, and then to process the text there. And I got
> > > through with much less code than the ICU version, while doing the
> > > same thing.
> > >
> > > I am unsure of how to proceed. As I see it there are three paths:
> > > 1) Use native win32 functionality only on win32
> > > 2) Use ICU functionality only on win32
> > > 3) Allow both ICU and native functionality, compile time switch
> > > --with-icu (same as unix with the ICU patch)
> > >
> > >
> > > The main downsides of ICU vs the native ones are:
> > > * ICU does not accept win32 locale names. When doing
> > > setlocale("sv_se"), for example, win32 will return this in later
> > > calls as "Swedish_Sweden.1252". To get around this in the
> ICU patch,
> > > I had to implement a lookup map that converts it back to
> sv_se for
> > > ICU.
> > >
> > > * ICU is yet another build and runtime dependency, and a
> large one
> > > (comes in at 11Mb for the DLL files alone in the
> > > win32 download)
> > >
> > >
> > > I guess that the main upside of it is that we'd get constistent
> > > behaviour - in case there are issues with either ICU or
> win32 native
> > > they'd otherwise differ. And only one new codepath. But
> we already
> > > live with the platform-inconsistency today...
> > >
> > > Another upside is that it handles more encodings in ICU -
> my native
> > > implementation does *only* UTF8 and relies on existing
> functionality
> > > to deal with other encodings. It could of course be extended if
> > > necessary, but from what I can tell
> > > UTF8 is the big one.
> > >
> > >
> > >
> > > I have attached both patches. For the native version, only
> > > win32_utf8.patch is required. For the ICU version,
> icu_win32.patch
> > > is needed and also the files localemap.c,localemap.pl, iso639 and
> > > iso3166 needs to go in src/backend/port/win32. (the
> localemap needs
> > > to be updated to do a better-than-linear search, but I wanted to
> > > include an example)
> > >
> > >
> > > Thoughts on the options?
> > >
> > >
> > > And anohter question - my native patch touches the same
> functions as
> > > the ICU patch. Can somebody who knows the internals
> confirm or deny
> > > that these are all the required locations, or do we need
> to modify
> > > more?
> > >
> > > (I have run simple tests in swedish locale and both
> behave the same
> > > and correct, but I'm unsure of exactly how much would be affected)
> > >
> > > Finally, the win32 patch also changes the normal path to use
> > > strncoll(). The comment above the function states that
> we'd like to
> > > use strncoll but it's not available. Well, on win32 it is, so it
> > > should provide a speedup on win32. It is currently not
> included in
> > > the ICU patch, but should probably be included whichever
> path we'd
> > > chose.
> > >
> > >
> > > //Magnus
> > >
>
> Content-Description: win32_utf8.patch
>
> [ Attachment, skipping... ]
>
> Content-Description: icu_win32.patch
>
> [ Attachment, skipping... ]
>
> Content-Description: localemap.pl
>
> [ Attachment, skipping... ]
>
> Content-Description: localemap.c
>
> [ Attachment, skipping... ]
>
> >
> > ---------------------------(end of
> > broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
>
> --
> Bruce Momjian http://candle.pha.pa.us
> SRA OSS, Inc. http://www.sraoss.com
>
> + If your life is a hard drive, Christ can be your backup. +
>

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-03-21 20:05:50 Re: Automatically setting work_mem
Previous Message Tom Lane 2006-03-21 14:52:47 Re: be-secure.c patch