Re: UNICODE/UTF-8 on win32

Lists: pgsql-hackerspgsql-hackers-win32
From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
To: "Tatsuo Ishii" <t-ishii(at)sra(dot)co(dot)jp>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers-win32(at)postgresql(dot)org>
Subject: Re: UNICODE/UTF-8 on win32
Date: 2005-01-01 13:48:04
Message-ID: 6BCB9D8A16AC4241919521715F4D8BCE4764A4@algol.sollentuna.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32

UNICODE/UTF-8 does not work on the win32 server. The reason is that
strcoll() and friends don't work with it. To support it on win32, it
needs to be converted to UTF16 and use the wide-character versions of
the fucntion. Which we do not do.
(See
http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
and
http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)

I don't *think* we need to disable ito n the client. AFAIK, the client
interfaces don't use any of these functions, and I've seen reports of
people using that long before we had a native win32 server.

//Magnus

>-----Original Message-----
>From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
>Sent: den 1 januari 2005 01:10
>To: tgl(at)sss(dot)pgh(dot)pa(dot)us
>Cc: Magnus Hagander; pgsql-hackers-win32(at)postgresql(dot)org
>Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
>
>
>Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
>problem here?
>--
>Tatsuo Ishii
>
>> "Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
>> > We know it's broken and won't be fixed for 8.0.
>>
>> > If we just #ifndef WIN32 the definitions in
>utils/mb/encnames.c it won't
>> > be possible to select that encoding, right? Will that have
>any other
>> > unwanted effects (such as breaking client encodings)? If
>not, I suggest
>> > this is done.
>>
>> I believe the subscripts in those arrays have to match the encoding
>> enum type, so you can't just ifdef out individual entries.
>>
>> > (Or perhaps something can be done in pg_valid_server_encoding?)
>>
>> Making the valid_server_encoding function reject it might work.
>> Tatsuo-san would know for sure.
>>
>> Should we also reject it as a client encoding, or does that work OK?
>>
>> regards, tom lane
>>
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Magnus Hagander <mha(at)sollentuna(dot)net>
Cc: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers-win32(at)postgresql(dot)org
Subject: Re: UNICODE/UTF-8 on win32
Date: 2005-01-01 16:36:46
Message-ID: 200501011636.j01Gakj12690@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32


TODO updated:

o Disallow encodings like UTF8 which PostgreSQL supports
but the operating system does not (already disallowed by
pginstaller)

To fix UTF8, the data needs to be converted to UTF16 and then
the Win32 strcoll() can be used.

---------------------------------------------------------------------------

Magnus Hagander wrote:
> UNICODE/UTF-8 does not work on the win32 server. The reason is that
> strcoll() and friends don't work with it. To support it on win32, it
> needs to be converted to UTF16 and use the wide-character versions of
> the fucntion. Which we do not do.
> (See
> http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
> and
> http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)
>
>
> I don't *think* we need to disable ito n the client. AFAIK, the client
> interfaces don't use any of these functions, and I've seen reports of
> people using that long before we had a native win32 server.
>
>
> //Magnus
>
>
> >-----Original Message-----
> >From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
> >Sent: den 1 januari 2005 01:10
> >To: tgl(at)sss(dot)pgh(dot)pa(dot)us
> >Cc: Magnus Hagander; pgsql-hackers-win32(at)postgresql(dot)org
> >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> >
> >
> >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
> >problem here?
> >--
> >Tatsuo Ishii
> >
> >> "Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
> >> > We know it's broken and won't be fixed for 8.0.
> >>
> >> > If we just #ifndef WIN32 the definitions in
> >utils/mb/encnames.c it won't
> >> > be possible to select that encoding, right? Will that have
> >any other
> >> > unwanted effects (such as breaking client encodings)? If
> >not, I suggest
> >> > this is done.
> >>
> >> I believe the subscripts in those arrays have to match the encoding
> >> enum type, so you can't just ifdef out individual entries.
> >>
> >> > (Or perhaps something can be done in pg_valid_server_encoding?)
> >>
> >> Making the valid_server_encoding function reject it might work.
> >> Tatsuo-san would know for sure.
> >>
> >> Should we also reject it as a client encoding, or does that work OK?
> >>
> >> regards, tom lane
> >>
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
To: mha(at)sollentuna(dot)net
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers-win32(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UNICODE/UTF-8 on win32
Date: 2005-01-02 11:55:55
Message-ID: 20050102.205555.71548768.t-ishii@sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32

I do understand the problem, but don't undertstand the decision you
guys made. The fact that UPPER/LOWER and some other functions does not
work in win32 is surely a problem for some languages, but not a
problem for otheres. For example, Japanese (and probably Chinese and
Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
does not work with UTF-8/win32 is not problem for Japanese (and for
some other languages). Just using C locale with UTF-8 is enough in
this case.

In summary, I think you guys are going to overkill the multibyte
support functionality on UTF-8/win32 because of the fact that some
langauges do not work.

Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.

I strongly object the policy to try to unconditionaly disable UTF-8
support on win32.
--
Tatsuo Ishii

From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: Sat, 1 Jan 2005 14:48:04 +0100
Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4(at)algol(dot)sollentuna(dot)se>

> UNICODE/UTF-8 does not work on the win32 server. The reason is that
> strcoll() and friends don't work with it. To support it on win32, it
> needs to be converted to UTF16 and use the wide-character versions of
> the fucntion. Which we do not do.
> (See
> http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
> and
> http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)
>
>
> I don't *think* we need to disable ito n the client. AFAIK, the client
> interfaces don't use any of these functions, and I've seen reports of
> people using that long before we had a native win32 server.
>
>
> //Magnus
>
>
> >-----Original Message-----
> >From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
> >Sent: den 1 januari 2005 01:10
> >To: tgl(at)sss(dot)pgh(dot)pa(dot)us
> >Cc: Magnus Hagander; pgsql-hackers-win32(at)postgresql(dot)org
> >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> >
> >
> >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
> >problem here?
> >--
> >Tatsuo Ishii
> >
> >> "Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
> >> > We know it's broken and won't be fixed for 8.0.
> >>
> >> > If we just #ifndef WIN32 the definitions in
> >utils/mb/encnames.c it won't
> >> > be possible to select that encoding, right? Will that have
> >any other
> >> > unwanted effects (such as breaking client encodings)? If
> >not, I suggest
> >> > this is done.
> >>
> >> I believe the subscripts in those arrays have to match the encoding
> >> enum type, so you can't just ifdef out individual entries.
> >>
> >> > (Or perhaps something can be done in pg_valid_server_encoding?)
> >>
> >> Making the valid_server_encoding function reject it might work.
> >> Tatsuo-san would know for sure.
> >>
> >> Should we also reject it as a client encoding, or does that work OK?
> >>
> >> regards, tom lane
> >>
> >
>


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: mha(at)sollentuna(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers-win32(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: 2005-02-22 03:43:02
Message-ID: 200502220343.j1M3h2P07627@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32


Magnus, where are we on this? Seems we should allow unicode encoding
and just not unicode locale in pginstaller.

Also, Unicode is changing to UTF-8 in 8.1.

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> I do understand the problem, but don't undertstand the decision you
> guys made. The fact that UPPER/LOWER and some other functions does not
> work in win32 is surely a problem for some languages, but not a
> problem for otheres. For example, Japanese (and probably Chinese and
> Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
> does not work with UTF-8/win32 is not problem for Japanese (and for
> some other languages). Just using C locale with UTF-8 is enough in
> this case.
>
> In summary, I think you guys are going to overkill the multibyte
> support functionality on UTF-8/win32 because of the fact that some
> langauges do not work.
>
> Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.
>
> I strongly object the policy to try to unconditionaly disable UTF-8
> support on win32.
> --
> Tatsuo Ishii
>
> From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
> Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> Date: Sat, 1 Jan 2005 14:48:04 +0100
> Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4(at)algol(dot)sollentuna(dot)se>
>
> > UNICODE/UTF-8 does not work on the win32 server. The reason is that
> > strcoll() and friends don't work with it. To support it on win32, it
> > needs to be converted to UTF16 and use the wide-character versions of
> > the fucntion. Which we do not do.
> > (See
> > http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
> > and
> > http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)
> >
> >
> > I don't *think* we need to disable ito n the client. AFAIK, the client
> > interfaces don't use any of these functions, and I've seen reports of
> > people using that long before we had a native win32 server.
> >
> >
> > //Magnus
> >
> >
> > >-----Original Message-----
> > >From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
> > >Sent: den 1 januari 2005 01:10
> > >To: tgl(at)sss(dot)pgh(dot)pa(dot)us
> > >Cc: Magnus Hagander; pgsql-hackers-win32(at)postgresql(dot)org
> > >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> > >
> > >
> > >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
> > >problem here?
> > >--
> > >Tatsuo Ishii
> > >
> > >> "Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
> > >> > We know it's broken and won't be fixed for 8.0.
> > >>
> > >> > If we just #ifndef WIN32 the definitions in
> > >utils/mb/encnames.c it won't
> > >> > be possible to select that encoding, right? Will that have
> > >any other
> > >> > unwanted effects (such as breaking client encodings)? If
> > >not, I suggest
> > >> > this is done.
> > >>
> > >> I believe the subscripts in those arrays have to match the encoding
> > >> enum type, so you can't just ifdef out individual entries.
> > >>
> > >> > (Or perhaps something can be done in pg_valid_server_encoding?)
> > >>
> > >> Making the valid_server_encoding function reject it might work.
> > >> Tatsuo-san would know for sure.
> > >>
> > >> Should we also reject it as a client encoding, or does that work OK?
> > >>
> > >> regards, tom lane
> > >>
> > >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: mha(at)sollentuna(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers-win32(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
Date: 2005-03-16 00:47:50
Message-ID: 200503160047.j2G0loZ15356@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32

Tatsuo Ishii wrote:
> I do understand the problem, but don't undertstand the decision you
> guys made. The fact that UPPER/LOWER and some other functions does not
> work in win32 is surely a problem for some languages, but not a
> problem for otheres. For example, Japanese (and probably Chinese and
> Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
> does not work with UTF-8/win32 is not problem for Japanese (and for
> some other languages). Just using C locale with UTF-8 is enough in
> this case.
>
> In summary, I think you guys are going to overkill the multibyte
> support functionality on UTF-8/win32 because of the fact that some
> langauges do not work.
>
> Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.
>
> I strongly object the policy to try to unconditionaly disable UTF-8
> support on win32.

I have just applied a patch to CVS HEAD and 8.0.X that disables
locale-aware handling of upper/lower/initcap when the locale is C or
POSIX.

With these changes, it seems safe to allow pginstaller to use UTF8
encoding of the locale is C/POSIX. If we don't do that, I am concerned
that Asian users will either make a hacked installer or be required to
run initdb manually by following complex instructions.

We could throw a warning if the combination is selected as a compromise.

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073


From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: Tatsuo Ishii <t-ishii(at)sra(dot)co(dot)jp>
Cc: mha(at)sollentuna(dot)net, tgl(at)sss(dot)pgh(dot)pa(dot)us, pgsql-hackers-win32(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] UNICODE/UTF-8 on win32
Date: 2005-04-24 12:35:15
Message-ID: 200504241235.j3OCZF927011@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers pgsql-hackers-win32


Where are we on this? As far as I can tell, we never disabled UTF8 on
Win32 in our code. The only thing we did do was to disable UTF8 in
pginstaller. See this FAQ item:

http://pginstaller.projects.postgresql.org/faq/FAQ_windows.html#2.6

Is the current setup OK? Should we allow UTF8 on Win32 for languages
that can use C locale, like Asian languages?

---------------------------------------------------------------------------

Tatsuo Ishii wrote:
> I do understand the problem, but don't undertstand the decision you
> guys made. The fact that UPPER/LOWER and some other functions does not
> work in win32 is surely a problem for some languages, but not a
> problem for otheres. For example, Japanese (and probably Chinese and
> Korean) does not have a concept upper/lower. So the fact UPPER/LOWER
> does not work with UTF-8/win32 is not problem for Japanese (and for
> some other languages). Just using C locale with UTF-8 is enough in
> this case.
>
> In summary, I think you guys are going to overkill the multibyte
> support functionality on UTF-8/win32 because of the fact that some
> langauges do not work.
>
> Same thing can be said to EUC-JP, EUC-CN and EUC-KR and so on as well.
>
> I strongly object the policy to try to unconditionaly disable UTF-8
> support on win32.
> --
> Tatsuo Ishii
>
> From: "Magnus Hagander" <mha(at)sollentuna(dot)net>
> Subject: RE: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> Date: Sat, 1 Jan 2005 14:48:04 +0100
> Message-ID: <6BCB9D8A16AC4241919521715F4D8BCE4764A4(at)algol(dot)sollentuna(dot)se>
>
> > UNICODE/UTF-8 does not work on the win32 server. The reason is that
> > strcoll() and friends don't work with it. To support it on win32, it
> > needs to be converted to UTF16 and use the wide-character versions of
> > the fucntion. Which we do not do.
> > (See
> > http://archives.postgresql.org/pgsql-hackers-win32/2004-11/msg00036.php
> > and
> > http://archives.postgresql.org/pgsql-hackers-win32/2004-12/msg00106.php)
> >
> >
> > I don't *think* we need to disable ito n the client. AFAIK, the client
> > interfaces don't use any of these functions, and I've seen reports of
> > people using that long before we had a native win32 server.
> >
> >
> > //Magnus
> >
> >
> > >-----Original Message-----
> > >From: Tatsuo Ishii [mailto:t-ishii(at)sra(dot)co(dot)jp]
> > >Sent: den 1 januari 2005 01:10
> > >To: tgl(at)sss(dot)pgh(dot)pa(dot)us
> > >Cc: Magnus Hagander; pgsql-hackers-win32(at)postgresql(dot)org
> > >Subject: Re: [pgsql-hackers-win32] UNICODE/UTF-8 on win32
> > >
> > >
> > >Sorry, but I don't subscribe to pgsql-hackers-win32 list. What's the
> > >problem here?
> > >--
> > >Tatsuo Ishii
> > >
> > >> "Magnus Hagander" <mha(at)sollentuna(dot)net> writes:
> > >> > We know it's broken and won't be fixed for 8.0.
> > >>
> > >> > If we just #ifndef WIN32 the definitions in
> > >utils/mb/encnames.c it won't
> > >> > be possible to select that encoding, right? Will that have
> > >any other
> > >> > unwanted effects (such as breaking client encodings)? If
> > >not, I suggest
> > >> > this is done.
> > >>
> > >> I believe the subscripts in those arrays have to match the encoding
> > >> enum type, so you can't just ifdef out individual entries.
> > >>
> > >> > (Or perhaps something can be done in pg_valid_server_encoding?)
> > >>
> > >> Making the valid_server_encoding function reject it might work.
> > >> Tatsuo-san would know for sure.
> > >>
> > >> Should we also reject it as a client encoding, or does that work OK?
> > >>
> > >> regards, tom lane
> > >>
> > >
> >
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo(at)postgresql(dot)org
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 359-1001
+ If your life is a hard drive, | 13 Roberts Road
+ Christ can be your backup. | Newtown Square, Pennsylvania 19073