Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: <robertmhaas(at)gmail(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-25 11:43:13
Message-ID: 39433F3837CB4CAE90753D4B1F514615@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
> On Tue, 2013-09-24 at 21:04 +0900, MauMau wrote:
>> "4. I guess some users really want to continue to use ShiftJIS or EUC_JP
>> for
>> database encoding, and use NCHAR for a limited set of columns to store
>> international text in Unicode:
>> - to avoid code conversion between the server and the client for
>> performance
>> - because ShiftJIS and EUC_JP require less amount of storage (2 bytes for
>> most Kanji) than UTF-8 (3 bytes)
>> This use case is described in chapter 6 of "Oracle Database Globalization
>> Support Guide"."
>
> But your proposal wouldn't address the first point, because data would
> have to go client -> server -> NCHAR.
>
> The second point is valid, but it's going to be an awful amount of work
> for that limited result.

I (or, Oracle's use case) meant the following, for example:

initdb -E EUC_JP
CREATE DATABASE mydb ENCODING EUC_JP NATIONAL ENCODING UTF-8;
CREATE TABLE mytable (
col1 char(10), -- EUC_JP text
col2 Nchar(10), -- UTF-8 text
);
client encoding = EUC_JP

That is,

1. Currently, the user is only handling Japanese text. To avoid unnecessary
conversion, he uses EUC_JP for both client and server.
2. He needs to store some limited amount of international (non-Japanese)
text in a few columns for a new feature of the system. But the
international text is limited, so he wants to sacrifice performance and
storage cost due to code conversion for most text and more bytes for each
character.

Regards
MauMau

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-09-25 12:48:19 Re: Freezing without write I/O
Previous Message Stas Kelvich 2013-09-25 11:14:09 Cube extension split algorithm fix