Re: Proposal - Support for National Characters functionality

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: arul(at)fast(dot)au(dot)fujitsu(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Proposal - Support for National Characters functionality
Date: 2013-07-05 05:02:13
Message-ID: 20130705.140213.799971806521596931.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Arul Shaji,

NCHAR support is on our TODO list for some time and I would like to
welcome efforts trying to implement it. However I have a few
questions:

> This is a proposal to implement functionalities for the handling of
> National Characters.
>
> [Introduction]
>
> The aim of this proposal is to eventually have a way to represent
> 'National Characters' in a uniform way, even in non-UTF8 encoded
> databases. Many of our customers in the Asian region who are now, as
> part of their platform modernization, are moving away from mainframes
> where they have used National Characters representation in COBOL and
> other databases. Having stronger support for national characters
> representation will also make it easier for these customers to look at
> PostgreSQL more favourably when migrating from other well known RDBMSs
> who all have varying degrees of NCHAR/NVARCHAR support.
>
> [Specifications]
>
> Broadly speaking, the national characters implementation ideally will
> include the following
> - Support for NCHAR/NVARCHAR data types
> - Representing NCHAR and NVARCHAR columns in UTF-8 encoding in non-UTF8
> databases

I think this is not a trivial work because we do not have framework to
allow mixed encodings in a database. I'm interested in how you are
going to solve the problem.

> - Support for UTF16 column encoding and representing NCHAR and NVARCHAR
> columns in UTF16 encoding in all databases.

Why do yo need UTF-16 as the database encoding? UTF-8 is already
supported, and any UTF-16 character can be represented in UTF-8 as far
as I know.

> - Support for NATIONAL_CHARACTER_SET GUC variable that will determine
> the encoding that will be used in NCHAR/NVARCHAR columns.

You said NCHAR's encoding is UTF-8. Why do you need the GUC if NCHAR's
encoding is fixed to UTF-8?

> The above points are at the moment a 'wishlist' only. Our aim is to
> tackle them one-by-one as we progress. I will send a detailed proposal
> later with more technical details.
>
> The main aim at the moment is to get some feedback on the above to know
> if this feature is something that would benefit PostgreSQL in general,
> and if users maintaining DBs in non-English speaking regions will find
> this beneficial.
>
> Rgds,
> Arul Shaji
>
>
>
> P.S.: It has been quite some time since I send a correspondence to this
> list. Our mail server adds a standard legal disclaimer to all outgoing
> mails, which I know that this list is not a huge fan of. I used to have
> an exemption for the mails I send to this list. If the disclaimer
> appears, apologies in advance. I will rectify that on the next one.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-07-05 05:19:17 Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: Proposal for Allow postgresql.conf values to be changed via SQL [review])
Previous Message Michael Paquier 2013-07-05 04:47:17 Re: Support for REINDEX CONCURRENTLY