Re: UTF8 national character data type support WIP patch and list of open issues.

From: "MauMau" <maumau307(at)gmail(dot)com>
To: <robertmhaas(at)gmail(dot)com>, "Tatsuo Ishii" <ishii(at)postgresql(dot)org>
Cc: <ishii(at)postgresql(dot)org>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, <hlinnakangas(at)vmware(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-09-23 06:53:02
Message-ID: D0A2FE73E8354EDCBEE56EC79268CA4E@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Tatsuo Ishii" <ishii(at)postgresql(dot)org>
> I don't think the bind placeholder is the case. That is processed by
> exec_bind_message() in postgres.c. It has enough info about the type
> of the placeholder, and I think we can easily deal with NCHAR. Same
> thing can be said to COPY case.

Yes, I've learned it. Agreed. If we allow an encoding for NCHAR different
from the database encoding, we can convert text from the client encoding to
the NCHAR encoding in nchar_in() for example. We can retrieve the NCHAR
encoding from pg_database and store it in a global variable at session
start.

> Problem is an ordinary query (simple protocol "Q" message) as you
> pointed out. Encoding conversion happens at a very early stage (note
> that fast-path case has the same issue). If a query message contains,
> say, SHIFT-JIS and EUC-JP, then we are going into trouble because the
> encoding conversion routine (pg_client_to_server) regards that the
> message from client contains only one encoding. However my question
> is, does it really happen? Because there's any text editor which can
> create SHIFT-JIS and EUC-JP mixed text. So my guess is, when user want
> to use NCHAR as SHIFT-JIS text, the rest of query consist of either
> SHIFT-JIS or plain ASCII. If so, what the user need to do is, set the
> client encoding to SJIFT-JIS and everything should be fine.
>
> Maumau, is my guess correct?

Yes, I believe you are right. Regardless of whether we support multiple
encodings in one database or not, a single client encoding will be
sufficient for one session. When receiving the "Q" message, the whole SQL
text is converted from the client encoding to the database encoding. This
part needs no modification. During execution of the "Q" message, NCHAR
values are converted from the database encoding to the NCHAR encoding.

Thank you very much, Tatsuo san. Everybody, is there any other challenge we
should consider to support NCHAR/NVARCHAR types as distinct types?

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message samthakur74 2013-09-23 07:56:13 Re: pg_stat_statements: calls under-estimation propagation
Previous Message Abhijit Menon-Sen 2013-09-23 06:47:44 Re: LDAP: bugfix and deprecated OpenLDAP API