Re: UTF8 national character data type support WIP patch and list of open issues.

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: tgl(at)sss(dot)pgh(dot)pa(dot)us, maumau307(at)gmail(dot)com, laurenz(dot)albe(at)wien(dot)gv(dot)at, robertmhaas(at)gmail(dot)com, peter_e(at)gmx(dot)net, arul(at)fast(dot)au(dot)fujitsu(dot)com, stark(at)mit(dot)edu, Maksym(dot)Boguk(at)au(dot)fujitsu(dot)com, hlinnakangas(at)vmware(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: UTF8 national character data type support WIP patch and list of open issues.
Date: 2013-11-13 20:19:50
Message-ID: 20131113201950.GA800@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote:
> I have been thinking about this for years and I think the key idea for
> this is, implementing "universal encoding". The universal encoding
> should have following characteristics to implement N>2 encoding in a
> database.
>
> 1) no loss of round trip encoding conversion
>
> 2) no mapping table is necessary to convert from/to existing encodings
>
> Once we implement the universal encoding, other problem such as
> "pg_database with multiple encoding problem" can be solved easily.

Isn't this essentially what the MULE internal encoding is?

> Currently there's no such an universal encoding in the universe, I
> think the only way is, inventing it by ourselves.

This sounds like a terrible idea. In the future people are only going
to want more advanced text functions, regular expressions, indexing and
making encodings that don't exist anywhere else seems like a way to
make a lot of work for little benefit.

A better idea seems to me is to (if postgres is configured properly)
embed the non-round-trippable characters in the custom character part
of the unicode character set. In other words, adjust the mappings
tables on demand and voila.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
-- Arthur Schopenhauer

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2013-11-13 20:27:14 pg_upgrade rebuild_tsvector_tables.sql includes child table columns
Previous Message Gavin Flower 2013-11-13 20:19:48 Re: hail the CFM