From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
Cc: | "Boguk, Maksym" <maksymb(at)fast(dot)au(dot)fujitsu(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: UTF8 national character data type support WIP patch and list of open issues. |
Date: | 2013-09-03 13:58:19 |
Message-ID: | 15548.1378216699@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> writes:
> On 03.09.2013 05:28, Boguk, Maksym wrote:
>> Target usage: ability to store UTF8 national characters in some
>> selected fields inside a single-byte encoded database.
> I think we should take a completely different approach to this. Two
> alternatives spring to mind:
> 1. Implement a new encoding. The new encoding would be some variant of
> UTF-8 that encodes languages like Russian more efficiently.
+1. I'm not sure that SCSU satisfies the requirement (which I read as
that Russian text should be pretty much 1 byte/character). But surely
we could devise a variant that does. For instance, it could look like
koi8r (or any other single-byte encoding of your choice) with one byte
value, say 255, reserved as a prefix. 255 means that a UTF8 character
follows. The main complication here is that you don't want to allow more
than one way to represent a character --- else you break text hashing,
for instance. So you'd have to take care that you never emit the 255+UTF8
representation for a character that can be represented in the single-byte
encoding. In particular, you'd never encode ASCII that way, and thus this
would satisfy the all-multibyte-chars-must-have-all-high-bits-set rule.
Ideally we could make a variant like this for each supported single-byte
encoding, and thus you could optimize a database for "mostly but not
entirely LATIN1 text", etc.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Merlin Moncure | 2013-09-03 13:59:53 | Re: operator precedence issues |
Previous Message | Andres Freund | 2013-09-03 13:26:54 | Re: INSERT...ON DUPLICATE KEY IGNORE |