Re: Fixed length data types issue

From: mark(at)mark(dot)mielke(dot)cc
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Martijn van Oosterhout <kleptog(at)svana(dot)org>, Gregory Stark <gsstark(at)mit(dot)edu>, andrew(at)supernews(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-08 22:07:30
Message-ID: 20060908220730.GA19800@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 04:49:22PM -0400, Andrew Dunstan wrote:
> mark(at)mark(dot)mielke(dot)cc wrote:
> >Only ASCII values store more space efficiently in UTF-8. All values
> >over 127 store more space efficiently using UTF-16.
> This second statement is demonstrably not true. Only values above 0x07ff
> require more than 2 bytes in UTF-8. All chars up to that point are
> stored in UTF-8 with greater or equal efficiency than that of UTF-16.
> See http://www.zvon.org/tmRFC/RFC2279/Output/chapter2.html

You are correct - I should have said "All values over 127 store
at least as space efficiently using UTF-16 as UTF-8."

From the ICU page: "Most of the time, the memory throughput of the
hard drive and RAM is the main performance constraint. UTF-8 is 50%
smaller than UTF-16 or US-ASCII, but UTF-8 is 50% larger than UTF-16
or East and South Asian scripts. There is no memory difference for
Latin extensions, Greek, Cyrillic, Hebrew, and Arabic.

For processing Unicode data, UTF-16 is much easier to handle. You get
a choice between either one or two units per character, not a choice
among four lengths. UTF-16 also does not have illegal 16-bit unit
values, while you might want to check or illegal bytes in UTF-8.
Incomplete character sequences in UTF-16 are less important and more
benign. If you want to quickly convert small strings between the
different UTF encodings or get a UChar32 value, you can use the macros
provided in utf.h and ..."

I didn't think of the iterators for simple uses.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message elein 2006-09-08 23:44:49 Re: Domains and subtypes, a brief proposal
Previous Message Mark Woodward 2006-09-08 21:26:55 Mapping arbitriary and heirachical XML to tuple