Re: using Core Foundation locale functions

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using Core Foundation locale functions
Date: 2014-12-02 08:17:58
Message-ID: CAM3SWZS4OrMdjbgsH97DrJ0hLmCE5=wvgh+DHT-mmOyq1h69Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Nov 28, 2014 at 8:43 AM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
> At the moment, this is probably just an experiment that shows where
> refactoring and better abstractions might be suitable if we want to
> support multiple locale libraries. If we want to pursue ICU, I think
> this could be a useful third option.

FWIW, I think that the richer API that ICU provides for string
transformations could be handy in optimizing sorting using abbreviated
keys. For example, ICU will happily only produce parts of sort keys
(the equivalent of strxfrm() blobs) if that is all that is required
[1].

I think that ICU also allows clients to parse individual primary
weights in a principled way (primary weights tend to be isomorphic to
the Unicode code points in the original string). I think that this
will enable order-preserving compression of the type anticipated by
the Unicode collation algorithm [2]. That could be useful for certain
languages, like Russian, where the primary weight level usually
contains multi-byte code points with glibc's strxfrm() (this is
generally not true of languages that use the Latin alphabet, or of
East Asian languages).

Note that there is already naturally a form of what you might call
compression with strxfrm() [3]. This is very useful for abbreviated
keys.

[1] http://userguide.icu-project.org/collation/architecture
[2] http://www.unicode.org/reports/tr10/#Run-length_Compression
[3] http://www.postgresql.org/message-id/CAM3SWZTyWe5J69TaPvZf2CM7mhSKKE3UhHnK9gLuQckkWqoL5w@mail.gmail.com
--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2014-12-02 09:59:14 Re: excessive amounts of consumed memory (RSS), triggering OOM killer
Previous Message Xiaoyulei 2014-12-02 08:07:55 Many processes blocked at ProcArrayLock