Re: Locales and Encodings

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Peter Eisentraut" <peter_e(at)gmx(dot)net>
Cc: <pgsql-hackers(at)postgresql(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Locales and Encodings
Date: 2007-10-12 13:03:47
Message-ID: 87ve9cfpsc.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

"Peter Eisentraut" <peter_e(at)gmx(dot)net> writes:

> Am Freitag, 12. Oktober 2007 schrieb Gregory Stark:
>> . when creating a new database from a template the new locale and encoding
>>   must be identical to the template database's encoding and locale. Unless
>> the template is template0 in which case we rebuild all indexes after
>> copying.
>
> Why would you restrict the index rebuilding only to this particular case? It
> could be done for any database.

Well there's no guarantee there isn't 8-bit data in other databases which
would be invalid in the new encoding. I think it's reasonable to assume
there's only 7-bit ascii in template0 however.

An alternative would be introducing an ASCII7 encoding which template0 would
use and any other database in that encoding could be used as a template for
any encoding. However that would still require index rebuilds which would
potentially take a long time. Another alternative would be recoding all the
data from the template database encoding to the new encoding and throwing an
error if a non-encodable character is found.

I think it's a lot simpler to just declare it a non-problem by saying there
won't be any non-ascii text in template0.

> The other issue are shared catalogs.

This approach doesn't address that but I don't think it makes the problems
there any worse either. That is, I think already have these problems around
shared tables.

. If you have two databases with locales that don't agree then the indexes on
those tables won't function properly.

. What happens if you create a user while connected to a latin1 database with
an é in his username and then connect to a database in a UTF8 database? That
username is now an invalidly encoded UTF8 string.

Perhaps we should be using pattern_ops for the indexes on the shared tables?
Or using bytea with UTF8 encoded strings instead of name and text? That
actually sounds reasonable now that we have convert() functions which take and
generate bytea, at least for the text fields like in pltemplate -- less so for
the name columns.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Trevor Talbot 2007-10-12 13:03:52 Re: Locale + encoding combinations
Previous Message Mario Weilguni 2007-10-12 12:57:52 Re: pg_restore oddity?