Re: Fixed length data types issue

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: andrew(at)supernews(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Fixed length data types issue
Date: 2006-09-07 12:27:01
Message-ID: 873bb3dfqi.fsf@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


Andrew - Supernews <andrew+nonews(at)supernews(dot)com> writes:

> Are you sure? Perhaps you are assuming that a char(1) field can be made
> to be fixed-length; this is not the case (consider utf-8 for example).

Well that could still be fixed length, it would just be a longer fixed length.
(In theory it would have to be 6 bytes long which I suppose would open up the
argument that if you're usually storing 7-bit ascii then a varlena would
usually be shorter.)

In any case I think the intersection of columns for which you care about i18n
and columns that you're storing according to an old-fashioned fixed column
layout is pretty much nil. And not just because it hasn't been updated to
modern standards either. If you look again at the columns in my example you'll
see none of them are appropriate targets for i18n anyways. They're all codes
and even numbers.

In other words if you're actually storing localized text then you almost
certainly will be using a text or varchar and probably won't even have a
maximum size. The use case for CHAR(n) is when you have fixed length
statically defined strings that are always the same length. it doesn't make
sense to store these in UTF8.

Currently Postgres has a limitation that you can only have one encoding per
database and one locale per cluster. Personally I'm of the opinion that the
only correct choice for that is "C" and all localization should be handled in
the client and with pg_strxfrm. Putting the whole database into non-C locales
guarantees that the columns that should not be localized will have broken
semantics and there's no way to work around things in the other direction.

Perhaps given the current situation what we should have is a cvarchar and
cchar data types that are like varchar and char but guaranteed to always be
interpreted in the c locale with ascii encoding.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-09-07 12:30:14 Re: Fixed length data types issue
Previous Message Gregory Stark 2006-09-07 12:11:49 Re: Fixed length data types issue