Re: Converting MySQL tinyint to PostgreSQL

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Dawid Kuroczko <qnex42(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Joe <svn(at)freedomcircle(dot)net>, pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Converting MySQL tinyint to PostgreSQL
Date: 2005-07-13 10:46:20
Message-ID: 20050713104616.GA19317@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Jul 13, 2005 at 10:48:56AM +0200, Dawid Kuroczko wrote:
> As for the char/varchar type -- I was wondering. Worst case
> scenario for UTF-8 (correct me on this) is when 1 character
> takes 4 bytes. And biggest problem with char/varchar is that
> length indicator takes 4 bytes... How much overhead would
> it be to make a length variable, for example:
>
> (var)char(1)-char(63) -- 1 byte length + string
> char(64)-char(16383) -- 2 byte length + string
> char(16384)-text -- 4 byte length + string, like now

Well, you get another issue, alignment. If you squeeze your string
down, the next field, if it is an int or string, will get padded to a
multiple of 4 negating most of the gains. Like in C structures, there
is padding to optimise access.

> This would reduce length of char(5) string from 9 bytes to
> 6 bytes, char(2) from 6 bytes to 3 bytes (for multibyte chars
> it would be a win also).

The only types that won't require padding if they are next field are
bool, "char" and cstring. So char(1-4) will actually go from 8 to 4
bytes in most cases. char(5-8) will go from 12 to 8 bytes.

> I don't know the internals too well (read: at all), but I guess there
> would be a problem of choosing which length of length to use --
> would it be possible to make some sort of on-the-fly mapping
> when creating tables -- varchar(224) is text_2bytelength,
> text is text_4bytelength, char(1) is text_1bytelength...

At the moment there are two basic types: variable length and fixed
length as defined by the type id. Fixed length are stored as is.
Variable length is a 4 byte length plus the string. The two highest
bits are flags. AFAIK it's the typlen value that decides the decoding,
the type itself is irrelevent.

I suppose somebody could create a new encoding but I don't know how
hard that would be...

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2005-07-13 10:53:41 Re: NTFS partition autodetection during instalation
Previous Message Evandro's mailing lists (Please, don't send personal messages to this address) 2005-07-13 09:12:43 NTFS partition autodetection during instalation