Re: Fixed length data types issue

From: mark(at)mark(dot)mielke(dot)cc
To: Peter Eisentraut <peter_e(at)gmx(dot)net>
Cc: Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org>
Subject: Re: Fixed length data types issue
Date: 2006-09-08 13:28:21
Message-ID: 20060908132821.GA24823@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 08, 2006 at 08:57:12AM +0200, Peter Eisentraut wrote:
> Gregory Stark wrote:
> > I think we have to find a way to remove the varlena length header
> > entirely for fixed length data types since it's going to be the same
> > for every single record in the table.
> But that won't help in the example you posted upthread, because char(N)
> is not fixed-length.

It can be fixed-length, or at least, have an upper bound. If marked
up to contain only ascii characters, it doesn't, at least in theory,
and even if it is unicode, it's not going to need more than 4 bytes
per character. char(2) through char(16) only require 4 bits to
store the length header, leaving 4 bits for encoding information.
bytea(2) through bytea(16), at least in theory, should require none.

For my own uses, I would like for bytea(16) to have no length header.
The length is constant. UUID or MD5SUM. Store the length at the head
of the table, or look up the information from the schema.

I see the complexity argument. Existing code is too heavy to change
completely. People talking about compromises such as allowing the
on disk layout to be different from the in memory layout. I wonder
whether the change could be small enough to not significantly
increase CPU, while still having significant effect. I find myself
doubting the CPU bound numbers. If even 20% data is saved, this
means 20% more RAM for caching, 20% less pages touched when
scanning, and 20% less RAM read. When people say CPU-bound, are we
sure they do not mean RAM speed bound? How do they tell the
difference between the two? RAM lookups count as CPU on most
performance counters I've ever used. RAM speed is also slower than
CPU speed, allowing for calculations between accesses assuming
that the loop allows for prefetching to be possible and accurate.

Cheers,
mark

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-09-08 13:30:01 Re: postgresql shared buffers
Previous Message Martijn van Oosterhout 2006-09-08 13:13:30 Re: Fixed length data types issue