From: | Simon Riggs <simon(at)2ndquadrant(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Gregory Stark <gsstark(at)mit(dot)edu>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org, Martijn van Oosterhout <kleptog(at)svana(dot)org> |
Subject: | Re: Fixed length data types issue |
Date: | 2006-09-11 16:28:23 |
Message-ID: | 1157992103.2692.392.camel@holly |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Sun, 2006-09-10 at 21:16 -0400, Tom Lane wrote:
> After further thought I have an alternate proposal
(snip)
> * If high order bit of datum's first byte is 0, then it's an
> uncompressed datum in what's essentially the same as our current
> in-memory format except that the 4-byte length word must be big-endian
> (to ensure that the leading bit can be kept zero). In particular this
> format will be aligned on 4- or 8-byte boundary as called for by the
> datatype definition.
>
> * If high order bit of first byte is 1, then it's some compressed
> variant. I'd propose divvying up the code space like this:
>
> * 0xxxxxxx uncompressed 4-byte length word as stated above
> * 10xxxxxx 1-byte length word, up to 62 bytes of data
> * 110xxxxx 2-byte length word, uncompressed inline data
> * 1110xxxx 2-byte length word, compressed inline data
> * 1111xxxx 1-byte length word, out-of-line TOAST pointer
>
> This limits us to 8K uncompressed or 4K compressed inline data without
> toasting, which is slightly annoying but probably still an insignificant
> limitation. It also means more distinct cases for the heap_deform_tuple
> inner loop to think about, which might be a problem.
>
> Since the compressed forms would not be aligned to any boundary,
> there's an important special case here: how can heap_deform_tuple tell
> whether the next field is compressed or not? The answer is that we'll
> have to require pad bytes between fields to be zero. (They already are
> zeroed by heap_form_tuple, but now it'd be a requirement.) So the
> algorithm for decoding a non-null field is:
>
> * if looking at a byte with high bit 0, then we are either
> on the start of an uncompressed field, or on a pad byte before
> such a field. Advance to the declared alignment boundary for
> the datatype, read a 4-byte length word, and proceed.
>
> * if looking at a byte with high bit 1, then we are at the
> start of a compressed field (which will never have any preceding
> pad bytes). Decode length as per rules above.
>
> The good thing about this approach is that it requires zero changes to
> fundamental system structure. The pack/unpack rules in heap_form_tuple
> and heap_deform_tuple change a bit, and the mechanics of
> PG_DETOAST_DATUM change, but a Datum is still just a pointer and you
> can always tell what you've got by examining the pointed-to data.
Seems like a great approach to this pain point.
More fun than lots of new datatypes also.
Is this an 8.2 thing? If not, is Numeric508 applied?
--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Alvaro Herrera | 2006-09-11 16:51:39 | Re: Fixed length data types issue |
Previous Message | Andrew Dunstan | 2006-09-11 15:42:06 | Re: contrib/xml2 and PG_MODULE_MAGIC |