Saving space for common kinds of numeric values

From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Saving space for common kinds of numeric values
Date: 2007-02-22 20:31:50
Message-ID: 878xeqkk7d.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


The numeric data type's minimum data size is 8 bytes and it can only even get
that small for "0". Storing even "1" requires 10 bytes. That seems pretty
abysmal.

It occurs to me that we could assign special-case meanings for any datum
smaller than 8 bytes. In just 2 or 4 bytes (including the 1-byte varlena
header) I think we could encode many common small values.

This would be a pretty straightforward change to numeric.c because all the
hard work there is done using an internal representation that never reaches
disk. As a result only set_var_from_num and make_result and a handful of
simple functions that work directly on the packed Numeric representation would
need to be adjusted at all.

I'm thinking of the following two cases:

1 byte (plus 1 byte header): integer between -128..127 with dscale and weight
implicitly defined to display as an integer. Ie, weight is always the number
of digits in the value and dscale is always 0. That doesn't restrict the
values Numeric supports only the values it can use this representation for, so
if you store "-1.0" it would store it normally (or using the following option).

3 bytes (plus 1 byte header): 1 byte to store weight and dscale (one nibble
each) and 2 bytes to store the value. This would let us handle the extremely
common case of currency quantities which have 2 decimal places. It could store
-327.68 .. 327.67 in four bytes including the varlena header.

Alternatively we could do away with weight entirely for the 3 byte
representation as with the 1 byte representation. That would let us store up
-10485.76 .. 10486.75 in 21 bits and use the remaining 3 bytes to store a
dscale of up to 8. I actually favour this option.

There are lots of options and we could go nuts defining a meaning for every
possible length up to 8, but really I think just defining a 1+1 byte encoding
for small integers and a 3+1 byte encoding for many common applications would
be reasonable.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

Browse pgsql-hackers by date

  From Date Subject
Next Message Chris Browne 2007-02-22 21:56:02 Re: [Monotone-devel] Re: SCMS question
Previous Message Jim C. Nasby 2007-02-22 19:36:18 Re: Log levels for checkpoint/bgwriter monitoring