Re: Reducing the overhead of NUMERIC data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing the overhead of NUMERIC data
Date: 2005-11-01 21:54:11
Message-ID: 28566.1130882051@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> varlen is int32 to match the standard varlena header. However, the max
> number of digits of the datatype is less than the threshold at which
> values get toasted. So no NUMERIC values ever get toasted - in which
> case, why worry about matching the size of varlena - lets reduce it to 2
> bytes which still gives us up to 1000 digits as we have now.

Because that will require an extra case in the code that disassembles
tuples, which will slow down *everything* even in databases that don't
contain one single NUMERIC value. I think you need more than "let's
save 2 bytes in NUMERICs" to justify that.

> n_weight seems to exist because we do not store trailing zeroes. So
> 1000000 is stored as 1 with a weight of 6. My experience is that large
> numbers of trailing zeroes do not occur with any frequency in real
> measurement or financial data and that this is an over-optimization.

That seems debatable. Keep in mind that not storing extra zeroes cuts
computation time as well as storage.

> n_sign_dscale shows us where the decimal point is. We could actually
> store a marker representing the decimal point, which would cost us 0.5
> byte rather than 2 bytes. Since we have 4 bits to represent a decimal
> number, that leaves a few bits spare to represent either a decimal-
> point-and-positive-sign and decimal-point-and-negative-sign. (We would
> still need to store trailing zeroes even after the decimal point).

This is completely bogus. How are you going to remember the sign except
by always storing a marker? ISTM this proposal just moves the
sign/decimalpoint overhead from one place to another, ie, somewhere in
the NumericDigit array instead of in a fixed field.

Also, you can't drop dscale without abandoning the efficient base-10000
representation, at least not unless you want people complaining that the
database shows NUMERIC(3) data with four decimal places.

It might be reasonable to restrict the range of NUMERIC to the point
that we could fit the weight/sign/dscale into 2 bytes instead of 4,
thereby saving 2 bytes per NUMERIC. I'm not excited about the other
aspects of this, though.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2005-11-01 22:16:58 Re: Reducing the overhead of NUMERIC data
Previous Message Dave Page 2005-11-01 21:32:18 Attn: Richard Huxton

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2005-11-01 21:59:00 Re: Final cleanup of SQL:1999 references
Previous Message Simon Riggs 2005-11-01 21:22:17 Reducing the overhead of NUMERIC data