Re: Reducing NUMERIC size for 8.3

Lists: pgsql-hackers
From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: <pgsql-hackers(at)postgresql(dot)org>
Subject: Reducing NUMERIC size for 8.3
Date: 2007-06-18 14:31:29
Message-ID: 1182177090.6855.175.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

We've changed the on-disk database format in 8.3, so we have an
opportunity to change other things also. There is a patch thats been on
the patch queue for some time called numeric508, submitted Dec 2005;
I've updated this patch now for 8.3 to remove bit rot (an hour's work).
This is posted to pgsql-patches now and it works.

The benefit of the patch is that it reduces each NUMERIC value by 2
bytes, so will speed up things considerably. This is now especially
important if we are looking to reduce the speed of numeric division by a
factor of 4 (recent WIP patch).

The objections to applying this patch originally were:
1. it changes on-disk format (we've done this, so argument is void)
2. it would restrict number of digits to 508 and there are allegedly
some people that want to store > 508 digits.

The current patch passes all regression tests, but currently fails
numeric_big.sql since this explicitly checks for support of
numeric(1000,800).

We could:
a) accept the patch as-is and restrict NUMERIC to 508 digits
b) refine the patch somewhat to allow 1000 digits

(b) is possible in a couple of ways, both fairly quick:
- extend the patch so that one of the spare bits from the second digit
is used to represent dscale 508-1000.
- extend the patch so that if weight > 127 or dscale > 127 we would use
the first byte in the digits as an extra indicator byte holding the high
bits of both fields.
Neither change makes any difference to numbers below
1,000,000,000,000,000....(127 zeroes in total)...000 which probably
covers the vast majority of people's usage.

Objections: True, we are passed feature freeze, but this patch has been
on the queue for 14 months prior to freeze and has been waiting on disk
format changes to make patch application acceptable. We definitely want
to reduce the size of Numeric by 2 bytes at some point. The question in
my mind is: When is the best time to make this change? If we put this
off until 8.4, then it will get rejected again because we won't want to
change the disk format again. So the best time to do this is now,
otherwise we'll put it off forever.

Can I get somebody other than Tom to agree to review the patch? Clearly
waiting for Tom to review this is just going to delay release, which I
don't want to do.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 14:54:19
Message-ID: 1775.1182178459@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> We've changed the on-disk database format in 8.3, so we have an
> opportunity to change other things also. There is a patch thats been on
> the patch queue for some time called numeric508, submitted Dec 2005;

I thought that idea had been rejected long since as being an unreasonable
reduction in the flexibility of numeric arithmetic.

> The benefit of the patch is that it reduces each NUMERIC value by 2
> bytes, so will speed up things considerably. This is now especially
> important if we are looking to reduce the speed of numeric division by a
> factor of 4 (recent WIP patch).

The only way that patch would make division faster is by making the
slowest cases (longest input numbers) impossible. Which is hardly a
solution.

regards, tom lane


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 14:56:20
Message-ID: 46769D14.6090107@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> The objections to applying this patch originally were:
> 2. it would restrict number of digits to 508 and there are allegedly
> some people that want to store > 508 digits.
>
If 508 digits are not enough, are1000 digits be sufficient? Both limits
appear quite arbitrary to me.

Regards,
Andreas


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:18:47
Message-ID: 1182179927.6855.190.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 10:54 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > We've changed the on-disk database format in 8.3, so we have an
> > opportunity to change other things also. There is a patch thats been on
> > the patch queue for some time called numeric508, submitted Dec 2005;
>
> I thought that idea had been rejected long since as being an unreasonable
> reduction in the flexibility of numeric arithmetic.

I've re-read all of the posts from Dec 2005 and I see everyone in favour
of committing the patch, subject to review of whether limitation to 508
was acceptable. For me that was never finalised, at least I can't see a
clear decision. The important point is that the patch can be easily
modified to overcome these restrictions, a line of thought that was
never explored.

The current patch can be easily modified to allow 1000 digits, yet still
saving 2 bytes per value when storing more common values. Or we can use
the same techniques to represent arbitrary number of digits, again still
saving 2 bytes on common values.

We have the ground work done, we just need to agree further changes.

> > The benefit of the patch is that it reduces each NUMERIC value by 2
> > bytes, so will speed up things considerably. This is now especially
> > important if we are looking to reduce the speed of numeric division by a
> > factor of 4 (recent WIP patch).
>
> The only way that patch would make division faster is by making the
> slowest cases (longest input numbers) impossible. Which is hardly a
> solution.

It isn't directly related to division, but is a speed up nonetheless
which can offset the loss.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:19:01
Message-ID: 1182179941.6855.191.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 16:56 +0200, Andreas Pflug wrote:
> Simon Riggs wrote:
> > The objections to applying this patch originally were:
> > 2. it would restrict number of digits to 508 and there are allegedly
> > some people that want to store > 508 digits.
> >
> If 508 digits are not enough, are1000 digits be sufficient? Both limits
> appear quite arbitrary to me.

Thats the current limit; I agree, but I didn't choose it. IIRC if you
don't specify a limit then you can have arbitrarily long numbers.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Michael Paesold <mpaesold(at)gmx(dot)at>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:24:16
Message-ID: 4676A3A0.2090509@gmx.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Pflug wrote:
> Simon Riggs wrote:
>> The objections to applying this patch originally were:
>> 2. it would restrict number of digits to 508 and there are allegedly
>> some people that want to store > 508 digits.
>>
> If 508 digits are not enough, are1000 digits be sufficient? Both limits
> appear quite arbitrary to me.

This 1000 is just a restriction on the typmod of numeric.

You can still use a much higher number of digits, if you use
unconstrained numeric:

test=> create table test (n numeric);

CREATE TABLE
test=> insert into test values (10::numeric ^ 99999);
INSERT 0 1
test=> select length(n) from test;
length
--------
100017
(1 row)

Best Regards
Michael Paesold


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:32:41
Message-ID: 4129.1182180761@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> Simon Riggs wrote:
>> The objections to applying this patch originally were:
>> 2. it would restrict number of digits to 508 and there are allegedly
>> some people that want to store > 508 digits.
>>
> If 508 digits are not enough, are1000 digits be sufficient? Both limits
> appear quite arbitrary to me.

As per the recent discussion about factorial, the current limit of
numeric format is 10^131071 --- there is a whole lot of daylight between
that and 10^508.

I had a thought though: it's possible to reduce the header overhead for
typical-size numbers without giving up the ability to store large ones.
This is because the POS/NEG/NAN sign possibilities leave one unused bit
pattern. Hence:

1. Switch the positions of the n_sign_dscale and n_weight fields in the
long format, so that the sign bits are in the first word.

2. Reserve the fourth "sign" bit pattern to denote a compressed-header
format in which there's just one uint16 header word and the
NumericDigits start right after that. The header word could contain:
2 bits: "sign" distinguishing this from the two-word-header format
1 bit: actual number sign (POS or NEG, disallow NaN)
6 bits: weight, room for -32 .. 31
7 bits: dscale, room for 0 .. 127

3. When packing a NumericVar into a Numeric, use this short format when
it's not a NaN and the weight and dscale are in range, else use the long
format.

Since the weight is in base-10000 digits, this bit allocation allows a
dynamic range of about +- 10^127 which fits well with the dscale range.
But I suspect that most of the use-cases for long numerics involve large
integers, so it might be more useful to shave another bit or two from
dscale and give 'em to weight.

In any case, no capability is lost, unlike the original proposal; and
this would be much less invasive than the original patch since there's
no need to play tricks with the content of the digit array.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:46:30
Message-ID: 1182181591.6855.203.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 11:32 -0400, Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> > Simon Riggs wrote:
> >> The objections to applying this patch originally were:
> >> 2. it would restrict number of digits to 508 and there are allegedly
> >> some people that want to store > 508 digits.
> >>
> > If 508 digits are not enough, are1000 digits be sufficient? Both limits
> > appear quite arbitrary to me.
>
> As per the recent discussion about factorial, the current limit of
> numeric format is 10^131071 --- there is a whole lot of daylight between
> that and 10^508.
>
> I had a thought though: it's possible to reduce the header overhead for
> typical-size numbers without giving up the ability to store large ones.
> This is because the POS/NEG/NAN sign possibilities leave one unused bit
> pattern. Hence:
>
> 1. Switch the positions of the n_sign_dscale and n_weight fields in the
> long format, so that the sign bits are in the first word.
>
> 2. Reserve the fourth "sign" bit pattern to denote a compressed-header
> format in which there's just one uint16 header word and the
> NumericDigits start right after that. The header word could contain:
> 2 bits: "sign" distinguishing this from the two-word-header format
> 1 bit: actual number sign (POS or NEG, disallow NaN)
> 6 bits: weight, room for -32 .. 31
> 7 bits: dscale, room for 0 .. 127
>
> 3. When packing a NumericVar into a Numeric, use this short format when
> it's not a NaN and the weight and dscale are in range, else use the long
> format.
>
> Since the weight is in base-10000 digits, this bit allocation allows a
> dynamic range of about +- 10^127 which fits well with the dscale range.
> But I suspect that most of the use-cases for long numerics involve large
> integers, so it might be more useful to shave another bit or two from
> dscale and give 'em to weight.
>
> In any case, no capability is lost, unlike the original proposal; and
> this would be much less invasive than the original patch since there's
> no need to play tricks with the content of the digit array.

Sounds good. I thought there'd be a way.

Since this is your idea, would you like to do this, or should I?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:49:03
Message-ID: 4676A96F.9090705@pse-consulting.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:
> Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
>
>> Simon Riggs wrote:
>>
>>> The objections to applying this patch originally were:
>>> 2. it would restrict number of digits to 508 and there are allegedly
>>> some people that want to store > 508 digits.
>>>
>>>
>> If 508 digits are not enough, are1000 digits be sufficient? Both limits
>> appear quite arbitrary to me.
>>
>
> As per the recent discussion about factorial, the current limit of
> numeric format is 10^131071 --- there is a whole lot of daylight between
> that and 10^508.
>
> I had a thought though: it's possible to reduce the header overhead for
> typical-size numbers without giving up the ability to store large ones.
> This is because the POS/NEG/NAN sign possibilities leave one unused bit
> pattern. Hence:
>
> 1. Switch the positions of the n_sign_dscale and n_weight fields in the
> long format, so that the sign bits are in the first word.
>
> 2. Reserve the fourth "sign" bit pattern to denote a compressed-header
> format in which there's just one uint16 header word and the
> NumericDigits start right after that. The header word could contain:
> 2 bits: "sign" distinguishing this from the two-word-header format
> 1 bit: actual number sign (POS or NEG, disallow NaN)
> 6 bits: weight, room for -32 .. 31
> 7 bits: dscale, room for 0 .. 127
>
> 3. When packing a NumericVar into a Numeric, use this short format when
> it's not a NaN and the weight and dscale are in range, else use the long
> format.
>
> Since the weight is in base-10000 digits, this bit allocation allows a
> dynamic range of about +- 10^127 which fits well with the dscale range.
> But I suspect that most of the use-cases for long numerics involve large
> integers, so it might be more useful to shave another bit or two from
> dscale and give 'em to weight.
>
> In any case, no capability is lost, unlike the original proposal; and
> this would be much less invasive than the original patch since there's
> no need to play tricks with the content of the digit array.
>

I wonder if the currently waiting patch isn't Good Enough for
999.9999999999999999 % of use cases, and "all" others can use numeric
instead of numeric(1000,800) or so. Especially since there are many
patches waiting that do need further investigation and refining.

Regards,
Andreas


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:53:50
Message-ID: 1182182031.6855.205.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 17:49 +0200, Andreas Pflug wrote:

> I wonder if the currently waiting patch isn't Good Enough for
> 999.9999999999999999 % of use cases, and "all" others can use numeric
> instead of numeric(1000,800) or so. Especially since there are many
> patches waiting that do need further investigation and refining.

That still has problems.

Another approach would be to restrict the existing datatype NUMERIC to
508 digits, but introduce a new datatype of LONGNUM which allows
arbitrary length numerics for those that *do* care. It might be more
beneficial in the longer run to separate the use cases so we can further
tune them (not sure how just yet...)

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andreas Pflug <pgadmin(at)pse-consulting(dot)de>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:54:13
Message-ID: 6294.1182182053@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andreas Pflug <pgadmin(at)pse-consulting(dot)de> writes:
> Tom Lane wrote:
>> In any case, no capability is lost, unlike the original proposal; and
>> this would be much less invasive than the original patch since there's
>> no need to play tricks with the content of the digit array.

> I wonder if the currently waiting patch isn't Good Enough for
> 999.9999999999999999 % of use cases, and "all" others can use numeric
> instead of numeric(1000,800) or so.

Apparently you misunderstand that patch: it takes capability away from
unconstrained numeric too.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:55:55
Message-ID: 6671.1182182155@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> Since this is your idea, would you like to do this, or should I?

Go for it. I'm not actually convinced this is worth spending time on,
as Greg Stark's 1-byte-varlena patch already saved more for typical
numerics than this will.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 15:57:27
Message-ID: 1182182248.6855.208.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 11:55 -0400, Tom Lane wrote:
> "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > Since this is your idea, would you like to do this, or should I?
>
> Go for it.

OK

> I'm not actually convinced this is worth spending time on,
> as Greg Stark's 1-byte-varlena patch already saved more for typical
> numerics than this will.

Understood, thats why I left this till last. This will save even more on
top of those savings. My concern is to make this change now while we
can, since we will be aiming for 8.3->8.4 to be a binary upgrade.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 16:03:22
Message-ID: 87sl8pmer9.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> I had a thought though: it's possible to reduce the header overhead for
> typical-size numbers without giving up the ability to store large ones.
> This is because the POS/NEG/NAN sign possibilities leave one unused bit
> pattern. Hence:

I had a whack and doing something similar to this a while back -- though I was
more concerned with optimizing very small numeric variables, especially small
integers and typical currency-sized quantities.

The bottleneck I ran into was desupporting the macros which access the sign,
display scale, etc. Those are currently in numeric.h making them--in theory--
publicly available outside the internal functions. Afaik nothing actively uses
them outside of numeric.c though.

If we're happy desupporting those macros and not replacing them with anything
comparable then there are additional bits of state hidden in the varlena size.
Anything shorter than the shortest possible numeric representation can
implicitly be interpreted as some alternate compact representation. I already
had a patch that stored small integers in a single NumericDigit without any
numeric header at all.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 16:08:52
Message-ID: 8683.1182182932@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> Anything shorter than the shortest possible numeric representation can
> implicitly be interpreted as some alternate compact representation. I already
> had a patch that stored small integers in a single NumericDigit without any
> numeric header at all.

That's getting well beyond the realm of reason IMHO. I doubt you can
merge it with this proposal anyway --- how will you disambiguate from
zero with a positive dscale ("0.00")?

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>, "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 16:44:45
Message-ID: 10570.1182185085@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

I wrote:
> Gregory Stark <stark(at)enterprisedb(dot)com> writes:
>> Anything shorter than the shortest possible numeric representation can
>> implicitly be interpreted as some alternate compact representation. I
>> already had a patch that stored small integers in a single
>> NumericDigit without any numeric header at all.

> That's getting well beyond the realm of reason IMHO. I doubt you can
> merge it with this proposal anyway --- how will you disambiguate from
> zero with a positive dscale ("0.00")?

Hmm, I take that back: it could be merged with this proposal pretty
easily. Again assuming that we move the POS/NEG/NAN/SHORT "sign"
flags to the first header word, we can have:

Sign=NAN: it's a NAN, regardless of anything else. We may as well
store NAN as just 2 bytes.

Sign=SHORT: it's a non-NAN with limited weight and dscale range,
as per my proposal. A zero value would still be only 2 bytes,
but anything else is longer. (This would be needed only for zero
with nonzero dscale, though.)

Sign=POS or NEG: if length == 2 bytes then interpret the remaining
14 bits as a single NBASE digit, with assumed weight and dscale 0.
This allows storing integers up to +/-9999 in 2 bytes (+ 1 byte
varlena header). If length > 2 then it is a traditional-format
numeric.

I'm not entirely convinced this is worth the extra pack/unpack
logic, since said integers would be 4 bytes (+1 byte header) in
the sign=SHORT representation, which is not that much more.
Also, this means at least three different representations of some
values, which is going to be a headache for comparison and hashing.
But the bit-space is there.

regards, tom lane


From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 18:45:38
Message-ID: 1182192338.6855.232.camel@silverbirch.site
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, 2007-06-18 at 12:44 -0400, Tom Lane wrote:
> I wrote:
> > Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> >> Anything shorter than the shortest possible numeric representation can
> >> implicitly be interpreted as some alternate compact representation. I
> >> already had a patch that stored small integers in a single
> >> NumericDigit without any numeric header at all.
>
> > That's getting well beyond the realm of reason IMHO. I doubt you can
> > merge it with this proposal anyway --- how will you disambiguate from
> > zero with a positive dscale ("0.00")?
>
> Hmm, I take that back: it could be merged with this proposal pretty
> easily. Again assuming that we move the POS/NEG/NAN/SHORT "sign"
> flags to the first header word, we can have:
>
> Sign=NAN: it's a NAN, regardless of anything else. We may as well
> store NAN as just 2 bytes.

> I'm not entirely convinced this is worth the extra pack/unpack
> logic, since said integers would be 4 bytes (+1 byte header) in
> the sign=SHORT representation, which is not that much more.
> Also, this means at least three different representations of some
> values, which is going to be a headache for comparison and hashing.
> But the bit-space is there.

I think representing zero as compactly as possible is worth the trouble,
so that in itself is enough for me to say Yes. Rest seems good too.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
Cc: "Gregory Stark" <stark(at)enterprisedb(dot)com>, "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 18:55:52
Message-ID: 24625.1182192952@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> I think representing zero as compactly as possible is worth the trouble,

Either of these proposals can do that in 2 bytes.

regards, tom lane


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 21:19:15
Message-ID: 200706182119.l5ILJFs25029@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:
> On Mon, 2007-06-18 at 11:55 -0400, Tom Lane wrote:
> > "Simon Riggs" <simon(at)2ndquadrant(dot)com> writes:
> > > Since this is your idea, would you like to do this, or should I?
> >
> > Go for it.
>
> OK
>
> > I'm not actually convinced this is worth spending time on,
> > as Greg Stark's 1-byte-varlena patch already saved more for typical
> > numerics than this will.
>
> Understood, thats why I left this till last. This will save even more on
> top of those savings. My concern is to make this change now while we
> can, since we will be aiming for 8.3->8.4 to be a binary upgrade.

Would someone please explain why we are considering this so far past
features freeze, and who suggtested that the 8.3->8.4 upgrade being a binary
upgrade was anything more than a pipe dream?

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Andreas Pflug <pgadmin(at)pse-consulting(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 21:27:13
Message-ID: 7176.1182202033@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Bruce Momjian <bruce(at)momjian(dot)us> writes:
> Would someone please explain why we are considering this so far past
> features freeze, and who suggtested that the 8.3->8.4 upgrade being a binary
> upgrade was anything more than a pipe dream?

Well, Greg had left further squeezing of numerics as an open item in his
varlena patch, so it's not totally unreasonable to consider a patch for
that now --- as long as it's pretty small and simple.

I agree that in-place upgrade is a pipe dream until we see someone
actually step forward and do the work to build a usable pg_upgrade
utility.

If nothing else, we should consider swapping the n_sign_dscale and
n_weight fields now, since that would enable upward-compatible
implementation of these ideas later. Otherwise any such patch
would probably get rejected if pg_upgrade did happen to emerge out
of nowhere.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Bruce Momjian" <bruce(at)momjian(dot)us>
Cc: "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Andreas Pflug" <pgadmin(at)pse-consulting(dot)de>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-06-18 21:45:25
Message-ID: 87ps3trl6y.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Bruce Momjian" <bruce(at)momjian(dot)us> writes:

> Would someone please explain why we are considering this so far past
> features freeze, and who suggtested that the 8.3->8.4 upgrade being a binary
> upgrade was anything more than a pipe dream?

Simon just updated a patch he had originally submitted over a year ago.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-19 09:06:43
Message-ID: 46779CA3.2050601@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Simon Riggs wrote:

>
> The objections to applying this patch originally were:
> 1. it changes on-disk format (we've done this, so argument is void)

I'm little bit confused when we introduce new page layout version? I
expect that new version become with changes with pageheader, tuple
header or data encoding (varlen/TOAST ...). But in case when there is
new data type internal implementation, there was not reason to update
version (see inet/cidr between 8.1 -> 8.2). Can me somebody clarify this?

However, In upgrade point of view. It will be necessary to keep
information which postgreSQL version creates page to correctly track
data type implementation (or special structure). Information in
pg_control file and PG_VERSION files are unusable for upgrade purpose.

Zdenek


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-19 14:15:50
Message-ID: 27263.1182262550@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> I'm little bit confused when we introduce new page layout version? I
> expect that new version become with changes with pageheader, tuple
> header or data encoding (varlen/TOAST ...). But in case when there is
> new data type internal implementation, there was not reason to update
> version (see inet/cidr between 8.1 -> 8.2). Can me somebody clarify this?

Well, we've changed it when there was a benefit to an existing tool to
do so. So far that's meant page header and tuple header changes. If
we ever had a working in-place upgrade solution, I think we'd be willing
to make the page version account for datatype format changes too.

regards, tom lane


From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-20 16:34:21
Message-ID: 200706201234.22368.xzilla@users.sourceforge.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tuesday 19 June 2007 10:15, Tom Lane wrote:
> Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM> writes:
> > I'm little bit confused when we introduce new page layout version? I
> > expect that new version become with changes with pageheader, tuple
> > header or data encoding (varlen/TOAST ...). But in case when there is
> > new data type internal implementation, there was not reason to update
> > version (see inet/cidr between 8.1 -> 8.2). Can me somebody clarify this?
>
> Well, we've changed it when there was a benefit to an existing tool to
> do so. So far that's meant page header and tuple header changes. If
> we ever had a working in-place upgrade solution, I think we'd be willing
> to make the page version account for datatype format changes too.
>

FWIW pg_migrator is a pretty good swing at an in-place upgrade tool for
8.1->8.2. Unfortunately until the PGDG decides that in-place upgrade is a
constraint their willing to place on development, I see them a good
chicken/egg away from making it a continually usefull tool.

--
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


From: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-20 16:43:44
Message-ID: 20070620164344.GU31426@phlogiston.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jun 20, 2007 at 12:34:21PM -0400, Robert Treat wrote:
> FWIW pg_migrator is a pretty good swing at an in-place upgrade tool for
> 8.1->8.2. Unfortunately until the PGDG decides that in-place upgrade is a
> constraint their willing to place on development, I see them a good
> chicken/egg away from making it a continually usefull tool.

Or maybe cart/horse. It seems to me that the rule more likely needs
to be that the migrator follow the development of the database than
that the database engine be strongly constrained by the needs of an
upgrade tool. I agree that some commitment is needed, though.

A

--
Andrew Sullivan | ajs(at)crankycanuck(dot)ca
The whole tendency of modern prose is away from concreteness.
--George Orwell


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-21 11:25:05
Message-ID: 467A6011.90207@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Andrew Sullivan wrote:
> On Wed, Jun 20, 2007 at 12:34:21PM -0400, Robert Treat wrote:
>> FWIW pg_migrator is a pretty good swing at an in-place upgrade tool for
>> 8.1->8.2. Unfortunately until the PGDG decides that in-place upgrade is a
>> constraint their willing to place on development, I see them a good
>> chicken/egg away from making it a continually usefull tool.
>
> Or maybe cart/horse. It seems to me that the rule more likely needs
> to be that the migrator follow the development of the database than
> that the database engine be strongly constrained by the needs of an
> upgrade tool. I agree that some commitment is needed, though.

I don't think that upgrade will strongly constrain a database engine.
But some constrains we can expect. Any new development in some area will
have to take care about upgrade. If you break some "upgrade coding
rules" you will not to able create upgrade procedure or upgrade process
will be risky.

For example upgrade between 8.1 -> 8.2 is risky, because you are not
able easy determine what format of cidr/inet is used on the page. If
something fails during table structure upgrade, then it is not easy to
recovery from this situation.

Zdenek


From: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-21 12:28:41
Message-ID: 467A6EF9.6040501@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Since we're discussing upgrades, let me summarize the discussions we had
over dinner in Ottawa for the benefit of all:

* pg_migrator is a sound approach to handling catalog changes.

* Handling any page format change that doesn't grow the space taken by
data is straightforward. Note that all 8.2 -> 8.3 changes fall in this
category.

* Page format changes that grow data size are problematic, because there
can be pages that can't be expanded to the new format because there's
not enough space. However, it would be possible to write a pre-upgrade
program to look for the problematic pages, and do a dummy UPDATE on some
tuples to move them away from such pages, to make room. The pre-upgrade
program could be run while the old database is still up, so it doesn't
cause downtime.

* Page format conversion can be done at upgrade time, scanning all
relations and rewriting them. Or it can be done at runtime in ReadBuffer
whenever a page in the old format is accessed.

We don't know what changes upcoming releases will bring, but 8.2 -> 8.3
upgrade is feasible. Any future changes will need to be considered on a
case-by-case basis, but we have a pretty good basic strategy that
doesn't impose any strict restrictions on future development. Even if we
can't guarantee that we can do a non dump/restore upgrade between every
release, if we can do it say between every other release, that's *much*
better than none.

As before, someone just needs to step up and do it.

Zdenek Kotala wrote:
> Andrew Sullivan wrote:
>> On Wed, Jun 20, 2007 at 12:34:21PM -0400, Robert Treat wrote:
>>> FWIW pg_migrator is a pretty good swing at an in-place upgrade tool
>>> for 8.1->8.2. Unfortunately until the PGDG decides that in-place
>>> upgrade is a constraint their willing to place on development, I see
>>> them a good chicken/egg away from making it a continually usefull tool.
>>
>> Or maybe cart/horse. It seems to me that the rule more likely needs
>> to be that the migrator follow the development of the database than
>> that the database engine be strongly constrained by the needs of an
>> upgrade tool. I agree that some commitment is needed, though.
>
> I don't think that upgrade will strongly constrain a database engine.
> But some constrains we can expect. Any new development in some area will
> have to take care about upgrade. If you break some "upgrade coding
> rules" you will not to able create upgrade procedure or upgrade process
> will be risky.
>
> For example upgrade between 8.1 -> 8.2 is risky, because you are not
> able easy determine what format of cidr/inet is used on the page. If
> something fails during table structure upgrade, then it is not easy to
> recovery from this situation.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-21 18:31:16
Message-ID: 2012.1182450676@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
> * Page format changes that grow data size are problematic, because there
> can be pages that can't be expanded to the new format because there's
> not enough space. However, it would be possible to write a pre-upgrade
> program to look for the problematic pages, and do a dummy UPDATE on some
> tuples to move them away from such pages, to make room. The pre-upgrade
> program could be run while the old database is still up, so it doesn't
> cause downtime.

That sounds good, but there are corner cases where it wouldn't work ---
consider a page containing a single maximum-length tuple.

In general I don't see us accepting changes that would increase data
size, but there's at least one troubling exception on the horizon:
per-column or per-value locale support. Another problem is that a strict
rule of "no data size increase ever" might forbid acceptance of changes
that achieve average space savings at the cost of increasing the size of
some lesser-used cases.

In short, this point seems to need more thought.

regards, tom lane


From: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-22 08:02:35
Message-ID: 467B821B.4010509@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Heikki Linnakangas wrote:
> Since we're discussing upgrades, let me summarize the discussions we had
> over dinner in Ottawa for the benefit of all:
>

Thanks for summary.

> As before, someone just needs to step up and do it.

I'm now working on proposal. I hope that it will ready soon.

Zdenek


From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Zdenek Kotala <Zdenek(dot)Kotala(at)sun(dot)com>, Andrew Sullivan <ajs(at)crankycanuck(dot)ca>
Subject: Re: What does Page Layout version mean? (Was: Re: Reducing NUMERIC size for 8.3)
Date: 2007-06-28 01:38:58
Message-ID: 200706271838.58498.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom,

> That sounds good, but there are corner cases where it wouldn't work ---
> consider a page containing a single maximum-length tuple.

Certainly any mature upgrade-in-place tool will require a "checker" which you
run first which determines if you have a prohibitive corner case.

Besides, I thought we didn't allow tuples to grow to more than 1/2 page size?
Or am I thinking indexes here?

> In general I don't see us accepting changes that would increase data
> size, but there's at least one troubling exception on the horizon:
> per-column or per-value locale support. Another problem is that a strict
> rule of "no data size increase ever" might forbid acceptance of changes
> that achieve average space savings at the cost of increasing the size of
> some lesser-used cases.

Our attitude at the meeting was "let's burn that bridge when we come to it".
If we can develop a solid in-place upgrade too which will work for 8.1->8.2
and 8.2->8.3, then we've done something worthwhile even if we break it in 8.4
or 8.5. It's possible that we won't implement anything that breaks it for
five years or that someone will invent another solution before then, in which
case we'd feel pretty dumb for having kept it on the drawing board all that
time.

Or to put it another way, the page-grows-on-upgrade problem is hard enough
that it will probably take as much effort as the whole rest of the upgrade
process to solve. So let's tackle one problem at a time.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Reducing NUMERIC size for 8.3
Date: 2007-07-17 22:01:41
Message-ID: 200707172201.l6HM1f226532@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


This has been saved for the 8.4 release:

http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Simon Riggs wrote:
> We've changed the on-disk database format in 8.3, so we have an
> opportunity to change other things also. There is a patch thats been on
> the patch queue for some time called numeric508, submitted Dec 2005;
> I've updated this patch now for 8.3 to remove bit rot (an hour's work).
> This is posted to pgsql-patches now and it works.
>
> The benefit of the patch is that it reduces each NUMERIC value by 2
> bytes, so will speed up things considerably. This is now especially
> important if we are looking to reduce the speed of numeric division by a
> factor of 4 (recent WIP patch).
>
> The objections to applying this patch originally were:
> 1. it changes on-disk format (we've done this, so argument is void)
> 2. it would restrict number of digits to 508 and there are allegedly
> some people that want to store > 508 digits.
>
> The current patch passes all regression tests, but currently fails
> numeric_big.sql since this explicitly checks for support of
> numeric(1000,800).
>
> We could:
> a) accept the patch as-is and restrict NUMERIC to 508 digits
> b) refine the patch somewhat to allow 1000 digits
>
> (b) is possible in a couple of ways, both fairly quick:
> - extend the patch so that one of the spare bits from the second digit
> is used to represent dscale 508-1000.
> - extend the patch so that if weight > 127 or dscale > 127 we would use
> the first byte in the digits as an extra indicator byte holding the high
> bits of both fields.
> Neither change makes any difference to numbers below
> 1,000,000,000,000,000....(127 zeroes in total)...000 which probably
> covers the vast majority of people's usage.
>
> Objections: True, we are passed feature freeze, but this patch has been
> on the queue for 14 months prior to freeze and has been waiting on disk
> format changes to make patch application acceptable. We definitely want
> to reduce the size of Numeric by 2 bytes at some point. The question in
> my mind is: When is the best time to make this change? If we put this
> off until 8.4, then it will get rejected again because we won't want to
> change the disk format again. So the best time to do this is now,
> otherwise we'll put it off forever.
>
> Can I get somebody other than Tom to agree to review the patch? Clearly
> waiting for Tom to review this is just going to delay release, which I
> don't want to do.
>
> --
> Simon Riggs
> EnterpriseDB http://www.enterprisedb.com
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://www.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +