Re: reducing NUMERIC size for 9.1

Lists: pgsql-hackers
From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: reducing NUMERIC size for 9.1
Date: 2010-07-09 14:58:40
Message-ID: AANLkTimMdFxyPPwPlkpWIigBYHTsIt_A2oOVG-erJXHP@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

EnterpriseDB asked me to develop the attached patch to reduce the
on-disk size of numeric and to submit it for inclusion in PG 9.1.
After searching the archives, I found a possible design for this by
Tom Lane based on an earlier proposal by Simon Riggs.

http://archives.postgresql.org/pgsql-hackers/2007-06/msg00715.php

The attached patch implements more or less the design described there,
and will essentially knock 2 bytes of the on-disk size of nearly all
numeric values anyone is likely to want to store, but without reducing
the overall range of the type; so, for people who are storing a lot of
numerics, it should save a great deal of storage space and, more
importantly, I/O. However, it does so in a way that should be
completely backward-compatible from a binary format standpoint, so
that pg_upgrade does not break.

I'm not entirely happy with the way I handled the variable-length
struct, although I don't think it's horrible, either. I'm willing to
rework it if someone has a better idea.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment Content-Type Size
numeric_2b.patch application/octet-stream 11.1 KB

From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-15 16:58:29
Message-ID: AANLkTilFDeixXXgrmb4Rrr-9LZuNbI7DDv55JG496gTh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 10 July 2010 00:58, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> EnterpriseDB asked me to develop the attached patch to reduce the
> on-disk size of numeric and to submit it for inclusion in PG 9.1.
> After searching the archives, I found a possible design for this by
> Tom Lane based on an earlier proposal by Simon Riggs.

Hi Robert,

I'm reviewing this patch for the commitfest, and so far everything in
the patch looks good. Compile and regression tests worked fine.

However, I was trying to find a simple way to verify that it really
was reducing the on-disk size of compact numeric values and didn't get
the results I was expecting.

I dropped one thousand numerics with value zero into a table and
checked the on-disk size of the relation with your patch and on a
stock 8.4 instance. In both cases the result was exactly the same.

Shouldn't the table be smaller with your patch? Or is there something
wrong with my test?

CREATE TEMP TABLE numeric_short (a numeric);

INSERT INTO numeric_short (a)
SELECT 0::numeric FROM generate_series(1, 1000) i;

Regards,
BJ


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-15 17:47:31
Message-ID: 5CED6F11-584B-44A2-B6F1-4967684A926A@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 15, 2010, at 11:58 AM, Brendan Jurd <direvus(at)gmail(dot)com> wrote:
> On 10 July 2010 00:58, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> EnterpriseDB asked me to develop the attached patch to reduce the
>> on-disk size of numeric and to submit it for inclusion in PG 9.1.
>> After searching the archives, I found a possible design for this by
>> Tom Lane based on an earlier proposal by Simon Riggs.
>
> Hi Robert,
>
> I'm reviewing this patch for the commitfest, and so far everything in
> the patch looks good. Compile and regression tests worked fine.
>
> However, I was trying to find a simple way to verify that it really
> was reducing the on-disk size of compact numeric values and didn't get
> the results I was expecting.
>
> I dropped one thousand numerics with value zero into a table and
> checked the on-disk size of the relation with your patch and on a
> stock 8.4 instance. In both cases the result was exactly the same.
>
> Shouldn't the table be smaller with your patch? Or is there something
> wrong with my test?
>
> CREATE TEMP TABLE numeric_short (a numeric);
>
> INSERT INTO numeric_short (a)
> SELECT 0::numeric FROM generate_series(1, 1000) i;

Well, on that test, you'll save only 2000 bytes, which is less than a full block, so there's no guarantee the difference would be noticeable at the relation level. Scale it up by a factor of 10 and the difference should be measurable.

You might also look at testing with pg_column_size().

...Robert


From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 12:44:39
Message-ID: AANLkTincjaM0KcL723jkAaRqmts7pUpHE3tw3LVMZo_3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 July 2010 03:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Jul 15, 2010, at 11:58 AM, Brendan Jurd <direvus(at)gmail(dot)com> wrote:
>> I dropped one thousand numerics with value zero into a table and
>> checked the on-disk size of the relation with your patch and on a
>> stock 8.4 instance.  In both cases the result was exactly the same.
>>
>> Shouldn't the table be smaller with your patch?  Or is there something
>> wrong with my test?
>
> Well, on that test, you'll save only 2000 bytes, which is less than a full block, so there's no guarantee the difference would be noticeable at the relation level.  Scale it up by a factor of 10 and the difference should be measurable.
>
> You might also look at testing with pg_column_size().
>

pg_column_size() did return the results I was expecting.
pg_column_size(0::numeric) is 8 bytes on 8.4 and it's 6 bytes on HEAD
with your patch.

However, even with 1 million rows of 0::numeric in my test table,
there was no difference at all in the on-disk relation size (36290560
with 36249600 in the table and 32768 in the fsm).

At this scale we should be seeing around 2 million bytes saved, but
instead the tables are identical. Is there some kind of disconnect in
how the new short numeric is making it to the disk, or perhaps another
effect interfering with my test?

Cheers,
BJ


From: Richard Huxton <dev(at)archonet(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 12:51:23
Message-ID: 4C4055CB.8070801@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16/07/10 13:44, Brendan Jurd wrote:
>
> pg_column_size() did return the results I was expecting.
> pg_column_size(0::numeric) is 8 bytes on 8.4 and it's 6 bytes on HEAD
> with your patch.

> At this scale we should be seeing around 2 million bytes saved, but
> instead the tables are identical. Is there some kind of disconnect in
> how the new short numeric is making it to the disk, or perhaps another
> effect interfering with my test?

You've probably got rows being aligned to a 4-byte boundary. You're
probably not going to see any change unless you have a couple of 1-byte
columns that get placed after the numeric. If you went from 10 bytes
down to 8, that should be visible.

--
Richard Huxton
Archonet Ltd


From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Richard Huxton <dev(at)archonet(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 13:14:56
Message-ID: AANLkTikOauIAJVYlhRDr6x0PNxBz2qsknQf-WmXW9cLK@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 July 2010 22:51, Richard Huxton <dev(at)archonet(dot)com> wrote:
> On 16/07/10 13:44, Brendan Jurd wrote:>
>> At this scale we should be seeing around 2 million bytes saved, but
>> instead the tables are identical.  Is there some kind of disconnect in
>> how the new short numeric is making it to the disk, or perhaps another
>> effect interfering with my test?
>
> You've probably got rows being aligned to a 4-byte boundary. You're probably
> not going to see any change unless you have a couple of 1-byte columns that
> get placed after the numeric. If you went from 10 bytes down to 8, that
> should be visible.

Ah, thanks for the hint Richard. I didn't see any change with two
1-byte columns after the numeric, but with four such columns I did
finally see a difference.

Test script:

BEGIN;

CREATE TEMP TABLE foo (a numeric, b bool, c bool, d bool, e bool);

INSERT INTO foo (a, b, c, d, e)
SELECT 0::numeric, false, true, i % 2 = 0, i % 2 = 1
FROM generate_series(1, 1000000) i;

SELECT pg_total_relation_size('foo'::regclass);

ROLLBACK;

Results:

8.4: 44326912
HEAD with patch: 36290560

That settles my concern and I'm happy to pass this along to a commiter.

Cheers,
BJ


From: Thom Brown <thombrown(at)gmail(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Richard Huxton <dev(at)archonet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 13:17:34
Message-ID: AANLkTil37D74d_Kpsq4sun0fxEo721ZKKCEYj2xaLzL6@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On 16 July 2010 14:14, Brendan Jurd <direvus(at)gmail(dot)com> wrote:
> On 16 July 2010 22:51, Richard Huxton <dev(at)archonet(dot)com> wrote:
>> On 16/07/10 13:44, Brendan Jurd wrote:>
>>> At this scale we should be seeing around 2 million bytes saved, but
>>> instead the tables are identical.  Is there some kind of disconnect in
>>> how the new short numeric is making it to the disk, or perhaps another
>>> effect interfering with my test?
>>
>> You've probably got rows being aligned to a 4-byte boundary. You're probably
>> not going to see any change unless you have a couple of 1-byte columns that
>> get placed after the numeric. If you went from 10 bytes down to 8, that
>> should be visible.
>
> Ah, thanks for the hint Richard.  I didn't see any change with two
> 1-byte columns after the numeric, but with four such columns I did
> finally see a difference.
>
> Test script:
>
> BEGIN;
>
> CREATE TEMP TABLE foo (a numeric, b bool, c bool, d bool, e bool);
>
> INSERT INTO foo (a, b, c, d, e)
> SELECT 0::numeric, false, true, i % 2 = 0, i % 2 = 1
> FROM generate_series(1, 1000000) i;
>
> SELECT pg_total_relation_size('foo'::regclass);
>
> ROLLBACK;
>
> Results:
>
> 8.4: 44326912
> HEAD with patch: 36290560
>
> That settles my concern and I'm happy to pass this along to a commiter.
>
> Cheers,
> BJ
>

Joy! :) Nice patch Robert.

Thom


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Thom Brown <thombrown(at)gmail(dot)com>
Cc: Brendan Jurd <direvus(at)gmail(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 16:03:27
Message-ID: 68BC8B47-11DD-48E7-BD9A-BF8998E9B93E@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 16, 2010, at 6:17 AM, Thom Brown wrote:

> Joy! :) Nice patch Robert.

Indeed.

What are the implications for pg_upgrade? Will a database with values created before the patch continue to work after the patch has been applied (as happened with the new hstore in 9.0), or will pg_upgrade need to be taught how to upgrade the old storage format?

Best,

David


From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 16:04:47
Message-ID: 201007161604.o6GG4lM11754@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David E. Wheeler wrote:
> On Jul 16, 2010, at 6:17 AM, Thom Brown wrote:
>
> > Joy! :) Nice patch Robert.
>
> Indeed.
>
> What are the implications for pg_upgrade? Will a database with values
> created before the patch continue to work after the patch has been
> applied (as happened with the new hstore in 9.0), or will pg_upgrade
> need to be taught how to upgrade the old storage format?

Robert told me the old format continues to work in the upgraded
databases.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ None of us is going to be here forever. +


From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Thom Brown <thombrown(at)gmail(dot)com>, Brendan Jurd <direvus(at)gmail(dot)com>, Richard Huxton <dev(at)archonet(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 16:06:21
Message-ID: 0A1801D5-ABDC-4EBF-A331-2F02B87612B9@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Jul 16, 2010, at 9:04 AM, Bruce Momjian wrote:

>> What are the implications for pg_upgrade? Will a database with values
>> created before the patch continue to work after the patch has been
>> applied (as happened with the new hstore in 9.0), or will pg_upgrade
>> need to be taught how to upgrade the old storage format?
>
> Robert told me the old format continues to work in the upgraded
> databases.

Awesome. rhaas++

Best,

David


From: Hitoshi Harada <umi(dot)tanuki(at)gmail(dot)com>
To: Brendan Jurd <direvus(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 17:03:03
Message-ID: AANLkTinnsrHj5EfHKDAhp99o2tb_NJH0rdpb7ez5X4tw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

2010/7/16 Brendan Jurd <direvus(at)gmail(dot)com>:
> On 16 July 2010 03:47, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> You might also look at testing with pg_column_size().
>>
>
> pg_column_size() did return the results I was expecting.
> pg_column_size(0::numeric) is 8 bytes on 8.4 and it's 6 bytes on HEAD
> with your patch.
>
> However, even with 1 million rows of 0::numeric in my test table,
> there was no difference at all in the on-disk relation size (36290560
> with 36249600 in the table and 32768 in the fsm).
>
> At this scale we should be seeing around 2 million bytes saved, but
> instead the tables are identical.  Is there some kind of disconnect in
> how the new short numeric is making it to the disk, or perhaps another
> effect interfering with my test?

What about large ARRAY of numeric type? Once upon a time I develop
tinyint for myself, the array size could get reduced.

Regards,

--
Hitoshi Harada


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-16 18:39:05
Message-ID: 19619.1279305545@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> I'm not entirely happy with the way I handled the variable-length
> struct, although I don't think it's horrible, either. I'm willing to
> rework it if someone has a better idea.

I don't like the way you did that either (specifically, not the kluge
in NUMERIC_DIGITS()). It would probably work better if you declared
two different structs, or a union of same, to represent the two layout
cases.

A couple of other thoughts:

n_sign_dscale is now pretty inappropriately named, probably better to
change the field name. This will also help to catch anything that's
not using the macros. (Renaming the n_weight field, or at least burying
it in an extra level of struct, would be helpful for the same reason.)

It seems like you've handled the NAN case a bit awkwardly. Since the
weight is uninteresting for a NAN, it's okay to not store the weight
field, so I think what you should do is consider that the dscale field
is still full-width, ie the format of the first word remains old-style
not new-style. I don't remember whether dscale is meaningful for a NAN,
but if it is, your approach is constraining what is possible to store,
and is also breaking compatibility with old databases.

Also, I wonder whether you can do anything with depending on the actual
bit values of the flag bits --- specifically, it's short header format
iff first bit is set. The NUMERIC_HEADER_SIZE macro in particular could
be made more efficient with that.

The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
awkward; I wonder if there's a better way. One solution might be to
offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
than try to sign-extend per se.

Please do NOT commit this:

(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
! errmsg("value overflows numeric format %x w=%d s=%u",
! result->n_sign_dscale,
! NUMERIC_WEIGHT(result), NUMERIC_DSCALE(result))));

or at least hide it in "#ifdef DEBUG_NUMERIC" or some such.

Other than that the code changes look pretty clean, I'm mostly just
dissatisfied with the access macros.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-20 15:28:06
Message-ID: AANLkTimVuM_DnEavTYWwG9q7Zms9Bx0qzh8itY=ww4n_@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I'm not entirely happy with the way I handled the variable-length
>> struct, although I don't think it's horrible, either. I'm willing to
>> rework it if someone has a better idea.
>
> I don't like the way you did that either (specifically, not the kluge
> in NUMERIC_DIGITS()).  It would probably work better if you declared
> two different structs, or a union of same, to represent the two layout
> cases.
>
> A couple of other thoughts:
>
> n_sign_dscale is now pretty inappropriately named, probably better to
> change the field name.  This will also help to catch anything that's
> not using the macros.  (Renaming the n_weight field, or at least burying
> it in an extra level of struct, would be helpful for the same reason.)

I'm not sure what you have in mind here. If we create a union of two
structs, we'll still have to pick one of them to use to check the high
bits of the first word, so I'm not sure we'll be adding all that much
in terms of clarity. One possibility would be to name the fields
something like n_header1 and n_header2, or even just n_header[], but
I'm not sure if that's any better. If it is I'm happy to do it.

> It seems like you've handled the NAN case a bit awkwardly.  Since the
> weight is uninteresting for a NAN, it's okay to not store the weight
> field, so I think what you should do is consider that the dscale field
> is still full-width, ie the format of the first word remains old-style
> not new-style.  I don't remember whether dscale is meaningful for a NAN,
> but if it is, your approach is constraining what is possible to store,
> and is also breaking compatibility with old databases.

There is only one NaN value. Neither weight or dscale is meaningful.
I think if the high two bits of the first word are 11 we never examine
anything else - do you see somewhere that we're doing otherwise?

> Also, I wonder whether you can do anything with depending on the actual
> bit values of the flag bits --- specifically, it's short header format
> iff first bit is set.  The NUMERIC_HEADER_SIZE macro in particular could
> be made more efficient with that.

Right, OK.

> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
> awkward; I wonder if there's a better way.  One solution might be to
> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
> than try to sign-extend per se.

Hmm... so, if the weight is X we store the value
X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer? That's kind of a
funny representation - I *think* it works out to sign extension with
the high bit flipped. I guess we could do it that way, but it might
make it harder/more confusing to do bit arithmetic with the weight
sign bit later on.

> Please do NOT commit this:
>
>                                (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> !                                errmsg("value overflows numeric format %x w=%d s=%u",
> !                                       result->n_sign_dscale,
> !                                       NUMERIC_WEIGHT(result), NUMERIC_DSCALE(result))));
>
> or at least hide it in "#ifdef DEBUG_NUMERIC" or some such.

Woopsie. That's debugging leftovers, sorry.

> Other than that the code changes look pretty clean, I'm mostly just
> dissatisfied with the access macros.

Thanks for the review.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-28 19:00:46
Message-ID: 20885.1280343646@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ gradually catching up on email ]

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I don't like the way you did that either (specifically, not the kluge
>> in NUMERIC_DIGITS()). It would probably work better if you declared
>> two different structs, or a union of same, to represent the two layout
>> cases.
>>
>> n_sign_dscale is now pretty inappropriately named, probably better to
>> change the field name. This will also help to catch anything that's
>> not using the macros. (Renaming the n_weight field, or at least burying
>> it in an extra level of struct, would be helpful for the same reason.)

> I'm not sure what you have in mind here. If we create a union of two
> structs, we'll still have to pick one of them to use to check the high
> bits of the first word, so I'm not sure we'll be adding all that much
> in terms of clarity.

No, you can do something like this:

typedef struct numeric_short
{
uint16 word1;
NumericDigit digits[1];
} numeric_short;

typedef struct numeric_long
{
uint16 word1;
int16 weight;
NumericDigit digits[1];
} numeric_long;

typedef union numeric
{
uint16 word1;
numeric_short short;
numeric_long long;
} numeric;

and then access word1 either directly or (after having identified which
format it is) via one of the sub-structs. If you really wanted to get
pedantic you could have a third sub-struct representing the format for
NaNs, but since those are just going to be word1 it may not be worth the
trouble.

>> It seems like you've handled the NAN case a bit awkwardly. Since the
>> weight is uninteresting for a NAN, it's okay to not store the weight
>> field, so I think what you should do is consider that the dscale field
>> is still full-width, ie the format of the first word remains old-style
>> not new-style. I don't remember whether dscale is meaningful for a NAN,
>> but if it is, your approach is constraining what is possible to store,
>> and is also breaking compatibility with old databases.

> There is only one NaN value. Neither weight or dscale is meaningful.
> I think if the high two bits of the first word are 11 we never examine
> anything else - do you see somewhere that we're doing otherwise?

I hadn't actually looked. I think though that it's a mistake to break
compatibility on both dscale and weight when you only need to break one.
Also, weight is *certainly* uninteresting for NaNs since it's not even
meaningful unless there are digits. dscale could conceivably be worth
something.

>> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
>> awkward; I wonder if there's a better way. One solution might be to
>> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
>> than try to sign-extend per se.

> Hmm... so, if the weight is X we store the value
> X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer? That's kind of a
> funny representation - I *think* it works out to sign extension with
> the high bit flipped. I guess we could do it that way, but it might
> make it harder/more confusing to do bit arithmetic with the weight
> sign bit later on.

Yeah, it was just an idea. It seems like there should be an easier way
to extract the sign-extended value, though.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 17:04:55
Message-ID: AANLkTindFE6p5owv7ZARW222xgistX-QqVLEWk6UjY28@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Jul 28, 2010 at 3:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jul 16, 2010 at 2:39 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> I don't like the way you did that either (specifically, not the kluge
>>> in NUMERIC_DIGITS()).  It would probably work better if you declared
>>> two different structs, or a union of same, to represent the two layout
>>> cases.
>>>
>>> n_sign_dscale is now pretty inappropriately named, probably better to
>>> change the field name.  This will also help to catch anything that's
>>> not using the macros.  (Renaming the n_weight field, or at least burying
>>> it in an extra level of struct, would be helpful for the same reason.)
>
>> I'm not sure what you have in mind here.  If we create a union of two
>> structs, we'll still have to pick one of them to use to check the high
>> bits of the first word, so I'm not sure we'll be adding all that much
>> in terms of clarity.
>
> No, you can do something like this:
>
> typedef struct numeric_short
> {
>        uint16  word1;
>        NumericDigit digits[1];
> } numeric_short;
>
> typedef struct numeric_long
> {
>        uint16  word1;
>        int16   weight;
>        NumericDigit digits[1];
> } numeric_long;
>
> typedef union numeric
> {
>        uint16  word1;
>        numeric_short   short;
>        numeric_long    long;
> } numeric;

That doesn't quite work because there's also a varlena header that has
to be accounted for, so the third member of the union can't be a
simple uint16. I'm wondering if it makes sense to do something along
these lines:

typedef struct NumericData
{
int32 varlen;
int16 n_header;
union {
struct {
char n_data[1];
} short;
struct {
uint16 n_weight;
char n_data[1];
} long;
};
} NumericData;

Why n_data as char[1] instead of NumericDigit, you ask? It's that way
now, mostly I think so that the rest of the system isn't allowed to
know what underlying type is being used for NumericDigit; it looks
like previously it was signed char, but now it's int16.

>>> It seems like you've handled the NAN case a bit awkwardly.  Since the
>>> weight is uninteresting for a NAN, it's okay to not store the weight
>>> field, so I think what you should do is consider that the dscale field
>>> is still full-width, ie the format of the first word remains old-style
>>> not new-style.  I don't remember whether dscale is meaningful for a NAN,
>>> but if it is, your approach is constraining what is possible to store,
>>> and is also breaking compatibility with old databases.
>
>> There is only one NaN value.  Neither weight or dscale is meaningful.
>> I think if the high two bits of the first word are 11 we never examine
>> anything else - do you see somewhere that we're doing otherwise?
>
> I hadn't actually looked.  I think though that it's a mistake to break
> compatibility on both dscale and weight when you only need to break one.
> Also, weight is *certainly* uninteresting for NaNs since it's not even
> meaningful unless there are digits.  dscale could conceivably be worth
> something.

I don't think I'm breaking compatibility on anything. Can you clarify
what part of the code you're referring to here? I'm sort of lost.

>>> The sign extension code in the NUMERIC_WEIGHT() macro seems a bit
>>> awkward; I wonder if there's a better way.  One solution might be to
>>> offset the value (ie, add or subtract NUMERIC_SHORT_WEIGHT_MIN) rather
>>> than try to sign-extend per se.
>
>> Hmm... so, if the weight is X we store the value
>> X-NUMERIC_SHORT_WEIGHT_MIN as an unsigned integer?  That's kind of a
>> funny representation - I *think* it works out to sign extension with
>> the high bit flipped.  I guess we could do it that way, but it might
>> make it harder/more confusing to do bit arithmetic with the weight
>> sign bit later on.
>
> Yeah, it was just an idea.  It seems like there should be an easier way
> to extract the sign-extended value, though.

It seemed a bit awkward to me, too, but I'm not sure there's a better one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 17:20:06
Message-ID: 27621.1280424006@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Wed, Jul 28, 2010 at 3:00 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> No, you can do something like this:
>>
>> typedef union numeric
>> {
>> uint16 word1;
>> numeric_short short;
>> numeric_long long;
>> } numeric;

> That doesn't quite work because there's also a varlena header that has
> to be accounted for, so the third member of the union can't be a
> simple uint16.

Yeah, you would need an additional layer of struct to represent the
numeric with a length word in front of it. I think this is not
necessarily bad because it would perhaps open the door to working
directly with short-varlena-header values, which is never going to
be possible with this:

> typedef struct NumericData
> {
> int32 varlen;
> int16 n_header;
> union { ...

OTOH alignment considerations may make that idea hopeless anyway.

> Why n_data as char[1] instead of NumericDigit, you ask?

Yes, we'd have to export NumericDigit if we wanted to declare these
structs "properly" in numeric.h. I wonder if that decision should
be revisited. I'd lean to making the whole struct local to numeric.c
though. Is there anyplace else that really ought to see it?

>> I hadn't actually looked. I think though that it's a mistake to break
>> compatibility on both dscale and weight when you only need to break one.
>> Also, weight is *certainly* uninteresting for NaNs since it's not even
>> meaningful unless there are digits. dscale could conceivably be worth
>> something.

> I don't think I'm breaking compatibility on anything. Can you clarify
> what part of the code you're referring to here? I'm sort of lost.

On-disk is what I'm thinking about. Right now, a NaN's first word is
all dscale except the sign bits. You're proposing to change that
but I think it's unnecessary to do so. If we do it the way I'm
thinking, dscale would still mean the same in a NaN, and we'd simply
be ignoring the weight field (which might or might not be there
physically).

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 18:35:12
Message-ID: AANLkTin1tGEVWwJ3CZD8iN5N6JWXVzPC_aiuKa8grbaz@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Yeah, you would need an additional layer of struct to represent the
> numeric with a length word in front of it.  I think this is not
> necessarily bad because it would perhaps open the door to working
> directly with short-varlena-header values, which is never going to
> be possible with this:
>
>> typedef struct NumericData
>> {
>>     int32           varlen;
>>     int16           n_header;
>>     union { ...
>
> OTOH alignment considerations may make that idea hopeless anyway.

My understanding of our alignment rules for on-disk storage is still a
bit fuzzy, but as I understand it we don't align varlenas. So
presumably if we get a pointer directly into a disk block, the first
byte might happen to be not aligned, which would make the rest of the
structure aligned; and from playing around with the system, it looks
like if we get a value from anywhere else it's typically using the
4-byte varlena header. So it seems like it might be possible to write
code that aligns the data only if needed and otherwise skips a
palloc-and-copy cycle. I'm not totally sure that would be a win, but
it could be. Actually, I had a thought that it might be even more of
a win if you added a flag to the NumericVar representation indicating
whether the digit array was palloc'd or from the original tuple. Then
you might be able to avoid TWO palloc-and-copy cycles, although at the
price of a fairly significant code restructuring.

Which is a long-winded way of saying - it's probably not hopeless.

>> Why n_data as char[1] instead of NumericDigit, you ask?
>
> Yes, we'd have to export NumericDigit if we wanted to declare these
> structs "properly" in numeric.h.  I wonder if that decision should
> be revisited.  I'd lean to making the whole struct local to numeric.c
> though.  Is there anyplace else that really ought to see it?

Probably not. btree_gist is using it, but that's it, at least as far
as our tree is concerned. Attached please find a patch to make the
numeric representation private and add a convenience function
numeric_is_nan() for the benefit of btree_gist. If this looks sane,
I'll go ahead and commit it, which will simplify review of the main
patch once I rebase it over these changes.

>>> I hadn't actually looked.  I think though that it's a mistake to break
>>> compatibility on both dscale and weight when you only need to break one.
>>> Also, weight is *certainly* uninteresting for NaNs since it's not even
>>> meaningful unless there are digits.  dscale could conceivably be worth
>>> something.
>
>> I don't think I'm breaking compatibility on anything.  Can you clarify
>> what part of the code you're referring to here?  I'm sort of lost.
>
> On-disk is what I'm thinking about.  Right now, a NaN's first word is
> all dscale except the sign bits.  You're proposing to change that
> but I think it's unnecessary to do so.

*Where* am I proposing this? The new versions of NUMERIC_WEIGHT() and
NUMERIC_DSCALE() determine where to look for the bits in question
using NUMERIC_IS_SHORT(), which just tests NUMERIC_FLAGBITS(n) ==
NUMERIC_SHORT. There's nothing in there about the NaN case at all.
Even if there were, it's irrelevant because those bits are never
examined and, as far as I can tell, will always be zero barring a
cosmic ray hit. But even if they WERE examined, I don't see where I'm
changing the interpretation of them; in fact, I think I'm very
explicitly NOT doing that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment Content-Type Size
make_numericdata_private.patch application/octet-stream 4.0 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 20:37:09
Message-ID: 10760.1280435829@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> On-disk is what I'm thinking about. Right now, a NaN's first word is
>> all dscale except the sign bits. You're proposing to change that
>> but I think it's unnecessary to do so.

> *Where* am I proposing this?

Um, your patch has the comment

! * If the high bits of n_scale_dscale are NUMERIC_NAN, the two-byte header
! * format is also used, but the low bits of n_scale_dscale are discarded in
! * this case.

but now that I look a bit more closely, I don't think that's what the
code is doing. You've got the NUMERIC_DSCALE and NUMERIC_WEIGHT access
macros testing specifically for NUMERIC_IS_SHORT, not for high-bit-set
which I think is what I was assuming they'd do. So actually that code
is good as is: a NAN still has the old header format. It's just the
comment that's wrong.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 20:45:03
Message-ID: AANLkTi=2BJ3zsLEjAUMH_-5Qpf5O_wkb=mH4Hm3GGieS@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 29, 2010 at 4:37 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Jul 29, 2010 at 1:20 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> On-disk is what I'm thinking about.  Right now, a NaN's first word is
>>> all dscale except the sign bits.  You're proposing to change that
>>> but I think it's unnecessary to do so.
>
>> *Where* am I proposing this?
>
> Um, your patch has the comment
>
> !  * If the high bits of n_scale_dscale are NUMERIC_NAN, the two-byte header
> !  * format is also used, but the low bits of n_scale_dscale are discarded in
> !  * this case.
>
> but now that I look a bit more closely, I don't think that's what the
> code is doing.  You've got the NUMERIC_DSCALE and NUMERIC_WEIGHT access
> macros testing specifically for NUMERIC_IS_SHORT, not for high-bit-set
> which I think is what I was assuming they'd do.  So actually that code
> is good as is: a NAN still has the old header format.  It's just the
> comment that's wrong.

OK. I think you're misinterpreting the point of that comment, which
may mean that it needs some clarification. By "the two byte header
format is also used", I think I really meant "the header (and in fact
the entire value) is just 2 bytes". Really, the low order bits have
neither the old interpretation nor the new interpretation: they don't
have any interpretation at all - they're completely meaningless.
That's what the part after the word "but" was intended to clarify.
Every routine in numeric.c checks for NUMERIC_IS_NAN() and gives it
some special handling before doing anything else, so NUMERIC_WEIGHT()
and NUMERIC_DSCALE() are never called in that case.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 21:03:43
Message-ID: 19479.1280437423@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> OK. I think you're misinterpreting the point of that comment, which
> may mean that it needs some clarification. By "the two byte header
> format is also used", I think I really meant "the header (and in fact
> the entire value) is just 2 bytes". Really, the low order bits have
> neither the old interpretation nor the new interpretation: they don't
> have any interpretation at all - they're completely meaningless.
> That's what the part after the word "but" was intended to clarify.
> Every routine in numeric.c checks for NUMERIC_IS_NAN() and gives it
> some special handling before doing anything else, so NUMERIC_WEIGHT()
> and NUMERIC_DSCALE() are never called in that case.

I would suggest the comment ought to read something like

NaN values also use a two-byte header (in fact, the
whole value is always only two bytes). The low order bits of
the header word are available to store dscale, though dscale
is not currently used with NaNs.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 21:25:10
Message-ID: AANLkTinKngSbVfqKi=BG2KHHYSRs6NuyDeoHuVacEo1O@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 29, 2010 at 5:03 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> OK.  I think you're misinterpreting the point of that comment, which
>> may mean that it needs some clarification.  By "the two byte header
>> format is also used", I think I really meant "the header (and in fact
>> the entire value) is just 2 bytes".  Really, the low order bits have
>> neither the old interpretation nor the new interpretation: they don't
>> have any interpretation at all - they're completely meaningless.
>> That's what the part after the word "but" was intended to clarify.
>> Every routine in numeric.c checks for NUMERIC_IS_NAN() and gives it
>> some special handling before doing anything else, so NUMERIC_WEIGHT()
>> and NUMERIC_DSCALE() are never called in that case.
>
> I would suggest the comment ought to read something like
>
>        NaN values also use a two-byte header (in fact, the
>        whole value is always only two bytes).  The low order bits of
>        the header word are available to store dscale, though dscale
>        is not currently used with NaNs.

I can do something along those lines, though I'm reluctant to mention
dscale specifically since we have no agreement that such a thing makes
sense, or is the only/best use for those bits. Did you look at the
patch to move the numeric stuff out of the .h file that I attached a
few emails back? If that looks OK, I can commit it and then redo the
rest of this along the lines we've discussed.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-29 21:35:07
Message-ID: 20134.1280439307@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Did you look at the
> patch to move the numeric stuff out of the .h file that I attached a
> few emails back? If that looks OK, I can commit it and then redo the
> rest of this along the lines we've discussed.

A couple of thoughts:

* It'd be good to get NUMERIC_HDRSZ out of there too, especially since
it'll have only the shakiest connection to reality after this patch goes in.
It looks like doing that would only require modifying type_maximum_size().
I'd suggest inventing another small function along the lines of
int32 numeric_maximum_size(int32 typemod)
so that the NUMERIC-specific knowledge can be pulled out of format_type.c.

* I'd suggest leaving a comment along the lines of
/* The actual contents of Numeric are private to numeric.c */
with the now-opaque typedef for Numeric.

Otherwise +1.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 04:36:26
Message-ID: AANLkTimkRXT5d7RCAWpazio4YB3VVXpFrJm8sS5pif7E@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Thu, Jul 29, 2010 at 5:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> Did you look at the
>> patch to move the numeric stuff out of the .h file that I attached a
>> few emails back?  If that looks OK, I can commit it and then redo the
>> rest of this along the lines we've discussed.
>
> A couple of thoughts:
>
> * It'd be good to get NUMERIC_HDRSZ out of there too, especially since
> it'll have only the shakiest connection to reality after this patch goes in.
> It looks like doing that would only require modifying type_maximum_size().
> I'd suggest inventing another small function along the lines of
>        int32 numeric_maximum_size(int32 typemod)
> so that the NUMERIC-specific knowledge can be pulled out of format_type.c.
>
> * I'd suggest leaving a comment along the lines of
>        /* The actual contents of Numeric are private to numeric.c */
> with the now-opaque typedef for Numeric.
>
> Otherwise +1.

Done, with those changes.

But, looking at it a bit more carefully, isn't the maximum-size logic
for numeric rather bogus?

rhaas=# \d n
Table "public.n"
Column | Type | Modifiers
--------+--------------+-----------
a | numeric(2,1) |

rhaas=# select atttypmod from pg_attribute where attrelid =
'n'::regclass and attname = 'a';
atttypmod
-----------
131077
(1 row)

(gdb) p numeric_maximum_size(131077)
$1 = 9

rhaas=# select a, pg_column_size(a),
pg_column_size(a::text::numeric(2,1)) from n;
a | pg_column_size | pg_column_size
-----+----------------+----------------
1.1 | 9 | 12
(1 row)

i.e. According to the max-size logic, the ostensible maximum size of a
numeric(2,1) is 9 bytes, but in fact the real maximum is 12 bytes = 4
byte varlena header + 2 bytes for sign/dscale + 2 bytes for weight +
(2 NumericDigits * 2 bytes/digit).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 05:13:53
Message-ID: 16754.1280466833@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> But, looking at it a bit more carefully, isn't the maximum-size logic
> for numeric rather bogus?

Perhaps, but I think you're confused on at least one point.
numeric(2,1) has to be able to hold 2 decimal digits, not 2
NumericDigits (which'd actually be 8 decimal digits given
the current code).

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 11:22:51
Message-ID: AANLkTimRX+XQRH4fdjrA=8cvbuR5GQom+w4=hh7z90JY@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 30, 2010 at 1:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> But, looking at it a bit more carefully, isn't the maximum-size logic
>> for numeric rather bogus?
>
> Perhaps, but I think you're confused on at least one point.
> numeric(2,1) has to be able to hold 2 decimal digits, not 2
> NumericDigits (which'd actually be 8 decimal digits given
> the current code).

I get that. The point is: if one of those 2 decimal digits is before
the decimal point and the other is after it, then two NumericDigits
will be used. The value '11'::numeric is only size 10 (untoasted),
but the value '1.1'::numeric is size 12 (untoasted).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 13:16:45
Message-ID: 24562.1280495805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jul 30, 2010 at 1:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Perhaps, but I think you're confused on at least one point.
>> numeric(2,1) has to be able to hold 2 decimal digits, not 2
>> NumericDigits (which'd actually be 8 decimal digits given
>> the current code).

> I get that. The point is: if one of those 2 decimal digits is before
> the decimal point and the other is after it, then two NumericDigits
> will be used.

Ah, I see. Maybe we should allow for one more NumericDigit in the
calculation for such cases. I guess you could look at the scale too
to detect if the case is possible, but not sure it's worth the trouble.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 17:42:04
Message-ID: AANLkTinHRCH7h+rY1nba3yV2Fz-cjiJQ77xVzzZRC+sh@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 30, 2010 at 9:16 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jul 30, 2010 at 1:13 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Perhaps, but I think you're confused on at least one point.
>>> numeric(2,1) has to be able to hold 2 decimal digits, not 2
>>> NumericDigits (which'd actually be 8 decimal digits given
>>> the current code).
>
>> I get that.  The point is: if one of those 2 decimal digits is before
>> the decimal point and the other is after it, then two NumericDigits
>> will be used.
>
> Ah, I see.  Maybe we should allow for one more NumericDigit in the
> calculation for such cases.  I guess you could look at the scale too
> to detect if the case is possible, but not sure it's worth the trouble.

Since this just needs to be a reasonable upper bound, probably not.

Another small but related issue is this comment is out-of-date:

/* Numeric stores 2 decimal digits/byte, plus header */
return (precision + 1) / 2 + NUMERIC_HDRSZ;

Currently, numeric stores 4 decimal digits/2 bytes, which is not quite
the same thing. I think that for clarity it would make sense to break
this calculation in two parts. First, estimate the number of
NumericDigits that could be needed; then, estimate the amount of space
that could be needed to store them. Maybe something like this,
obviously with a suitable comment which I haven't written yet:

numeric_digits = (precision + 6) / 4;
return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;

The first line would be incorrect for precision 0 (it would produce 1,
not 0) but precision 0 isn't allowed anyway, and overestimating just
that one case would be OK even if it did. As far as I can see, it's
correct for all higher values. If you have 1 decimal digit, you can't
need more than one NumericDigit, but if you have 2 decimal digits, you
can need one for each side of the decimal point. But you can't manage
to get a third NumericDigit until you get up to 6 decimal digits,
because with only 5 you can split them 1-4 or 3-2 across the decimal
point, but getting to a second NumericDigit on either side of the
decimal point uses up all 5 digits, leaving nothing for the other
side. Similarly, to get a fourth NumericDigit, you have to have
enough decimal digits to split them 1-4-4-1 (with the decimal point
somewhere in the middle), so 10 is the minimum.

Another question here is whether we should just fix this in CVS HEAD,
or whether there's any reason to back-patch it. I can't see much
reason to back-patch, unless there's third-party code depending on
this for something more than what we use it for. The only callers of
type_maximum_size are get_typavgwidth(), which doesn't need to be
exact, and needs_toast_table(). So it seems that the worst thing that
could happen as a result of this problem is that we might fail to
create a toast table when one is needed, and even that seems extremely
unlikely. First, you'd need to have a table containing no text or
unlimited-length varchar columns, because those will force toast table
creation anyway. Second, numerics are typically going to be stored
with a short varlena header on disk, so a 2-byte underestimate of the
max size is going to be swamped by a 3-byte savings on the varlena
header. Third, even if you can find a case where the above doesn't
matter, it's not that hard to force a toast table to get created.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 18:08:19
Message-ID: 19060.1280513299@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> .... Maybe something like this,
> obviously with a suitable comment which I haven't written yet:

> numeric_digits = (precision + 6) / 4;
> return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;

This is OK for the base-10K case, but there's still code in there
for the base-10 and base-100 cases. Can you express this logic in
terms of DEC_DIGITS and sizeof(NumericDigit) ? I think you might
find it was actually clearer that way, cf Polya.

regards, tom lane


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-30 18:11:05
Message-ID: 19122.1280513465@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

[ forgot to answer this part ]

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> Another question here is whether we should just fix this in CVS HEAD,
> or whether there's any reason to back-patch it.

Agreed, fixing it in HEAD seems sufficient.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-07-31 01:55:45
Message-ID: AANLkTikjqXc2rX6GuG1tKhkM+=zGXZ7MaCOEzWX2sJ9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 30, 2010 at 2:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> ....  Maybe something like this,
>> obviously with a suitable comment which I haven't written yet:
>
>>     numeric_digits = (precision + 6) / 4;
>>     return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;
>
> This is OK for the base-10K case, but there's still code in there
> for the base-10 and base-100 cases.  Can you express this logic in
> terms of DEC_DIGITS and sizeof(NumericDigit) ?  I think you might
> find it was actually clearer that way, cf Polya.

It appears to work out to:

numeric_digits = (precision + 2 * (DEC_DIGITS - 1)) / DEC_DIGITS
return (numeric_digits * sizeof(NumericDigits)) + NUMERIC_HDRSZ;

The smallest value for precision which requires 2 numeric_digits is
always 2; and the required number of numeric_digits increases by 1
each time the number of base-10 digits increases by DEC_DIGITS.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-08-03 23:52:18
Message-ID: AANLkTim5iZanHc6AKc7MUg52Cz92wSdX5LW+by4KVNXJ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Fri, Jul 30, 2010 at 9:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jul 30, 2010 at 2:08 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> ....  Maybe something like this,
>>> obviously with a suitable comment which I haven't written yet:
>>
>>>     numeric_digits = (precision + 6) / 4;
>>>     return (numeric_digits * sizeof(int16)) + NUMERIC_HDRSZ;
>>
>> This is OK for the base-10K case, but there's still code in there
>> for the base-10 and base-100 cases.  Can you express this logic in
>> terms of DEC_DIGITS and sizeof(NumericDigit) ?  I think you might
>> find it was actually clearer that way, cf Polya.
>
> It appears to work out to:
>
>    numeric_digits = (precision + 2 * (DEC_DIGITS - 1)) / DEC_DIGITS
>    return (numeric_digits * sizeof(NumericDigits)) + NUMERIC_HDRSZ;
>
> The smallest value for precision which requires 2 numeric_digits is
> always 2; and the required number of numeric_digits increases by 1
> each time the number of base-10 digits increases by DEC_DIGITS.

And here is a patch implementing that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

Attachment Content-Type Size
numeric_maximum_size.patch application/octet-stream 1.6 KB

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-08-04 15:05:42
Message-ID: 18977.1280934342@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Fri, Jul 30, 2010 at 9:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>> The smallest value for precision which requires 2 numeric_digits is
>> always 2; and the required number of numeric_digits increases by 1
>> each time the number of base-10 digits increases by DEC_DIGITS.

> And here is a patch implementing that.

Looks good to me.

One thought --- to make this look more like the typmod-whacking in
the numeric() function, perhaps the first line of numeric_maximum_size
ought to be

if (typemod <= (int32) (VARHDRSZ))
return -1;

I think the author of numeric() was concerned that VARHDRSZ might be
unsigned, in which case the cast-less comparison would do the Wrong
Thing. Reference to c.h shows that it's signed, so no bug, but having
the cast in the comparison seems like good conservative programming.

Or, if you prefer, you can take out the cast in numeric(). I merely
opine that the two bits of code should match.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-08-04 16:49:49
Message-ID: AANLkTin4bO8dKBmWGtSeKFBR2V4ZxgjvcK+H4mKsEmgb@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 4, 2010 at 11:05 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Fri, Jul 30, 2010 at 9:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>>> The smallest value for precision which requires 2 numeric_digits is
>>> always 2; and the required number of numeric_digits increases by 1
>>> each time the number of base-10 digits increases by DEC_DIGITS.
>
>> And here is a patch implementing that.
>
> Looks good to me.
>
> One thought --- to make this look more like the typmod-whacking in
> the numeric() function, perhaps the first line of numeric_maximum_size
> ought to be
>
>        if (typemod <= (int32) (VARHDRSZ))
>                return -1;
>
> I think the author of numeric() was concerned that VARHDRSZ might be
> unsigned, in which case the cast-less comparison would do the Wrong
> Thing.  Reference to c.h shows that it's signed, so no bug, but having
> the cast in the comparison seems like good conservative programming.
>
> Or, if you prefer, you can take out the cast in numeric().  I merely
> opine that the two bits of code should match.

*scratches head*

One of those tests uses < and the other uses <=

Which one is wrong?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-08-04 16:55:22
Message-ID: 21484.1280940922@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> *scratches head*

> One of those tests uses < and the other uses <=

> Which one is wrong?

Well, neither, since NUMERIC(0,0) is disallowed. However, it's probably
not the place of these functions to assume that, so I'd suggest treating
equality as valid.

regards, tom lane


From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: reducing NUMERIC size for 9.1
Date: 2010-08-04 17:35:10
Message-ID: AANLkTinLTOANWzZ=jG23UTMWcdq0SxyyzA2Yfwiw+TDk@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Wed, Aug 4, 2010 at 12:55 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> *scratches head*
>
>> One of those tests uses < and the other uses <=
>
>> Which one is wrong?
>
> Well, neither, since NUMERIC(0,0) is disallowed.  However, it's probably
> not the place of these functions to assume that, so I'd suggest treating
> equality as valid.

OK, done. I renamed the argument from typemod to typmod so the tests
would all match byte-for-byte.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company