About numeric division again

Lists: pgsql-hackers
From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: About numeric division again
Date: 2008-04-03 18:32:05
Message-ID: 6232.1207247525@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

One of the items on the commit-fest list is my patch from last year
to rewrite the numeric division operator using "schoolbook division":
http://archives.postgresql.org/pgsql-patches/2007-06/msg00173.php

The code that's currently in there sometimes has to propagate rounding
to the left, meaning that you can never be certain whether all of the
digits you have so far are good, and that means that it can sometimes
generate an incorrect truncated output. This leads to the bugs cited
in the above message.

The reason I didn't just commit it last year is that I was dissatisfied
with the speed penalty --- on very long inputs (dozens or hundreds of
digits) division is about 4X slower than with our existing code.
However, no one has come up with a better answer; and as a wise man once
said, "I can make this program arbitrarily fast, if it doesn't have to
give the right answer". Correctness has to trump speed.

One thing that occurs to me is that we could keep the existing
"approximate division" code in there too, and use it internally in the
transcendental function implementations. Those are not particularly
interested in getting exact truncated results, and they are the worst
case for the speed penalty because they do lots of divisions on values
that are likely to be long. However this idea could fairly be charged
with being code bloat.

Comments?

Also, there was some discussion of providing a SQL-level numeric
"integer division" operator or function, that is the equivalent of
trunc(x/y) except faster (since it'd not need to compute fractional
digits that would then be thrown away). Is this worth doing, and if so
what should we call it exactly? The amount of new code needed should
be pretty small (just an interface function), so I'm willing to take
care of it if we want one.

regards, tom lane


From: Gregory Stark <stark(at)enterprisedb(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: About numeric division again
Date: 2008-04-04 00:00:11
Message-ID: 8763uyh4l0.fsf@oxford.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

"Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:

> One of the items on the commit-fest list is my patch from last year
> to rewrite the numeric division operator using "schoolbook division":
> http://archives.postgresql.org/pgsql-patches/2007-06/msg00173.php
>
> The code that's currently in there sometimes has to propagate rounding
> to the left, meaning that you can never be certain whether all of the
> digits you have so far are good, and that means that it can sometimes
> generate an incorrect truncated output. This leads to the bugs cited
> in the above message.

The case I looked into could be traced specifically to the fact that it
calculates an intermediate value which is the reciprocal of the divisor and
multiplies by that. The case where the digits were inaccurate was a case where
there reciprocal wasn't representable and multiplying by the reciprocal didn't
produce the same value as dividing.

I assume you're right about there being bigger problems but I don't follow how
the division is actually being done in enough detail to judge that for my
self.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's 24x7 Postgres support!


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: About numeric division again
Date: 2008-04-04 01:04:23
Message-ID: 29619.1207271063@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Gregory Stark <stark(at)enterprisedb(dot)com> writes:
> "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>> The code that's currently in there sometimes has to propagate rounding
>> to the left, meaning that you can never be certain whether all of the
>> digits you have so far are good, and that means that it can sometimes
>> generate an incorrect truncated output. This leads to the bugs cited
>> in the above message.

> The case I looked into could be traced specifically to the fact that it
> calculates an intermediate value which is the reciprocal of the divisor and
> multiplies by that.

Perhaps, but no matter how you slice it, that algorithm sometimes needs
to go back and "fix" digits it previously emitted. That's okay for
approximate answers and not okay for exact ones, because you never
know if you're done fixing the last digit you need to return.

I fooled around with the idea of using the existing fast division code
in the sqrt/exp/log/power functions, and got these timings for the
numeric_big regression test:

Xeon HPPA

CVS HEAD 5.446s 2m10.97s
Knuth division only 11.574s 3m9.29s
old code for trans. fns 5.477s 2m15.97s

The HPPA is an old machine with excruciatingly slow integer division,
so it's probably about the worst case for the Knuth-based code.

Now you can certainly object that numeric_big might not be
representative of real-world applications, but these results seem to
me to justify implementing both functions. wc says that numeric.c is

5457 15914 122662 CVS HEAD
5466 16050 123401 Knuth only
5732 17202 130764 both

A couple hundred more lines seems like an acceptable price.

Also, I remembered the reason for proposing an explicit "integer
division" function: you can't always get the right answer from
trunc(x/y). The / operator will round its output at some number of
fractional digits chosen by itself, and it's possible that it rounds
.999... up to the next integer, whereupon your trunc() result is simply
wrong. (Indeed this is just the same problem at the SQL level as mod
was facing at the C level in one of the bugs that started this
discussion.) So if you want to do any high-precision integer math,
this is really not a negotiable feature.

So IMHO the only question left is what to call it at the SQL level.
A few options:

Expose it as a function:

div(x, y)
quotient(x, y)

Expose it as an operator:

x // y

There are probably better ideas I haven't thought of. Comments?

regards, tom lane