Re: Slaying the HYPOTamus

Lists: pgsql-hackers
From: Paul Matthews <plm(at)netspace(dot)net(dot)au>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-23 07:00:50
Message-ID: 4A90E922.408@netspace.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Tom Lane wrote:

> Greg Stark <gsstark(at)mit(dot)edu> writes:
>
>> If there's a performance advantage then we could add a configure test
>> and define the macro to call hypot(). You said it existed before C99
>> though, how widespread was it? If it's in all the platforms we support
>> it might be reasonable to just go with it.
>>
>
> For one data point, I see hypot() in HPUX 10.20, released circa 1996.
> I suspect we would want a configure test and a substitute function
> anyway. Personally I wouldn't have a problem with the substitute being
> the naive sqrt(x*x+y*y), particularly if it's replacing existing code
> that overflows in the same places.
>
> regards, tom lane
>
>

A hypot() substitute might look something like this psudo-code, this is
how Python does it if the real hypot() is missing.

double hypot( double dx, double dy )
{
double yx;

if( isinf(dx) || ifinf(dy) ) {
return INFINITY;
}

dx = fabs(dx);
dy = fabs(dy);
if (dx < dy) {
double temp = dx;
dx = dy;
dy = temp;
}
if (x == 0.)
return 0.;
else {
yx = dy/dx;
return dx*sqrt(1.0+yx*yx);
}
}

As the following link shows, a lot of care could be put into getting a
substitute hypot() correct.
http://gforge.inria.fr/plugins/scmsvn/viewcvs.php/trunk/hypot.c?rev=5677&root=mpfr&view=markup


From: Paul Matthews <plm(at)netspace(dot)net(dot)au>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 13:14:19
Message-ID: 4A92922B.1040100@netspace.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

This is to go with the hypot() patch I posted the other day.

As other code, such as found in adt/float.c and adt/numeric.c, simply
assumes that isnan() exists, despite it being a C99 function, I have
assumed the same.

The below code should be placed into a file called src/port/hypot.c.

Unfortunately I do not know how to drive configure and all the other
associated build magics. Could some kind soul please implemented that
portion. (Or shove me in the right direction)

#include <math.h>
#include "c.h"
#include "utils/builtins.h"

/*
* Find the hypotenuse. Firstly x and y are swapped, if required, to make
* x the larger number. The traditional formulae of x^2+y^2 is rearranged
* to bring x outside the sqrt. This allows computation of the hypotenuse
* for much larger magnitudes than otherwise normally possible.
*
* sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) )
* = x * sqrt( 1 + y^2/x^2 )
* = x * sqrt( 1 + y/x * y/x )
*/
double hypot( double x, double y )
{
double yx;

if( isinf(x) || isinf(y) )
return get_float8_infinity();

if( isnan(x) || isnan(y) )
return get_float8_nan();

x = fabs(x);
y = fabs(y);
if (x < y) {
double temp = x;
x = y;
y = temp;
}
if (x == 0.0)
return 0.0;
else {
yx = y/x;
return x*sqrt(1.0+yx*yx);
}
}


From: David Fetter <david(at)fetter(dot)org>
To: Paul Matthews <plm(at)netspace(dot)net(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 14:07:13
Message-ID: 20090824140713.GC5896@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote:
> This is to go with the hypot() patch I posted the other day.
>
> As other code, such as found in adt/float.c and adt/numeric.c, simply
> assumes that isnan() exists, despite it being a C99 function, I have
> assumed the same.
>
> The below code should be placed into a file called src/port/hypot.c.
>
> Unfortunately I do not know how to drive configure and all the other
> associated build magics. Could some kind soul please implemented that
> portion. (Or shove me in the right direction)

Comments below :)

> #include <math.h>
> #include "c.h"
> #include "utils/builtins.h"
>
> /*
> * Find the hypotenuse. Firstly x and y are swapped, if required, to make
> * x the larger number. The traditional formulae of x^2+y^2 is rearranged
> * to bring x outside the sqrt. This allows computation of the hypotenuse
> * for much larger magnitudes than otherwise normally possible.
> *
> * sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) )
> * = x * sqrt( 1 + y^2/x^2 )
> * = x * sqrt( 1 + y/x * y/x )
> */
> double hypot( double x, double y )
> {
> double yx;
>
> if( isinf(x) || isinf(y) )
> return get_float8_infinity();
>
> if( isnan(x) || isnan(y) )
> return get_float8_nan();
>
> x = fabs(x);
> y = fabs(y);
>
> if (x < y) {
> double temp = x;
> x = y;
> y = temp;
> }

How about putting:

if (x >= 0.0 && y == 0.0)
return x;

here?

These next two lines are a teensy bit baroque. Is there some
significant speed increase that would justify them?

> if (x == 0.0)
> return 0.0;
> else {
> yx = y/x;
> return x*sqrt(1.0+yx*yx);
> }
> }

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: "Paul Matthews" <plm(at)netspace(dot)net(dot)au>
Cc: "David Fetter" <david(at)fetter(dot)org>,<pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 17:47:42
Message-ID: 4A928BEE0200002500029FF2@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

David Fetter <david(at)fetter(dot)org> wrote:
> On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote:

> These next two lines are a teensy bit baroque. Is there some
> significant speed increase that would justify them?
>
>> if (x == 0.0)
>> return 0.0;
>> else {
>> yx = y/x;
>> return x*sqrt(1.0+yx*yx);
>> }
>> }

I think the reason is overflow. From the function comment:

>> * The traditional formulae of x^2+y^2 is rearranged
>> * to bring x outside the sqrt. This allows computation of the
hypotenuse
>> * for much larger magnitudes than otherwise normally possible.

Although I don't see why the first part isn't:

if (y == 0.0)
return x;

-Kevin


From: David Fetter <david(at)fetter(dot)org>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Paul Matthews <plm(at)netspace(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 17:52:59
Message-ID: 20090824175258.GF5896@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 24, 2009 at 12:47:42PM -0500, Kevin Grittner wrote:
> David Fetter <david(at)fetter(dot)org> wrote:
> > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote:
>
> > These next two lines are a teensy bit baroque. Is there some
> > significant speed increase that would justify them?
> >
> >> if (x == 0.0)
> >> return 0.0;
> >> else {
> >> yx = y/x;
> >> return x*sqrt(1.0+yx*yx);
> >> }
> >> }
>
> I think the reason is overflow. From the function comment:
>
> >> * The traditional formulae of x^2+y^2 is rearranged
> >> * to bring x outside the sqrt. This allows computation of the
> hypotenuse
> >> * for much larger magnitudes than otherwise normally possible.
>
> Although I don't see why the first part isn't:
>
> if (y == 0.0)
> return x;

D'oh!

Good point :)

So the code should read as follows?

#include <math.h>
#include "c.h"
#include "utils/builtins.h"

/*
* Find the hypotenuse. Firstly x and y are swapped, if required, to make
* x the larger number. The traditional formulae of x^2+y^2 is rearranged
* to bring x outside the sqrt. This allows computation of the hypotenuse
* for much larger magnitudes than otherwise normally possible.
*
* sqrt( x^2 + y^2 ) = sqrt( x^2( 1 + y^2/x^2) )
* = x * sqrt( 1 + y^2/x^2 )
* = x * sqrt( 1 + y/x * y/x )
*/
double hypot( double x, double y )
{
double yx;

if( isinf(x) || isinf(y) )
return get_float8_infinity();

if( isnan(x) || isnan(y) )
return get_float8_nan();

x = fabs(x);
y = fabs(y);

if (x < y) {
double temp = x;
x = y;
y = temp;
}
if (y == 0.0)
return x;
yx = y/x;
return x*sqrt(1.0+yx*yx);
}

Cheers,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 17:59:38
Message-ID: 20090824175938.GL5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 24, 2009 at 07:07:13AM -0700, David Fetter wrote:
> These next two lines are a teensy bit baroque. Is there some
> significant speed increase that would justify them?

Just noticed with your revised code that the following check:

> On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote:
> > if (x == 0.0)
> > return 0.0;
> > else {
> > yx = y/x;

is preventing a divide by zero on the line above. So it's not a
performance hack, it's just allowing it to remain correct as a result of
changing the maths around.

> > return x*sqrt(1.0+yx*yx);
> > }
> > }

--
Sam http://samason.me.uk/


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 18:04:56
Message-ID: 20090824180456.GM5407@samason.me.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 24, 2009 at 06:59:38PM +0100, Sam Mason wrote:
> > On Mon, Aug 24, 2009 at 11:14:19PM +1000, Paul Matthews wrote:
> > > if (x == 0.0)
> > > return 0.0;
> > > else {
> > > yx = y/x;
>
> is preventing a divide by zero on the line above. So it's not a
> performance hack, it's just allowing it to remain correct as a result of
> changing the maths around.

I've also just realized why it's safe to return zero here; "y" contains
the smaller number and so if "x" is zero, "y" must be zero as well.

--
Sam http://samason.me.uk/


From: Greg Stark <gsstark(at)mit(dot)edu>
To: David Fetter <david(at)fetter(dot)org>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Paul Matthews <plm(at)netspace(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 18:37:33
Message-ID: 407d949e0908241137s7bd7f473o873cc52ddcd3e35e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Mon, Aug 24, 2009 at 6:52 PM, David Fetter<david(at)fetter(dot)org> wrote:
> double hypot( double x, double y )
> {
>    double yx;
>
>    if( isinf(x) || isinf(y) )
>    return get_float8_infinity();
>
>    if( isnan(x) || isnan(y) )
>    return get_float8_nan();

For what it's worth though, check out the code in float.c to see how
postgres handles floating point overflows. I'm not sure whether it's
forced by the standard, we discussed and settled on this behaviour, or
the person who wrote it just thought it was a good idea but we may as
well be consistent.

The behaviour we have now is that if you pass Inf or -Inf then we
happily return Inf or -Inf (or whatever the result is) as appropriate.
But if the operation overflows despite being passed reasonable values
then it throws an error.

Afaict hypot() can still overflow even with this new coding if you
have, say, hypot(MAX_FLOAT, MAX_FLOAT) which will return MAX_FLOAT *
sqrt(2). In that case we should throw an error unless either input was
Inf.

Also, the question arises what should be returned for hypot(Inf,NaN)
which your code returns Inf for. Empirically, it seems the normal
floating point behaviour is to return NaN so I think the NaN test
should be first.

Lastly I find the swap kind of confusing and also think it might
perform less well than just having one branch and simple logic in each
branch. This is just a style question that you could see either way
though, the performance difference is probably immeasurable if even if
my guess is right.

so I would suggest:

if (isnan(x) || isnan(y)
return float8_get_nan();
else if (isinf(x) || isinf(y))
return float8_get_inf();
else if (x == 0.0 && y == 0.0)
return 0.0;

x = fabs(x);
y = fabs(y);

if (x > y)
retval = x * sqrt((y/x)*(y/x) + 1);
else
retval = y * sqrt((x/y)*(x/y) + 1);

if (isinf(retval))
ereport(overflow...)

return retval;
}

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Paul Matthews <plm(at)netspace(dot)net(dot)au>
To: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-24 22:38:06
Message-ID: 4A93164E.7010706@netspace.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> Also, the question arises what should be returned for hypot(Inf,NaN)
> which your code returns Inf for. Empirically, it seems the normal
> floating point behaviour is to return NaN so I think the NaN test
> should be first.
>
>
See http://www.opengroup.org/onlinepubs/000095399/functions/hypot.html

If /x/ or /y/ is ±Inf, +Inf shall be returned (even if one of /x/ or
/y/ is NaN).
If /x/ or /y/ is NaN, and the other is not ±Inf, a NaN shall be
returned.

Just trying to implement correct C99 behaviour here.


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Paul Matthews <plm(at)netspace(dot)net(dot)au>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-25 00:14:31
Message-ID: 9413.1251159271@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Paul Matthews <plm(at)netspace(dot)net(dot)au> writes:
> Just trying to implement correct C99 behaviour here.

Around here we tend to read the Single Unix Spec before C99, and SUS
saith something different:

http://www.opengroup.org/onlinepubs/007908799/xsh/hypot.html

It would be serious folly for us to suppose that every platform's
version of hypot() behaves exactly the same for these corner cases,
anyway. If you're proposing to write code that would depend on that,
we need to call it something else and not pretend that it's just a
fill-in for platforms that lack hypot() entirely.

regards, tom lane


From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Paul Matthews <plm(at)netspace(dot)net(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-25 01:17:35
Message-ID: 407d949e0908241817w167bc369ua23d22cd68801b45@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

On Tue, Aug 25, 2009 at 1:14 AM, Tom Lane<tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Paul Matthews <plm(at)netspace(dot)net(dot)au> writes:
>> Just trying to implement correct C99 behaviour here.
>
> Around here we tend to read the Single Unix Spec before C99, and SUS
> saith something different:

It doesn't seem to anticipate NaN at all.

Neither of these seems on-point since we're neither implementing SuS
nor a C compiler. In fact we're not even implementing hypot().

We're implementing things like box_distance and point_distance which
as it happens will already be doing earlier arithmetic on the doubles,
so whatever HYPOT() does had better be consistent with that or the
results will be just an inexplicable mishmash.

Incidentally, what on earth does it mean to multiply or divide a
circle by a point?

--
greg
http://mit.edu/~gsstark/resume.pdf


From: Paul Matthews <plm(at)netspace(dot)net(dot)au>
To: Greg Stark <gsstark(at)mit(dot)edu>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Slaying the HYPOTamus
Date: 2009-08-25 10:04:03
Message-ID: 4A93B713.8040905@netspace.net.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Greg Stark wrote:
> We're implementing things like box_distance and point_distance which
> as it happens will already be doing earlier arithmetic on the doubles,
> so whatever HYPOT() does had better be consistent with that or the
> results will be just an inexplicable mishmash.
>
>
Let's look at what the HYPOT macro in PostgreSQL does right now:

#define HYPOT(A, B) sqrt((A)*(A)+(B)*(B))

And let's see how it is used in point_distance:

Datum
point_distance(PG_FUNCTION_ARGS)
{
Point *pt1 = PG_GETARG_POINT_P(0);
Point *pt2 = PG_GETARG_POINT_P(1);

PG_RETURN_FLOAT8(HYPOT(pt1->x - pt2->x, pt1->y - pt2->y));
}

Oh. Surprise. It does not look out for NaN's, InF's, overflows,
underflows or anything else. In addition, for the majority of it's input
space, it fails to work correctly at all. If x and y are both 1E200 the
hypotenuse should be 1.4142135e200, well within the range of the double.
However by naively squaring x yields 1E400, outside the range of a
double. Currently HYPOT() returns INFINITY, which is not the correct answer.

I am trying to introduce the hypot() function (where required) and
associated patch, in order to start correcting some of the more obvious
defects with the current geometry handling. Maybe my proposed hypot()
function is not perfect, but it's so far in front of current the HYPOT
macro is not funny.

> Incidentally, what on earth does it mean to multiply or divide a
> circle by a point?
>
>
Basically it's complex math. This comment, from my new geometry
implementation, might help.

/*-------------------------------------------------------------------------
* Additional Operators
*
* The +, -, * and / operators treat IPoint's as complex numbers.
* (This would indicate a requirement for a Complex type?)
*
* (a+bi)+(c+di) = ((a+c)+(b+d)i)
* (a+bi)-(c+di) = ((a-c)+(b-d)i)
* (a+bi)*(c+di) = ac + adi + bci + bdi^2
* = ac + (ad+bc)i - bd # as i^2 = -1
* = ((ac-bd)+(ad+bc)i)
* (a+bi)/(c+di) = (a+bi)(c-di) / (c+di)(c-di)
* = ((ac+bd) + (bc-ad)i) / (c^2+d^2)
* = ((ac + bd)/(c^2+d^2) + ((bc-ad)/(c^2+d^2))i)
*
* translation + IPoint_IPoint_add
* translation - IPoint_IPoint_sub
* multiplication * IPoint_IPoint_mul
* division / IPoint_IPoint_div
*-----------------------------------------------------------------------*/