Quick Links

Re: Current int & float overflow checking is slow.

Lists:	pgsql-hackers

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Current int & float overflow checking is slow.
Date:	2017-10-24 10:39:54
Message-ID:	20171024103954.ztmatprlglz3rwke@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

In analytics queries that involve a large amounts of integers and/or
floats (i.e. a large percentage) it's quite easy to see the functions
underlying the operators in profiles. Partially that's the function call
overhead, but even *after* removing most of that via JITing, they're
surprisingly expensive.

Largely that's due to the overflow checks.

For integers we currently do:

#define SAMESIGN(a,b) (((a) < 0) == ((b) < 0))

/*
* Overflow check. If the inputs are of different signs then their sum
* cannot overflow. If the inputs are of the same sign, their sum had
* better be that sign too.
*/
if (SAMESIGN(arg1, arg2) && !SAMESIGN(result, arg1))
ereport(ERROR,
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
errmsg("integer out of range")));

which means that we turn a single integer instruction into ~10,
including a bunch of branches. All that despite the fact that most
architectures have flag registers signalling integer overflow. It's just
that C doesn't easily make that available.

gcc exposes more efficient overflow detection via intrinsics:
https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Integer-Overflow-Builtins.html

Using that turns the non-error path from int4pl from:

0x0000000000826ec0 <+0>: mov 0x20(%rdi),%rcx # arg1
0x0000000000826ec4 <+4>: mov 0x28(%rdi),%rdx # arg2
0x0000000000826ec8 <+8>: mov %ecx,%esi
0x0000000000826eca <+10>: lea (%rdx,%rcx,1),%eax # add
# overflow check
0x0000000000826ecd <+13>: shr $0x1f,%edx
0x0000000000826ed0 <+16>: not %esi
0x0000000000826ed2 <+18>: shr $0x1f,%esi
0x0000000000826ed5 <+21>: cmp %dl,%sil
0x0000000000826ed8 <+24>: je 0x826f30 <int4pl+112>
0x0000000000826eda <+26>: mov %eax,%edx
0x0000000000826edc <+28>: shr $0x1f,%ecx
0x0000000000826edf <+31>: shr $0x1f,%edx
0x0000000000826ee2 <+34>: cmp %cl,%dl
0x0000000000826ee4 <+36>: je 0x826f30 <int4pl+112>
/* overflow error code */
0x0000000000826f30 <+112>: retq

into

0x0000000000826ec0 <+0>: mov 0x28(%rdi),%rax # arg2
0x0000000000826ec4 <+4>: add 0x20(%rdi),%eax # arg1 + arg2
0x0000000000826ec7 <+7>: jo 0x826ecc <int4pl+12> # jump if overflowed
0x0000000000826ec9 <+9>: mov %eax,%eax # clear high bits
0x0000000000826ecb <+11>: retq

which, not that surprisingly, is faster. Not to speak of easier to read
;)

Besides the fact that the code is faster, there's also the issue that
the current way to do overflow checks is not actually correct C, and
requires compiler flags like -fwrapv.

For floating point it's even worse.

/*
* check to see if a float4/8 val has underflowed or overflowed
*/
#define CHECKFLOATVAL(val, inf_is_valid, zero_is_valid) \
do { \
if (isinf(val) && !(inf_is_valid)) \
ereport(ERROR, \
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), \
errmsg("value out of range: overflow"))); \
\
if ((val) == 0.0 && !(zero_is_valid)) \
ereport(ERROR, \
(errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE), \
errmsg("value out of range: underflow"))); \
} while(0)

result = arg1 + arg2;

/*
* There isn't any way to check for underflow of addition/subtraction
* because numbers near the underflow value have already been rounded to
* the point where we can't detect that the two values were originally
* different, e.g. on x86, '1e-45'::float4 == '2e-45'::float4 ==
* 1.4013e-45.
*/
CHECKFLOATVAL(result, isinf(arg1) || isinf(arg2), true);

The disassembled code for float4pl is:
0x000000000043ce90 <+0>: vmovss 0x20(%rdi),%xmm1
0x000000000043ce95 <+5>: vmovss 0x28(%rdi),%xmm2
0x000000000043ce9a <+10>: vmovss 0x2b6a7e(%rip),%xmm3 # 0x6f3920
0x000000000043cea2 <+18>: vaddss %xmm1,%xmm2,%xmm0
0x000000000043cea6 <+22>: vmovaps %xmm0,%xmm4
0x000000000043ceaa <+26>: vandps %xmm3,%xmm4,%xmm4
0x000000000043ceae <+30>: vucomiss 0x2b6a4a(%rip),%xmm4 # 0x6f3900
0x000000000043ceb6 <+38>: jbe 0x43ced4 <float4pl+68>
0x000000000043ceb8 <+40>: vandps %xmm3,%xmm1,%xmm1
0x000000000043cebc <+44>: vucomiss 0x2b6a3c(%rip),%xmm1 # 0x6f3900
0x000000000043cec4 <+52>: ja 0x43ced4 <float4pl+68>
0x000000000043cec6 <+54>: vandps %xmm3,%xmm2,%xmm2
0x000000000043ceca <+58>: vucomiss 0x2b6a2e(%rip),%xmm2 # 0x6f3900
0x000000000043ced2 <+66>: jbe 0x43ced9 <float4pl+73>
0x000000000043ced4 <+68>: vmovd %xmm0,%eax
0x000000000043ced8 <+72>: retq
0x000000000043ced9 <+73>: push %rbx
# call to ereport

clang's code is much worse, it generates *external* function calls for
isinf (can be fixed by redefining isinf to __builtin_isinf).

Entirely removing the overflow checks results in:
0x0000000000801850 <+0>: vmovss 0x28(%rdi),%xmm1 # arg2
0x0000000000801855 <+5>: vaddss 0x20(%rdi),%xmm1,%xmm0 # arg1 + arg2
0x000000000080185a <+10>: vmovd %xmm0,%eax # convert to int
0x000000000080185e <+14>: mov %eax,%eax # clear upper bits
0x0000000000801860 <+16>: retq

which unsurprisingly is a good bit faster. float4mul etc generate even
worse code.

There's no comparable overflow handling to the above integer
intrinsics. But I think we can still do a lot better. Two very different
ways:

1) Just give up on detecting overflows for floats. Generating inf in
these cases actually seems entirely reasonable. We already don't
detect them in a bunch of cases anyway. I can't quite parse the
standard's language around this.
2) Use platform specific float exception handling where available. We
could at backend start, and in FloatExceptionHandler(), us
feenableexcept() (windows has similar) to trigger SIGFPE on float
overflow.
3) Magic?

Greetings,

Andres Freund

From:	Greg Stark <stark(at)mit(dot)edu>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 12:26:32
Message-ID:	CAM-w4HOFuqLWBsHfo+WOOhsxU3UV7JHkDGSd0uz-r4dHH9AQ3A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

We already know this integer overflow checking is non-standard and
compilers keep trying to optimize them out. Our only strategy to
defeat that depends on compiler flags like -fwrapv that vary by
compiler and may or may not be working on less well tested compiler.

So if there's a nice readable and convenient way to portably use cpu
flags That would be brilliant. And I'm not too concerned if it doesn't
run on VAX.

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 14:09:09
Message-ID:	22057.1508854149@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> There's no comparable overflow handling to the above integer
> intrinsics. But I think we can still do a lot better. Two very different
> ways:

> 1) Just give up on detecting overflows for floats. Generating inf in
> these cases actually seems entirely reasonable. We already don't
> detect them in a bunch of cases anyway. I can't quite parse the
> standard's language around this.

There's an ancient saying that code can be arbitrarily fast if it
doesn't have to get the right answer. I think this proposal falls
in that category.

> 2) Use platform specific float exception handling where available. We
> could at backend start, and in FloatExceptionHandler(), us
> feenableexcept() (windows has similar) to trigger SIGFPE on float
> overflow.

SIGFPE isn't going to be easy to recover from, nor portable.

I think what you actually want to do is *disable* SIGFPE (see
feholdexcept), and then have individual functions use feclearexcept
and fetestexcept. These functions were standardized by C99 so
they should be pretty widely available ... of course, whether they
actually are widely portable remains to be seen. Whether they're
faster than what we're doing now also remains to be seen.

regards, tom lane

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 14:17:21
Message-ID:	20171024141721.45msqb64hhcg3cp5@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-10-24 10:09:09 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > There's no comparable overflow handling to the above integer
> > intrinsics. But I think we can still do a lot better. Two very different
> > ways:
>
> > 1) Just give up on detecting overflows for floats. Generating inf in
> > these cases actually seems entirely reasonable. We already don't
> > detect them in a bunch of cases anyway. I can't quite parse the
> > standard's language around this.
>
> There's an ancient saying that code can be arbitrarily fast if it
> doesn't have to get the right answer. I think this proposal falls
> in that category.

Does it? In plenty of cases getting infinity rather than an error is
just about as useful.

This was argued by a certain Tom Lane a few years back ;)
http://archives.postgresql.org/message-id/19208.1167246902%40sss.pgh.pa.us

> > 2) Use platform specific float exception handling where available. We
> > could at backend start, and in FloatExceptionHandler(), us
> > feenableexcept() (windows has similar) to trigger SIGFPE on float
> > overflow.
>
> SIGFPE isn't going to be easy to recover from, nor portable.

Hm? A trivial hack implementing the above survives the regression test,
with the exception of one output change because some functions currently
do *not* check for overflow. What's the issue you're concerned about?

The portability indeed is a problem.

> I think what you actually want to do is *disable* SIGFPE (see
> feholdexcept), and then have individual functions use feclearexcept
> and fetestexcept. These functions were standardized by C99 so
> they should be pretty widely available ... of course, whether they
> actually are widely portable remains to be seen. Whether they're
> faster than what we're doing now also remains to be seen.

I tested it, and they're fairly slow on at least gcc-7 + glibc 2.24.

Greetings,

Andres Freund

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 14:36:42
Message-ID:	23311.1508855802@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Andres Freund <andres(at)anarazel(dot)de> writes:
> On 2017-10-24 10:09:09 -0400, Tom Lane wrote:
>> There's an ancient saying that code can be arbitrarily fast if it
>> doesn't have to get the right answer. I think this proposal falls
>> in that category.

> Does it? In plenty of cases getting infinity rather than an error is
> just about as useful.
> This was argued by a certain Tom Lane a few years back ;)
> http://archives.postgresql.org/message-id/19208.1167246902%40sss.pgh.pa.us

Yeah, but I lost the argument. For better or worse, our expected
behavior is now that we throw errors. You don't get to change that
just because it would save a few cycles.

>> SIGFPE isn't going to be easy to recover from, nor portable.

> Hm? A trivial hack implementing the above survives the regression test,
> with the exception of one output change because some functions currently
> do *not* check for overflow. What's the issue you're concerned about?

The real problem with it is that it's a process-wide setting, and would
for example probably break PL/R, or other libraries that are not expecting
to lose control to overflows.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 18:27:04
Message-ID:	CA+TgmoaBi-5B1wxTF4NfhAK5iA_tZfFJE3W62PaRVxrow7NJyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 24, 2017 at 4:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Does it? In plenty of cases getting infinity rather than an error is
>> just about as useful.
>> This was argued by a certain Tom Lane a few years back ;)
>> http://archives.postgresql.org/message-id/19208.1167246902%40sss.pgh.pa.us
>
> Yeah, but I lost the argument. For better or worse, our expected
> behavior is now that we throw errors. You don't get to change that
> just because it would save a few cycles.

I don't know that we can consider the results of a discussion in 2006
to be binding policy for the indefinite future. A lot of things get
relitigated more than once per decade on this mailing list, and if we
know things now that we didn't know then (e.g. that one choice has a
far more severe performance consequence than the other) that's
reasonable justification for deciding to change our mind. Also, it's
not like there were a million votes on one side vs. just you on the
other; reading the thread, it's not at all clear that you were in the
minority with that position.

That's not to say I necessarily support Andres's proposal. Changing
query behavior is a big deal; we can't do it very often without
causing a lot of hassles for users (and maybe damaging our reputation
for stability in the process). And it's not very clear to me that
someone who does a SUM(a * b) over many rows will be happy to get
infinity rather than an error. It could be true, but I don't have the
experience to be sure of it -- and I'm a bit worried that if we change
anything, we'll only find out whether users like it after we cut the
release.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-24 19:28:17
Message-ID:	1162.1508873297@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:
> On Tue, Oct 24, 2017 at 4:36 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Yeah, but I lost the argument. For better or worse, our expected
>> behavior is now that we throw errors. You don't get to change that
>> just because it would save a few cycles.

> I don't know that we can consider the results of a discussion in 2006
> to be binding policy for the indefinite future. A lot of things get
> relitigated more than once per decade on this mailing list, and if we
> know things now that we didn't know then (e.g. that one choice has a
> far more severe performance consequence than the other) that's
> reasonable justification for deciding to change our mind.

I don't like changing well-defined, user-visible query behavior for
no other reason than a performance gain (of a size that hasn't even
been shown to be interesting, btw). Will we change it back in another
ten years if the performance tradeoff changes?

Also, if I recall the old discussion properly, one concern was getting
uniform behavior across different platforms. I'm worried that if we do
what Andres suggests, we'll get behavior that is not only different but
platform-specific. Now, to the extent that you believe that every modern
platform implements edge-case IEEE float behavior the same way, that worry
may be obsolete. But I don't think I believe that.

regards, tom lane

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-25 05:33:46
Message-ID:	CA+TgmoYDBNX-PDHQiiwrhGjsxbVLmOAn_yRdQNVnYoLTGd_SGQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Tue, Oct 24, 2017 at 9:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I don't like changing well-defined, user-visible query behavior for
> no other reason than a performance gain (of a size that hasn't even
> been shown to be interesting, btw). Will we change it back in another
> ten years if the performance tradeoff changes?
>
> Also, if I recall the old discussion properly, one concern was getting
> uniform behavior across different platforms. I'm worried that if we do
> what Andres suggests, we'll get behavior that is not only different but
> platform-specific. Now, to the extent that you believe that every modern
> platform implements edge-case IEEE float behavior the same way, that worry
> may be obsolete. But I don't think I believe that.

Yeah, those are reasonable concerns.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-25 05:53:49
Message-ID:	20171025055349.acz4fqaxk7scm5ht@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-10-25 07:33:46 +0200, Robert Haas wrote:
> On Tue, Oct 24, 2017 at 9:28 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > I don't like changing well-defined, user-visible query behavior for
> > no other reason than a performance gain (of a size that hasn't even
> > been shown to be interesting, btw). Will we change it back in another
> > ten years if the performance tradeoff changes?

That part of the argument seems unconvincing. It's not like the overflow
check is likely to ever have been beneficial performancewise, nor is it
remotely likely for that to ever be the case.

> > Also, if I recall the old discussion properly, one concern was getting
> > uniform behavior across different platforms. I'm worried that if we do
> > what Andres suggests, we'll get behavior that is not only different but
> > platform-specific. Now, to the extent that you believe that every modern
> > platform implements edge-case IEEE float behavior the same way, that worry
> > may be obsolete. But I don't think I believe that.
>
> Yeah, those are reasonable concerns.

I agree. I'm not really sure what the right way is here. I do however
think it's worth discussing what ways to address the performance penalty
due to the overflow checks, and one obvious way to do so is not to play.

It'd be interesting to write the overflow checking addition in x86
inline asm, and see how much better that gets - just so we know the
maximum we can reach with that. The problem with the C99 stuff seems to
be the external function calls. With either, one problem would be that
we'd have to reset the overflow register before doing math, which isn't
free either - otherwise some external function could have left it set to
on.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-28 17:16:15
Message-ID:	20171028171615.2cczb2xjx74czex5@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On 2017-10-24 15:28:17 -0400, Tom Lane wrote:
> Also, if I recall the old discussion properly, one concern was getting
> uniform behavior across different platforms. I'm worried that if we do
> what Andres suggests, we'll get behavior that is not only different but
> platform-specific. Now, to the extent that you believe that every modern
> platform implements edge-case IEEE float behavior the same way, that worry
> may be obsolete. But I don't think I believe that.

Hm. Is the current code actually meaningfully less dependent on IEEE
float behaviour? Both with the current behaviour and with the
alternative of not ereporting we rely on infinity op something to result
in infinity. Given that we're not preventing underflows, imprecise
results, denormals from being continued to use, I don't think we're
avoiding edge cases effectively at the moment.

I just spent the last hours digging through intel's architecture
manual. And discovered way too much weird stuff :/.

There indeed doesn't really seem to be any sort of decent way to
implement the overflow checks in an efficient manner. Clearing & testing
the SSE floating point control register, which contains the overflow
bit, is ~10 cycles each. The way gcc implements the isinf check as a
bunch of compares and bitwizzery with constants - I don't see how to
beat that.

Btw, looking at this code I noticed that the current error messages
aren't meaningful:

=# SELECT '-1e38'::float4 + '-3e38'::float4;
ERROR: 22003: value out of range: overflow

The current code gets slightly better if I put an unlikely() around just
the isinf(val) in CHECKFLOATVAL.

Greetings,

Andres Freund

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	pgsql-hackers(at)postgresql(dot)org, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-30 11:27:51
Message-ID:	20171030112751.mukkriz2rur2qkxc@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-10-24 03:39:54 -0700, Andres Freund wrote:
> Largely that's due to the overflow checks.
>
> For integers we currently do:
>
> #define SAMESIGN(a,b) (((a) < 0) == ((b) < 0))
>
> /*
> * Overflow check. If the inputs are of different signs then their sum
> * cannot overflow. If the inputs are of the same sign, their sum had
> * better be that sign too.
> */
> if (SAMESIGN(arg1, arg2) && !SAMESIGN(result, arg1))
> ereport(ERROR,
> (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
> errmsg("integer out of range")));
>
> which means that we turn a single integer instruction into ~10,
> including a bunch of branches. All that despite the fact that most
> architectures have flag registers signalling integer overflow. It's just
> that C doesn't easily make that available.
>
> gcc exposes more efficient overflow detection via intrinsics:
> https://gcc.gnu.org/onlinedocs/gcc-7.1.0/gcc/Integer-Overflow-Builtins.html
>
> Using that turns the non-error path from int4pl from:
>
> 0x0000000000826ec0 <+0>: mov 0x20(%rdi),%rcx # arg1
> 0x0000000000826ec4 <+4>: mov 0x28(%rdi),%rdx # arg2
> 0x0000000000826ec8 <+8>: mov %ecx,%esi
> 0x0000000000826eca <+10>: lea (%rdx,%rcx,1),%eax # add
> # overflow check
> 0x0000000000826ecd <+13>: shr $0x1f,%edx
> 0x0000000000826ed0 <+16>: not %esi
> 0x0000000000826ed2 <+18>: shr $0x1f,%esi
> 0x0000000000826ed5 <+21>: cmp %dl,%sil
> 0x0000000000826ed8 <+24>: je 0x826f30 <int4pl+112>
> 0x0000000000826eda <+26>: mov %eax,%edx
> 0x0000000000826edc <+28>: shr $0x1f,%ecx
> 0x0000000000826edf <+31>: shr $0x1f,%edx
> 0x0000000000826ee2 <+34>: cmp %cl,%dl
> 0x0000000000826ee4 <+36>: je 0x826f30 <int4pl+112>
> /* overflow error code */
> 0x0000000000826f30 <+112>: retq
>
> into
>
> 0x0000000000826ec0 <+0>: mov 0x28(%rdi),%rax # arg2
> 0x0000000000826ec4 <+4>: add 0x20(%rdi),%eax # arg1 + arg2
> 0x0000000000826ec7 <+7>: jo 0x826ecc <int4pl+12> # jump if overflowed
> 0x0000000000826ec9 <+9>: mov %eax,%eax # clear high bits
> 0x0000000000826ecb <+11>: retq
>
> which, not that surprisingly, is faster. Not to speak of easier to read
> ;)
>
> Besides the fact that the code is faster, there's also the issue that
> the current way to do overflow checks is not actually correct C, and
> requires compiler flags like -fwrapv.

Attached is a series of patches that:

0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
These use compiler intrinsics on gcc/clang. If that's not
available, they cast to a wider type and to overflow checks. For
64bit there's a fallback for the case 128bit math is not
available (here I stole from an old patch of Greg's).

These fallbacks are, as far as I can tell, C free of overflow
related undefined behaviour.

Perhaps it should rather be pg_add_s32_overflow, or a similar
naming scheme?

0002) Converts int.c, int8.c and a smattering of other functions to use
the new facilities. This removes a fair amount of code.

It might make sense to split this up further, but right now that's
the set of functions that either are affected performancewise by
previous overflow checks, and/or relied on wraparound
overflow. There's probably more places, but this is what I found
by visual inspection and compiler warnings.

0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
it seems like an important test for the new facilities. Without
0002, tests would fail after this, after it all tests run
successfully.

Greetings,

Andres Freund

Attachment	Content-Type	Size
0001-Provide-overflow-safe-integer-math-inline-functions.patch	text/x-diff	11.2 KB
0002-Rewrite-overflow-handling-to-be-faster-and-not-rely-.patch	text/x-diff	41.4 KB
0003-Do-Not-Apply-Remove-fwrapv.patch	text/x-diff	2.5 KB

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-30 16:59:42
Message-ID:	CA+TgmoZgNuniF0ONqEifEM48xoR=pkYVtgfBCcjBwcXtos4mrw@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
> These use compiler intrinsics on gcc/clang. If that's not
> available, they cast to a wider type and to overflow checks. For
> 64bit there's a fallback for the case 128bit math is not
> available (here I stole from an old patch of Greg's).
>
> These fallbacks are, as far as I can tell, C free of overflow
> related undefined behaviour.

Looks nice.

> Perhaps it should rather be pg_add_s32_overflow, or a similar
> naming scheme?

Not sure what the s is supposed to be? Signed?

> 0002) Converts int.c, int8.c and a smattering of other functions to use
> the new facilities. This removes a fair amount of code.
>
> It might make sense to split this up further, but right now that's
> the set of functions that either are affected performancewise by
> previous overflow checks, and/or relied on wraparound
> overflow. There's probably more places, but this is what I found
> by visual inspection and compiler warnings.

I lack the patience to review this tonight.

> 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
> it seems like an important test for the new facilities. Without
> 0002, tests would fail after this, after it all tests run
> successfully.

I suggest that if we think we don't need -fwrapv any more, we ought to
remove it. Otherwise, we won't find out if we're wrong.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-30 17:54:45
Message-ID:	20171030175445.uayyn7ieptl4hmby@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Hi,

On 2017-10-30 22:29:42 +0530, Robert Haas wrote:
> On Mon, Oct 30, 2017 at 4:57 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> > 0001) Introduces pg_{add,sub,mul}{16,32,64}_overflow(a, b, *result)
> > These use compiler intrinsics on gcc/clang. If that's not
> > available, they cast to a wider type and to overflow checks. For
> > 64bit there's a fallback for the case 128bit math is not
> > available (here I stole from an old patch of Greg's).
> >
> > These fallbacks are, as far as I can tell, C free of overflow
> > related undefined behaviour.
>
> Looks nice.

Thanks.

> > Perhaps it should rather be pg_add_s32_overflow, or a similar
> > naming scheme?
>
> Not sure what the s is supposed to be? Signed?

Yes, signed. So we could add a u32 or something complementing the
functions already in the patch. Even though overflow checks are a heck
of a lot easier to write for unsigned ints, the intrinsics are still
faster. I don't have any sort of strong feelings on the naming.

> > 0002) Converts int.c, int8.c and a smattering of other functions to use
> > the new facilities. This removes a fair amount of code.
> >
> > It might make sense to split this up further, but right now that's
> > the set of functions that either are affected performancewise by
> > previous overflow checks, and/or relied on wraparound
> > overflow. There's probably more places, but this is what I found
> > by visual inspection and compiler warnings.
>
> I lack the patience to review this tonight.

Understandable ;)

> > 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
> > it seems like an important test for the new facilities. Without
> > 0002, tests would fail after this, after it all tests run
> > successfully.
>
> I suggest that if we think we don't need -fwrapv any more, we ought to
> remove it. Otherwise, we won't find out if we're wrong.

I agree that we should do so at some point not too far away in the
future. Not the least because we don't specify this kind of C dialect in
a lot of other compilers. Additionally the flag causes some slowdown
(because e.g. for loop variables are optimized less). But I'm fairly
certain it needs a bit more care that I've invested as of now - should
probably at least compile with -Wstrict-overflow=some-higher-level, and
with ubsan. I'm fairly certain there's more bogus overflow checks
around...

Greetings,

Andres Freund

From:	ilmari(at)ilmari(dot)org (Dagfinn Ilmari =?utf-8?Q?Manns=C3=A5ker?=)
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Andres Freund <andres(at)anarazel(dot)de>, "pgsql-hackers\(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-30 18:12:58
Message-ID:	d8ja8084i8l.fsf@dalvik.ping.uio.no
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

Robert Haas <robertmhaas(at)gmail(dot)com> writes:

>> 0003) Removes -fwrapv. I'm *NOT* suggesting we apply this right now, but
>> it seems like an important test for the new facilities. Without
>> 0002, tests would fail after this, after it all tests run
>> successfully.
>
> I suggest that if we think we don't need -fwrapv any more, we ought to
> remove it. Otherwise, we won't find out if we're wrong.

Without -fwrapv signed overflow is undefined behaviour. We should test
thoroughly with -ftrapv or -fsanitize=signed-integer-overflow to be
confident the code is free of such things. We might even want to enable
-ftrapv by default in cassert-enabled builds.

- ilmari
--
"I use RMS as a guide in the same way that a boat captain would use
a lighthouse. It's good to know where it is, but you generally
don't want to find yourself in the same spot." - Tollef Fog Heen

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	"pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Greg Stark <stark(at)mit(dot)edu>
Subject:	Re: Current int & float overflow checking is slow.
Date:	2017-10-31 05:50:06
Message-ID:	CA+TgmobAtW6CwkOXzrhWiZRXSOEd+8O88PWNA+y-U8WNX9iWYQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Lists:	pgsql-hackers

On Mon, Oct 30, 2017 at 11:24 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> > Perhaps it should rather be pg_add_s32_overflow, or a similar
>> > naming scheme?
>>
>> Not sure what the s is supposed to be? Signed?
>
> Yes, signed. So we could add a u32 or something complementing the
> functions already in the patch. Even though overflow checks are a heck
> of a lot easier to write for unsigned ints, the intrinsics are still
> faster. I don't have any sort of strong feelings on the naming.

Right, I guess including the s is probably a good idea then.

>> I suggest that if we think we don't need -fwrapv any more, we ought to
>> remove it. Otherwise, we won't find out if we're wrong.
>
> I agree that we should do so at some point not too far away in the
> future. Not the least because we don't specify this kind of C dialect in
> a lot of other compilers. Additionally the flag causes some slowdown
> (because e.g. for loop variables are optimized less). But I'm fairly
> certain it needs a bit more care that I've invested as of now - should
> probably at least compile with -Wstrict-overflow=some-higher-level, and
> with ubsan. I'm fairly certain there's more bogus overflow checks
> around...

Makes sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company