Re: Inconsistent shift operator

Lists: pgsql-bugs
From: Roman Kononov <kononov(at)dls(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Inconsistent shift operator
Date: 2008-04-19 06:25:25
Message-ID: 48099055.9040607@dls.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

The below test cases show the obvious inconsistency between different
integer types.

Regards,

Roman

test=# \t
Showing only tuples.
test=# select 1::int2 << 17;
0

test=# select 1::int4 << 33;
2

test=# select 1::int8 << 65;
2

test=# select 2::int2 >> 17;
0

test=# select 2::int4 >> 33;
1

test=# select 2::int8 >> 65;
1


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Roman Kononov <kononov(at)dls(dot)net>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Inconsistent shift operator
Date: 2008-04-19 15:26:57
Message-ID: 9094.1208618817@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Roman Kononov <kononov(at)dls(dot)net> writes:
> The below test cases show the obvious inconsistency between different
> integer types.

[ shrug... ] The << and >> operators just expose the behavior of the
local C compiler's shift operators, and it's clearly stated in the C
spec that shifting by more than the word width produces unspecified
results. I don't see a bug here, any more than it's a bug that the
floating point operators expose machine-specific behavior.

regards, tom lane


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Inconsistent shift operator
Date: 2008-04-20 13:38:42
Message-ID: 20080420133842.GN6870@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sat, Apr 19, 2008 at 11:26:57AM -0400, Tom Lane wrote:
> Roman Kononov <kononov(at)dls(dot)net> writes:
> > The below test cases show the obvious inconsistency between different
> > integer types.
>
> [ shrug... ] The << and >> operators just expose the behavior of the
> local C compiler's shift operators, and it's clearly stated in the C
> spec that shifting by more than the word width produces unspecified
> results.

I thought that getting the correct answer was more important than
getting the "wrong" answer quickly. The current code also introduces
its own set of strangeness in the 16 bit case on my x86_64 box. It does
something closer to:

int16 shl (int val, int n) {
n %= 32;
return n < 16 ? val << n : 0;
}

This is exactly what the code says, it's just the resulting semantics
aren't very nice for the user.

Wouldn't it be easy to put some code like this in:

if (arg2 < 16)
return PG_RETURN_INT16(arg1 << arg2);
return PG_RETURN_INT16(0);

People would have one less platform specific weirdo to worry about.

As an aside, I thought it would be interesting to see what MySQL did and
it seems to check for and handle this case--albeit only the 64bit case,
but as far as I can tell it only really knows about "long long" ints.

Sam


From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Sam Mason <sam(at)samason(dot)me(dot)uk>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Inconsistent shift operator
Date: 2008-04-20 16:27:38
Message-ID: 21647.1208708858@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> Wouldn't it be easy to put some code like this in:

> if (arg2 < 16)
> return PG_RETURN_INT16(arg1 << arg2);
> return PG_RETURN_INT16(0);

This is a straw man. It covers *one* of the behaviors left undefined
by the C standard. I quote from C99:

[#3] The integer promotions are performed on each of the
operands. The type of the result is that of the promoted
left operand. If the value of the right operand is negative
or is greater than or equal to the width of the promoted
left operand, the behavior is undefined.

[#4] The result of E1 << E2 is E1 left-shifted E2 bit
positions; vacated bits are filled with zeros. If E1 has an
unsigned type, the value of the result is E1<<E2, reduced
modulo one more than the maximum value representable in the
result type. If E1 has a signed type and nonnegative value,
and E1<<E2 is representable in the result type, then that is
the resulting value; otherwise, the behavior is undefined.

[#5] The result of E1 >> E2 is E1 right-shifted E2 bit
positions. If E1 has an unsigned type or if E1 has a signed
type and a nonnegative value, the value of the result is the
integral part of the quotient of E1 divided by the quantity,
2 raised to the power E2. If E1 has a signed type and a
negative value, the resulting value is implementation-
defined.

We are shifting signed types so we are exposed to every one of these
unspecified behaviors. In particular, since we don't know whether the
behavior of >> will be zero-fill or sign-fill, it's not going to be
exactly trivial to make a consistent extension of it to shift values
greater than the word width.

By the time you get done replicating all this for the int2, int4,
and int8 shift operators, it's not looking like such a small patch
anymore. And I still find the premise entirely unconvincing.
Maybe the user *wants* to see the local behavior of shift, whatever
it might be. It's certainly not impossible that we'd break applications
that worked fine before (at least on the hardware they were being
used on).

regards, tom lane


From: Peter Eisentraut <peter_e(at)gmx(dot)net>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Sam Mason <sam(at)samason(dot)me(dot)uk>
Subject: Re: Inconsistent shift operator
Date: 2008-04-20 17:08:00
Message-ID: 200804201908.02170.peter_e@gmx.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

Tom Lane wrote:
> And I still find the premise entirely unconvincing.
> Maybe the user *wants* to see the local behavior of shift, whatever
> it might be.  It's certainly not impossible that we'd break applications
> that worked fine before (at least on the hardware they were being
> used on).

Certainly someone who uses shift operators is more prone to accept
architecture-dependent behaviors. Those who like to take a more abstract
view can use regular arithmetic.


From: Sam Mason <sam(at)samason(dot)me(dot)uk>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Inconsistent shift operator
Date: 2008-04-20 18:58:59
Message-ID: 20080420185859.GO6870@frubble.xen.chris-lamb.co.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-bugs

On Sun, Apr 20, 2008 at 12:27:38PM -0400, Tom Lane wrote:
> Sam Mason <sam(at)samason(dot)me(dot)uk> writes:
> > Wouldn't it be easy to put some code like this in:
> > if (arg2 < 16)
> > return PG_RETURN_INT16(arg1 << arg2);
> > return PG_RETURN_INT16(0);
>
> This is a straw man. It covers *one* of the behaviors left undefined
> by the C standard. I quote from C99:

[...]

wow, I never realised how little semantics C actually defines. I'm a
fan of formally defined semantics and the above just seems like a big
cop-out. The case of E1 being negative seems completely implementation
defined!

> We are shifting signed types so we are exposed to every one of these
> unspecified behaviors. In particular, since we don't know whether the
> behavior of >> will be zero-fill or sign-fill, it's not going to be
> exactly trivial to make a consistent extension of it to shift values
> greater than the word width.

About the only reasonable thing I can think of that would remain within
C's spec (at least with respect of shifting) would be to always treat
E1 as an unsigned value. This would allow it to be used consistently
with the other bit-wise operators, but would cause any non-bitwise
interpretation of the result to become implementation defined. Not very
nice.

> By the time you get done replicating all this for the int2, int4,
> and int8 shift operators, it's not looking like such a small patch
> anymore. And I still find the premise entirely unconvincing.

After an afternoon of getting utterly bogged down looking into what
other languages do and getting very distracted with ring theory I'm
tempted to reluctantly agree. Maybe the warning about floats could be
extended to cover the shift operators as well. Maybe:

The bit shift operators only return consistent results when both the
RHS is within the bit-size of the arguments' data type and the LHS is
positive, in all other cases the behaviour is platform specific.

I think that's technically correct, but seems to come across as very
pessimistic.

> Maybe the user *wants* to see the local behavior of shift, whatever
> it might be. It's certainly not impossible that we'd break applications
> that worked fine before (at least on the hardware they were being
> used on).

Yes. It always surprises me how hard getting formal semantics into an
existing language seems to be.

Sam