← Constants quiz Checking for overflows operation by operation →

A bit of explanation regarding the quiz in the last post

Pascal Cuoq - 20th Jan 2012

There are only positive constants in C, as per section 6.4.4 in the C99 standard:

integer-constant:  
        decimal-constant integer-suffixopt  
        octal-constant integer-suffixopt  
        hexadecimal-constant integer-suffixopt  
decimal-constant:  
        nonzero-digit  
        decimal-constant digit  
octal-constant:  
        0  
        octal-constant octal-digit  
hexadecimal-constant:  
        hexadecimal-preﬁx hexadecimal-digit  
        hexadecimal-constant hexadecimal-digit  
...

The minus sign is not part of the constant according to the grammar.

The expression -0x80000000 is parsed and typed as the application of the unary negation operator - to the constant 0x80000000. The table in section 6.4.4.1 of the standard shows that, when typing hexadecimal constants, unsigned types must be tried. The list of types to try to fit the hexadecimal constant in is, in order, int, unsigned int, long, unsigned long, long long, unsigned long long.

For many architectures, the first type in the list that fits 0x80000000 is unsigned int. Unary negation, when applied to an unsigned int, returns an unsigned int, so that -0x80000000 has type unsigned int and value 0x80000000.

Following the same reasoning as above, reading from the \Decimal Constant" column of the table in the C99 standard the types to try are int long and long long. This might lead you to expect -2147483648 for the value of the expression -2147483648 compiled with GCC. Instead when compiling this expression on a 32-bit architecture GCC emits a warning and the expression has the value 2147483648 instead. The warning is:

t.c:6: warning: this decimal constant is unsigned only in ISO C90

Indeed there is a subtlety here for 32-bit architectures. GCC by default follows the C90 standard. It's not so much that the spirit of the table in section 6.4.4.1 in C99 changed between C90 and C99. The spirit remained the same with unsigned types being tried for octal and hexadecimal constants and mostly only signed types being tried for decimal constants. Here is the relevant snippet from the C90 standard:

The type of an integer constant is the first of the corresponding list in which its value can be represented. Unsuffixed decimal: int long int unsigned long int;

The difference really stems from the fact C90 did not have a long long type and the list of types to try for a decimal constant ended in unsigned long since that type contained values that did not fit in any other type. On a 32-bit architecture where long and int are both 32-bit 2147483648 fits neither int nor long and so ends up being typed as an unsigned long. Note that on an architecture where long is 64-bit then 2147483648 and -2147483648 are typed as long.

Finally when GCC is told with option -std=c99 to apply C99 rules on an architecture where long is 32-bit then 2147483648 is typed as long long so that the expression -2147483648 has type long long and value -2147483648.

This should explain the results obtained when compiling the three programs from last post with GCC on 32-bit and on 64-bit architectures.