Why do signed overflows so often do what programmers expect?
Pascal Cuoq - 29th Mar 2012Semi-serious musings
During the Frama-C random testing experiment described at length on this blog and this page we found a few bugs in Csmith too. John Regehr one of the Csmith developers and not entirely coincidentally a co-author of the article linked in the previous post is also a co-author in the NFM12 article where the random testing of Frama-C is described. Some time ago while in the context of the latter article discussing the fact that Csmith had been used intensively finding 400+ bugs in compilers without the bugs that Frama-C uncovered being an annoyance or even noticed he said:
It would also maybe be useful to remark that the Csmith bugs had not been detected because compilers are pretty weak at exploiting undefined behaviors (though they do this sometimes).
I do not agree about the choice of the adjective “weak”. I find compilers are pretty good at what they are supposed to do which is apparently to ignore undefined behaviors in the program and generate the shortest and/or fastest sequences of instructions that compute the correct result for all defined executions. Sometimes an opportunity arises to exploit the fact that some inputs cause undefined behavior to shorten/fasten the sequence of instructions. But in the general case with industry-standard instruction sets there just isn't any such opportunity. The shortest fastest sequence that works for all defined inputs is the sequence that happens to give 2's complement semantics to signed overflows. I think that this is a fundamental fact of current instruction sets and considering the rate at which these change I do not expect programmers misunderstandings about signed overflows to change in the near or far future. Even with much progress in compilation techniques signed overflow will continue to give 2's complement results most of the times and programmers will continue to be surprised when it doesn't.
And now for some more Regehr links
This blog post shows how good compilers already are.
I put forward a weak version of the argument presented here on the Frama-C mailing list a long time ago and John answered with an example C program where the compiler can take advantage of the undefinedness of signed overflows to generate code that does not give 2's complement results.
John also discussed undefined behaviors on his blog which prompted Chris Lattner to write a 3-post series [removed dead link] with more of the compiler implementer's point of view and revealing examples.