Issue
I wonder how the compilers deal with undefined behavior.
I will take GCC 10.4
for x86
architecture and -O2 -std=c++03
flags as an example, but please feel free to comment on other compilers.
What does it take to alter the outcome of an operation with UB?
The language standard does not prescribe what should happen if an operation has UB, but compiler will do something. That is, I'm not asking what happens in UB from C++'s perspective but from compiler's perspective. I know the c++ standard does not impose any restriction on the behavior of the program.
For example, if I have UB due to the value of the object in a memory location being modified more than once by the evaluation of an expression, like so:
int i = 0;
i = ++i + i++; // UB pre-C++11
the chosen compiler in this setup generates the assembly code that reduces the computation to a constant, 3 in this case, see https://godbolt.org/z/MEEGT15dM.
What can cause the constant to become anything rather than 3 if I do not change the compiler, its version, flags or architecture? Could editing the function without changing the value of i
before the erroneous statement cause it?
Solution
The C and C++ language standards define “undefined behavior” to be behavior for which the standard imposes no requirements. Note the emphasized part. In particular, this does not mean there are no requirements for the behavior as a whole, but only that from the language standard's perspective. There may be requirements from other specifications that the compiler seeks to conform to, including its own.
Compilers commonly support many things that are “undefined behavior” in the sense of a language standard. A few examples are:
- linking code written in multiple programming languages,
- calling operating system routines that display graphics or perform network communication or perform other operating system services,
- providing features for special alignment requests and other variable attributes,
- allowing insertion of assembly language into C or C++ code,
- providing routines or operations to count bits in a word, to find the first bit set, to perform arithmetic with overflow handling,
- providing support for SIMD features, and
- defining functions inside functions.
Anything a compiler supports should be stable; it should not be affected by changing optimization switches, language-variant-selection switches, or other switches except as documented by the compiler. So these “undefined behaviors” should be consistent.
Outside of these, there are things that are neither defined by the applicable language standard nor by the compiler (directly in its own documentation or indirectly through specifications it seeks to conform to). For the most part, you should regard these as not stable. Behaviors that are not at all part of the compiler design may change when optimization switches are changed, when other code is changed, when patterns of memory use or contents of memory are changed, and so on.
Although you generally cannot rely on such behaviors, this does not mean they are without pattern. Compilers are not designed randomly; they are properties that arise out of their design. Experienced programmers may recognize certain symptoms as clues about what is wrong in a program. Even though the behavior is undefined (by the language standard and by the compiler), it nonetheless may fall into a pattern because of how we design software. For example, overrunning a buffer may corrupt data further up (earlier) on the stack. This is not guaranteed to open; optimization can change what happens when a buffer is overrun, but it is nonetheless a common result. Furthermore, it is a result some people do rely on. Malicious people may seek to exploit buffer overruns to attack programs and steal information or money, to take control of systems, or to crash or otherwise cause denial of service. The behavior they exploit is not random; it is at least partly predictable, and that is what affords them the opportunity to exploit it. So even fully undefined behavior cannot be regarded as random; good programmers must consider the consequences of undefined behavior and seek to mitigate it.
What can cause the constant to become anything rather than 3 if I do not change the compiler, its version, flags or architecture?
For the most part, if you change nothing about a compilation, you should get the same result every time, with a few exceptions. This is because a compiler is a machine; it proceeds mechanically and executes its program mechanically. If the compiler has no bugs, then its behavior should be defined by its source code (even if we, the users, do not know what the definition is), and that means that, given the same input and circumstances, it should produce the same output.
One exception is that compilers might inject date or time information into their output. Similarly, other variations in the execution environment might cause some changes. Another issue is that the output of the compiler is object code, and the object code is not the complete program, so the final program may be influenced by other things. An example is that modern multi-user operating systems commonly use address space layout randomization, so many of the addresses in a program will vary from execution to execution. This is unlikely to affect your i = ++i + i++;
example, but it means other bugs resulting in undefined behavior can exhibit some randomness due to the addresses involved.
Answered By - Eric Postpischil Answer Checked By - Cary Denson (WPSolving Admin)