Wednesday, November 17, 2021

[SOLVED] Why local variables have undetermined values in C if not initialized?

Issue

In C - Linux OS, when a function is called the epilogue portion of Assembly creates a stack frame and the local variables are in reference to base pointers. My question is that what makes the variable hold undetermined values when we print the variable without initializing. My theory is that when we make use of the variable, the OS brings the page corresponding to the local variable's address and the address in that page may have some value that makes the value of the local variable. Is that correct?


Solution

Consider the compiler compiling a program that correctly initializes an object:

int x = 3;
printf("%d\n", x);
int y = 4+x*7;
printf("%d\n", y);

This might result in assembly code:

Store 3 in X.                   // "X" refers to the stack location assigned for x.
Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Load from X into R1.            // R1 is the register for the second argument.
Call printf.
Load 4 into R1.                 // Start the 4 of 4+x*7.
Load from X into R2             // Get x to calculate with it.
Multiply R2 by 7.               // Make x*7.
Add R2 to R1.                   // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.

This is a working program. Now suppose we do not initialize x and have int x; instead. Since x is not initialized, the rules say it does not have a determined value. This means the compiler is allowed to omit all the instructions that get the value of x. So let’s take the working assembly code and remove all the instructions that get the value of x:

Load address of "%d\n" into R0. // R0 is the register used for passing the first argument.
Call printf.
Load 4 into R1.                 // Start the 4 of 4+x*7.
Multiply R2 by 7.               // Make x*7.
Add R2 to R1.                   // Finish 4+x*7.
Load address of "%d\n" into R0.
Call printf.

In this program, the first printf prints whatever was in R1, because the value of x was never loaded into R1. And the calculation of x*7 uses whatever is in R2, because the value of x was never loaded into R2. So this program might print, say, “37” for the first printf, because there happened to be a 37 in R1, but it might print, say “4” for the second printf, because there happened to be a 0 in R2. So the output of this program “looks like” x had the value 37 at one moment and the value 0 at another. The program behaves as if x does not have any fixed value.

This is a very simplified example. Practically, when a compiler is removing code during optimization, it would remove more. For example, if it knows x is not initialized, it might not remove only the load of x but also the multiply by 7. However, this example serves to demonstrate the principle: When there is an uninitialized value, the compiler can radically change the code that is generated.



Answered By - Eric Postpischil