Issue
I was playing around with strcmp
when I noticed this, here is the code:
#include <string.h>
#include <stdio.h>
int main(){
//passing strings directly
printf("%d\n", strcmp("ahmad", "fatema"));
//passing strings as pointers
char *a= "ahmad";
char *b= "fatema";
printf("%d\n",strcmp(a,b));
return 0;
}
the output is:
-1
-5
shouldn't strcmp
work the same? Why is it that I am given different value when I pass strings as "ahmad"
or as char* a = "ahmad"
. When you pass values to a function they are allocated in its stack right?
Solution
TL:DR: Use gcc -fno-builtin-strcmp
so strcmp()
isn't treated as equivalent to __builtin_strcmp()
. With optimization disabled, GCC will only be able to do constant-propagation within a single statement, not across statements. The actual library version subtracts the differing character; the compile-time eval probably normalizes the result to 1 / 0 / -1, which isn't required or guaranteed by ISO C.
You are most likely seeing the result of a compiler optimization. If we test the code using gcc on godbolt, with -O0
optimization level, we can see for the first case it does not call strcmp
:
movl $-1, %esi #,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #
Since your are using constants as arguments to strcmp the compiler is able for perform constant folding and call a compiler intrinsic at compile time and generate the -1
then, instead of having to call strcmp
at run-time which is implemented in the standard library and will have a different implementation then a likely more simple compile time strcmp
.
In the second case it does generate a call to strcmp
:
call strcmp #
movl %eax, %esi # D.2047,
movl $.LC0, %edi #,
movl $0, %eax #,
call printf #
This is consistent with the fact that gcc has a builtin for strcmp, which is what gcc
will use during constant folding.
If we further test using -O1
optimization level or greater gcc
is able to fold both cases and the result will be -1
for both cases:
movl $-1, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
movl $-1, %esi #,
movl $.LC0, %edi #,
xorl %eax, %eax #
call printf #
With more optimizations options turned on the optimizer is able to determine that a
and b
point to constants known at compile time as well and can also compute the result of strcmp
for this case as well during compile time.
We can confirm that gcc
is using builtin function by building with the -fno-builtin flag and observing that a call to strcmp
will be generated for all cases.
clang
is slightly different in that it does not fold at all using -O0
but will fold at -O1
and above for both.
Note, that any negative result is an entirely conformant, we can see by going to the draft C99 standard section 7.21.4.2
The strcmp function which says (emphasis mine):
int strcmp(const char *s1, const char *s2);
The strcmp function returns an integer greater than, equal to, or less than zero, accordingly as the string pointed to by s1 is greater than, equal to, or less than the string pointed to by s2.
technosurus points out that strcmp
is specified to treat the strings as if they were composed of unsigned char, this is covered in C99 under 7.21.1
which says:
For all functions in this subclause, each character shall be interpreted as if it had the type unsigned char (and therefore every possible object representation is valid and has a different value).
Answered By - Shafik Yaghmour Answer Checked By - Candace Johnson (WPSolving Volunteer)