Issue
Here are two functions which I claim do exactly the same thing:
bool fast(int x)
{
return x & 4242;
}
bool slow(int x)
{
return x && (x & 4242);
}
Logically they do the same thing, and just to be 100% sure I wrote a test that ran all four billion possible inputs through both of them, and they matched. (x & 4242
is only non-zero if it has set bits in specific positions, which means x
has a non-zero value, so testing x!=0
separately as the other side of a logical &&
is redundant.) But the assembly code is a different story:
fast:
andl $4242, %edi
setne %al
ret
slow:
xorl %eax, %eax
testl %edi, %edi
je .L3
andl $4242, %edi
setne %al
.L3:
rep
ret
I was surprised that GCC could not make the leap of logic to eliminate the redundant test. I tried g++ 4.4.3 and 4.7.2 with -O2, -O3, and -Os, all of which generated the same code. The platform is Linux x86_64.
Can someone explain why GCC shouldn't be smart enough to generate the same code in both cases?
Edit to add test harness:
#include <cstdlib>
#include <vector>
using namespace std;
int main(int argc, char* argv[])
{
// make vector filled with numbers starting from argv[1]
int seed = atoi(argv[1]);
vector<int> v(100000);
for (int j = 0; j < 100000; ++j)
v[j] = j + seed;
// count how many times the function returns true
int result = 0;
for (int j = 0; j < 100000; ++j)
for (int i : v)
result += slow(i); // or fast(i), try both
return result;
}
I tested the above with clang 5.1 on Mac OS with -O3. It took 2.9 seconds using fast()
and 3.8 seconds using slow()
. If I instead use a vector of all zeros, there is no significant difference in performance between the two functions.
Other compilers:
- mainline clang 3.7 and later do the optimization even for
&&
, clang 3.6 and earlier don't. https://godbolt.org/z/v5bjrvrP1 - latest GCC trunk (march 2022) and 11.2 still don't.
- Current MSVC does both parts with branches, not using
setcc
. - ICC makes asm like GCC's, LLVM-based ICX is like clang. https://godbolt.org/z/cjKfr8r5b
Solution
You are correct that this appears to be a deficiency, and possibly an outright bug, in the optimizer.
Consider:
bool slow(int x)
{
return x && (x & 4242);
}
bool slow2(int x)
{
return (x & 4242) && x;
}
Assembly emitted by GCC 4.8.1 (-O3):
slow:
xorl %eax, %eax
testl %edi, %edi
je .L2
andl $4242, %edi
setne %al
.L2:
rep ret
slow2:
andl $4242, %edi
setne %al
ret
In other words, slow2
is misnamed.
I have only contributed the occasional patch to GCC, so whether my point of view carries any weight is debatable :-). But it is certainly strange, in my view, for GCC to optimize one of these and not the other. I suggest filing a bug report.
[Update]
Surprisingly small changes appear to make a big difference. For example:
bool slow3(int x)
{
int y = x & 4242;
return y && x;
}
...generates "slow" code again. I have no hypothesis for this behavior.
You can experiment with all of these on multiple compilers here.
Answered By - Nemo Answer Checked By - Pedro (WPSolving Volunteer)