Issue
According to cppreference, c++20 has rich (and, to me useful) support for atomic_flag
operations.
However, it's not clear whether gcc yet supports these features, they're not anywhere to be found on gnu's feature summary. I'm currently using version 8, with -c++=2a
set.
This code doesn't compile with GCC8:
#include <atomic>
int main() {
std::atomic_flag myFlag = ATOMIC_FLAG_INIT;
myFlag.test();
}
error: ‘struct std::atomic_flag’ has no member named ‘test’
I don't want to destabilize my build environment by installing a newer version of g++, and would be grateful to anyone who can report on the support for atomic_flag
in version 10 or higher.
Solution
atomic<bool>
does everything atomic_flag
does, just as efficiently on all normal C++ implementations. C++20 just added new stuff to atomic_flag to bring it up to the level of atomic<bool>
. atomic_flag
is guaranteed to be lock_free, but in practice on all platforms anyone cares about, so is atomic<bool>
.
Don't expect GCC8 to have all the C++2a features; at least try it on https://godbolt.org/ with latest release or nightly gcc. (Also note that it's not the compiler proper that needs to support this, just the standard library headers. But libstdc++ is normally distributed with g++.)
I tweaked your example so it could be compiled with optimization enabled without optimizing away the actual work.
#include <atomic>
int flagtest(std::atomic_flag &myFlag) {
//std::atomic_flag myFlag = ATOMIC_FLAG_INIT;
return myFlag.test();
}
On the Godbolt compiler explorer with gcc and clang: GCC10.2 doesn't support the new C++20 atomic_flag::test()
member function, GCC nightly trunk build does. Clang 11.0 and trunk do, clang 10.0.1 doesn't.
# GCC trunk for x86-64 -O3 -std=gnu++2a
flagtest(std::atomic_flag&):
movzx eax, BYTE PTR [rdi]
ret
booltest(std::atomic<bool>&):
movzx eax, BYTE PTR [rdi]
test al, al
setne al
movzx eax, al # this is weird, GCC has gone insane.
ret
With clang, we can also try libc++ (a new implementation of the C++ standard library). By default, clang on Linux (including Godbolt) uses libstdc++, like GCC does.
# clang 11.0 -O3 -std=gnu++2a -stdlib=libc++
flagtest(std::__1::atomic_flag&): # @flagtest(std::__1::atomic_flag&)
mov al, byte ptr [rdi]
movzx eax, al
and eax, 1
ret
booltest(std::__1::atomic<bool>&): # @booltest(std::__1::atomic<bool>&)
mov al, byte ptr [rdi]
movzx eax, al
and eax, 1
ret
So that's weird and horrible; even if the value in memory might not be booleanized, there's no reason not to merge into the low byte of RAX and then movzx eax,al
, vs. just doing a movzx load in the first place.
But and eax,1
is much less bad that GCC's insane test/setnz/movzx, if it thinks it needs to re-booleanize. (It doesn't actually need to do that; the ABI guarantees that a bool in memory is an actual 0
or 1
byte, and atomic<bool>
uses the same object-representation as bool
.)
So with clang, both ways have stupid missed-optimizations converting to int
. With GCC for some reason atomic_flag
doesn't suffer that problem, but I wouldn't recommend using it just for that reason. Hopefully atomic<bool>
will get fixed, and normally you don't convert bool to int.
Normal uses of atomic<bool>
or atomic_flag
, like branching on it, should not have any of these missed optimizations. e.g.
int g0, g1;
int conditional_load(std::atomic<bool> &myFlag) {
return myFlag ? g0 : g1;
}
# gcc 11 nightly build -O3
conditional_load(std::atomic<bool>&):
movzx eax, BYTE PTR [rdi]
test al, al
mov eax, DWORD PTR g0[rip]
cmove eax, DWORD PTR g1[rip]
ret
So that's pretty normal. Clang chooses to select between addresses, then load once. That puts the load-use latency on the critical path and takes more instructions; worse choice when both vars are adjacent so probably come from the same cache line. (GCC's choice always touches both vars, could be worse if one could stay "cold" in cache).
Answered By - Peter Cordes Answer Checked By - Candace Johnson (WPSolving Volunteer)