Monday, February 21, 2022

[SOLVED] Are C++20 new atomic_flag features supported in g++ / gcc?


According to cppreference, c++20 has rich (and, to me useful) support for atomic_flag operations.

However, it's not clear whether gcc yet supports these features, they're not anywhere to be found on gnu's feature summary. I'm currently using version 8, with -c++=2a set.

This code doesn't compile with GCC8:

#include <atomic>

int main() {
  std::atomic_flag myFlag = ATOMIC_FLAG_INIT;

error: ‘struct std::atomic_flag’ has no member named ‘test’

I don't want to destabilize my build environment by installing a newer version of g++, and would be grateful to anyone who can report on the support for atomic_flag in version 10 or higher.


atomic<bool> does everything atomic_flag does, just as efficiently on all normal C++ implementations. C++20 just added new stuff to atomic_flag to bring it up to the level of atomic<bool>. atomic_flag is guaranteed to be lock_free, but in practice on all platforms anyone cares about, so is atomic<bool>.

Don't expect GCC8 to have all the C++2a features; at least try it on with latest release or nightly gcc. (Also note that it's not the compiler proper that needs to support this, just the standard library headers. But libstdc++ is normally distributed with g++.)

I tweaked your example so it could be compiled with optimization enabled without optimizing away the actual work.

#include <atomic>

int flagtest(std::atomic_flag &myFlag) {
  //std::atomic_flag myFlag = ATOMIC_FLAG_INIT;
  return myFlag.test();

On the Godbolt compiler explorer with gcc and clang: GCC10.2 doesn't support the new C++20 atomic_flag::test() member function, GCC nightly trunk build does. Clang 11.0 and trunk do, clang 10.0.1 doesn't.

# GCC trunk for x86-64 -O3 -std=gnu++2a
        movzx   eax, BYTE PTR [rdi]
        movzx   eax, BYTE PTR [rdi]
        test    al, al
        setne   al
        movzx   eax, al                # this is weird, GCC has gone insane.

With clang, we can also try libc++ (a new implementation of the C++ standard library). By default, clang on Linux (including Godbolt) uses libstdc++, like GCC does.

# clang 11.0 -O3 -std=gnu++2a -stdlib=libc++
flagtest(std::__1::atomic_flag&):      # @flagtest(std::__1::atomic_flag&)
        mov     al, byte ptr [rdi]
        movzx   eax, al
        and     eax, 1
booltest(std::__1::atomic<bool>&):         # @booltest(std::__1::atomic<bool>&)
        mov     al, byte ptr [rdi]
        movzx   eax, al
        and     eax, 1

So that's weird and horrible; even if the value in memory might not be booleanized, there's no reason not to merge into the low byte of RAX and then movzx eax,al, vs. just doing a movzx load in the first place.

But and eax,1 is much less bad that GCC's insane test/setnz/movzx, if it thinks it needs to re-booleanize. (It doesn't actually need to do that; the ABI guarantees that a bool in memory is an actual 0 or 1 byte, and atomic<bool> uses the same object-representation as bool.)

So with clang, both ways have stupid missed-optimizations converting to int. With GCC for some reason atomic_flag doesn't suffer that problem, but I wouldn't recommend using it just for that reason. Hopefully atomic<bool> will get fixed, and normally you don't convert bool to int.

Normal uses of atomic<bool> or atomic_flag, like branching on it, should not have any of these missed optimizations. e.g.

int g0, g1;
int conditional_load(std::atomic<bool> &myFlag) {
    return myFlag ? g0 : g1;
# gcc 11 nightly build -O3
        movzx   eax, BYTE PTR [rdi]
        test    al, al
        mov     eax, DWORD PTR g0[rip]
        cmove   eax, DWORD PTR g1[rip]

So that's pretty normal. Clang chooses to select between addresses, then load once. That puts the load-use latency on the critical path and takes more instructions; worse choice when both vars are adjacent so probably come from the same cache line. (GCC's choice always touches both vars, could be worse if one could stay "cold" in cache).

Answered By - Peter Cordes
Answer Checked By - Candace Johnson (WPSolving Volunteer)