Issue
It is known that asm volatile ("" ::: "memory")
can serve as a compiler barrier to prevent compiler from reordering assembly instructions across it. For example, it is mentioned in https://preshing.com/20120625/memory-ordering-at-compile-time/, section "Explicit Compiler Barriers".
However, all the articles I can find only mention the fact that asm volatile ("" ::: "memory")
can serve as a compiler barrier without giving a reason why the "memory"
clobber can effectively form a compiler barrier. The GCC online documentation only says that all the special clobber "memory"
does is tell the compiler that the assembly code may potentially perform memory reads or writes other than those specified in operands lists. But how does such a semantic cause compiler to stop any attempt to reorder memory instructions across it? I tried to answer myself but failed, so I ask here: why can asm volatile ("" ::: "memory")
serve as a compiler barrier, based on the semantics of "memory"
clobber? Please note that I am asking about "compiler barrier" (in effect at compile-time), not stronger "memory barrier" (in effect at run-time). For convenience, I excerpt the semantics of "memory"
clobber in GCC online doc below:
The
"memory"
clobber tells the compiler that the assembly code performs memory reads or writes to items other than those listed in the input and output operands (for example, accessing the memory pointed to by one of the input parameters). To ensure memory contains correct values, GCC may need to flush specific register values to memory before executing theasm
. Further, the compiler does not assume that any values read from memory before anasm
remain unchanged after thatasm
; it reloads them as needed. Using the"memory"
clobber effectively forms a read/write memory barrier for the compiler.
Solution
If a variable is potentially read or written, it matters what order that happens in. The point of a "memory"
clobber is to make sure the reads and/or writes in an asm
statement happen at the right point in the program's execution.
Any read of a C variable's value that happens in the source after an asm
statement must be after the memory-clobbering asm
statement in the compiler-generated assembly output for the target machine, otherwise it might be reading a value before the asm statement would have changed it.
Any read of a C var in the source before an asm
statement similarly must stay sequenced before, otherwise it might incorrectly read a modified value.
Similar reasoning applies to assignments to (writes of) C variables before/after any asm
statement with a "memory"
clobber. Just like a function call to an "opaque" function, one who's definition the compiler can't see.
No reads or writes can reorder with the barrier in either direction, therefore no operation before the barrier can reorder with any operation after the barrier, or vice versa.
Another way to look at it: the actual machine memory contents must match the C abstract machine at that point. The compiler-generated asm has to respect that, by storing any variable values from registers to memory before the start of an asm("":::"memory")
statement, and afterwards it has to assume that any registers that had copies of variable values might not be up to date anymore. So they have to be reloaded if they're needed.
This reads-everything / writes-everything assumption for the "memory"
clobber is what keeps the asm
statement from reordering at all at compile time wrt. all accesses, even non-volatile
ones. The volatile
is already implicit from being an asm()
statement with no "=..."
output operands, and is what stops it from being optimized away entirely (and with it the memory clobber).
Note that only potentially "reachable" C variables are affected. For example, escape analysis can still let the compiler keep a local int i
in a register across a "memory"
clobber, as long as the asm statement itself doesn't have the address as an input.
Just like a function call: for (int i=0;i<10;i++) {foobar("%d\n", i);}
can keep the loop counter in a register, and just copy it to the 2nd arg-passing register for foobar every iteration. There's no way foobar can have a reference to i
because its address hasn't been stored anywhere or passed anywhere.
(This is fine for the memory barrier use-case; no other thread could have its address either.)
Related:
- How does a mutex lock and unlock functions prevents CPU reordering? - why opaque function calls work as compiler barriers.
- How can I indicate that the memory *pointed* to by an inline ASM argument may be used? - cases where a
"memory"
clobber is needed for a non-emptyasm
statement (or other dummy operands to tell the asm statement which memory is read / written.)
Answered By - Peter Cordes