Issue
I am very new to Linux Kernel-based C-coding Style. I am trying to understand the following implementation of the "atomic_add" function from "arch/arm64/include/asm/atomic.h" file (Lines 112-124 of here).
static inline void atomic_add(int i, atomic_t *v)
{
unsigned long tmp;
int result;
asm volatile("// atomic_add\n"
"1: ldxr %w0, %2\n"
" add %w0, %w0, %w3\n"
" stxr %w1, %w0, %2\n"
" cbnz %w1, 1b"
: "=&r" (result), "=&r" (tmp), "+Q" (v->counter)
: "Ir" (I));
}
Please help me to understand the following questions.
What is the meaning of %w0 or %w3? I understand that %2 is referring to the counter value.
Is %w0 referring to the (result) variable or a general-purpose register?
Does the constraint string "Ir" stand for "Immediate Register"?
Solution
The
w
is a template modifier. It causes the inline asm to contain the 32-bit name of the register (w0
, etc) instead of its 64-bit name (x0
) which would be the default. See the documentation linked by David Wohlferd. You can also try it and note that if you write%0
instead of%w0
, the generated instruction uses the 64-bitx
register. That is not what you would want since these should be 32-bit loads and stores.Both. As usual for GCC-style extended asm,
%w0
refers to operand number 0 of the inline asm (with, as mentioned, thew
modifier to use its 32-bit name). Here that is the one declared with"=&r" (result)
. Since the constraint isr
, this operand will be allocated a general-purpose register, and all mentions of%0
(respectively%w0
) in the asm code will be replaced with the name of that register. In the Godbolt example above, the compiler chosex9
(respectivelyw9
).The
(result)
means that after the asm statement, the compiler should take whatever is left inw9
and store it in the variableresult
. It could do this with a store to memory, or amov
to whatever register is being used forresult
, or it could just allocateresult
in that variable itself. With luck, the optimizer should choose the latter; and sinceresult
isn't used for anything after theasm
, it should not do anything further with that register. So in effect, an output operand with a variable that isn't used afterwards is a way of telling the compiler "please pick a register that I can use as scratch".This is two constraints,
I
andr
. Constraints are documented by GCC: simple and machine-specific, and when multiple constraints are given, the compiler can choose to satisfy any one of them.I
asks for an immediate value suitable for use in an AArch64add
instruction, i.e. a 12-bit zero-extended number optionally shifted by 12 bits which is a compile-time constant.r
, as you know, asks for a general-purpose register. So if you write any ofatomic_add(1, &c)
oratomic_add(1+1+1, &c)
oratomic_add(4095, &c)
oratomic_add(4096, &c)
, the second line of theasm
statement will be emitted as immediateadd
instruction, with your constant encoded directly into the instruction:add w9, w9, #1
and so on. But if you writeatomic_add(4097, &c)
oratomic_add(my_variable, &c)
, the compiler will generate additional code before theasm
to load the appropriate value into some register (sayw13
) and emitadd w9, w9, w13
inside yourasm
. This lets the compiler generate the more efficient immediateadd
whenever possible, while still getting correct code in general.
Answered By - Nate Eldredge