Issue
Inspired by a recent question.
One use case for gcc-style inline assembly is to encode instructions neither compiler nor assembler are aware of. For example, I gave this example for how to use the rdrand
instruction on a toolchain too old to support it:
/* "rdrand %%rax ; setc %b1" */
asm volatile (".byte 0x48, 0x0f, 0xc7, 0xf0; setc %b1"
: "=a"(result), "=qm"(success) :: "cc");
Unfortunately, hard-coding the instruction means that you also need to hard-code the registers used with it, greatly reducing the compiler's freedom to perform register allocation.
On some architectures (like RISC-V with its .insn
directive) the assembler provides a way to systematically build original instructions, but that seems to be the exception.
A simple solution would be to have a way to obtain the undecorated number of the register to manually encode it into the instruction. For example, suppose a template modifier X
existed to print the number of the register chosen. Then, the above example could be made more flexible as such:
/* "rdrand %0 ; setc %b1" */
asm volatile (".byte 0x48 | (%X0 >> 3), 0x0f, 0xc7, 0xf0 | (%X0 & 7); setc %b1"
: "=r"(result), "=qm"(success) :: "cc");
Similarly, if there was a way to have gcc print 12
instead of v12
for SIMD register 12 on ARM64, it would be possible to do stuff like this:
float32x4_t add3(float32x4_t a, float32x4_t b)
{
float32x4_t c;
/* fadd %0, %1, %2 */
asm (".inst 0x4e20d40 + %X0 + (%X1<<5) + (%X2<<16)" : "=w"(c) : "w"(a), "w"(b));
return c;
}
Is there a way to obtain the register number? If no, what other options exist to encode instructions neither compiler nor assembler are aware of without having to hard-code register numbers?
Solution
I've actually had the same problem and came up with the following solution.
#define REG_CONST(n) asm(".equ .L__reg_const__v" #n ", " #n);
REG_CONST(0)
REG_CONST(1)
REG_CONST(2)
REG_CONST(3)
// ... repeat this for all register numbers ...
REG_CONST(27)
REG_CONST(28)
REG_CONST(29)
REG_CONST(30)
float32x4_t add3(float32x4_t a, float32x4_t b) {
float32x4_t c;
// fadd %0, %1, %2
asm(".inst 0x4e20d40 | .L__reg_const__%0 | (.L__reg_const__%1 << 5) + (.L__reg_const__%2 << 16)" : "=w"(c) : "w"(a), "w"(b));
return c;
}
how does this work?
- Keep in mind that the placeholder like
%X1
will be filled with a register name with simple string replacements by the compiler before before passing the result to the assembler. - inside assembly files we can use the
.equ
directive to define symbols to represent integers. (symbols that start with.L
will be not be visible in the generated object file, so we don't unnecessarily clutter the symbol table) - each of the invocations of the
REG_CONST
macro will define a (local) symbol:.L__reg_const__v0
which will be equal to 0,.L__reg_const__v1
equal to 1,.L__reg_const__v2
to 2, and so on. - the macros are intentionally placed at the top of the file, outside any function because the resulting
asm(".equ .L__reg_const__v0 0")
expression is supposed to go at the top of the assembly file. - in the
asm(".inst ...")
template inside theadd3
function the%X0
,%X1
,%X2
will then be replaced with whatever register the compiler selected fora
,b
andc
. - since we sneakily wrote the placeholder without any space directly after the
.L__reg_const__
expression, the replacement will turn it into expressions like.L__reg_const__v7
. - but this corresponds exactly to the name of the integer symbols we defined at the top! so the assembler will actually pick this up as a symbol and replace it with the integer value we defined.
- after evaluating the symbols, the result is a purely numeric expression and the assembler will happily "or" the integer values together, yielding the desired opcode.
Answered By - Martin Keßler Answer Checked By - Willingham (WPSolving Volunteer)