Issue
I'm writing an RPC library for AVR and need to pass a function address to some inline assembler code and call the function from within the assembler code. However the assembler complains when I try to call the function directly.
This minimal example test.cpp illustrates the issue (in the actual case I'm passing args and the function is an instantiation of a static member of templated class):
void bar () {
return;
}
void foo() {
asm volatile (
"call %0" "\n"
:
: "p" (bar)
);
}
Compiling with avr-gcc -S test.cpp -o test.S -mmcu=atmega328p
works fine but when I try to assemble with avr-gcc -c test.S -o test.o -mmcu=atmega328p
avr-as complains:
test.c: Assembler messages:
test.c:38: Error: garbage at end of line
I have no idea why it writes "test.c", the file it is referring to is test.S, which contains this on line 38:
call gs(_Z3barv)
I have tried all even remotely sensible constraints on the paramter to the inline assembler that I could find here but none of those I tried worked.
I imagine if the gs() part was removed, everything should work, but all constraints seem to add it. I have no idea what it does.
The odd thing is that doing an indirect call like this assembles just fine:
void bar () {
return;
}
void foo() {
asm volatile (
"ldi r30, lo8(%0)" "\n"
"ldi r31, hi8(%0)" "\n"
"icall" "\n"
:
: "p" (bar)
);
}
The assembler produced looks like this:
ldi r30, lo8(gs(_Z3barv))
ldi r31, hi8(gs(_Z3barv))
icall
And avr-as doesn't complain about any garbage.
Solution
There are several issues with the code:
Issue 1: Wrong Constraint
The correct constraint for a call target is "i"
, thus known at link-time.
Issue 2: Wrong % print-modifier
In order to print an address suitable for a call, use %x
which will print a plain symbol without gs()
. Generating a linker stub at this place by means of gs()
is not valid syntax, hence "garbage at end of line". Apart from that, as you are calling bar
directly, there is no need for linker stub (at least not for this kind of symbol usage).
Issue 3: call
instruction might not be available
To factor out whether a device supports call
or just rcall
, there is %~
which prints a single r
if just rcall
is available, and nothing if call
is available.
Issue 4: The Call might clobber Registers or have other Side-Effects
It's unlikely that the call has no effects on registers or on memory whatsoever. If you description of the inline asm does not match some side-effects of the code, it's likely that you will get wrong code sooner or later.
Taking it all together
Let's assume you have a function bar
written in assembly that takes two 16-bit operands in R22 and R26, and computes a result in R22. This function does not obey the avr-gcc C/C++ calling convention, so inline assembly is one way to interface to such a function. For bar
we cannot write a correct prototype anyways, so we just provide a prototype so that we can use symbol bar
. Register X has constraint "x"
, but R22 has no own register constraint, and therefore we have to use a local asm register:
extern "C" void bar (...);
int call_bar (int x, int y)
{
register int r22 __asm ("r22") = x;
__asm ("%~call %x2"
: "+r" (r22)
: "x" (y), "i" (bar));
return r22;
}
Generated code for ATmega32 + optimization:
_Z8call_barii:
movw r26,r22
movw r22,r24
call bar
movw r24,r22
ret
So what's that "generate stub" gs()
thing?
Suppose the C/C++ code is taking the address of a function. The only sensible thing to do with it is to call that function, which will be an indirect call in general. Now an indirect call can target 64KiW = 128KiB at most, so that on devices with > 128KiB of code memory, special means must be taken to indirectly call a function beyond the 128KiB boundary. The AVR hardware features an SFR named EIND
for that purpose, but problems using it are obvious. You'd have to set it prior to a call and then reset it somehow somewhere; all evil things would be necessary.
avr-gcc takes a different approach: For each such address taken, the compiler generates gs(func)
. This will just resolve to func
if the address is in the 128KiB range. If not, gs()
resolves to an address in section .trampolines
which is located close to the beginning of flash, i.e. in the lower 128KiB. .trampolines
containts a list of direct JMP
s to targets beyond the 128KiB range.
Take for example the following C code:
extern int far_func (void);
int main (void)
{
int (*pfunc)(void) = far_func;
__asm ("" : "+r" (pfunc)); /* Forget content of pfunc. */
return pfunc();
}
The __asm is used to keep the compiler from optimizing the indirect call to a direct one. Then run
> avr-gcc main.c -o main.elf -mmcu=atmega2560 -save-temps -Os -Wl,--defsym,far_func=0x24680
> avr-objdump -d main.elf > main.lst
For the matter of brevity, we just define symbol far_func
per command line.
The assembly dump in main.s
shows that far_func
might require a linker stub:
main:
ldi r30,lo8(gs(far_func))
ldi r31,hi8(gs(far_func))
eijmp
The final executable listing in main.lst
then shows that the stub is actually generated and used:
main.elf: file format elf32-avr
Disassembly of section .text:
...
000000e4 <__trampolines_start>:
e4: 0d 94 40 23 jmp 0x24680 ; 0x24680 <far_func>
...
00000104 <main>:
104: e2 e7 ldi r30, 0x72 ; 114
106: f0 e0 ldi r31, 0x00 ; 0
108: 19 94 eijmp
main loads Z=0x0072 which is a word address for byte address 0x00e4, i.e. the code is indirectly jumping to 0x00e4, and from there it jumps directly to 0x24680.
Answered By - emacs drives me nuts Answer Checked By - Pedro (WPSolving Volunteer)