Issue
This code:
const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int keepPadding() {
return (int)(&padding);
}
int foo() {
return (int)(&myTable); // <-- this is the part I'm looking at
}
compiles to the following assembly for the thumb instruction set (abbreviated for clarity). Note particularly the adds
as the second instruction of foo
:
...
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r0, .L5
@ sp needed
adds r0, r0, #10
bx lr
.L6:
.align 2
.L5:
.word .LANCHOR0
.size foo, .-foo
.align 1
.global bar
.syntax unified
.code 16
.thumb_func
.type bar, %function
...
myTable:
.ascii "\001\002\003\004"
It looks like it's loading a pointer (ldr
) to the top of .rodata and then programmatically offsetting to the location of myTable
(adds
). But why not just load the address of the table itself directly?
Note: when I remove the const
then it seems to do it without the ADDS
instruction (with myTable
in .data
)
The context of the question is that I'm trying to hand-optimize some C firmware and noticed this adds
instruction that seems to be superfluous, so I'm wondering if there's a way to restructure my code to get rid of it.
Note: this is all compiled for the ARM thumb instruction set as follows (using arm-none-eabi-gcc version 11.2.1):
arm-none-eabi-gcc -Os -c -mcpu=cortex-m0 -mthumb temp.c -S
Also note: the example code here is intended to represent a snippet of a larger codebase. If myTable
were the only thing compiled then it lands at offset 0 in .rodata
and the adds
instruction disappears, but that is not the typcial case a real-world scenario. To represent the typical real-world scenario that produces this assembly, I added padding before the table.
See also here it's reproduced on Godbolt
Solution
The question originally contained just this:
const char myTable[] = { 1, 2, 3, 4 };
int foo() {
return (int)(&myTable);
}
arm-none-eabi-gcc -Os -c -mthumb so.c -o so.o
arm-none-eabi-objdump -D so.o
but it did not produce the adds:
Disassembly of section .text:
00000000 <foo>:
0: 4800 ldr r0, [pc, #0] ; (4 <foo+0x4>)
2: 4770 bx lr
4: 00000000 andeq r0, r0, r0
Disassembly of section .rodata:
00000000 <myTable>:
0: 04030201 streq r0, [r3], #-513 ; 0xfffffdff
The question has been edited to show a repeatable example, and this answer has been edited as a result. But I will just leave the answer to work toward the same solution. As maybe it is of interest that to get to the anchor took a few components to avoid the problem being optimized out.
So from your question and this:
const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int foo() {
return (int)(&myTable);
}
It is obvious why myTable is at an offset of 10.
But padding is optimized out so you still end up with the same result.
So:
const char padding[] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, };
const char myTable[] = { 1, 2, 3, 4 };
int keepPadding() {
return (int)(&padding);
}
int foo() {
return (int)(&myTable);
}
The name of that function implies you know all of this already and know what it took to make a minimum example, etc.
arm-none-eabi-gcc -Os -c -mthumb so.c -S
foo:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r0, .L5
@ sp needed
adds r0, r0, #10
bx lr
.L6:
.align 2
.L5:
.word .LANCHOR0
.size foo, .-foo
.global myTable
.global padding
.section .rodata
.set .LANCHOR0,. + 0
.type padding, %object
.size padding, 10
padding:
.space 10
.type myTable, %object
.size myTable, 4
myTable:
.ascii "\001\002\003\004"
.ident "GCC: (GNU) 11.2.0"
It is generating an anchor then referencing from the anchor rather than directly to the label.
I suspect it is to allow for an optimization of the ldr. Let's try:
arm-none-eabi-gcc -Os -c -mthumb -mcpu=cortex-m4 so.c -S
foo:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r0, .L5
bx lr
.L6:
.align 2
.L5:
.word .LANCHOR0+10
.size foo, .-foo
00000008 <foo>:
8: 4800 ldr r0, [pc, #0] ; (c <foo+0x4>)
a: 4770 bx lr
c: 0000000a .word 0x0000000a
yeah, so that fixed it, but what about linking it
Disassembly of section .rodata:
00000000 <padding>:
...
0000000a <myTable>:
a: 04030201 streq r0, [r3], #-513 ; 0xfffffdff
Disassembly of section .text:
00000010 <keepPadding>:
10: 4800 ldr r0, [pc, #0] ; (14 <keepPadding+0x4>)
12: 4770 bx lr
14: 00000000 andeq r0, r0, r0
00000018 <foo>:
18: 4801 ldr r0, [pc, #4] ; (20 <foo+0x8>)
1a: 300a adds r0, #10
1c: 4770 bx lr
1e: 46c0 nop ; (mov r8, r8)
20: 00000000 andeq r0, r0, r0
Nope, was hoping that the linker would replace the pc-relative load and turn that into a mov r0,#0...Saving the load which is (might be) an optimization for systems that are not cortex-m (or even cortex-m).
Note: this also works
arm-none-eabi-gcc -Os -c -mthumb -fno-section-anchors so.c -o so.o
00000008 <foo>:
8: 4800 ldr r0, [pc, #0] ; (c <foo+0x4>)
a: 4770 bx lr
c: 00000000 andeq r0, r0, r0
foo:
@ Function supports interworking.
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
ldr r0, .L5
@ sp needed
bx lr
.L6:
.align 2
.L5:
.word myTable
.size foo, .-foo
.global myTable
.section .rodata
.type myTable, %object
.size myTable, 4
myTable:
.ascii "\001\002\003\004"
.global padding
.type padding, %object
.size padding, 10
The anchor was not used so the address of myTable was used directly.
From my perspective the "why" is because an anchor was used and the padding in front caused myTable to be an offset from the anchor. So the load loads the anchor address then adds gets you from the anchor to the table.
Why the anchor? Exercise for the reader, or someone else.
Answered By - old_timer Answer Checked By - Pedro (WPSolving Volunteer)