Issue
Short question: Are there compiler options or functions attributes available in g++ that force the compiler to pass members of structures through registers instead of the stack.
Long question: In my application I have a list of function handles that I am basically calling in a loop. Since every function does only a small amount of work, the function call overhead needs to be minimized.
I want now to pass the arguments in a struct. This has the advantage, that a change in the arguments needs to be done only in one place not in like 20 places all over the code base. Another advantage is, that some arguments are based on template parameters which add or remove arguments. With the struct this could be overcome.
The problem is now, that if the struct has more than two members, g++ pushes the struct on the stack instead of passing the arguments in the registers. This causes the performance to go down by 50%. I produced a small example that demonstrates the problem:
#include <iostream>
struct A {
uint8_t n;
size_t& __restrict__ dataPos;
char* const __restrict__ data;
};
struct B {
size_t& __restrict__ dataPos;
char* const __restrict__ data;
};
__attribute__((noinline)) void funcStructA(A a) {
std::cout << "out struct A: n: " << a.n << " dataPos: " << a.dataPos << " data: " << a.data << std::endl;
}
__attribute__((noinline)) void funcStructB(uint8_t n, B b) {
std::cout << "out struct B: n: " << n << " dataPos: " << b.dataPos << " data: " << b.data << std::endl;
}
__attribute__((noinline)) void funcDirect(uint8_t n, size_t& __restrict__ dataPos, char* const __restrict__ data) {
std::cout << "out direct: n: " << n << " dataPos: " << dataPos << " data: " << data << std::endl;
}
int main(int nargs, char** args) {
char data[1000];
size_t pos = 100;
funcStructA(A{10, pos, data});
funcStructB(10, B{pos, data});
funcDirect(10, pos, data);
return 0;
}
The assembly code (g++ -std=c++14 -O3, version 11.2.1 20220127 (Red Hat 11.2.1-9)) in main is:
401119: push QWORD PTR [rsp+0x10]
40111d: push QWORD PTR [rsp+0x10]
401121: push QWORD PTR [rsp+0x38]
401125: call 401280 <funcStructA(A)>
40112a: add rsp,0x20
40112e: mov rsi,rbp
401131: mov rdx,r12
401134: mov edi,0xa
401139: call 4013a0 <funcStructB(unsigned char, B)>
40113e: mov rdx,r12
401141: mov rsi,rbp
401144: mov edi,0xa
401149: call 4014c0 <funcDirect(unsigned char, unsigned long&, char*)>
In functStructA
the structure is pushed to the stack, for funcStructB
the members are passed through the registers.
I tried to move n
around in the struct or pass it by reference, but the behavior is always the same.
I read through the attributes available in gnu (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes, https://gcc.gnu.org/onlinedocs/gcc/x86-Function-Attributes.html#x86-Function-Attributes) but could not find one that matches my problem. I tried cdcl
, fastcall
, ms_abi
but this changed not that much.
Passing the structure by reference causes the same problems.
clang++ seems to have the same problem. I will run a test in the next days.
Any help would be appreciated.
Solution
You could pass the uint8_t
or one of the pointers as a separate arg to describe what you want to the compiler, or stuff it into one of the existing 64-bit members (see below).
Unfortunately no, there aren't compiler options that tweak the C ABI / calling-convention rules to pass structs larger than 16 bytes in registers on x86-64 or other ISAs. The x86-64 System V ABI doesn't do that, and there isn't another calling convention GCC knows about which does. The Windows x64 ABI only passes up to 8-byte objects in registers, not even 16.
Also, you can't override the C++ ABI rule that non-trivially-copyable objects (or whatever the exact criterion is) are passed in memory so they always have an address. (e.g. by value on the stack in x86-64 System V.)
The only options I know of that modify the calling convention are -mabi=ms
or whatever to select an existing calling convention GCC knows about. Or ones that affect whether certain registers are call-preserved or call-clobbered, like -fcall-used-
reg (GCC manual) and some ABI-affecting options like -fpack-struct[=n]
that aren't specifically about the calling convention. (And no, -fpack-struct
wouldn't help. Bringing sizeof(A)
down from 24 to 17 doesn't let it fit in 2 regs.
In theory with -fwhole-program
or maybe -flto
, GCC could invent custom calling conventions, but AFAIK it doesn't. It can take advantage of the fact that another function doesn't clobber certain registers, in terms of inter-procedural optimization (IPO) other than inlining, but not changing how args are passed.
The normal way to handle calling-convention overhead is to make sure small functions inline (e.g. by compiling with -flto
to allow cross-file inlining), but this doesn't work if you're taking function pointers or using virtual functions.
It's not number of members, it's total size, so the x32 ABI (with 32-bit pointers/references and size_t
) would be able to pass / return that struct packed into two registers. g++ -O3 -mx32
.
(x86-64 SysV packs aggregates into up-to-2 registers using the same layout it would in memory, so smaller members means more member fit in 16 bytes.)
Or if you can settle for having a 32-bit size by value, or 48-bit size, you could pack the uint8_t
into the upper byte of a uint64_t
, or even use bitfield members. But since you have a level of indirection (a reference member) for size_t& __restrict__ dataPos;
, that member is basically another pointer; using uint32_t&
there wouldn't help since a pointer is still 64 bits. I assume you need that to be a reference for some reason.
You could pack your uint8_t
into the upper byte of a pointer. Upcoming HW will have an option to optimize this, ignoring high bits instead of enforcing correct sign-extension from 48-bit or 57-bit. Otherwise you just manually do that with shifts and &
with uintptr_t
: Using the extra 16 bits in 64-bit pointers
Answered By - Peter Cordes Answer Checked By - Robin (WPSolving Admin)