Issue
In my program I have a function that does a simple vector addition c[0:15] = a[0:15] + b[0:15]
. The function prototype is:
void vecadd(float * restrict a, float * restrict b, float * restrict c);
On our 32-bit embedded architecture there is a load/store option of loading/storing double words, like:
r16 = 0x4000 ;
strd r0,[r16] ; stores r0 in [0x4000] and r1 in [0x4004]
The GCC optimizer recognizes the vector nature of the loop and generates two branches of the code - one for the case where the 3 arrays are double word aligned (so it uses the double load/store instructions) and the other for the case that the arrays are word-aligned (where it uses the single load/store option).
The problem is that the address alignment check is costly relative to the addition part and I want to eliminate it by hinting the compiler that a, b and c are always 8-aligned. Is there a modifier to add to the pointer declaration to tell this to the compiler?
The arrays that are used for calling this function have the aligned(8) attribute, but it is not reflected in the function code itself. is it possible to add this attribute to the function parameters?
Solution
Following a piece of example code I've found on my system, I tried the following solution, which incorporates ideas from a few of the answers given earlier: basically, create a union of a small array of floats with a 64-bit type - in this case a SIMD vector of floats - and call the function with a cast of the operand float arrays:
typedef float f2 __attribute__((vector_size(8)));
typedef union { f2 v; float f[2]; } simdfu;
void vecadd(f2 * restrict a, f2 * restrict b, f2 * restrict c);
float a[16] __attribute__((aligned(8)));
float b[16] __attribute__((aligned(8)));
float c[16] __attribute__((aligned(8)));
int main()
{
vecadd((f2 *) a, (f2 *) b, (f2 *) c);
return 0;
}
Now the compiler does not generate the 4-aligned branch.
However, the __builtin_assume_aligned()
would be the preferable solution, preventing the cast and possible side effects, if it only worked...
EDIT: I noticed that the builtin function is actually buggy on our implementation (i.e, not only it doesn't work, but it causes calculation errors later in the code.
Answered By - ysap Answer Checked By - Marilyn (WPSolving Volunteer)