Issue
I am working with a company to create a specialized set of codes that use SSE instructions. They are using the Qt Creator IDE. I have provided them with a C header file and corresponding *.c file that compile in XCode and Visual Studio; they don't compile in Qt Creator.
How do I alter the appropriate compiler settings so that SSE instructions will compile in Qt Creator?
The line causing the first error is as follows:
_mm_store_ps(outData, _mm_add_ps(*l, *r));
The error statements read:
Inlining failed in call to always_inline ‘__mm_add_ps’: target specific option mismatch
Inlining failed in call to always_inline ‘_mm_store_ps’: target specific option mismatch
We're also getting this warning:
SSE vector return without SSE enabled changes the ABI [-Wpsabi]
Solution
Compilers like gcc and clang require that the relevant instruction-set extensions be enabled when compiling code that uses intrinsics and vector types. e.g. -msse2
. This also lets the compiler auto-vectorize with SSE2.
SSE2 is baseline for x86-64, so I guess you're building a 32-bit binary for some reason?
I think some compilers (maybe MSVC) will let you use intrinsics without enabling the compiler to automatically generate the instructions.
If you want to do runtime CPU dispatching, so you have some functions that use SSE4.1 or AVX, but need your program to work on computers without those: put your SSE4 and AVX functions in separate files, so you can build those compilation units with -msse4.1
and -mavx
.
It's usually a good idea to enable -march=nehalem
or -march=haswell
to also enable stuff like -mpopcnt
(if that's what you want), and more importantly to set -mtune=haswell
to optimize for a likely target CPU as well as using extensions it supports.
Related questions with more detailed answers:
- What exactly do the gcc compiler switches (-mavx -mavx2 -mavx512f) do?
- The Effect of Architecture When Using SSE / AVX Intrinisics
- How to enable instrinsic functions from the preprocessor
- Why doesn't gcc resolve _mm256_loadu_pd as single vmovupd? - why you should use
-march=haswell
instead of just-mavx2
, especially with older GCC. These days-march=x86-64-v3
is good for an AVX2+FMA+BMI2 baseline. - inlining failed in call to always_inline ‘_mm_mullo_epi32’: target specific option mismatch - a script to identify which ISA extension is necessary for an intrinsic.
Answered By - Peter Cordes Answer Checked By - David Goodson (WPSolving Volunteer)