Issue
I try to introduce OpenMP to my c++ code to improve the performance using a simple case as shown:
#include <omp.h>
#include <chrono>
#include <iostream>
#include <cmath>
using std::cout;
using std::endl;
#define NUM 100000
int main()
{
double data[NUM] __attribute__ ((aligned (128)));;
#ifdef _OPENMP
auto t1 = omp_get_wtime();
#else
auto t1 = std::chrono::steady_clock::now();
#endif
for(long int k=0; k<100000; ++k)
{
#pragma omp parallel for schedule(static, 16) num_threads(4)
for(long int i=0; i<NUM; ++i)
{
data[i] = cos(sin(i*i+ k*k));
}
}
#ifdef _OPENMP
auto t2 = omp_get_wtime();
auto duration = t2 - t1;
cout<<"OpenMP Elapsed time (second): "<<duration<<endl;
#else
auto t2 = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(t2 - t1).count();
cout<<"No OpenMP Elapsed time (second): "<<duration/1e6<<endl;
#endif
double tempsum = 0.;
for(long int i=0; i<NUM; ++i)
{
int nextind = (i == 0 ? 0 : i-1);
tempsum += i + sin(data[i]) + cos(data[nextind]);
}
cout<<"Raw data sum: "<<tempsum<<endl;
return 0;
}
Access to a tightly looped int array (size = 10000) and change its elements in either parallel or non-parallel way.
Build as
g++ -o test test.cpp
or
g++ -o test test.cpp -fopenmp
The program reported results as:
No OpenMP Elapsed time (second): 427.44
Raw data sum: 5.00009e+09
OpenMP Elapsed time (second): 113.017
Raw data sum: 5.00009e+09
Intel 10th CPU, Ubuntu 18.04, GCC 7.5, OpenMP 4.5.
I suspect that the false sharing in the cache line leads to the bad performance of the OpenMP version code.
I update the new test results after increasing the loop size, the OpenMP runs faster as expected.
Thank you!
Solution
- Since you're writing C++, use the C++ random number generator, which is threadsafe, unlike the C legacy one you're using.
- Also, you're not using your data array, so the compiler is actually at liberty to remove your loop completely.
- You should touch all your data once before you do the timed loop. That way you ensure that pages are instantiated and data is in or out of cache depending.
- Your loop is pretty short.
Answered By - Victor Eijkhout Answer Checked By - David Marino (WPSolving Volunteer)