c++ - Why can GCC not vectorize this function and loop?
问题描述
I'm attempting to make a function SIMD-enabled and vectorize the loop with a function call.
#include <cmath>
#pragma omp declare simd
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
#pragma omp simd reduction(+: I)
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(double(i) + 0.5);
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
For the code above, if I compile it with icpc
:
icpc worker.cc -qopenmp -qopt-report=5 -c
The opt-report shows that the function and loop are both vectorized.
However, if I try to compile it with g++ 6.5
:
g++ worker.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c
The output shows note:not vectorized: control flow in loop.
and note: bad loop form
, and the loop cannot be vectorized.
How can I vectorize the loop with GCC?
EDIT :
If I write the function into a separate file,
worker.cc
:
#include "library.h"
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
#pragma omp simd reduction(+: I)
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(double(i) + 0.5);
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
library.h
:
#ifndef __INCLUDED_LIBRARY_H__
#define __INCLUDED_LIBRARY_H__
#pragma omp declare simd
double BlackBoxFunction(const double x);
#endif
and library.cc
:
#include <cmath>
#pragma omp declare simd
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
Then I compile it with GCC:
g++ worker.cc library.cc -O3 -fopenmp -fopt-info-vec-missed -funsafe-math-optimizations -c
It shows:
worker.cc:9:31: note: loop vectorized
but
library.cc:5:18: note:not vectorized: control flow in loop.
library.cc:5:18: note:bad loop form.
It makes me confused. I wonder whether it is already vectorized.
解决方案
Vectorization is possible with gcc, after some slight modifications of the code:
#include <cmath>
double BlackBoxFunction(const double x) {
return 1.0/sqrt(x);
}
double ComputeIntegral(const int n, const double a, const double b) {
const double dx = (b - a)/n;
double I = 0.0;
double d_i = 0.0;
for (int i = 0; i < n; i++) {
const double xip12 = a + dx*(d_i + 0.5);
d_i = d_i + 1.0;
const double yip12 = BlackBoxFunction(xip12);
const double dI = yip12*dx;
I += dI;
}
return I;
}
This was compiled with the compiler options: -Ofast -march=haswell -fopt-info-vec-missed -funsafe-math-optimizations
. The main loop compiles to
.L7:
vaddpd ymm2, ymm4, ymm7
inc eax
vaddpd ymm4, ymm4, ymm8
vfmadd132pd ymm2, ymm9, ymm5
vsqrtpd ymm2, ymm2
vdivpd ymm2, ymm6, ymm2
vfmadd231pd ymm3, ymm5, ymm2
cmp eax, edx
jne .L7
See the following Godbolt link
I removed the #pragma omp ...
, because they didn't improve the vectorization, but they did not made the vectorization worse either.
Note that only changing the compiler option from -O3
to -Ofast
is
sufficient to enable vectorization. Nevertheless, it is more efficient to use a double
counter than an int
counter which is converted to double each iteration.
Note also that the vectorization reports are quite misleading. Inspect the generated assembly code to see whether or not the vectorization was successful.
推荐阅读
- java - 如何使用 JSoup 打印此 HTML 表格的内容?
- postgresql - POSTGRESQL查询:有空格不返回结果,使用下划线返回结果
- email - Laravel 无法在任何 mail_encryption(tls、ssl、starttls)上建立连接主机
- python - 在 Python 中,如何计算电子邮件地址列表中的数字“1”?
- html - Ruby on Rails:使用 button_to 增加脚手架中对象的值
- php - 树枝中的 Symfony4 翻译
- mysql - 将用户添加到 RDS 实例 - MySQL 数据库
- ios - 从另一个文件调用函数时如何访问 ViewController 中的 UIView?
- java - Android P WifiRttManager 始终为空
- jsp - 在基于 Thymeleaf、Apache Velocity、Apache Freemarker 的 JSP 替代项目中使用 JSTL、标签库和自定义标签