c++ - 尝试使用 GCC9.3 和 OpenMP 卸载到 GTX-1050 时出错
问题描述
构建日志:
-------------- Clean: Release in OffloadTest (compiler: GNU GCC Compiler)---------------
Cleaned "OffloadTest - Release"
-------------- Build: Release in OffloadTest (compiler: GNU GCC Compiler)---------------
g++ -Wall -m64 -fopenmp -foffload=nvptx-none -fno-stack-protector -O2 -fopenmp -foffload=nvptx-none -fcf-protection=none -fno-stack-protector -c /home/david/CBProjects/OffloadTest/main.cpp -o obj/Release/main.o
g++ -o bin/Release/OffloadTest obj/Release/main.o -m64 -lgomp -s -lgomp
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x0): undefined reference to `__offload_func_table'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x8): undefined reference to `__offload_funcs_end'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x10): undefined reference to `__offload_var_table'
/usr/bin/ld: /tmp/ccfvsLgk.crtoffloadtable.o:(.rodata+0x18): undefined reference to `__offload_vars_end'
collect2: error: ld returned 1 exit status
Process terminated with status 1 (0 minute(s), 0 second(s))
5 error(s), 0 warning(s) (0 minute(s), 0 second(s))
我已经加载了以下内容(带有描述):
Gcc-9-offload-nvptx
Description: The package provides offloading support for NVidia PTX. OpenMP and OpenACC programs linked with -fopenmp will by default add PTX code into the binaries, which can be offloaded to NVidia PTX capable devices if available.
Gcc-offload-nvptx
Description: This package contains libgomp plugin for offloading to NVidia PTX. The plugin needs libcuda.so.1 shared library that has to be installed separately.
Nvptx-tools
Description: This tool consists of nptx-non-as: "assembler" for PTX, nvptx-none-ld: "linker" for PTX. Additionally, the following symlinks are installed: nvptx-none-ar: link to the GNU/Linux host system's ar, nvptx-none-ranlib: link to the GNU/Linux host system's ranlib
我已经验证 libcuda.so.1 位于 /lib/x86_64-linux-gnu
该脚本很简单,只是一个帮助我卸载和运行的示例。如果我取出“目标”关键字,效果很好
#include <iostream>
#include <omp.h>
using namespace std;
#define iSize 200000
long *A, *B;
int main()
{
A = new long[iSize];
B = new long[iSize];
long sum = 0;
double dStart, dEnd;
int iNumberOfDevices = omp_get_num_devices();
int iInitialDevice = omp_get_initial_device(); // device number for host computer
int iDeviceNumber = omp_get_default_device();
dStart = omp_get_wtime();
#pragma omp parallel for
for (long i=0; i<iSize; i++)
{
A[i] = i;
B[i] = i+1;
}
#pragma omp target parallel for reduction(+:sum)
for (long i=0; i<iSize; i++)
{
for (long j=0; j<iSize; j++)
{
sum += 3 * A[i] - B[j];
}
}
dEnd = omp_get_wtime();
double dtime = dEnd - dStart;
cout << "Number of devices = " << iNumberOfDevices << endl;
cout << "Device number = " << iDeviceNumber << endl;
cout << "Initial Device number (host processor) = " << iInitialDevice << endl;
cout << endl;
cout << "Sum = " << sum << endl;
cout << "Processing time = " << dtime << " Seconds" << endl;
}
任何帮助表示赞赏。
- 大卫
解决方案
要解析undefined reference
s,请指定(如果这不是默认值,则-fopenmp
可能再次指定)而不是(顺便说一句,重复)。-foffload=nvptx-none
-lgomp
我认为还缺少一些omp target data
(或类似的)指令来在设备上设置A
和B
数组?
推荐阅读
- python - 将方法添加到由另一个类创建的对象的类
- python - 如何在给定位置和日期的情况下查询历史 UTC 时区偏移
- xpath - 如何将以下xpath转换为css?
- typescript - 如何在 TypeScript 的方法上下文中重新定义变量类型?
- python - 如何迭代地附加到文本?
- assembly - 什么是 0x0000000000001413 <+244>: cmp DWORD PTR [rbp-0x70],eax mean
- python - Nginx error recv() failed (14: Bad address) while waiting for request
- django - Django的角度问题
- linux - 将一个 bash 脚本的输出视为另一个 bash 脚本中的文件
- flutter - 我在 VSCode 上调试和构建 Flutter 项目时遇到此错误