linux - 如何设置 OpenMP 以使用整个超线程进行并行处理?
问题描述
请帮助我,我想在我的程序中使用 OpenMP 进行所有线程的并行处理。我设置它同样如下:
#pragma omp parallel
{
omp_set_num_threads(272);
region my_routine processing;
}
当我执行它时,我使用编译器“top”来检查 CPU 使用的性能,只是有时它存档 6800%(几乎低于 5500%)——它不稳定。我希望它在我的程序执行期间保持稳定(总是存档 6800%)。
使用 OpenMP 哪里出了问题,或者我们有任何其他方法可以使用整个线程?
非常感谢。
这是我的平台:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 272
On-line CPU(s) list: 0-271
Thread(s) per core: 4
Core(s) per socket: 68
Socket(s): 1
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 87
Model name: Intel(R) Xeon Phi(TM) CPU 7250 @ 1.40GHz
Stepping: 1
CPU MHz: 1392.507
BogoMIPS: 2799.81
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 1024K
NUMA node0 CPU(s): 0-271
NUMA node1 CPU(s):
解决方案
第 0 步:安全第一:
与您的集群提供商 HPC 支持团队核实,他们是否认为在其拥有/运营的集群设备上设置如此高级别的工作负载无害。
第 1 步:设置“冒烟”飞行测试
准备lstopo
命令(apt-get
或要求管理员修复此问题,如有必要)并将系统的 NUMA 拓扑图创建为 pdf 文件并在此处发布
准备htop
命令(apt-get
或要求如有必要,请您的管理员解决此问题),将
设置
配置为显示到左侧面板,
设置为显示到左侧面板,设置为至少显示列字段F2METERS
CPUs (1&2/4): first half in 2 shorter columns
MONITOR
METERS
CPUs (3&4/4): second half in 2 shorter columns
MONITOR
COLUMNS
{ PPID, PID, TGID, CPU, CPU%, STATUS, command }
第 2 步:运行htop
-monitor,运行编译后的 OpenMP 代码
期望在终端 CLI 上出现高于此值的内容,但htop
-monitor 将比任何单个数字更好地显示 NUMA-CPU-workloads 的实时场景景观:
Real time: 23.027 s
User time: 45.337 s
Sys. time: 0.047 s
Exit code: 0
stdout
将阅读此内容:
WARMUP: OpenMP thread[ 0] instantiated as thread[ 0]
WARMUP: OpenMP thread[ 3] instantiated as thread[ 3]
...
WARMUP: OpenMP thread[272] instantiated as thread[272]
my_routine(): thread[ 0] START_TIME( 2078891848 )
my_routine(): thread[ 2] START_TIME( -528891186 )
...
my_routine(): thread[ 2] ENDED_TIME( 635748478 ) sum = 1370488.801186
HOT RUN: in thread[ 2] my_routine() returned 10.915321 ....
my_routine(): thread[ 4] ENDED_TIME( -1543969584 ) sum = 1370489.030301
HOT RUN: in thread[ 4] my_routine() returned 11.133672 ....
my_routine(): thread[ 1] ENDED_TIME( -213996360 ) sum = 1370489.060176
HOT RUN: in thread[ 1] my_routine() returned 11.158897 ....
...
my_routine(): thread[ 0] ENDED_TIME( -389214506 ) sum = 1370489.079366
HOT RUN: in thread[270] my_routine() returned 11.149798 ....
my_routine(): thread[ 3] ENDED_TIME( -586400566 ) sum = 1370489.125829
HOT RUN: in thread[269] my_routine() returned 11.091430 ....
OpenMP ver(201511)...finito
#include <omp.h> // ------------------------------------ compile flags: -fopenmp -O3
#include <stdio.h>
#define MAX_COUNT 999999999
#define MAX_THREADS 272
double my_routine()
{
printf( "my_routine(): thread[%3d] START_TIME( %20d )\n", omp_get_thread_num(), omp_get_wtime() );
double temp = omp_get_wtime(),
sum = 0;
for ( int count = 0; count < MAX_COUNT; count++ )
{
sum += ( omp_get_wtime() - temp ); temp = omp_get_wtime();
}
printf( "my_routine(): thread[%3d] ENDED_TIME( %20d ) sum = %15.6f\n", omp_get_thread_num(), omp_get_wtime(), sum );
return( sum );
}
void warmUp() // -------------------------------- prevents performance skewing in-situ
{ // NOP-alike payload, yet enforces all thread-instantiations to happen
#pragma omp parallel for num_threads( MAX_THREADS )
for ( int i = 0; i < MAX_THREADS; i++ )
printf( "WARMUP: OpenMP thread[%3d] instantiated as thread[%3d]\n", i, omp_get_thread_num() );
}
int main( int argc, char **argv )
{
omp_set_num_threads( MAX_THREADS ); warmUp(); // ---------- pre-arrange all threads
#pragma omp parallel for
for ( int i = 0; i < MAX_THREADS; i++ )
printf( "HOT RUN: in thread[%3d] my_routine() returned %34.6f ....\n", omp_get_thread_num(), my_routine() );
printf( "\nOpenMP ver(%d)...finito", _OPENMP );
}
推荐阅读
- javascript - html表格中多个单元格的自定义金额减法
- ios - 终止的应用程序未使用 iBeacon 数据包唤醒
- c# - C# 类中的静态实例变量
- spring-boot - 如何禁用 Spring Boot 验证工厂
- anaconda - NotWritableError conda 安装新模块
- vue.js - 从 iframe 内的页面发出的 Nuxt.js 全局事件对父页面不可用
- elasticsearch - Liferay 7 搜索不返回结果 - 自定义实体
- python - python安装speex dsp模块错误
- python - 如何在python中使用正则表达式将子字符串替换为另一个包含该子字符串的字符串
- javascript - 如何在 redis 中找到部分匹配的值并更新它是否已经存在?