首页 > 解决方案 > TensorFlow GPU 内存配置文件

问题描述

我正在尝试使用 Tensorflow C++ 分析工具来提取一个网络在我的 GPU 上的内存占用。

经过相当长的旅程,我设法编译了所有 TF 依赖项,现在我有一个工作示例,如下所示:

auto tracer = CreateGpuTracer();
tracer->Start(); 
// inference here
tracer->Stop();

XSpace xspace;
status = tracer->CollectData(&xspace);

std::vector<const XPlane*> device_planes_gpu = FindPlanesWithPrefix(xspace, "/device:GPU:");
const XPlane* plane_gpu = device_planes_gpu[0];
MemoryProfile memory_profile = ConvertXPlaneToMemoryProfile(*plane_gpu);

std::string json_output;
google::protobuf::util::MessageToJsonString(memory_profile, &json_output);
LOG(INFO) << json_output;

我的推理运行正确,并且我有一些很好的日志告诉我捕获工作正常这里日志:

2021-03-26 15:28:19.841126: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1391] Profiler found 1 GPUs
2021-03-26 15:28:19.841822: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcupti.so.10.2
2021-03-26 15:28:20.823429: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2021-03-26 15:24:34.402625: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1513] CUPTI activity buffer flushed
2021-03-26 15:24:34.402648: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:223]  GpuTracer has collected 562 callback api events and 562 activity events. 

不幸的是,我的 MemoryProfile 感觉很空:

{"memoryProfilePerAllocator":{},"numHosts":1,"memoryIds":[]}

我尝试从“/device:GPU:”和“/host:CUPTI”捕获,但没有成功。

有人能告诉我我做错了什么吗?谢谢!

标签: c++tensorflowgpu

解决方案


推荐阅读