首页 > 解决方案 > 如何确保在 OpenCV T-API 函数调用后实际释放 GPU 内存?

问题描述

OpenCV 的 T-API ( UMats) 在设计上是异步的:任务在后台运行,直到

通过查看源代码,我的印象是至少在某些情况下内存释放也是异步的。如果在循环中执行消耗内存的调用,即使cv::ocl::finish()显式调用,这也会导致分配失败。例如:

#include <opencv2/opencv.hpp>
#include <opencv2/core/ocl.hpp>
#include <iostream>

int main()
{
    auto const nLoops = 5;
    auto const imageWidth = 46340;  // Image size ~ 2 GiB.

    for (int iLoop = 0; iLoop < nLoops; ++iLoop)
    {
        std::cout << "Loop " << iLoop << " begins.\n";
        {
            // bigImage will be destroyed as soon as it is out of scope.
            auto const bigImage = cv::UMat::zeros(imageWidth, imageWidth, CV_8UC1);
        }
        cv::ocl::finish();
        std::cout << "\n";
    }

    std::cout << "Success!";
}

这里昂贵的功能是cv::UMat::zeros(),我观察cv::fastNlMeansDenoising().

运行上面的例子

我明白了

Loop 0 begins.
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (888) cv::ocl::haveOpenCL Initialize OpenCL runtime...
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (430) cv::ocl::OpenCLBinaryCacheConfigurator::OpenCLBinaryCacheConfigurator Successfully initialized OpenCL cache directory: C:\Users\ANGELO~1.PER\AppData\Local\Temp\opencv\4.1\opencl_cache\
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (454) cv::ocl::OpenCLBinaryCacheConfigurator::prepareCacheDirectoryForContext Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_1050--446_14

Loop 1 begins.
OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueNDRangeKernel('set', dims=2, globalsize=11776x46344x1, localsize=NULL) sync=false
OpenCV(4.1.1) Error: Unknown error code -220 (OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueReadBuffer(q, handle=000001C44EA8A970, CL_TRUE, 0, sz=2147395600, data=000001C491D93080, 0, 0, 0)) in cv::ocl::OpenCLAllocator::map, file C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp, line 5089
OpenCV: terminate handler is called! The last OpenCV error is:
OpenCV(4.1.1) Error: Unknown error code -220 (OpenCL error CL_MEM_OBJECT_ALLOCATION_FAILURE (-4) during call: clEnqueueReadBuffer(q, handle=000001C44EA8A970, CL_TRUE, 0, sz=2147395600, data=000001C491D93080, 0, 0, 0)) in cv::ocl::OpenCLAllocator::map, file C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp, line 5089

UMat一个hacky的解决方法是在调用之后分配一个小cv::ocl::finish():如果我理解正确,清理队列在分配期间被刷新。UMat

[…]
        }
        cv::ocl::finish();
        // It seems that an allocation flushes the cleanup queue!
        auto const cleanupQueueFlusher = cv::UMat::zeros(1, 1, CV_8UC1);
        std::cout << "\n";
    }
[…]

输出:

Loop 0 begins.
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (888) cv::ocl::haveOpenCL Initialize OpenCL runtime...
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (430) cv::ocl::OpenCLBinaryCacheConfigurator::OpenCLBinaryCacheConfigurator Successfully initialized OpenCL cache directory: C:\Users\ANGELO~1.PER\AppData\Local\Temp\opencv\4.1\opencl_cache\
[ INFO:0] global C:\tools\vcpkg\buildtrees\opencv4\src\4.1.1-fb9e10326a.clean\modules\core\src\ocl.cpp (454) cv::ocl::OpenCLBinaryCacheConfigurator::prepareCacheDirectoryForContext Preparing OpenCL cache configuration for context: NVIDIA_Corporation--GeForce_GTX_1050--446_14

Loop 1 begins.

Loop 2 begins.

Loop 3 begins.

Loop 4 begins.

Success!

确保在 T-API 函数调用后实际释放未使用的内存的正确/官方/API 提供的方法是什么?

相关的GitHub 问题在OpenCV 论坛上交叉发布。

标签: c++opencvgpu

解决方案


推荐阅读