python-3.x - Numpy/Numba 在将非常大的空集分配给 CUDA 时引发错误
问题描述
我正在用 Numba/Numpy 编写一个 Mandelbrot 集生成器。其中一项优化是使用 cudatoolkit 通过 Numba 将计算推送到 CUDA。该脚本适用于低分辨率集,但是在尝试计算大集时会出错。
import numpy as np
from pylab import imshow, show
import time
from numba import cuda
from numba import *
import matplotlib
def mandel(x, y, max_iters):
c = complex(x, y)
z = 0.0j
for i in range(max_iters):
z = z*z + c
if (z.real*z.real + z.imag*z.imag) >= 4:
return i
return max_iters
mandel_gpu = cuda.jit(device=True)(mandel)
@cuda.jit
def mandel_kernel(min_x, max_x, min_y, max_y, image, iters):
height = image.shape[0]
width = image.shape[1]
pixel_size_x = (max_x - min_x) / width
pixel_size_y = (max_y - min_y) / height
startX, startY = cuda.grid(2)
gridX = cuda.gridDim.x * cuda.blockDim.x;
gridY = cuda.gridDim.y * cuda.blockDim.y;
for x in range(startX, width, gridX):
real = min_x + x * pixel_size_x
for y in range(startY, height, gridY):
imag = min_y + y * pixel_size_y
image[y, x] = mandel_gpu(real, imag, iters) / iters
gimage = np.zeros((65536, 65536), dtype = np.uint8)
#gimage = np.zeros((1024, 1024), dtype = np.uint8)
blockdim = (32, 8)
griddim = (32,16)
start = time.time()
d_image = cuda.to_device(gimage)
mandel_kernel[griddim, blockdim](-2.0, 2.0, -2.0, 2.0, d_image, 10000)
d_image.to_host()
dt = time.time() - start
print ("Mandelbrot created in " + str(dt) + " seconds")
imshow(gimage, 'gray')
show()
#matplotlib.image.imsave("mandel.png", gimage)
超过 46000 x 46000 像素,python 会引发以下错误:
Traceback (most recent call last):
File "C:\_main\Files\Mandel\mandel_cuda.py", line 46, in <module>
d_image = cuda.to_device(gimage)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\api.py", line 103, in to_device
to, new = devicearray.auto_device(obj, stream=stream, copy=copy)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 688, in auto_device
devobj.copy_to_device(obj, stream=stream)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 181, in copy_to_device
sentry_contiguous(self)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 657, in sentry_contiguous
core = array_core(ary)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 647, in array_core
return ary[tuple(core_index)]
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devices.py", line 212, in _require_cuda_context
return fn(*args, **kws)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 479, in __getitem__
return self._do_getitem(item)
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\devicearray.py", line 493, in _do_getitem
newdata = self.gpu_data.view(*extents[0])
File "C:\ProgramData\Anaconda3\lib\site-packages\numba\cuda\cudadrv\driver.py", line 1227, in view
raise RuntimeError("non-empty slice into empty slice")
RuntimeError: non-empty slice into empty slice
该脚本在具有 4GB VRAM 的 1050ti 上运行。对于 46000 x 46000 像素,显存使用量仅为 2.1GB。应该有足够的 VRAM 用于 46000 x 46000 以上的渲染。
解决方案
似乎是 VRAM 溢出问题。在渲染的前 30 秒,更多的 VRAM 用于存储空集。初始化时,很快就达到了 4GB 的限制,导致脚本崩溃。
推荐阅读
- java - 如果在 JAVA 中后递减的优先级高于前递增,那么为什么以下代码输出为 22.0 而不是 20.0 ?
- android - Google Play 游戏 - 无法登录 - 加载播放器
- firebase - 任何 API 可以将授权域添加到 Firebase 身份验证?
- algorithm - 多算法程序的时间复杂度
- python - 从外部附加函数调用 `model.predict()`
- in-app-purchase - 应用商店上的应用记录可疑的应用内购买
- json - IOS:style.json文件加载完毕,但谷歌地图不显示;仅显示引脚
- rust - 编译为 wasm 时替代 ctor/inventory?
- javascript - 根据传递的对象更新对象数组
- c# - 在进程 C# .NET 中模拟 CTRL+C