首页 > 解决方案 > 如何从延迟的 dask 数组中获取特定的块?

问题描述

假设我有以下数组:

import numpy as np
import dask.array as da
import dask

arr_x = list(range(0,100))
arr_y = list(range(0,100))
arr = np.stack([arr_x,arr_y])
arr = arr.T

然后我想将它转换为延迟的 dask 数组:

arr = da.from_array(arr,chunks = (3,2))
data = arr.to_delayed()

它具有以下格式:

[[Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 0, 0))]
 [Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 1, 0))]
 [Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 2, 0))]

...
 [Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 31, 0))]
 [Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 32, 0))]
 [Delayed(('array-fa3499f6a402676a68a198bef8016ec4', 33, 0))]]

现在我想得到一个特定的块:

chunk = da.from_delayed(data[1], shape=(3,2))
print(chunk.compute())

但是,我收到以下错误:

dsk = {(name,) + (0,) * len(shape): value.key}

AttributeError: 'numpy.ndarray' object has no attribute 'key'

我究竟做错了什么?

标签: pythonarraysiteratordask

解决方案


dask.array.Array.to_delayed()返回二维数组的列表。您需要一直切到延迟对象以将其传递给 from_delayed()

In [5]: chunk = da.from_delayed(data[1][0], shape=(3,2), dtype=arr.dtype)

In [6]: chunk.compute()
Out[6]:
array([[3, 3],
       [4, 4],
       [5, 5]])

这也可以写成

In [11]: arr.blocks[1, 0].compute()
Out[11]:
array([[3, 3],
       [4, 4],
       [5, 5]])

推荐阅读