首页 > 解决方案 > scipy的labeled_comprehension“沿轴”的模拟

问题描述

如果您有一个长度的一维numpy数组和一个具有相同形状的唯一标签的一维数组,则scipy.ndimage.measurements.labeled_comprehension允许您将通用函数应用于与每个标签对应的元素,而无需遍历整个数组. 当标签的数量相对于平面数组较大并接受平面数组时,这很有帮助。pointsnkfknf

但是,如果您想应用一个对非平面数组f进行操作的函数怎么办?在我的例子中,我试图在一组由簇注释的二维点中计算每个簇的直径。这是设置:

import numpy as np
from scipy import spatial
from scipy.ndimage import labeled_comprehension


# set up fake data
np.random.seed(0)
n_centroids = 3
centroids = np.random.rand(n_centroids, 2)
cluster_sizes = np.random.randint(3, 5, size=n_centroids)
# labels for each point for clusters ranging from 1 to n_centroids inclusive
labels = np.repeat(np.arange(n_centroids), cluster_sizes) + 1
# random points around each centroid
points = np.zeros((cluster_sizes.sum(), 2))
points[:,0] = np.repeat(centroids[:,0], cluster_sizes)
points[:,1] = np.repeat(centroids[:,1], cluster_sizes)
points += 0.05 * np.random.randn(cluster_sizes.sum(), 2)

# black box function; we can assume that
# pts.shape[0] is variable
# pts.shape[1:] is fixed
def diameter(pts):
  # print(pts)
  # need at least 3 points to construct the convex hull
  if pts.shape[0] <= 1:
    return 0
  if pts.shape[0] == 2:
    return ((pts[0] - pts[1])**2).sum()
  # two points which are fruthest apart will occur as vertices of the convex hull
  hull = spatial.ConvexHull(pts)
  candidates = pts[spatial.ConvexHull(pts).vertices]
  return spatial.distance_matrix(candidates, candidates).max()

我的申请尝试labelled_comprehension如下:

labeled_comprehension(points, np.stack((labels, labels), 1), None, diameter, np.float, None)

导致以下回溯:

 ***debug printout of argument to diameter***
 [0.54606175 0.70982384 0.61708686 0.71030458 0.42751578 0.69253658
 0.57922483 0.59353398 0.53885592 0.61675172 0.59887815 0.59936469
 0.60759051 0.61581654 0.48206846 0.69325341 0.47792915 0.76500534
 0.40335361 0.65921638]

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-79-4c2d0fdb5c38> in <module>()
----> 1 labeled_comprehension(points, np.stack((labels, labels), 1), None, diameter, np.float, None)

1 frames

<ipython-input-77-99537dd58821> in diameter(pts)
     25     return ((pts[0] - pts[1])**2).sum()
     26   # two points which are fruthest apart will occur as vertices of the convex hull
---> 27   hull = spatial.ConvexHull(pts)
     28   candidates = pts[spatial.ConvexHull(pts).vertices]
     29   return spatial.distance_matrix(candidates, candidates).max()

qhull.pyx in scipy.spatial.qhull.ConvexHull.__init__()

IndexError: tuple index out of range

如我们所见,diameter接收到一个平面数组作为输入。

我可以通过迭代屏蔽每个标签来实现所需的输出,例如

def desired_output(points, labels, f=diameter):
  return np.array([f(points[labels==i]) for i in np.unique(labels)])

# [0.1904019410968485, 0.06874095082563453, 0.12943266211372922]

但是,这需要k(簇数)越过n元素(点)。

相反,我想做这样的事情:

def desired_output2(points, labels, f=diameter):
  return labelled_comprehension_along_axis(points, labels, f=diameter, axis=-1)

问题:如何实现一个labelled_comprehension接受黑盒函数的向量化版本,该函数f对一个形状数组进行操作(k, const)并返回一个标量,而不需要O(k)对输入数组进行传递?

理想情况下,这可以扩展到const维度“元组”的情况,类似于_over_axes函数。

标签: pythonperformancenumpyscipyvectorization

解决方案


推荐阅读