首页 > 解决方案 > 删除几乎相同的行 numpy 数组

问题描述

如果我有以下 numpy 数组:

import numpy as np

arr = np.array([[285, 849],
                [399, 715],
                [399, 716],
                [400, 715],
                [400, 716]])

我将如何删除几乎相同的行?我不介意我是否以行, 或[399, 715].[399, 716]结尾。例如,作为最终结果,我想得到:[400, 715][400, 716]

out = remove_near_identical(arr)
print(out)

[[285 849]
 [399 715]]

标签: pythonnumpy

解决方案


仅基于距离的方法:

import numpy as np
from scipy.spatial.distance import deist

arr = np.array([[285, 849],
                [399, 715],
                [399, 716],
                [400, 715],
                [400, 716]])

# get distances between every set of points
dists = cdist(arr, arr)
dists[np.isclose(dists, 0)] = np.inf # set 0 (self) distances to be large, ie. ignore

# get indices of points less than some threshold value (too close)
i, j = np.where(dists <= 1)
# get the unique indices from either i or j
# and delete all but one of these points from the original array
np.delete(arr, np.unique(i)[1:], axis=0)
>>> array([[285, 849],
           [399, 715]])

推荐阅读