首页 > 解决方案 > 使用 Numpy 进行 Kmean 聚类

问题描述

我是机器学习的新手,想建立一个 k = 2 的 Kmean 算法,我正在努力计算新的质心。这是我的kmeans代码:

def euclidean_distance(x: np.ndarray, y: np.ndarray):
   # x shape: (N1, D)
   # y shape: (N2, D)
   # output shape: (N1, N2)
    dist = []
    for i in x:
       for j in y:
        new_list = np.sqrt(sum((i - j) ** 2))
        dist.append(new_list)
    distance = np.reshape(dist, (len(x), len(y)))
    return distance

def kmeans(x, centroids, iterations=30):
    assignment = None
    for i in iterations:
        dist = euclidean_distance(x, centroids)
        assignment = np.argmin(dist, axis=1)

    for c in range(len(y)):
        centroids[c] = np.mean(x[assignment == c], 0) #error here
    
        return centroids, assignment

我有输入x = [[1., 0.], [0., 1.], [0.5, 0.5]]y = [[1., 0.], [0., 1.]]并且 distance是一个数组,看起来像这样:

[[0.         1.41421356]
[1.41421356 0.         ]
[0.70710678 0.70710678]]

当我运行时,kmeans(x,y)它返回错误:

-------------------------------------------------- ------------------------- TypeError Traceback (最近一次调用最后) /tmp/ipykernel_40086/2170434798.py in 5 6 for c in range(len (y)):

----> 7 centroids[c] = (x[classes == c], 0) 8 print(centroids)

TypeError:只有整数标量数组可以转换为标量索引

有谁知道如何修复它或改进我的代码?先感谢您!

标签: pythonnumpyk-means

解决方案


将输入更改为 NumPy 数组应该可以消除错误:

x = np.array([[1., 0.], [0., 1.], [0.5, 0.5]])
y = np.array([[1., 0.], [0., 1.]])

似乎您必须更改for i in iterations为功能for i in range(iterations)kmeans


推荐阅读