首页 > 解决方案 > 实现k-means聚类的Python代码,不适用于k值,超过4?

问题描述

我有 115 个空气质量传感器,我正在尝试根据 344 个位置的人口密度使用 k-means 聚类在 344 个位置部署这 115 个空气质量传感器。但是,以下代码不适用于大于 4 的 k 值。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

def update_assignments(data, centroids): # updating the assignment of the clusters
    c = []
    for i in data:
        c.append(np.argmin(np.sum((i.reshape((1, 2)) - centroids) ** 2, axis=1)))
    return c

def update_centroids(data, num_clusters, assignments):
    cen = []
    for c in range(len(num_clusters)):
        cen.append(np.mean([data[x] for x in range(len(data)) if assignments[x] == c], axis=0))
    return cen

df1 = pd.read_csv(r"F:\iitd\csv_files\west_bengal_block_population.csv") # Input data.
df1["ID"] = df1.index
df2 = pd.concat([df1["ID"], df1["pop_den"]], axis = 1)
data = df2.to_numpy() 
print(data.shape) # Rows = 344; Columns = 2; Column 1: ID of location; Column 2: Population density of location; 
# No. of locations = 344.

centroids = (np.random.normal(size=(4, 2)) * 0.0001) + np.mean(data, axis=0).reshape((1, 2)) # The value of k is 4. 
# However, I want it to be 115 as I want to know where I should deploy 115 air quality sensors across 344 locations. 
for i in range(100): # No. of iterations = 100.
    a = update_assignments(data, centroids)
    centroids = update_centroids(data, centroids, a)
    centroids = np.array(centroids)

plt.scatter(data[:, 0], data[:, 1])
plt.scatter(centroids[:, 0], centroids[:, 1])
plt.show()

我附上了我得到的错误如下:

在此处输入图像描述

我还添加了二维 numpy 数组 data 的开头。

在此处输入图像描述

这是数据的结束。

在此处输入图像描述

标签: pythonpandasnumpycluster-analysisk-means

解决方案


推荐阅读