python - 如何检索类内的聚类计算?
问题描述
我正在试验基于 KM 的算法,即所谓的 ODKM,它使用KMeans聚类算法。
我想优雅地检索聚类信息 cluster_centers_
, labels_
cluster_score
, effect
, 。dist
class ODKM
import math
from math import pow
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
class ODKM:
def __init__(self,n_clusters=15,effectiveness=500,max_iter=2):
self.n_clusters=n_clusters
self.effectiveness=effectiveness
self.max_iter=max_iter
self.kmeans = {}
self.cluster_score = {}
def fit(self, data):
length = len(data)
for column in data.columns:
kmeans = KMeans(n_clusters=self.n_clusters,max_iter=self.max_iter)
self.kmeans[column]=kmeans
kmeans.fit(data[column].values.reshape(-1,1))
assign = pd.DataFrame(kmeans.predict(data[column].values.reshape(-1,1)),columns=['cluster'])
cluster_score=assign.groupby('cluster').apply(len).apply(lambda x:x/length)
ratio=cluster_score.copy()
sorted_centers = sorted(kmeans.cluster_centers_)
max_distance = ( sorted_centers[-1] - sorted_centers[0] )[ 0 ]
for i in range(self.n_clusters):
for k in range(self.n_clusters):
if i != k:
dist = np.abs(kmeans.cluster_centers_[i] - kmeans.cluster_centers_[k])/max_distance
effect = ratio[k]*(1/pow(self.effectiveness,dist))
cluster_score[i] = cluster_score[i]+effect
self.cluster_score[column] = cluster_score
def predict(self, data):
length = len(data)
score_array = np.zeros(length)
for column in data.columns:
kmeans = self.kmeans[ column ]
cluster_score = self.cluster_score[ column ]
assign = kmeans.predict( data[ column ].values.reshape(-1,1) )
#print(assign)
for i in range(length):
score_array[i] = score_array[i] + math.log10( cluster_score[assign[i]] )
return score_array
def fit_predict(self,data):
self.fit(data)
return self.predict(data)
测试结果:
import pandas as pd
df = pd.DataFrame(data={'attr1':[1,1,1,1,2,2,2,2,2,2,2,2,3,5,5,6,6,7,7,7,7,7,7,7,15],
'attr2':[1,1,1,1,2,2,2,2,2,2,2,2,3,5,5,6,6,7,7,7,13,13,13,14,15]})
odkm_model = ODKM(n_clusters=3, max_iter=1)
result = odkm_model.fit_predict(df)
df['ODKM_Score']= result
df
#for i in result:
# print(round(i,2))
#results
#-0.51, -0.51 , -0.51 , -0.51, -0.51, -0.51, -0.51, -0.51, -0.51, -0.51, -0.51, -0.51, -0.51
#-0.78, -0.78, -0.78, -0.78, -0.78, -0.78, -0.78
#-0.99, -0.99, -0.99, -0.99
#-1.99
所以问题是:有没有什么优雅的方法可以包含聚类信息并在我运行时返回它,class ODKM
并得到反映结果,@class ODKM
比如我们如何df['ODKM_Score']= result
为df['Cluster_labels']= result
, df['Cluster_centers']= result
,df['cluster_score']= result
在主数据框中拥有所有信息df
并为聚类结果的可视化铺平道路。
通常没有类方法脚本我会使用km.cluster_centers_
and来做到这一点km.cluster_centers_
:
n_clusters=3
km = KMeans(init='k-means++', n_clusters=n_clusters).fit(df[['Score']])
counts = np.bincount(km.labels_)
for center, count, label in zip(km.cluster_centers_, counts, range(n_clusters)):
print(center, count)
plt.bar(center, count, width=0.2, label=label)
但我想知道我是否可以在拟合和变换模型之后收集这个聚类信息,也许在类的末尾定义一个函数,名为KM_summary
:
- 预测的集群标签
km.labels_
- 集群中心
km.cluster_centers_
dist
,effect
,cluster_score
从第二个函数def fit(self, data):
score_array
从第三个功能def predict(self, data):
任何帮助将不胜感激。
解决方案
推荐阅读
- r - 从 Shinyalert 回调重新运行响应式
- javascript - Javascript 对象内部自引用
- tensorflow - Tensorflow_probability 整数类型错误
- amazon-cloudwatch - 如何在 Codebuild 中从 Cloudwatch 检索匹配的事件?
- java - java在构建时执行文件
- javascript - 选项卡显示在抽屉中,但没有发生切换
- php - 从 Twig 中的数组中检索值
- c - Bison 中运算符 inc dec 用作前缀和 posfix 的问题
- android - 使用 Vuetify PWA 模板时如何更改 Android 状态栏颜色
- xml - 使用 XSLT 重新排列 XML 同级