python - 如何循环遍历多个sklearn分类模型?
问题描述
我试图弄清楚如何将我的数据集输入到几个 scikit 分类模型中。
当我运行代码时,出现以下错误:
Traceback (most recent call last):
File "<ipython-input-515-9a3302837c99>", line 3, in <module>
X, y = dataset
ValueError: too many values to unpack (expected 2)
这是我的代码。
X = np.asarray([np.asarray(df['LRMScore']),np.asarray(df['Spread'])]).T
import time
import numpy as np
import matplotlib.pyplot as plt
from sklearn import cluster, datasets
from sklearn.neighbors import kneighbors_graph
from sklearn.preprocessing import StandardScaler
np.random.seed(0)
colors = np.array([x for x in 'bgrcmykbgrcmykbgrcmykbgrcmyk'])
colors = np.hstack([colors] * 20)
clustering_names = [
'MiniBatchKMeans', 'AffinityPropagation', 'MeanShift',
'SpectralClustering', 'Ward', 'AgglomerativeClustering',
'DBSCAN', 'Birch']
plt.figure(figsize=(len(clustering_names) * 2 + 3, 9.5))
plt.subplots_adjust(left=.02, right=.98, bottom=.001, top=.96, wspace=.05,
hspace=.01)
plot_num = 1
datasets = [X]
for i_dataset, dataset in enumerate(datasets):
X, y = dataset
# normalize dataset for easier parameter selection
X = StandardScaler().fit_transform(X)
# estimate bandwidth for mean shift
bandwidth = cluster.estimate_bandwidth(X, quantile=0.3)
# connectivity matrix for structured Ward
connectivity = kneighbors_graph(X, n_neighbors=10, include_self=False)
# make connectivity symmetric
connectivity = 0.5 * (connectivity + connectivity.T)
# create clustering estimators
ms = cluster.MeanShift(bandwidth=bandwidth, bin_seeding=True)
two_means = cluster.MiniBatchKMeans(n_clusters=2)
ward = cluster.AgglomerativeClustering(n_clusters=2, linkage='ward',
connectivity=connectivity)
spectral = cluster.SpectralClustering(n_clusters=2,
eigen_solver='arpack',
affinity="nearest_neighbors")
dbscan = cluster.DBSCAN(eps=.2)
affinity_propagation = cluster.AffinityPropagation(damping=.9,
preference=-200)
average_linkage = cluster.AgglomerativeClustering(
linkage="average", affinity="cityblock", n_clusters=2,
connectivity=connectivity)
birch = cluster.Birch(n_clusters=2)
clustering_algorithms = [
two_means, affinity_propagation, ms, spectral, ward, average_linkage,
dbscan, birch]
for name, algorithm in zip(clustering_names, clustering_algorithms):
# predict cluster memberships
t0 = time.time()
algorithm.fit(X)
t1 = time.time()
if hasattr(algorithm, 'labels_'):
y_pred = algorithm.labels_.astype(np.int)
else:
y_pred = algorithm.predict(X)
# plot
plt.subplot(4, len(clustering_algorithms), plot_num)
if i_dataset == 0:
plt.title(name, size=18)
plt.scatter(X[:, 0], X[:, 1], color=colors[y_pred].tolist(), s=10)
if hasattr(algorithm, 'cluster_centers_'):
centers = algorithm.cluster_centers_
center_colors = colors[:len(centers)]
plt.scatter(centers[:, 0], centers[:, 1], s=100, c=center_colors)
plt.xlim(-2, 2)
plt.ylim(-2, 2)
plt.xticks(())
plt.yticks(())
plt.text(.99, .01, ('%.2fs' % (t1 - t0)).lstrip('0'),
transform=plt.gca().transAxes, size=15,
horizontalalignment='right')
plot_num += 1
plt.show()
我的 X 变量由数据框的两列组成,看起来像这样。
array([[ 8. , 0.06],
[ 8. , 0.06],
[ 8. , 0.06],
...,
[10. , 0.01],
[ 8. , 0.03],
[ 9.75, 0.06]])
这些数据集由两个数组组成:X 和 Y。
noisy_circles = datasets.make_circles(n_samples=n_samples, factor=.5,
noise=.05)
noisy_moons = datasets.make_moons(n_samples=n_samples, noise=.05)
blobs = datasets.make_blobs(n_samples=n_samples, random_state=8)
no_structure = np.random.rand(n_samples, 2), None
我的数据集由一个数组组成。那就是问题所在。我想我的设置必须稍有不同,但我不确定那会是什么样子。
我从下面的链接中获得了代码。
https://scikit-learn.org/0.18/auto_examples/cluster/plot_cluster_comparison.html
解决方案
由于您的X
数组有两列,因此您需要对其进行转置以使用值解包:
x, y = dataset.T
推荐阅读
- javascript - 如何将自定义属性添加到文件表单 javascript / jquery
- laravel - 如何将 FormRequest 用于 API JSON 请求?
- batch-file - .bat 文件夹锁定:如何在 Cmd 中更改密码
- c# - 如何在一组 WinForms 控件中管理失去焦点?
- java - 带有可选 2-4 年组件的 Java 8 日期时间解析
- c - 在c中调用write命令时传递整数代替文件描述符
- java - 寻找不寻常的模板引擎来生成海量数据
- php - PHP Sendgrid - 发件人姓名未在邮件中发送
- jquery - jQuery Marquee 滚动文本在 Bootstrap 模式中不起作用
- javascript - 当我切换到另一个jsp文件时,JSP从另一个会话中获取数据