首页 > 解决方案 > 关于将数组输入 K-Means.Fit 的问题

问题描述

我在databricks; 从SQL Server 数据库中提取数据。数据很好,我从更大的 中选择了一些数字字段dataframe,然后将其放入数组中。

在这条线上:

model = kmeans.fit(dataset)

我收到此错误:

raise ValueError("Params must be either a param map or a list/tuple of param maps, 

这是我的代码。

import tensorflow as tf
import numpy as np
from pyspark.ml.clustering import KMeans
from pyspark.ml.evaluation import ClusteringEvaluator
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.clustering import KMeans


df = spark.read \
    .jdbc("jdbc:sqlxx//sqlxx.database.windows.net:1433;databaseName=name_of_database", "dbo.name_of_table",
          properties={"user": "user", "password": "pwd"})


dataset = df.select('Rat', 'Cat', 'Coup', 'Mat', 'Pr', 'Sp', 'Co', 'Weight', 'DV')

dataset = dataset.fillna(0)

data_array =  np.array(dataset.select('Rat', 'Cat', 'Coup', 'Mat', 'Pr', 'Sp', 'Co', 'Weight', 'DV').collect())


# Loads data.
dataset = data_array

# Trains a k-means model.
kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(dataset)

# Make predictions
predictions = model.transform(dataset)

# Evaluate clustering by computing Silhouette score
evaluator = ClusteringEvaluator()

silhouette = evaluator.evaluate(predictions)
print("Silhouette with squared euclidean distance = " + str(silhouette))

# Shows the result.
centers = model.clusterCenters()
print("Cluster Centers: ")
for center in centers:
    print(center)

当我在Spyder/Anaconda中运行类似的东西时(只是对该环境稍作改动),它运行良好。一定有什么特别的东西databricks需要,但我不确定到底是什么。

代码示例来自此链接

标签: pythonpython-3.xcluster-analysisk-meansdatabricks

解决方案


推荐阅读