首页 > 解决方案 > kmeans pyspark org.apache.spark.SparkException:作业因阶段失败而中止

问题描述

我想在我的基础上使用 k-means(670 万行和 22 个变量),

base.dtypes

 ('anonimisation2', 'double'),
 ('anonimisation3', 'double'),
 ('anonimisation4', 'double'),
 ('anonimisation5', 'double'),
 ('anonimisation6', 'double'),
 ('anonimisation7', 'double'),
 ('anonimisation8', 'double'),
 ('anonimisation9', 'double'),
 ('anonimisation10', 'double'),
 ('anonimisation11', 'double'),
 ('anonimisation12', 'double'),
 ('anonimisation13', 'double'),
 ('anonimisation14', 'double'),
 ('anonimisation15', 'double'),
 ('anonimisation16', 'double'),
 ('anonimisation17', 'double'),
 ('anonimisation18', 'double'),
 ('anonimisation19', 'double'),
 ('anonimisation20', 'double'),
 ('anonimisation21', 'double'),
 ('anonimisation22', 'double')]

我读到我应该使用这段代码:

def transData(base):
    return base.rdd.map(lambda r: [Vectors.dense(r[:-1])]).toDF(['features'])
transformed= transData(base)
transformed.show(5, False)

然后我写了这个:

kmeans = KMeans().setK(2).setSeed(1)
model = kmeans.fit(transformed)

我有这个错误:

IllegalArgumentException: 'requirement failed: Column features must be of type equal to one of the following types: [struct<type:tinyint,size:int,indices:array<int>,values:array<double>>, array<double>, array<float>] but was actually of type struct<type:tinyint,size:int,indices:array<int>,values:array<double>>.'

不知道该怎么办?如果您想了解更多信息,请询问谢谢

我尝试在 Pandas 上使用 python,但我也有问题

标签: apache-sparkpysparkk-means

解决方案


使用from pyspark.ml.linalg import Vectors代替from pyspark.mllib.linalg import Vectors


推荐阅读