首页 > 解决方案 > 如何将镶木地板文件转换为 libsvm 格式?

问题描述

我有一个parquet文件(id,features)我想将它转换为libsvm以应用随机森林算法。parquet 文件没有标签字段来应用算法。

val data = spark.read.format("parquet").load("file:///usr/local/spark/dataset/data/user")

val splits = data.randomSplit(Array(0.7, 0.3))

    val (trainingData, testData) = (splits(0), splits(1))

    val trainingDf = trainingData.toDF()
    val pca = new PCA()
    .setInputCol("features")
    .setOutputCol("pcaFeatures")
    .setK(2)
    .fit(assembled_df)

    val pcaTrainingData = pca.transform(assembled_df)
val labeled = pca.transform(assembled_df).rdd.map(row => LabeledPoint(
   row.getAs[Double]("label"),   
   row.getAs[org.apache.spark.mllib.linalg.Vector]("pcaFeatures")
))
    val numClasses = 10
    val categoricalFeaturesInfo = Map[Int, Int]()
    val numTrees = 10 // Use more in practice.
    val featureSubsetStrategy = "auto" // Let the algorithm choose.
    val impurity = "gini"
    val maxDepth = 20
    val maxBins = 32

    val model = RandomForest.trainClassifier(labeled, numClasses, categoricalFeaturesInfo,
        numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins)

但出现异常

java.lang.IllegalArgumentException:字段“标签”不存在。

有什么帮助吗?

标签: scalaapache-sparkrandom-forest

解决方案


推荐阅读