首页 > 解决方案 > 未设置线性回归特征

问题描述

我正在尝试编写一些线性回归来分析我的数据。所以我使用的是scala,我基本上是这样做的,如下所示

import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.{Pipeline, PipelineModel}

val training_data_finalised = training.drop("COUNTRY_REGION", "PROVINCE_STATE", "DATE")
val featuresArray = Array("Active","Confirmed","Deaths","Recovered","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC","AVG_PARKS_CHANGE_PERC","AVG_RESIDENTIAL_CHANGE_PERC","AVG_RETAIL_AND_RECREATION_CHANGE_PERC","AVG_TRANSIT_STATIONS_CHANGE_PERC","AVG_WORKPLACES_CHANGE_PERC","Active_1_day","Active_2_day","Active_7_day","Active_14_day","Confirmed_1_day","Confirmed_2_day","Confirmed_7_day","Confirmed_14_day","Deaths_1_day","Deaths_2_day","Deaths_7_day","Deaths_14_day","Recovered_1_day","Recovered_2_day","Recovered_7_day","Recovered_14_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_1_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_2_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_7_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_14_day","AVG_PARKS_CHANGE_PERC_1_day","AVG_PARKS_CHANGE_PERC_2_day","AVG_PARKS_CHANGE_PERC_7_day","AVG_PARKS_CHANGE_PERC_14_day","AVG_RESIDENTIAL_CHANGE_PERC_1_day","AVG_RESIDENTIAL_CHANGE_PERC_2_day","AVG_RESIDENTIAL_CHANGE_PERC_7_day","AVG_RESIDENTIAL_CHANGE_PERC_14_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_1_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_2_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_7_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_14_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_1_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_2_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_7_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_14_day","AVG_WORKPLACES_CHANGE_PERC_1_day","AVG_WORKPLACES_CHANGE_PERC_2_day","AVG_WORKPLACES_CHANGE_PERC_7_day","AVG_WORKPLACES_CHANGE_PERC_14_day")

val assembler = new VectorAssembler()
  .setInputCols(featuresArray)
  .setOutputCol("features")

val lr = new LinearRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)
  .setFeaturesCol("features")   // setting features column
  .setLabelCol("Deaths")       // setting label column

val pipeline = new Pipeline().setStages(Array(assembler,lr))

//fitting the model
val lrModel = pipeline.fit(training_data_finalised.na.fill(0))

但是我如何获得系数值?

有什么建议吗?

为了补充,我尝试按照火花文档(https://spark.apache.org/docs/latest/ml-classification-regression.html)这样做

val lr = new LinearRegression()
  .setMaxIter(10)
  .setRegParam(0.3)
  .setElasticNetParam(0.8)

// Fit the model
val lrModel = lr.fit(training)

// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

但由于某种原因,这给了我一个

IllegalArgumentException: features does not exist. Available: Active, Confirmed, Deaths

标签: scalaregression

解决方案


推荐阅读