scala - 未设置线性回归特征
问题描述
我正在尝试编写一些线性回归来分析我的数据。所以我使用的是scala,我基本上是这样做的,如下所示
import org.apache.spark.ml.regression.LinearRegression
import org.apache.spark.ml.regression.LinearRegressionModel
import org.apache.spark.ml.feature.VectorAssembler
import org.apache.spark.ml.{Pipeline, PipelineModel}
val training_data_finalised = training.drop("COUNTRY_REGION", "PROVINCE_STATE", "DATE")
val featuresArray = Array("Active","Confirmed","Deaths","Recovered","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC","AVG_PARKS_CHANGE_PERC","AVG_RESIDENTIAL_CHANGE_PERC","AVG_RETAIL_AND_RECREATION_CHANGE_PERC","AVG_TRANSIT_STATIONS_CHANGE_PERC","AVG_WORKPLACES_CHANGE_PERC","Active_1_day","Active_2_day","Active_7_day","Active_14_day","Confirmed_1_day","Confirmed_2_day","Confirmed_7_day","Confirmed_14_day","Deaths_1_day","Deaths_2_day","Deaths_7_day","Deaths_14_day","Recovered_1_day","Recovered_2_day","Recovered_7_day","Recovered_14_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_1_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_2_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_7_day","AVG_GROCERY_AND_PHARMACY_CHANGE_PERC_14_day","AVG_PARKS_CHANGE_PERC_1_day","AVG_PARKS_CHANGE_PERC_2_day","AVG_PARKS_CHANGE_PERC_7_day","AVG_PARKS_CHANGE_PERC_14_day","AVG_RESIDENTIAL_CHANGE_PERC_1_day","AVG_RESIDENTIAL_CHANGE_PERC_2_day","AVG_RESIDENTIAL_CHANGE_PERC_7_day","AVG_RESIDENTIAL_CHANGE_PERC_14_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_1_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_2_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_7_day","AVG_RETAIL_AND_RECREATION_CHANGE_PERC_14_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_1_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_2_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_7_day","AVG_TRANSIT_STATIONS_CHANGE_PERC_14_day","AVG_WORKPLACES_CHANGE_PERC_1_day","AVG_WORKPLACES_CHANGE_PERC_2_day","AVG_WORKPLACES_CHANGE_PERC_7_day","AVG_WORKPLACES_CHANGE_PERC_14_day")
val assembler = new VectorAssembler()
.setInputCols(featuresArray)
.setOutputCol("features")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
.setFeaturesCol("features") // setting features column
.setLabelCol("Deaths") // setting label column
val pipeline = new Pipeline().setStages(Array(assembler,lr))
//fitting the model
val lrModel = pipeline.fit(training_data_finalised.na.fill(0))
但是我如何获得系数值?
有什么建议吗?
为了补充,我尝试按照火花文档(https://spark.apache.org/docs/latest/ml-classification-regression.html)这样做
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
// Fit the model
val lrModel = lr.fit(training)
// Print the coefficients and intercept for linear regression
println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")
但由于某种原因,这给了我一个
IllegalArgumentException: features does not exist. Available: Active, Confirmed, Deaths
解决方案
推荐阅读
- css - 使用 CSS 影响另一个外部元素
- python - Pandas 数据帧乘法,其中数据帧具有不同的矩阵
- sql - 静态和可变日期之间的 SQL 运行总计
- javascript - 你怎么把它变成这个?
- python - Django 脆形式 FormHelper Field() 不接受“密码”作为 HTML attr 类型
- java - 将txt文件保存到二维矩阵
- php - 如何正确使用rtrim函数在PHP中删除字符串末尾的字符
- database - 用于“实时”数据流的数据库?
- javascript - 使用 react-testing-library 和 jest 测试是否调用了 prop 函数
- python - 如何在 Windows 上获取所有自动启动程序?