python-3.x - AttributeError:“LogisticRegressionTrainingSummary”对象没有属性“areaUnderROC”
问题描述
我想为我的机器学习模型运行ROC测试下的区域,但是弹出属性错误。以下是我的完整代码,其中包含错误详细信息。我已经在飞行中拥有字符串索引器、一个热编码器和矢量汇编器。请参考下面的完整代码:
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
import pyspark.sql.types as T
spark = SparkSession.builder.getOrCreate()
df=spark.read.csv("2018-2010_import.csv",inferSchema=True,header=True)
train, test = df.randomSplit([0.7, 0.3], seed=7)
print(f"Train set length: {train.count()} records")
print(f"Test set length: {test.count()} records")
train.dtypes
catCols = [x for (x, dataType) in train.dtypes if dataType == "string"]
numCols = [
x for (x, dataType) in train.dtypes if ((dataType == "double") & (x != "HSCode"))
]
print(numCols)
print(catCols)
train.agg(F.countDistinct("Commodity","Country")).show()
train.groupBy("Commodity","Country").count().show()
from pyspark.ml.feature import (
OneHotEncoder,
StringIndexer,
)
string_indexer = [
StringIndexer(inputCol=x, outputCol=x + "_StringIndexer", handleInvalid="skip")
for x in catCols
]
one_hot_encoder = [
OneHotEncoder(
inputCols=[f"{x}_StringIndexer" for x in catCols],
outputCols=[f"{x}_OneHotEncoder" for x in catCols],
)
]
from pyspark.ml.feature import VectorAssembler
assemblerInput = [x for x in numCols]
assemblerInput += [f"{x}_OneHotEncoder" for x in catCols]
vector_assembler = VectorAssembler(
inputCols=assemblerInput, outputCol="VectorAssembler_features", handleInvalid="skip"
)
stages = []
stages += string_indexer
stages += one_hot_encoder
stages += [vector_assembler]
from pyspark.ml import Pipeline
pipeline = Pipeline().setStages(stages)
model = pipeline.fit(train)
pp_df = model.transform(test)
pp_df.select(
"HSCode", "Commodity", "value", "Country", "VectorAssembler_features",
).show(truncate=False)
from pyspark.ml.classification import LogisticRegression
data = pp_df.select(
F.col("VectorAssembler_features").alias("features"),
F.col("HSCode").alias("label"),
)
model = LogisticRegression().fit(data)
model_summary.areaUnderROC
AttributeError Traceback(最近一次调用最后)C:\Users\AZMANM~1\AppData\Local\Temp/ipykernel_4856/3039136250.py in ----> 1 model_summary.areaUnderROC AttributeError: 'LogisticRegressionTrainingSummary' 对象没有属性 'areaUnderROC'
model.summary.pr.show()
AttributeError Traceback(最近一次调用最后)C:\Users\AZMANM~1\AppData\Local\Temp/ipykernel_4856/3388404637.py in ----> 1 model.summary.pr.show()
AttributeError:“LogisticRegressionTrainingSummary”对象没有属性“pr”
解决方案
您将需要使用 BinaryClassificationEvaluator。在训练测试拆分后,我将训练集命名为 train_set,将测试数据命名为 test_set。这里 input_columns 是除标签列之外的所有列。
from pyspark.ml.evaluation import BinaryClassificationEvaluator
assembler= VectorAssembler(inputCols=input_columns,outputCol='features')
并调用向量汇编器来转换你的数据框
final_data = assembler.transform(your_dataframe)
print("Train test Split...")
train,test = final_data.randomSplit([0.7,0.3], seed=4000)
lr = LogisticRegression(labelCol="label",
featuresCol="features",maxIter=10 ,threshold=0.5)
lr_model=lr.fit(train_set)
predict_train=lr_model.transform(train_set)
predict_test=lr_model.transform(test_set)
evaluator = BinaryClassificationEvaluator()
print("Test Area Under ROC: " + str(evaluator.evaluate(predict_test, {evaluator.metricName: "areaUnderROC"})))
推荐阅读
- django - 将模型和表单拆分为子文件夹结构 Django 2.0+
- node.js - Mongodb网络错误无法在第一次尝试连接到数据库
- math - 找到这个二次方程组的解析解
- java - 如何将可序列化对象发送到另一个 android 应用程序
- delphi - SelectDirectory() 不起作用。它总是返回 false
- r - R 在读取 csv 文件(Unicode)时添加了奇怪的符号
- protractor - 从网格中的一列获取文本
- android - 如何使用 android init.rc (vendor.rc) 读取文件中的值
- python - Python - 事件发生的次数
- android - 找不到 Gradle DSL 方法:'deleteAllActions()'