首页 > 解决方案 > 使用 Weka 在 Jupyter Notebook 中进行预测

问题描述

我想使用 Jupyter Notebook 中的 python-weka-wrapper 预测数据,但是,我只得到错误,但没有得到给定的预测。这是我的数据样本:

@relation data_new3

@attribute pos_x numeric
@attribute pos_y numeric
@attribute angle numeric
@attribute vel numeric
@attribute vel_x numeric
@attribute vel_y numeric

@data
414049364,21773560,75.06043,16.775027,15.827037,-5.559355
414049656,21773926,43.232657,4.452031,3.258594,-3.033504
414049938,21774287,43.836066,4.347145,4.300749,-0.633426
414050204,21774638,44.704315,4.157368,3.119995,2.747606

这是我当前的代码:

import weka.core.jvm as jvm
import weka.core.converters as conv
from weka.classifiers import Evaluation, Classifier
from weka.core.classes import Random
import weka.plot.classifiers as plcls
df = conv.load_any_file("data_new3.arff")
df.class_is_last()

cls = Classifier(classname="weka.classifiers.functions.LinearRegression", options=["-C","-S","1"])
evl = Evaluation(df)
evl.crossvalidate_model(cls, df, 10, Random(1))
plcls.plot_classifier_errors(evl.predictions, absolute=False, wait=True)

我想得到 n 个预测(预测)。在上述代码的结果下方。但是,我希望数据没有错误。

在此处输入图像描述

我做错了什么?我在https://fracpete.github.io/python-weka-wrapper3/weka.html中找不到正确的函数

更新:

evl.predictions 内容:

[NUM: 2.341519980578598 -0.46169578717854165 1.0,
 NUM: 0.0324593108656498 -1.250148004408402 1.0,
 NUM: 1.0894596695042125 -0.19638227888390247 1.0,
 NUM: 2.2415137801101532 0.21523000008892268 1.0,
 NUM: 0.8403094260947848 1.4159336571192398 1.0,
 NUM: -0.6246802557143522 0.02694804349266633 1.0,
 NUM: -1.1291083174467442 -0.9771784825188661 1.0,
 NUM: 1.9865123314290711 0.0020180962164886296 1.0,
 NUM: -1.4171054579203468 -0.1185322656565404 1.0,
 NUM: 1.701392413210111 -0.31173498990392545 1.0,
 NUM: 0.9142204697441169 -0.46289838829034125 1.0,
 NUM: 2.2371544887471027 0.26863408847202663 1.0,
 NUM: 1.0104945457853498 0.08266907560209802 1.0,
 NUM: -2.1844092184524277 0.2537549599419435 1.0,
 NUM: -1.64486932964462 -0.0757717380165559 1.0,
 NUM: -0.8833185855520697 0.272053514894651 1.0,
 NUM: 2.023258402624002 0.20075652151535905 1.0,
 NUM: -0.09766261800428815 -0.13010619249325828 1.0,
 NUM: 0.008614470166021827 -2.0206928075313044 1.0,
 NUM: -0.3746145438554381 0.2627094869476423 1.0,
 NUM: 0.321292162562831 -0.39300510611246864 1.0,
 NUM: -0.8603272578575111 -0.24401632088301994 1.0,
 NUM: 1.2917808082313293 0.27398191955035145 1.0,
 NUM: 2.5069928462982736 -0.28666784299093706 1.0,
 NUM: 0.5342954034915244 0.023908866474812385 1.0,
 NUM: -0.8199944215138957 -0.35662294870599 1.0,
 NUM: 1.5190967129296846 -0.494692957136067 1.0,
 NUM: 2.1750768892884005 0.17687020938865317 1.0,
 NUM: -1.3129196874730458 -0.4337196896722162 1.0,
 NUM: 0.9085960948511023 -1.0273595147173182 1.0,
 NUM: -1.790840080776049 -0.7976173866791214 1.0,
 NUM: 0.6226362259069361 0.8034413426921674 1.0,
 NUM: -1.6641718286913476 0.1441503225123597 1.0,
 NUM: 0.9671958169480396 0.4460301975123002 1.0,
 NUM: -0.0762479157090975 -0.014214109052772983 1.0,
 NUM: 0.1274295194870555 -0.7136733953120711 1.0,

标签: pythonjupyter-notebookwrapperwekaforecast

解决方案


交叉验证用于收集统计数据,而不是用于构建可用于进行预测的模型。在您的情况下,10 倍 CV 在评估过程中生成(并丢弃)10 个模型,并从每个训练/测试对(您可以通过predictions属性访问)收集预测。

而不是评估您的分类器并绘制错误:

evl = Evaluation(df)
evl.crossvalidate_model(cls, df, 10, Random(1))
plcls.plot_classifier_errors(evl.predictions, absolute=False, wait=True)

训练分类器并使用它。像这样的东西(对于数字类):

from weka.core.dataset import Instances
train, test = Instances.train_test_split(df, 80.0)  # 80/20 train/test split
cls.build_classifier(train)
for i in range(test.num_instances):
    print(cls.classify_instance(test.get_instance(i)))

一些指示:


推荐阅读