首页 > 解决方案 > 加载模型和应用预测的正确方法

问题描述

model4 是一个朴素贝叶斯模型,使用 test_size=0.22 进行拆分,代码应加载模型,应用基于训练模型的预测,然后将其与其余数据一起保存为列。

joblib.dump(model4, 'training1.joblib')
# ------------------------loading model-------------------------------
data = pd.read_csv('Documents_data.csv')
model = joblib.load('training1.joblib')
X_all = pandas.get_dummies(data.drop(['score', 'size', 'created', 'user'], axis=1))
y_all = data['score']
pred = model.predict(X_all)
data['prediction'] = pred
data.to_csv('predictions.csv', index=False)

通常我使用 (X_test) 进行预测,例如:

pred = model4.predict(X_test)

我再次尝试拆分数据:

# ------------------------loading model-------------------------------
data = pd.read_csv('Documents_data.csv')
model = joblib.load('training1.joblib')
X_all = pandas.get_dummies(data.drop(['score', 'size', 'created', 'user'], axis=1))
y_all = data['score']
X_train, X_test, y_train, y_test = train_test_split(X_all, y_all, test_size=0.22, random_state=0)
pred = model.predict(X_test)
data['prediction'] = pred
data.to_csv('predictions.csv', index=False)

错误:

Traceback (most recent call last):
  File "C:\Users\coldtea\PycharmProjects\pythonProject\machine_learning.py", line 169, in <module>
    data['prediction'] = pred
  File "C:\Users\coldtea\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3163, in __setitem__
    self._set_item(key, value)
  File "C:\Users\coldtea\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3242, in _set_item
    value = self._sanitize_column(key, value)
  File "C:\Users\coldtea\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\frame.py", line 3899, in _sanitize_column
    value = sanitize_index(value, self.index)
  File "C:\Users\coldtea\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\internals\construction.py", line 751, in sanitize_index
    raise ValueError(
ValueError: Length of values (42) does not match length of index (190)

至少有人可以提供一个类似要求的例子吗?

标签: pythonpython-3.xmachine-learningscikit-learn

解决方案


推荐阅读