python - 为什么我的 RMSE 不受数据标准化 (0 - 1) 的影响?
问题描述
如何正确标准化/缩放以影响我的错误指标(RMSE 和 MAE)?或者如果可能的话,我如何标准化计算的 RMSE?我没有逆变换成真实的数字。或者有没有办法标准化计算的 RMSE?
导入所需的库:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
将数据分解为 x/y:
x = data.iloc[:, 1:]
y = data.iloc[:, :1]
调用训练和测试大小:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,
random_state = 0)
from sklearn.preprocessing import StandardScaler
缩放数据(训练集和测试集):
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
data_scaled = scaler.fit_transform(data)
缩放继续:
print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))
配件standardScaler
:
sc_x = StandardScaler()
sc_y = StandardScaler()
x_train = sc_x.fit_transform(x_train)
y_train = sc_y.fit_transform(y_train)
拟合模型(支持向量回归):
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(x_train, y_train)
导入错误指标:
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from math import sqrt
计算第一误差 (RMSE)。这就是一切飙升的地方,我期望在标准化数据的范围内得到一个 RMSE 值,但我得到的是真实数字(rmse = 42596.17):
mse=sqrt(mean_squared_error(y_test,y_pred))
print(mse)
我也没有逆变换成真实的数字。由于我的 RMSE 值不受缩放影响,我决定使用以下代码对 RMSE 进行归一化:
d=pd.DataFrame()
d['y']=y_test
d['X']=y_pred
d=normalize(d)
y_pred = d['X']
y_test = d['y']
mse=sqrt(mean_squared_error(y_test,y_test))
print('Mean Squared Error: ', round(y_test, 3))
尝试标准化顽固的 RMSE 后出现以下错误:
ValueError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value)
3165 try:
-> 3166 value = Series(value)
3167 except (ValueError, NotImplementedError, TypeError) as err:
~\anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
230
--> 231 if is_empty_data(data) and dtype is None:
232 # gh-17261
~\anaconda3\lib\site-packages\pandas\core\construction.py in is_empty_data(data)
595 is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
--> 596 is_simple_empty = is_list_like_without_dtype and not data
597 return is_none or is_simple_empty
~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1328 def __nonzero__(self):
-> 1329 raise ValueError(
1330 f"The truth value of a {type(self).__name__} is ambiguous. "
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-142-60389a99eb72> in <module>
1 d=pd.DataFrame()
----> 2 d['y']=y_test
3 d['X']=y_pred
4 d=normalize(d)
5 y_pred = d['X']
~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
3038 else:
3039 # set column
-> 3040 self._set_item(key, value)
3041
3042 def _setitem_slice(self, key: slice, value):
~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
3113 ensure homogeneity.
3114 """
-> 3115 self._ensure_valid_index(value)
3116 value = self._sanitize_column(key, value)
3117 NDFrame._set_item(self, key, value)
~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value)
3166 value = Series(value)
3167 except (ValueError, NotImplementedError, TypeError) as err:
-> 3168 raise ValueError(
3169 "Cannot set a frame with no defined index "
3170 "and a value that cannot be converted to a Series"
ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series
解决方案
我期望在归一化数据范围内得到一个 RMSE 值,但我得到的是真实数字(rmse = 42596.17):
mse=sqrt(mean_squared_error(y_test,y_pred)) print(mse)
那是因为你没有缩放y_test
。
您确实缩放了整个数据data
集,但前提是您已经从中拆分了训练集和测试集。
推荐阅读
- sql - Postgresql:如何获取一个json,其中一列是键,另一列是值?
- node.js - 如何改进返回 5000 条记录的查询?
- reactjs - 将 ReactNode 数组传递给功能组件时出现类型检查错误
- javascript - Laravel 编辑页面在服务器上时不会调用 CSS 和 JS
- python - 根据列表值过滤列结果
- xpath - Xpath - 查找特定元素,打印该节点的所有元素
- excel - 从运行宏中排除工作表
- unit-testing - 如何使用 mocha-junit-reporter 打印失败的测试
- flutter - 将 admob 添加到同一屏幕时,Flutter webview 不起作用
- c - 从套接字 C 读取 HTTP 标头