首页 > 解决方案 > 为什么我的 RMSE 不受数据标准化 (0 - 1) 的影响?

问题描述

如何正确标准化/缩放以影响我的错误指标(RMSE 和 MAE)?或者如果可能的话,我如何标准化计算的 RMSE?我没有逆变换成真实的数字。或者有没有办法标准化计算的 RMSE?

导入所需的库:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

将数据分解为 x/y:

x = data.iloc[:, 1:]
y = data.iloc[:, :1]

调用训练和测试大小:

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2,
                                                    random_state = 0)

from sklearn.preprocessing import StandardScaler

缩放数据(训练集和测试集):

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler() 
data_scaled = scaler.fit_transform(data)

缩放继续:

print(data_scaled.mean(axis=0))
print(data_scaled.std(axis=0))

配件standardScaler

sc_x = StandardScaler()
sc_y = StandardScaler()
x_train = sc_x.fit_transform(x_train)
y_train = sc_y.fit_transform(y_train)

拟合模型(支持向量回归):

from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(x_train, y_train)

导入错误指标:

from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
from sklearn.metrics import mean_absolute_error
from math import sqrt

计算第一误差 (RMSE)。这就是一切飙升的地方,我期望在标准化数据的范围内得到一个 RMSE 值,但我得到的是真实数字(rmse = 42596.17):

mse=sqrt(mean_squared_error(y_test,y_pred))
print(mse)

我也没有逆变换成真实的数字。由于我的 RMSE 值不受缩放影响,我决定使用以下代码对 RMSE 进行归一化:

d=pd.DataFrame()
d['y']=y_test
d['X']=y_pred
d=normalize(d)
y_pred = d['X']
y_test = d['y']
mse=sqrt(mean_squared_error(y_test,y_test))
print('Mean Squared Error: ', round(y_test, 3))

尝试标准化顽固的 RMSE 后出现以下错误:

    ValueError                                Traceback (most recent call last)
    ~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value)
       3165             try:
    -> 3166                 value = Series(value)
       3167             except (ValueError, NotImplementedError, TypeError) as err:
    
    ~\anaconda3\lib\site-packages\pandas\core\series.py in __init__(self, data, index, dtype, name, copy, fastpath)
        230 
    --> 231             if is_empty_data(data) and dtype is None:
        232                 # gh-17261
    
    ~\anaconda3\lib\site-packages\pandas\core\construction.py in is_empty_data(data)
        595     is_list_like_without_dtype = is_list_like(data) and not hasattr(data, "dtype")
    --> 596     is_simple_empty = is_list_like_without_dtype and not data
        597     return is_none or is_simple_empty
    
    ~\anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
       1328     def __nonzero__(self):
    -> 1329         raise ValueError(
       1330             f"The truth value of a {type(self).__name__} is ambiguous. "
    
    ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    
    The above exception was the direct cause of the following exception:
    
    ValueError                                Traceback (most recent call last)
    <ipython-input-142-60389a99eb72> in <module>
          1 d=pd.DataFrame()
    ----> 2 d['y']=y_test
          3 d['X']=y_pred
          4 d=normalize(d)
          5 y_pred = d['X']
    
    ~\anaconda3\lib\site-packages\pandas\core\frame.py in __setitem__(self, key, value)
       3038         else:
       3039             # set column
    -> 3040             self._set_item(key, value)
       3041 
       3042     def _setitem_slice(self, key: slice, value):
    
    ~\anaconda3\lib\site-packages\pandas\core\frame.py in _set_item(self, key, value)
       3113         ensure homogeneity.
       3114         """
    -> 3115         self._ensure_valid_index(value)
       3116         value = self._sanitize_column(key, value)
       3117         NDFrame._set_item(self, key, value)
    
    ~\anaconda3\lib\site-packages\pandas\core\frame.py in _ensure_valid_index(self, value)
       3166                 value = Series(value)
       3167             except (ValueError, NotImplementedError, TypeError) as err:
    -> 3168                 raise ValueError(
       3169                     "Cannot set a frame with no defined index "
       3170                     "and a value that cannot be converted to a Series"
    
    ValueError: Cannot set a frame with no defined index and a value that cannot be converted to a Series

标签: pythonmachine-learningvalueerrornormalize

解决方案


我期望在归一化数据范围内得到一个 RMSE 值,但我得到的是真实数字(rmse = 42596.17):

mse=sqrt(mean_squared_error(y_test,y_pred))
print(mse)

那是因为你没有缩放y_test

您确实缩放了整个数据data集,但前提是您已经从中拆分了训练集和测试集。


推荐阅读