首页 > 解决方案 > K-NN:训练 MSE,K=1 不等于 0

问题描述

理论上,k = 1 的训练 MSE 应该为零。但是,以下脚本显示不同。我首先生成一些玩具数据:x 代表睡眠时间,y 代表幸福。然后我训练数据并预测结果。最后,我通过两种方法计算训练数据的 MSE。谁能告诉我出了什么问题?

from sklearn.neighbors import KNeighborsRegressor

model = KNeighborsRegressor(n_neighbors=1)

import numpy as np
x = np.array([7,8,6,7,5.7,6.8,8.6,6.5,7.8,5.7,9.8,7.7,8.8,6.2,7.1,5.7]).reshape(16,1)
y = np.array([5,7,4,5,6,9,7,6.8,8,7.6,9.3,8.2,7,6.2,3.8,6]).reshape(16,1)

model = model.fit(x,y)

for hours_slept in range(1,11):
    happiness = model.predict([[hours_slept]])
    print("if you sleep %.0f hours, you will be %.1f happy!" %(hours_slept, happiness))


# calculate MSE

# fast method
def model_mse(model,x,y):
    predictions = model.predict(x)
    return np.mean(np.power(y-predictions,2))
print(model_mse(model,x,y))

输出:

if you sleep 1 hours, you will be 6.0 happy!
if you sleep 2 hours, you will be 6.0 happy!
if you sleep 3 hours, you will be 6.0 happy!
if you sleep 4 hours, you will be 6.0 happy!
if you sleep 5 hours, you will be 6.0 happy!
if you sleep 6 hours, you will be 4.0 happy!
if you sleep 7 hours, you will be 5.0 happy!
if you sleep 8 hours, you will be 7.0 happy!
if you sleep 9 hours, you will be 7.0 happy!
if you sleep 10 hours, you will be 9.3 happy!
0.15999999999999992 #strictly larger than 0!

标签: machine-learningscikit-learnknnsupervised-learning

解决方案


在您的数据中,具有in和x的多个标签。训练后,算法为变量分配标签,在评估过程中,当它第二次遇到时,它返回但不返回。所以,这对的平方误差是和均方误差,考虑到其他误差是,是- 正是你的结果。5.7y67.665.75.767.6(7.6 - 6)**2 = 2.5601/16 * 2.56 = 0.16


推荐阅读