首页 > 解决方案 > 我该如何改进这个 knn 算法?

问题描述

我试图从头开始开发一个 KNN 到 Iris 数据,但进入了唱歌 - 对初学者来说非常混乱。你能花一点时间帮我解决这个错误 IndexError: index 4 is out of bounds for axis 0 with size 4吗?非常感谢

我想申请 13,000 个邻居,但进展不顺利

from sklearn import datasets # import datasets
import numpy as np # import numpy
iris = datasets.load_iris() # load data 
X = iris.data # get features
y = iris.target # get targets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

def euclidean_distance(row1, row2):
    distance = 0.0
    for i in range(len(row1)-1):
        distance += (row1[i] - row2[i])**2
    return sqrt(distance)

# Locate the most similar neighbors
def call_neighbors(X_train, X_test, num_neighbors):
    distances = list()
    for train_row in X_train:
        dist = euclidean_distance(X_test, train_row)
        distances.append((X_train, dist))
    distances.sort(key=lambda tup: tup[1])
    neighbors = list()
    for i in range(14):
        neighbors.append(distances[i][0])
    return neighbors
neighbors = call_neighbors(X_train, X_test, 13)
for neighbor in neighbors:
    print(neighbor)
# Make a classification prediction with neighbors
def predict_classification(train, test_row, num_neighbors):
    neighbors = get_neighbors(X_train, X_test, 13)
    output_values = [row[-1] for row in neighbors]
    prediction = max(set(output_values), key=output_values.count)
    return prediction

标签: pythonperformance-testingknn

解决方案


dist = euclidean_distance(X_test, train_row)

似乎 X_test 和 train_row 有不同的大小。

尝试使用打印数组的形状

print(X_test.shape, train_row.shape)

并相应地纠正它们


推荐阅读