首页 > 解决方案 > 在python中计算Knn

问题描述

我想写一个函数来返回一些包含多数的类。

我编写了以下函数来计算距离。

给定距离 度量(Euclid、Manthan 等)。

xTrainInstances - 是一个包含所有火车实例的数据框

xSeriesTestVector - 是一个 Series 对象,来自测试集

 def calc_distances(xSeriesTestVector, xTrainInstances, distanceMetric):
 distances = np.zeros(xTrainInstances.shape[0])
 for i in range(xTrainInstances.shape[0]):
    distances[i] = distanceMetric(xSeriesTestVector, xTrainInstances.iloc[i])
return distances

假设我有以下数据框,幸存的列是我的类别。

                    Survived
 PassengerId          
    1                   0
    2                   1
    3                   1
    4                   1
    5                   0

我的问题

我想知道如何实现以下功能?我被卡住了,distances 为我返回了一系列距离,从 predict_one_instance,我想返回正确的类别

标签: pythonmachine-learningknn

解决方案


请看一下这个例子,使用'manhattan'

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"

# Assign colum names to the dataset
names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe
dataset = pd.read_csv(url, names=names)


dataset.head()


X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values


from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20)


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)


from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors=5, metric='manhattan')
classifier.fit(X_train, y_train)


y_pred = classifier.predict(X_test)


from sklearn.metrics import classification_report, confusion_matrix
print(confusion_matrix(y_test, y_pred))
print(classification_report(y_test, y_pred))

三个大体相似,但略有不同的结果

# manhattan
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00         9
Iris-versicolor       1.00      1.00      1.00        15
 Iris-virginica       1.00      1.00      1.00         6

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30


# euclidian
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        11
Iris-versicolor       0.90      1.00      0.95         9
 Iris-virginica       1.00      0.90      0.95        10

       accuracy                           0.97        30
      macro avg       0.97      0.97      0.96        30
   weighted avg       0.97      0.97      0.97        30


# minkowski
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        13
Iris-versicolor       1.00      0.85      0.92        13
 Iris-virginica       0.67      1.00      0.80         4

       accuracy                           0.93        30
      macro avg       0.89      0.95      0.91        30
   weighted avg       0.96      0.93      0.94        30

只需在运行这 3 个示例时更改指标(您可以轻松地循环遍历列表中的这三个项目以自动化整个过程):

metric='manhattan'
metric='euclidian'
metric='minkowski'

资源:

https://www.bogotobogo.com/python/scikit-learn/scikit_machine_learning_k-NN_k-nearest-neighbors-algorithm.php


推荐阅读