首页 > 解决方案 > 我正在尝试使用 sklearn.neighbors.KernelDensity 查找分类器错误

问题描述

对于代码打击,我尝试了几种方法来计算在下面给出的测试样本“M”上测试时的准确性和分类错误。简单地说,我试图计算分类器错过分类数据点的次数,并在这两个类的测试样本总数中得到它们的百分比。

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import multivariate_normal as mvn
from sklearn.neighbors import KernelDensity as KD
from matplotlib.colors import ListedColormap

# Fix random state for reproducibility
np.random.seed(1978081)

mm0 = np.array([2,2])
mm1= np.array([4,4])
Sig0 = 4*np.identity(2)
Sig1 = 4*np.identity(2)

N = 50 # number of points in each class
X0 = mvn.rvs(mm0,Sig0,N)
x0,y0 = np.split(X0,2,1)
X1 = mvn.rvs(mm1,Sig1,N)
x1,y1 = np.split(X1,2,1)
X = np.concatenate((X0,X1),axis=0)
y = np.concatenate((np.zeros(N),np.ones(N)))


# Generate Test data
M = 10
M0 = mvn.rvs(mm0, Sig0, M)
mx0, my0 = np.split(M0, 2, 1)
M1 = mvn.rvs(mm1, Sig1, M)
mx1, my1 = np.split(M1, 2, 1)
MX = np.concatenate((M0, M1), axis=0)
My = np.concatenate((np.zeros(M), np.ones(M)))


cmap_light = ListedColormap(['#ffe0c0','#b7faff'])
                             
h = .01  # mesh step size
x_min,x_max = (-3,9)
y_min,y_max = (-3,9)
for b in [1]:  #, 3, 5, 7, 9, 11]:
    clf0 = KD(b)
    clf0.fit(X0)
    clf1 = KD(b)
    clf1.fit(X1)
    xx,yy = np.meshgrid(np.arange(x_min,x_max,h),np.arange(y_min,y_max,h))
    Z0 = clf0.score_samples(np.c_[xx.ravel(), yy.ravel()])
    Z1 = clf1.score_samples(np.c_[xx.ravel(), yy.ravel()])
    Z = Z0<=Z1
    Z = Z.reshape(xx.shape)


    fig,ax=plt.subplots(figsize=(8,8),dpi=150)
    plt.rc('xtick',labelsize=16)
    plt.rc('ytick',labelsize=16)
    plt.plot(x0,y0,'.r',markersize=8) # class 0
    plt.plot(x1,y1,'.b',markersize=8) # class 1
    plt.plot(mx0, my0, '.g', markersize=8)  # class 0 Test
    plt.plot(mx1, my1, '.y', markersize=8)  # class 1 Test
    plt.xlim([-3,9])
    plt.ylim([-3,9])
    plt.title('N = ' + str(N) + ' , k = ' + str(b))
    plt.pcolormesh(xx,yy,Z,cmap=cmap_light)
    ax.contour(xx,yy,Z,colors='black',linewidths=0.5)
    plt.show()
    fig.savefig('c05_kernel'+str(int(10*b))+'.png',bbox_inches="tight",facecolor="white")

然而,当使用分数函数时,它给出了不合理的答案。计算两个类的测试样本 M 的每个点的未分类次数并计算它们的总和的最佳方法是什么?

标签: pythonscikit-learnclassificationkernel-density

解决方案


推荐阅读