python - y 标签是否需要是特定类型才能使 SVM 工作？

问题描述

我正在努力使用 SVM 分类器根据类别对多米诺骨牌图像进行分类，例如 1x3。

我有 28 种不同类别的多米诺骨牌的 2000 多张图像（可以在这里下载）。

我正在使用 scikit-learn 和 SVM 作为算法运行以下脚本：

import matplotlib.pyplot as plt
from sklearn import svm, metrics
from sklearn.model_selection import train_test_split
import numpy as np
import os # Working with files and folders
from PIL import Image # Image processing

rootdir = os.getcwd()

image_file = 'images.npy'
key_file = 'keys.npy'

if (os.path.exists(image_file) and os.path.exists(key_file)):
  print "Loading existing numpy's"
  pixel_arr = np.load(image_file)
  key = np.load(key_file)
else:
  print "Creating new numpy's"  
  key_array = []
  pixel_arr = np.empty((0,10000), "uint8")

  for subdir, dirs, files in os.walk('data'):
    dir_name = subdir.split("/")[-1]    
    if "x" in dir_name:
      for file in files:
        if ".DS_Store" not in file:
          im = Image.open(os.path.join(subdir, file))
          if im.size == (100,100):            
            key_array.append(dir_name)          
            numpied_image = np.array(im.convert('L')).reshape(1,-1)
            #Image.fromarray(np.reshape(numpied_image,(-1,100)), 'L').show()
            pixel_arr = np.append(pixel_arr, numpied_image, axis=0)
          im.close()

  key = np.array(key_array)
  np.save(image_file, pixel_arr)
  np.save(key_file, key)


# Create a classifier: a support vector classifier
classifier = svm.SVC(gamma='auto')

X_train, X_test, y_train, y_test = train_test_split(pixel_arr, key, test_size=0.1,random_state=33)

# We learn the digits on the first half of the digits
print "Fitting classifier"
classifier.fit(X_train, y_train)

# Now predict the value of the digit on the second half:
expected = y_test

print "Predicting"
predicted = classifier.predict(X_test)

print("Classification report for classifier %s:\n%s\n"
      % (classifier, metrics.classification_report(expected, predicted)))
print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))

产生以下结果：

Classification report for classifier SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False):
             precision    recall  f1-score   support

        0x0       0.00      0.00      0.00         9
        1x0       0.00      0.00      0.00         9
        1x1       0.00      0.00      0.00        12
        2x0       0.00      0.00      0.00        12
        2x1       0.00      0.00      0.00        10
        2x2       0.00      0.00      0.00         7
        3x0       0.00      0.00      0.00         7
        3x1       0.00      0.00      0.00         8
        3x2       0.00      0.00      0.00         8
        3x3       0.01      1.00      0.02         3
        4x0       0.00      0.00      0.00        11
        4x1       0.00      0.00      0.00        10
        4x2       0.00      0.00      0.00         8
        4x3       0.00      0.00      0.00        15
        4x4       0.00      0.00      0.00         8
        5x0       0.00      0.00      0.00        12
        5x1       0.00      0.00      0.00         7
        5x2       0.00      0.00      0.00        11
        5x3       0.00      0.00      0.00         7
        5x4       0.00      0.00      0.00         9
        5x5       0.00      0.00      0.00        14
        6x0       0.00      0.00      0.00        11
        6x1       0.00      0.00      0.00        12
        6x2       0.00      0.00      0.00        11
        6x3       0.00      0.00      0.00         9
        6x4       0.00      0.00      0.00         9
        6x5       0.00      0.00      0.00        18
        6x6       0.00      0.00      0.00        13

avg / total       0.00      0.01      0.00       280


>>> print("Confusion matrix:\n%s" % metrics.confusion_matrix(expected, predicted))
Confusion matrix:
[[ 0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  3  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 11  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 10  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 15  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  8  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 11  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  7  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 14  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 11  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 12  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 11  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0  9  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 18  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]
 [ 0  0  0  0  0  0  0  0  0 13  0  0  0  0  0  0  0  0  0  0  0  0  0  0
   0  0  0  0]]

显然有些事情是非常错误的。即使分类器可以自由猜测，它也会获得更好的精度！我无法确认的怀疑是我创建 key/y 标签的方式不正确。尽管如此，脚本运行没有错误，但无法预测任何事情。

让我认为密钥有问题的一件事是混淆矩阵没有任何标签。

当您得到这样的结果时，可能会出现什么错误？

编辑：我尝试使用 LabelEncoder ，key但结果是一样的。

Edit2：我还尝试了不同的 lambda，并手动将 lambda 设置为 0.00001，结果分类器得分为 0.05（与上述相比有所改进）。我不希望分类器在这些数据上是完美的，但我至少希望在 60-70% 的范围内，而不是 5%。

标签： pythonnumpyscikit-learn

python - y 标签是否需要是特定类型才能使 SVM 工作？

问题描述

解决方案

推荐阅读