首页 > 解决方案 > Scikit Learn 中的分类朴素贝叶斯给出了 IndexError

问题描述

我尝试使用sklearn来比较不同的分类方法。我有带有姓氏、姓名、性别值的字符串数据,我想定义分类器如何处理性别值。但是,我在分类朴素贝叶斯中遇到错误:

import pandas as pd
import numpy as np
from sklearn.naive_bayes import CategoricalNB
from sklearn.model_selection import train_test_split
from sklearn import metrics

if __name__ == "__main__":
        
    csv_with_all_surnames = folder_path_one_level_up + os.sep + "Results" + os.sep + "surnames_labeled_all.csv"
    csv_naive_bayes_categorical_results = folder_path_one_level_up + os.sep + "Results" + os.sep + "Naive_Bayes_Categorical_results_names_only.csv"
    
    total_accuracy_scores = []
    

    data_to_be_tested = pd.read_csv(csv_with_all_surnames,header=None)
    column_names = ['Surname', 'Name', 'Gender']
    data_to_be_tested.columns=column_names
    
    names_only = data_to_be_tested.drop(['Surname', 'Gender'],axis=1)
    genders=data_to_be_tested['Gender']
    
    names_only = names_only.apply(lambda x: pd.factorize(x)[0])
    genders=genders.factorize()
    genders=genders[0].copy()

    accuracy_score = []
    for test_percent in range(99, 0, -1):
        x_train, x_test, y_train, y_test = train_test_split(names_only, genders, test_size=(test_percent/100), shuffle=False)
        classifier_naive_bayes_categorical = CategoricalNB()
        classifier_naive_bayes_categorical = classifier_naive_bayes_categorical.fit(x_train, y_train)
        y_pred_naive_bayes_categorical = classifier_naive_bayes_categorical.predict(x_test)
        naive_bayes_categorical_accuracy_score = round(metrics.accuracy_score(y_test, y_pred_naive_bayes_categorical), 3)
        file_results_log = open("logFile6.txt","a+")
        file_results_log.write(text + "\n")
        file_results_log.close()
        accuracy_score.append(naive_bayes_categorical_accuracy_score)
    total_accuracy_scores.append(accuracy_score)
"C:\My Files\Upper level folders\Test\Results\surnames_labeled_all.csv" exists.
Traceback (most recent call last):

  File "C:\My Files\Upper level folders\Test\python\naive_bayes_categorical_comp_no_duplicates_all_names_name.py", line 85, in <module>
    y_pred_naive_bayes_categorical = classifier_naive_bayes_categorical.predict(x_test)

  File "C:\Users\ulvi95\anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 75, in predict
    jll = self._joint_log_likelihood(X)

  File "C:\Users\ulvi95\anaconda3\lib\site-packages\sklearn\naive_bayes.py", line 1303, in _joint_log_likelihood
    jll += self.feature_log_prob_[i][:, indices].T

IndexError: index 815 is out of bounds for axis 1 with size 815

问题应该如何解决?

标签: pythonmachine-learningscikit-learn

解决方案


推荐阅读