首页 > 解决方案 > raise ValueError ValueError: 找到具有 0 个特征的数组 (shape=(124, 0)) 而至少需要 1

问题描述

我正在尝试对具有 124 行和 13 个特征的数据集应用 PCA(主成分分析)。我正在尝试查看要使用多少功能(通过逻辑回归)来获得最准确的预测,我在这里有以下代码:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df_wine = pd.read_csv('https://archive.ics.uci.edu/ml/'
    'machine-learning-databases/wine/wine.data', header=None)

from sklearn.model_selection import train_test_split
X, y = df_wine.iloc[:, 1:].values, df_wine.iloc[:, 0].values
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, test_size=0.3, stratify=y, random_state=0)
# standardize the features
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.fit_transform(X_train)
X_test_std = sc.transform(X_test)

from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
# initializing the PCA transformer and
# logistic regression estimator:
pca = PCA() #prof recommends getting rid of m_components = 3 
lr = LogisticRegression()
# dimensionality reduction:
X_train_pca = pca.fit_transform(X_train_std)
X_test_pca = pca.transform(X_test_std)

"""
rows = len(X_train_pca)
columns = len(X_train_pca[0])
print(rows)
print(columns)
"""

# fitting the logistic regression model on the reduced dataset:
for i in range(12):
    lr.fit(X_train_pca[:, :i], y_train)
    y_train_pca = lr.predict(X_train_pca[:, :i])
    print('Training accuracy:', lr.score(X_train_pca[:, :i], y_train))

我收到错误消息: raise ValueError("Found array with %d features(s) (shape=%s) while" ValueError: Found array with 0 feature(s) (shape=(124, 0)) 同时至少1 是必需的。

据我了解,for 循环范围在 12 处是正确的,因为它将遍历所有 13 个特征(0 到 12),我试图让 for 循环遍历所有特征(先用一个特征进行逻辑回归,然后再用两个,然后 3.... 一直到所有 13 个特征,然后看看它们的准确度是多少,看看有多少特征最有效)。

标签: pythonfor-looplogistic-regressionpca

解决方案


对于您的错误:

X_train_pca[:, :i]wheni=0会给你一个空数组,它作为.fit().

怎么解决:

如果您想仅使用截距拟合模型,您可以在 X 中显式设置fit_intercept=FalseLogisticRegression()添加一个额外的列(最左侧),填充为 1(作为截距)。


推荐阅读