首页 > 解决方案 > 加入两个具有相同行的 numpy 数组:我得到一个 ValueError

问题描述

我正在使用的数据集有一些属于类别的列。我对他们应用了 OneHotEncoder。然后,我尝试将数字特​​征数组和 OneHotEncoder 的结果数组连接起来,从而形成一个包含所有特征的单个数组!

第一个数组是(5074382, 82),第二个是(5074382, 9276434)

我试过了:

features_final = np.column_stack((features2, features_encoded))

features_final将被用来代替features

features_encoded

(5074382, 9276434)    dtype('float64')   scipy.sparse.csr.csr_matrix 

特点2

(5074382, 82)       dtype('float64')       numpy.ndarray

编码:

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt
    import seaborn as sns
    pd.options.display.max_columns = None  #Display all dataframe columns in a Jupyter Python Notebook
    pd.set_option('display.max_rows', 1000)
    get_ipython().run_line_magic('matplotlib', 'inline')

    CIC2019 = pd.read_csv(r"DrDoS_DNS.csv")

    remove =lambda x:x.strip()# remove the blancks in columns names

    columns = list(CIC2019.columns)

    new_columns =list(map(lambda x:x.strip(),columns))# removing blamcks

    CIC2019 = pd.read_csv(r"CSV-01-12\DrDoS_DNS.csv", names =new_columns, header = None, skiprows=1,nrows=None)
    CIC2019.rename(columns={"Unnamed: 0": "ID"}, inplace=True)

    CIC2019 = CIC2019.dropna()
    CIC2019.isna().sum()

    features = CIC2019.drop("Label", axis =1)

    # # Handling categorical attributes

    from sklearn.preprocessing import OneHotEncoder
    encoder = OneHotEncoder()

    CIC2019["Label"]

    Label_encoded = encoder.fit_transform(CIC2019["Label"].to_numpy().reshape(1,-1))

    features[["Flow ID","Source IP","Timestamp","SimillarHTTP","Destination IP"]]

    features2 = features.drop(["Flow ID","Source IP","Timestamp","Destination IP","SimillarHTTP"], axis =1)
    features2 = features2.to_numpy()

    features_encoded = encoder.fit_transform(features[["Flow ID","Source IP","Timestamp","Destination IP",]].to_numpy())
    #"SimillarHTTP" : error when you added this
    # # Training - Linear Regression

    features_final = np.column_stack((features2, features_encoded))

我得到了错误:

ValueError:连接轴的所有输入数组维度必须完全匹配,但沿维度 0,索引 0 处的数组大小为 5074382,索引 1 处的数组大小为 1

发生了什么?如何解决?

标签: pythonnumpynumpy-ndarray

解决方案


推荐阅读