python - 将分类网络流量特征转换为数值 - ISCX VPN2016 数据集
问题描述
我正在使用 ISCX VPN2016 数据集对加密的网络流量进行分类,我想实现一种深度神经网络技术进行分类。数据集包括 14 个 pcap 文件,指示 14 个流量类别,我已将 pcap 文件导出为 csv,添加一列作为类并将它们合并为一个文件。但问题是特征的数据类型,我无法将它们转换为数字特征,我尝试在 Numpy、Pandas 和 Sklearn 中使用建议的常用方法,例如:、、、、、……OneHotEncoder
但它们都不起作用。LabelEncoder
astype
get_dummies
我的问题是我应该怎么做才能转换这些功能?如果根本需要转换?这是我的代码:
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense
import numpy as np
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
seed = 9
np.random.seed(seed)
netTraffic = np.loadtxt('netTraffic_100each.csv', delimiter=',', skiprows=1)
# OneHotEncoder
make_column_transformer(
(OneHotEncoder(), ['Source'], ['Destination'], ['Protocol'], ['Info']))
# LabelEncoder
le = preprocessing.LabelEncoder()
le.fit(['Class'])
list(le.classes_)
le.transform(['Class'])
print(netTraffic.Class.dtypes)
X = netTraffic[:, 0:6]
Y = netTraffic[:, 6]
(X_train, X_test, Y_train, Y_test) = train_test_split(X, Y, test_size=0.3, random_state=seed)
model = Sequential()
model.add(Dense(7, input_dim=6, init='uniform', activation='relu'))
model.add(Dense(6, init='uniform', activation='relu'))
model.add(Dense(14, init='uniform', activation='relu'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, Y_train, validation_data=(X_test, Y_test), nb_epoch=20, batch_size=5)
scores = model.evaluate(X_test, Y_test)
print("Accuracy: %.2f%%" % (scores[1] * 100))
这是错误:
Traceback (most recent call last):
File "C:/Users/PycharmProjects/webmining/testNN/neuralNetusingtfSite.py", line 12, in <module>
netTraffic = np.loadtxt('netTraffic_100each.csv', delimiter=',', skiprows=1)
File "C:\Users\Anaconda3\envs\webmining\lib\site-packages\numpy\lib\npyio.py", line 1141, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "C:\Users\Anaconda3\envs\webmining\lib\site-packages\numpy\lib\npyio.py", line 1068, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Users\Anaconda3\envs\webmining\lib\site-packages\numpy\lib\npyio.py", line 1068, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "C:\Users\Anaconda3\envs\webmining\lib\site-packages\numpy\lib\npyio.py", line 775, in floatconv
return float(x)
ValueError: could not convert string to float: 'Dell_b2:5b:a6'
前几行数据:
我还在这里更新了用于此代码的 csv 文件:https ://gofile.io/?c=L8UNYb
解决方案
import pandas as pd
df = pd.read_csv('netTraffic_100each.csv')
df_encoded = pd.get_dummies(df, drop_first=True)
..
推荐阅读
- c# - 如何在 .net core web app 和 web api 之间传递/验证 Open ID 令牌?
- apache-kafka - 卡夫卡流在scala中计数不同?
- javascript - 从我使用 Chrome 扩展程序注入网页的 iFrame 获取 LocalStorage
- python-3.x - 在py中更改日期时间格式
- c++ - C++ duration_cast<>(time_point_end - tine_point_start)。count() 溢出
- google-apps-script - Google App Maker:日期驱动的电子邮件通知
- sql-server - 如何将存储过程中的指定列插入#tempTable
- ios - 在 SwiftCharts 中使用未解析的标识符“值”
- lua - lua 表 - 允许的值和语法
- ssl - SSL 拦截代理