首页 > 解决方案 > Pandas:循环获取特定列值

问题描述

这是一个非常短的踪迹。但是原始文件太大了

highest_layer,transport_layer,src_ip,dst_ip,src_port,dst_port,ip_flag,packet_length,transport_flag,time,timestamp,geo_country,data
DNS,UDP,192.168.1.6,172.217.12.131,32631,53,0,89,-1,2020-06-10 19:38:08.863846,1591832288863,Unknown,
DNS,UDP,192.168.1.6,192.168.1.1,31708,53,0,79,-1,2020-06-10 19:38:08.864186,1591832288864,Unknown,
DNS,UDP,192.168.1.6,172.217.12.131,32631,53,0,79,-1,2020-06-10 19:38:08.866492,1591832288866,Unknown,
SSDP,UDP,192.168.1.6,172.217.12.131,32631,1900,0,216,-1,2020-06-10 19:38:08.887298,1591832288887,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,32631,16384,105,-1,2020-06-10 19:38:08.888232,1591832288888,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,32631,443,16384,78,2,2020-06-10 19:38:08.888553,1591832288888,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,31708,16384,95,-1,2020-06-10 19:38:08.895148,1591832288895,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,78,2,2020-06-10 19:38:08.895594,1591832288895,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,16807,16384,119,-1,2020-06-10 19:38:08.896202,1591832288896,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,78,2,2020-06-10 19:38:08.896540,1591832288896,Unknown,
DNS,UDP,192.168.1.6,172.217.12.131,16807,53,0,75,-1,2020-06-10 19:38:08.911968,1591832288911,Unknown,
DATA,UDP,192.168.1.3,192.168.1.6,51216,58185,16384,558,-1,2020-06-10 19:38:08.913276,1591832288913,Unknown,
TCP,TCP,172.217.12.131,192.168.1.6,443,53717,0,74,18,2020-06-10 19:38:08.916735,1591832288916,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,58185,443,16384,66,16,2020-06-10 19:38:08.916860,1591832288916,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,58185,443,16384,583,24,2020-06-10 19:38:08.917442,1591832288917,Unknown,
TCP,TCP,172.217.10.237,192.168.1.6,443,53718,0,74,18,2020-06-10 19:38:08.919293,1591832288919,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,58185,443,16384,66,16,2020-06-10 19:38:08.919423,1591832288919,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,32631,443,16384,583,24,2020-06-10 19:38:08.919593,1591832288919,Unknown,
TCP,TCP,172.217.11.14,192.168.1.6,443,53719,0,74,18,2020-06-10 19:38:08.928819,1591832288928,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,66,16,2020-06-10 19:38:08.928922,1591832288928,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,58185,443,16384,583,24,2020-06-10 19:38:08.929100,1591832288929,Unknown,

我删除了一些不需要的列,我想从特定的 src_ip(192.168.1.6)、目标 ip 地址(172.217.12.131) 和 src_port(32631,16807,58185) 累积数据包长度。
我想遍历给定 src_ip 和 dest_ip 的 src_port。在这种情况下,对于 3 个 src_port 中的每一个,我需要计算累积数据包长度。绘制 x 轴(相对时间戳 - 这里是索引) y 轴(累积数据包长度)。我希望一个图表包含每个端口累积数据包长度的 3 行。

df = pd.read_csv('read.csv', sep=',')

#Calculate relative time for each dataframe
df.index = df['timestamp'] - df.loc[0,'timestamp']

#Drop unwanted columns
drop = df.drop(columns=['highest_layer', 'transport_layer','ip_flag', 'transport_flag','geo_country','data'])


df1 = drop[(drop.src_ip == '192.168.1.6') & (drop.dst_ip == '172.217.12.131')]

for i in df1['src_port']:
    df_cumsum = df1.groupby(['src_ip'])['packet_length'].cumsum()
    plt.plot(df.index, df_cumsum,label='i')

如果我明确给出端口号并在没有 for 循环的情况下绘制它,它就可以工作。但是在我遍历 src_port 之后,什么也没有发生。我在这里想念什么。请有任何想法

标签: pythonpandas

解决方案


在您的代码进行后续分组后,我通过将其与原始 DF 组合创建了一个表。我根据该表制作了一个图表。

df2 = df1[['src_port','packet_length']].groupby('src_port')['packet_length'].transform('cumsum').to_frame()
df2.columns = ['cumsum_packets']
df3 = pd.concat([df1,df2],axis=1)
import matplotlib.pyplot as plt
import seaborn as sns

sns.lineplot(x=df3.index, y=df3['cumsum_packets'], hue=df3['src_port'], data=df3, legend='full')

|   timestamp | src_ip      | dst_ip         |   src_port |   dst_port |   packet_length | time                       |     timestamp |   cumsum_packets |
|------------:|:------------|:---------------|-----------:|-----------:|----------------:|:---------------------------|--------------:|-----------------:|
|           0 | 192.168.1.6 | 172.217.12.131 |      32631 |         53 |              89 | 2020-06-10 19:38:08.863846 | 1591832288863 |               89 |
|           3 | 192.168.1.6 | 172.217.12.131 |      32631 |         53 |              79 | 2020-06-10 19:38:08.866492 | 1591832288866 |              168 |
|          24 | 192.168.1.6 | 172.217.12.131 |      32631 |       1900 |             216 | 2020-06-10 19:38:08.887298 | 1591832288887 |              384 |
|          25 | 192.168.1.6 | 172.217.12.131 |      32631 |        443 |              78 | 2020-06-10 19:38:08.888553 | 1591832288888 |              462 |
|          32 | 192.168.1.6 | 172.217.12.131 |      16807 |        443 |              78 | 2020-06-10 19:38:08.895594 | 1591832288895 |               78 |
|          33 | 192.168.1.6 | 172.217.12.131 |      16807 |        443 |              78 | 2020-06-10 19:38:08.896540 | 1591832288896 |              156 |
|          48 | 192.168.1.6 | 172.217.12.131 |      16807 |         53 |              75 | 2020-06-10 19:38:08.911968 | 1591832288911 |              231 |
|          53 | 192.168.1.6 | 172.217.12.131 |      58185 |        443 |              66 | 2020-06-10 19:38:08.916860 | 1591832288916 |               66 |
|          54 | 192.168.1.6 | 172.217.12.131 |      58185 |        443 |             583 | 2020-06-10 19:38:08.917442 | 1591832288917 |              649 |
|          56 | 192.168.1.6 | 172.217.12.131 |      58185 |        443 |              66 | 2020-06-10 19:38:08.919423 | 1591832288919 |              715 |
|          56 | 192.168.1.6 | 172.217.12.131 |      32631 |        443 |             583 | 2020-06-10 19:38:08.919593 | 1591832288919 |             1045 |
|          65 | 192.168.1.6 | 172.217.12.131 |      16807 |        443 |              66 | 2020-06-10 19:38:08.928922 | 1591832288928 |              297 |
|          66 | 192.168.1.6 | 172.217.12.131 |      58185 |        443 |             583 | 2020-06-10 19:38:08.929100 | 1591832288929 |             1298 |

在此处输入图像描述


推荐阅读