python - Pandas:循环获取特定列值
问题描述
这是一个非常短的踪迹。但是原始文件太大了
highest_layer,transport_layer,src_ip,dst_ip,src_port,dst_port,ip_flag,packet_length,transport_flag,time,timestamp,geo_country,data
DNS,UDP,192.168.1.6,172.217.12.131,32631,53,0,89,-1,2020-06-10 19:38:08.863846,1591832288863,Unknown,
DNS,UDP,192.168.1.6,192.168.1.1,31708,53,0,79,-1,2020-06-10 19:38:08.864186,1591832288864,Unknown,
DNS,UDP,192.168.1.6,172.217.12.131,32631,53,0,79,-1,2020-06-10 19:38:08.866492,1591832288866,Unknown,
SSDP,UDP,192.168.1.6,172.217.12.131,32631,1900,0,216,-1,2020-06-10 19:38:08.887298,1591832288887,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,32631,16384,105,-1,2020-06-10 19:38:08.888232,1591832288888,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,32631,443,16384,78,2,2020-06-10 19:38:08.888553,1591832288888,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,31708,16384,95,-1,2020-06-10 19:38:08.895148,1591832288895,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,78,2,2020-06-10 19:38:08.895594,1591832288895,Unknown,
DNS,UDP,192.168.1.1,192.168.1.6,53,16807,16384,119,-1,2020-06-10 19:38:08.896202,1591832288896,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,78,2,2020-06-10 19:38:08.896540,1591832288896,Unknown,
DNS,UDP,192.168.1.6,172.217.12.131,16807,53,0,75,-1,2020-06-10 19:38:08.911968,1591832288911,Unknown,
DATA,UDP,192.168.1.3,192.168.1.6,51216,58185,16384,558,-1,2020-06-10 19:38:08.913276,1591832288913,Unknown,
TCP,TCP,172.217.12.131,192.168.1.6,443,53717,0,74,18,2020-06-10 19:38:08.916735,1591832288916,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,58185,443,16384,66,16,2020-06-10 19:38:08.916860,1591832288916,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,58185,443,16384,583,24,2020-06-10 19:38:08.917442,1591832288917,Unknown,
TCP,TCP,172.217.10.237,192.168.1.6,443,53718,0,74,18,2020-06-10 19:38:08.919293,1591832288919,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,58185,443,16384,66,16,2020-06-10 19:38:08.919423,1591832288919,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,32631,443,16384,583,24,2020-06-10 19:38:08.919593,1591832288919,Unknown,
TCP,TCP,172.217.11.14,192.168.1.6,443,53719,0,74,18,2020-06-10 19:38:08.928819,1591832288928,Unknown,
TCP,TCP,192.168.1.6,172.217.12.131,16807,443,16384,66,16,2020-06-10 19:38:08.928922,1591832288928,Unknown,
TLS,TCP,192.168.1.6,172.217.12.131,58185,443,16384,583,24,2020-06-10 19:38:08.929100,1591832288929,Unknown,
我删除了一些不需要的列,我想从特定的 src_ip(192.168.1.6)、目标 ip 地址(172.217.12.131) 和 src_port(32631,16807,58185) 累积数据包长度。
我想遍历给定 src_ip 和 dest_ip 的 src_port。在这种情况下,对于 3 个 src_port 中的每一个,我需要计算累积数据包长度。绘制 x 轴(相对时间戳 - 这里是索引) y 轴(累积数据包长度)。我希望一个图表包含每个端口累积数据包长度的 3 行。
df = pd.read_csv('read.csv', sep=',')
#Calculate relative time for each dataframe
df.index = df['timestamp'] - df.loc[0,'timestamp']
#Drop unwanted columns
drop = df.drop(columns=['highest_layer', 'transport_layer','ip_flag', 'transport_flag','geo_country','data'])
df1 = drop[(drop.src_ip == '192.168.1.6') & (drop.dst_ip == '172.217.12.131')]
for i in df1['src_port']:
df_cumsum = df1.groupby(['src_ip'])['packet_length'].cumsum()
plt.plot(df.index, df_cumsum,label='i')
如果我明确给出端口号并在没有 for 循环的情况下绘制它,它就可以工作。但是在我遍历 src_port 之后,什么也没有发生。我在这里想念什么。请有任何想法
解决方案
在您的代码进行后续分组后,我通过将其与原始 DF 组合创建了一个表。我根据该表制作了一个图表。
df2 = df1[['src_port','packet_length']].groupby('src_port')['packet_length'].transform('cumsum').to_frame()
df2.columns = ['cumsum_packets']
df3 = pd.concat([df1,df2],axis=1)
import matplotlib.pyplot as plt
import seaborn as sns
sns.lineplot(x=df3.index, y=df3['cumsum_packets'], hue=df3['src_port'], data=df3, legend='full')
| timestamp | src_ip | dst_ip | src_port | dst_port | packet_length | time | timestamp | cumsum_packets |
|------------:|:------------|:---------------|-----------:|-----------:|----------------:|:---------------------------|--------------:|-----------------:|
| 0 | 192.168.1.6 | 172.217.12.131 | 32631 | 53 | 89 | 2020-06-10 19:38:08.863846 | 1591832288863 | 89 |
| 3 | 192.168.1.6 | 172.217.12.131 | 32631 | 53 | 79 | 2020-06-10 19:38:08.866492 | 1591832288866 | 168 |
| 24 | 192.168.1.6 | 172.217.12.131 | 32631 | 1900 | 216 | 2020-06-10 19:38:08.887298 | 1591832288887 | 384 |
| 25 | 192.168.1.6 | 172.217.12.131 | 32631 | 443 | 78 | 2020-06-10 19:38:08.888553 | 1591832288888 | 462 |
| 32 | 192.168.1.6 | 172.217.12.131 | 16807 | 443 | 78 | 2020-06-10 19:38:08.895594 | 1591832288895 | 78 |
| 33 | 192.168.1.6 | 172.217.12.131 | 16807 | 443 | 78 | 2020-06-10 19:38:08.896540 | 1591832288896 | 156 |
| 48 | 192.168.1.6 | 172.217.12.131 | 16807 | 53 | 75 | 2020-06-10 19:38:08.911968 | 1591832288911 | 231 |
| 53 | 192.168.1.6 | 172.217.12.131 | 58185 | 443 | 66 | 2020-06-10 19:38:08.916860 | 1591832288916 | 66 |
| 54 | 192.168.1.6 | 172.217.12.131 | 58185 | 443 | 583 | 2020-06-10 19:38:08.917442 | 1591832288917 | 649 |
| 56 | 192.168.1.6 | 172.217.12.131 | 58185 | 443 | 66 | 2020-06-10 19:38:08.919423 | 1591832288919 | 715 |
| 56 | 192.168.1.6 | 172.217.12.131 | 32631 | 443 | 583 | 2020-06-10 19:38:08.919593 | 1591832288919 | 1045 |
| 65 | 192.168.1.6 | 172.217.12.131 | 16807 | 443 | 66 | 2020-06-10 19:38:08.928922 | 1591832288928 | 297 |
| 66 | 192.168.1.6 | 172.217.12.131 | 58185 | 443 | 583 | 2020-06-10 19:38:08.929100 | 1591832288929 | 1298 |
推荐阅读
- javascript - Javascript 数组 - 全局范围的问题
- c - 如何在C中找到数字中的最小数字及其在向量中的位置?
- c - 无法调试使我的 shell 段错误的 .so 库
- sql - SQL Server SUM 返回 null,它应该返回 0
- python - 定义 __next__ 迭代器方法的正确方法是什么?
- php - 多态关系 - 无法将所有文件存储在数据库中?
- stripe-payments - 从条带中检索“活动订阅”信息
- rust - 为什么在函数内部改变传递的选项不会传播到外部的选项?
- git - GIT 日志图的最佳方法,每个分支都有单独的行
- regex - 提取目录中所有文件的两个关键字之间的字符串