python - 熊猫合并单个数据框内的行
问题描述
Pandas 新手,有一个我自己无法回答的问题。对于上下文,这是从防火墙输出的。它会生成数百万个数据包,我正在尝试将这些数据聚合到防火墙规则集中。我想出的最好方法是根据目标 IP 识别流量。
如果源/目标端口是短暂的,它们会发生变化,因此将它们聚合到同一行中很重要。这样我就可以确定规则集的端口范围。
原始 CSV:
dvc,"src_interface",transport,"src_ip","src_port","dest_ip","dest_port",方向,动作,原因,计数 "Firewall-1",outside,tcp,"4.4.4.4",53," 1.1.1.1",1025,outbound,allowed,"",2 "Firewall-1",outside,tcp,"4.4.4.4",53,"1.1.1.1",1026,outbound,allowed,"",2 " Firewall-1",outside,tcp,"4.4.4.4",22,"1.1.1.1",1028,outbound,allowed,"",2 "Firewall-1",outside,tcp,"3.3.3.3",22 ,"2.2.2.2",2200,出站,允许,"",2
数据框:
dvc src_interface transport src_ip src_port dest_ip dest_port direction action cause count
0 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025 outbound allowed NaN 2
1 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1026 outbound allowed NaN 2
2 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1028 outbound allowed NaN 2
3 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed NaN 2
我将如何合并具有相同 dest_ip 的行?
代码:
df = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')], ignore_index = True)
index_cols = df.columns.tolist()
index_cols.remove('dest_ip')
df = df.groupby(index_cols, as_index=False)['dest_ip'].apply(list)
print(df)
预期输出:
Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 1025-1026,1028 outbound allowed nan 2
Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 2200 outbound allowed nan 2
我在网上找到的大多数示例都涉及加入两个数据框,而我只有一个。任何帮助,将不胜感激。提前致谢!
解决方案
尝试这个。将您希望复制信息的所有列分组,然后将不同的“dest_port”值聚合到一个列表中:
df = pd.DataFrame([
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1025,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",53,"1.1.1.1",1026,"outbound","allowed","",2],
["Firewall-1","outside","tcp","4.4.4.4",22,"1.1.1.1",1028,"outbound","allowed","",2],
["Firewall-1","outside","tcp","3.3.3.3",22,"2.2.2.2",2200,"outbound", "allowed","",2]
],
columns=["dvc","src_interface","transport","src_ip","src_port","dest_ip","dest_port","direction", "action", "cause", "count"])
index_cols = df.columns.tolist()
index_cols.remove("dest_port")
df = df.groupby(index_cols)["dest_port"].apply(list)
df = df.reset_index()
这会导致剩余 3 行,而不是您想要的输出中的 2 行:
dvc src_interface transport src_ip src_port dest_ip direction action cause count dest_port
0 Firewall-1 outside tcp 3.3.3.3 22 2.2.2.2 outbound allowed 2 [2200]
1 Firewall-1 outside tcp 4.4.4.4 22 1.1.1.1 outbound allowed 2 [1028]
2 Firewall-1 outside tcp 4.4.4.4 53 1.1.1.1 outbound allowed 2 [1025, 1026]
推荐阅读
- sql - 在 PostgreSQL 和 Micrsoft SQL Server 中解析字符串
- html - 如何在作为参数传入的函数中获取组件实例?
- ios - 从 AppDelegate 实例化视图控制器 - 视图控制器代码运行但不显示在屏幕上
- ios - 在swiftui中的文本字段文本非空等条件下启用/禁用按钮?这不是基于按下按钮状态
- spring-boot - 为什么 RabbitMQ 给 AMQP 协议版本不匹配;我们是版本 0-9-1,服务器发送签名 3,1,0,0 接收?
- python - 为什么 if 循环在这个 python 程序上不起作用?
- apache-spark - Spark Streaming 从输入数据中提取模式
- android - Android在后台停止定期工作管理器
- java - java中的移动数组
- python - 如何将 influxdb 列作为节点红色流中 python 脚本的输入数组