首页 > 解决方案 > csv排序和删除python

问题描述

我有一个 csv 文件,其中包含以下数据。

192.168.136.192,2848,100.100.100.212,6667,"other"
100.100.100.212,6667,192.168.136.192,2848,"other"
100.100.100.212,6667,192.168.136.192,2848,"CHAT IRC message"
192.168.61.74,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"other"
192.168.107.87,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"infection"
192.168.177.85,4662,69.192.30.179,80,"other"
192.168.118.168,4662,69.192.30.179,80,"infection"
192.168.118.168,4662,69.192.30.179,80,"other"
192.168.110.111,4662,69.192.30.179,80,"infection"

到目前为止,我已经能够删除重复项,现在我需要删除 src =dest && dest =source && message == message 的行以及 src =src && dest = dest ||src =dest && dest =source && remove 的行如果他们的 = 标记为“感染”,则带有“其他”的那些 基本上将它们视为相同的连接 这是我迄今为止删除重复项的内容

with open(r'alerts.csv','r') as in_file, open('alertsfix.csv','w') as     out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
    if line in seen: continue # skip duplicate

    seen.add(line)
    out_file.write(line)

基本上

src/prt/dest/prt/msg
1. a/a1/b/b1/c
2. 2a/2a1/2b/2b1/2c

条件:

if a==2b && a1==2b1 && b==2a && b1==2a1 c==2c
    delete one of them being they are equal 

或者

if a==2b && a1==2b1 && b==2a && b1==2a1  ( c==other ) &&( 2c=="infected" || 2c=='CNC") 
    delete one that has message "other" 

我对 python 很陌生,任何指导将不胜感激

标签: pythoncsv

解决方案


首先,您必须定义相等的条件。例如,以下代码仅在满足两个条件时才会认为行相等:

  • 两个参与地址(ip 和 post 一起)是相同的;我使用 afrozenset添加两个地址,所以顺序无关紧要。
  • 消息是一样的。

您可以使用frozenset(内置的不可修改集)为每一行构建键以实现在seen集中的查找:

with open('alerts.csv','r') as in_file, open('alertsfix.csv','w') as out_file:
    seen = set()
    for line in in_file:
        line = line.strip()
        if len(line) > 0:
            src_ip, src_port, dst_ip, dst_port, msg = line.split(',')
            src = '{}:{}'.format(src_ip, src_port)
            dst = '{}:{}'.format(dst_ip, dst_port)
            key = frozenset([
                frozenset([src, dst]),
                msg,
            ])

            if key not in seen:
                seen.add(key)         # we add 'key' to the set
                out_file.write(line)  # we write 'line' to the new file

这是否有助于您完成任务?


推荐阅读