首页 > 解决方案 > python - 将CSV列一分为二

问题描述

我正在尝试将进程哈希字段拆分为两个字段,这样"md5" "sha256" "process_name" "process_effective_reputation",我已经尝试了上面的代码,但我得到了

row = {'md5': data['process_hash'][0], 'sha256': data['process_hash'][1]}
IndexError: list index out of range

json数据:

{'results': [{'device_name': 'faaadc2',
          'device_timestamp': '2020-10-27T00:50:46.176Z',
          'event_id': '9b1bvfaa11eb81b',
          'process_effective_reputation': 'LIST5',
          'process_hash': ['bfc7dcf5935f3avda9df8e9b6425c37a',
                           'ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
          'process_name': 'c:\\program files '
                          '(x86)\\to122soft\\thcaadf3\\tohossce.exe',
          'process_username': ['JOHN\\user1']},
         {'device_name': 'fk6saadc2',
          'device_timestamp': '2020-10-27T00:50:46.176Z',
          'event_id': '9b151f6e17ee11eb81b',
          'process_effective_reputation': 'LIST1',
          'process_hash': ['bfc7dcf5935f3a9df8e9baaa425c37a',
                           'ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
          'process_name': 'c:\\program files '
                          '(x86)\\oaaft\\tf3\\toaaotsice.exe',
          'process_username': ['JOHN\\user2']},
         {'device_name': 'sdddsdc2',
          'device_timestamp': '2020-10-27T00:50:46.176Z',
          'event_id': '9b151f698e11eb81b',
          'process_effective_reputation': 'LIST',
          'process_hash': ['9df8ebfc7dcf5935830f3a9b6asdcd7a',
                           'ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19'],
          'process_name': 'c:\\program files '
                          '(x86)\\toht\\thaa3\\toasce.exe',
          'process_username': ['JOHN\\user3']}]}

response = json.loads(r.text)
r = response['results']

selected_fields = []
for d in r:
    selected_fields.append({k: d[k] for k in ("process_hash", "process_name", "process_effective_reputation")})

new_data = []
for data in selected_fields:
    fieldnames = 'md5 sha256 process_name process_effective_reputation'.split()
    row = {'md5': data['process_hash'][0], 'sha256': data['process_hash'][1]}
    # Copy process_name and process_effective_reputation fields.
    row.update({fieldname: data[fieldname] for fieldname in fieldnames[-2:]})
    new_data.append(row)
return new_data

当前的 csv 数据:

process_hash    process_name    process_effective_reputation
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['73ca11f2acf1adb7802c2914e1026db899a3c851cd9500378c0045e0']    c:\users\zdr3dds01\documents\sap\sap gui\export.mhtml   NOT_LISTED
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2']    c:\windows\system32\delltpad\apmsgfwd.exe   ADAPTIVE_WHITE_LIST
 ['582f018bc7a732d63f624d6f92b3d143', '66505bcb9975d61af14dd09cddd9ac0d11a3e2b5ae41845c65117e7e2b046d37']    c:\users\jij09\appdata\local\kingsoft\power word 2016\2016.3.3.0368\powerword.exe   ADAPTIVE_WHITE_LIST

我试图用 CSV 文件实现的目标:

 md5   sha256   process_name  process_effective_reputation

谢谢

更新:感谢该代码有效,但现在它再次返回重复项并且 csv 格式不正确,例如,如果只有一种类型的哈希,该行将向右移动,并且所有列都不会正确排列。对不起,我还在学习python,请帮助

md5 sha256  process_name    process_effective_reputation
082642cf23a33a9c6fd1e5e671c075e4    ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7    \\plaapp01\hupzar\winsad\winsadib.exe   ADAPTIVE_WHITE_LIST
082642cf23a33a9c6fd1e5e671c075e4    ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7    \\plaapp01\hupzar\winsad\winsadib.exe   ADAPTIVE_WHITE_LIST
5c3471076193ef7c1d0df4cd42b58249bfd49fd68332d38c645c35d709b449d9    c:\users\it\appdata\local\temp\{a70cbf04-a246-434a-bd96-b5cfd84e765d}\qualcomm atheros ethernet driver installer.msi    NOT_LISTED  
082642cf23a33a9c6fd1e5e671c075e4    ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7    \\plaapp01\hupzar\winsad\winsadib.exe   ADAPTIVE_WHITE_LIST

标签: pythoncsv

解决方案


这是我的解决方案。md5如果在哈希列表中有一个长度值在范围内,则我为列30创建值,40否则没有md5,我将md5列留空,对于sha256但范围相同,因为在您的数据60中,由于某种原因有时是or和or长度。如果您想严格地将我的代码中的范围替换为and 。如果列表中有多个值,也会采用最左边的值,对于。其余需要的列 (和) 只是被复制过来。70md5sha256313263643264(32, 33)(64, 65)md5md5sha256process_nameprocess_effective_reputation

对于创建 CSV,我使用默认设置(请参阅csv.DictWriter(...)在我的代码中创建对象),这意味着 CSV 具有,作为分隔符,要使用您在问题中提到的例如选项卡,只需添加额外的参数delimiter = '\t'as csv.DictWriter(..., delimiter = '\t' ,...)。如果需要,您可能想要添加其他 CSV 写入参数,请在此处阅读它们

在线尝试!

def create_csv(json_data):
    import csv, io
    fbuf = io.StringIO()
    writer = csv.DictWriter(fbuf, fieldnames = [
        'md5', 'sha256', 'process_name', 'process_effective_reputation'])
    writer.writeheader()
    for device in json_data['results']:
        writer.writerow({
            **{h : ([e for e in device['process_hash'] if l0 <= len(e) < l1] + [''])[0]
                for h, l0, l1 in (('md5', 30, 40), ('sha256', 60, 70))},
            **{e : device[e] for e in ('process_name', 'process_effective_reputation')},
        })
    print(fbuf.getvalue())
    with open('output.csv', 'w', encoding = 'utf-8') as f:
        f.write(fbuf.getvalue())

json_data = {
    "results": [
        {
            "device_name": "faaadc2",
            "device_timestamp": "2020-10-27T00:50:46.176Z",
            "event_id": "9b1bvfaa11eb81b",
            "process_effective_reputation": "LIST5",
            "process_hash": [
                "bfc7dcf5935f3avda9df8e9b6425c37a",
                "ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19",
            ],
            "process_name": "c:\\program files "
            "(x86)\\to122soft\\thcaadf3\\tohossce.exe",
            "process_username": ["JOHN\\user1"],
        },
        {
            "device_name": "fk6saadc2",
            "device_timestamp": "2020-10-27T00:50:46.176Z",
            "event_id": "9b151f6e17ee11eb81b",
            "process_effective_reputation": "LIST1",
            "process_hash": [
                "bfc7dcf5935f3a9df8e9baaa425c37a",
                "ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19",
            ],
            "process_name": "c:\\program files " "(x86)\\oaaft\\tf3\\toaaotsice.exe",
            "process_username": ["JOHN\\user2"],
        },
        {
            "device_name": "sdddsdc2",
            "device_timestamp": "2020-10-27T00:50:46.176Z",
            "event_id": "9b151f698e11eb81b",
            "process_effective_reputation": "LIST",
            "process_hash": [
                "9df8ebfc7dcf5935830f3a9b6asdcd7a",
                "ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19",
            ],
            "process_name": "c:\\program files " "(x86)\\toht\\thaa3\\toasce.exe",
            "process_username": ["JOHN\\user3"],
        },
    ]
}

create_csv(json_data)

输出:

md5,sha256,process_name,process_effective_reputation
bfc7dcf5935f3avda9df8e9b6425c37a,ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19,c:\program files (x86)\to122soft\thcaadf3\tohossce.exe,LIST5
bfc7dcf5935f3a9df8e9baaa425c37a,ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19,c:\program files (x86)\oaaft\tf3\toaaotsice.exe,LIST1
9df8ebfc7dcf5935830f3a9b6asdcd7a,ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19,c:\program files (x86)\toht\thaa3\toasce.exe,LIST

推荐阅读