python - python - 将CSV列一分为二
问题描述
我正在尝试将进程哈希字段拆分为两个字段,这样"md5" "sha256" "process_name" "process_effective_reputation"
,我已经尝试了上面的代码,但我得到了
row = {'md5': data['process_hash'][0], 'sha256': data['process_hash'][1]}
IndexError: list index out of range
json数据:
{'results': [{'device_name': 'faaadc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b1bvfaa11eb81b',
'process_effective_reputation': 'LIST5',
'process_hash': ['bfc7dcf5935f3avda9df8e9b6425c37a',
'ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files '
'(x86)\\to122soft\\thcaadf3\\tohossce.exe',
'process_username': ['JOHN\\user1']},
{'device_name': 'fk6saadc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f6e17ee11eb81b',
'process_effective_reputation': 'LIST1',
'process_hash': ['bfc7dcf5935f3a9df8e9baaa425c37a',
'ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19'],
'process_name': 'c:\\program files '
'(x86)\\oaaft\\tf3\\toaaotsice.exe',
'process_username': ['JOHN\\user2']},
{'device_name': 'sdddsdc2',
'device_timestamp': '2020-10-27T00:50:46.176Z',
'event_id': '9b151f698e11eb81b',
'process_effective_reputation': 'LIST',
'process_hash': ['9df8ebfc7dcf5935830f3a9b6asdcd7a',
'ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19'],
'process_name': 'c:\\program files '
'(x86)\\toht\\thaa3\\toasce.exe',
'process_username': ['JOHN\\user3']}]}
response = json.loads(r.text)
r = response['results']
selected_fields = []
for d in r:
selected_fields.append({k: d[k] for k in ("process_hash", "process_name", "process_effective_reputation")})
new_data = []
for data in selected_fields:
fieldnames = 'md5 sha256 process_name process_effective_reputation'.split()
row = {'md5': data['process_hash'][0], 'sha256': data['process_hash'][1]}
# Copy process_name and process_effective_reputation fields.
row.update({fieldname: data[fieldname] for fieldname in fieldnames[-2:]})
new_data.append(row)
return new_data
当前的 csv 数据:
process_hash process_name process_effective_reputation
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['73ca11f2acf1adb7802c2914e1026db899a3c851cd9500378c0045e0'] c:\users\zdr3dds01\documents\sap\sap gui\export.mhtml NOT_LISTED
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['f810a809e9cdf70c3189008e07c83619', '58d44528b60d36b515359fe234c9332ccef6937f5c950472230ce15dca8812e2'] c:\windows\system32\delltpad\apmsgfwd.exe ADAPTIVE_WHITE_LIST
['582f018bc7a732d63f624d6f92b3d143', '66505bcb9975d61af14dd09cddd9ac0d11a3e2b5ae41845c65117e7e2b046d37'] c:\users\jij09\appdata\local\kingsoft\power word 2016\2016.3.3.0368\powerword.exe ADAPTIVE_WHITE_LIST
我试图用 CSV 文件实现的目标:
md5 sha256 process_name process_effective_reputation
谢谢
更新:感谢该代码有效,但现在它再次返回重复项并且 csv 格式不正确,例如,如果只有一种类型的哈希,该行将向右移动,并且所有列都不会正确排列。对不起,我还在学习python,请帮助
md5 sha256 process_name process_effective_reputation
082642cf23a33a9c6fd1e5e671c075e4 ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7 \\plaapp01\hupzar\winsad\winsadib.exe ADAPTIVE_WHITE_LIST
082642cf23a33a9c6fd1e5e671c075e4 ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7 \\plaapp01\hupzar\winsad\winsadib.exe ADAPTIVE_WHITE_LIST
5c3471076193ef7c1d0df4cd42b58249bfd49fd68332d38c645c35d709b449d9 c:\users\it\appdata\local\temp\{a70cbf04-a246-434a-bd96-b5cfd84e765d}\qualcomm atheros ethernet driver installer.msi NOT_LISTED
082642cf23a33a9c6fd1e5e671c075e4 ad0020c2b55708528edb7e54dc35878b7309084d011357398051d2644fe707c7 \\plaapp01\hupzar\winsad\winsadib.exe ADAPTIVE_WHITE_LIST
解决方案
这是我的解决方案。md5
如果在哈希列表中有一个长度值在范围内,则我为列30
创建值,40
否则没有md5
,我将md5
列留空,对于sha256
但范围相同,因为在您的数据60
中,由于某种原因有时是or和or长度。如果您想严格地将我的代码中的范围替换为and 。如果列表中有多个值,也会采用最左边的值,对于。其余需要的列 (和) 只是被复制过来。70
md5
sha256
31
32
63
64
32
64
(32, 33)
(64, 65)
md5
md5
sha256
process_name
process_effective_reputation
对于创建 CSV,我使用默认设置(请参阅csv.DictWriter(...)
在我的代码中创建对象),这意味着 CSV 具有,
作为分隔符,要使用您在问题中提到的例如选项卡,只需添加额外的参数delimiter = '\t'
as csv.DictWriter(..., delimiter = '\t' ,...)
。如果需要,您可能想要添加其他 CSV 写入参数,请在此处阅读它们。
def create_csv(json_data):
import csv, io
fbuf = io.StringIO()
writer = csv.DictWriter(fbuf, fieldnames = [
'md5', 'sha256', 'process_name', 'process_effective_reputation'])
writer.writeheader()
for device in json_data['results']:
writer.writerow({
**{h : ([e for e in device['process_hash'] if l0 <= len(e) < l1] + [''])[0]
for h, l0, l1 in (('md5', 30, 40), ('sha256', 60, 70))},
**{e : device[e] for e in ('process_name', 'process_effective_reputation')},
})
print(fbuf.getvalue())
with open('output.csv', 'w', encoding = 'utf-8') as f:
f.write(fbuf.getvalue())
json_data = {
"results": [
{
"device_name": "faaadc2",
"device_timestamp": "2020-10-27T00:50:46.176Z",
"event_id": "9b1bvfaa11eb81b",
"process_effective_reputation": "LIST5",
"process_hash": [
"bfc7dcf5935f3avda9df8e9b6425c37a",
"ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19",
],
"process_name": "c:\\program files "
"(x86)\\to122soft\\thcaadf3\\tohossce.exe",
"process_username": ["JOHN\\user1"],
},
{
"device_name": "fk6saadc2",
"device_timestamp": "2020-10-27T00:50:46.176Z",
"event_id": "9b151f6e17ee11eb81b",
"process_effective_reputation": "LIST1",
"process_hash": [
"bfc7dcf5935f3a9df8e9baaa425c37a",
"ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19",
],
"process_name": "c:\\program files " "(x86)\\oaaft\\tf3\\toaaotsice.exe",
"process_username": ["JOHN\\user2"],
},
{
"device_name": "sdddsdc2",
"device_timestamp": "2020-10-27T00:50:46.176Z",
"event_id": "9b151f698e11eb81b",
"process_effective_reputation": "LIST",
"process_hash": [
"9df8ebfc7dcf5935830f3a9b6asdcd7a",
"ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19",
],
"process_name": "c:\\program files " "(x86)\\toht\\thaa3\\toasce.exe",
"process_username": ["JOHN\\user3"],
},
]
}
create_csv(json_data)
输出:
md5,sha256,process_name,process_effective_reputation
bfc7dcf5935f3avda9df8e9b6425c37a,ca9f3a2450asd518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19,c:\program files (x86)\to122soft\thcaadf3\tohossce.exe,LIST5
bfc7dcf5935f3a9df8e9baaa425c37a,ca9f3aaa506cc518fc939a33c100b2d557f96e040f712f6dd4641ad1734e2f19,c:\program files (x86)\oaaft\tf3\toaaotsice.exe,LIST1
9df8ebfc7dcf5935830f3a9b6asdcd7a,ca9f3a24506cc518fdfrcv39a33c100b2d557f96e040f7124641ad1734e2f19,c:\program files (x86)\toht\thaa3\toasce.exe,LIST
推荐阅读
- r - 如何根据 R 中的相邻多边形填充缺失值?
- python-3.x - Python SQLite3 按特定字段列出表中的数据
- go - 将未编组的 yaml interface{} 转换为实际结构
- android - 无法通过android studio中的url在webview中打开pdf
- python - Python融化数据框
- javascript - Expo (web) 找到应用程序数据文件,但 Expo (android) 没有
- java - 了解 Hazelcast 中的这些指标名称的含义
- node.js - 在 Cognito 中链接第三方应用程序帐户
- android - Branch.io 是否在 expo 托管工作流中工作?
- html - 验证到