python - 如何检测日志文件中是否存在csv文件中的字符串?
问题描述
任务:
我的任务是将 csv 文件第一列中的字符串与日志文件匹配,如果存在,则将匹配的字符串放在第三列中,否则将“未检测到”
我的日志文件的内容 - trendx.log 我的 csv 文件的内容 - sha1_vsdt.csv
预期输出:
代码:
到目前为止,我已经使用 pandaframe 和 numpy 使用了这个概念,只是听从了某人的建议
import numpy as np
import pandas as pd
import csv
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=r"|",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print df[['SHA-1','VSDT','PRG','IP']]
然后我有这个错误:
Warning (from warnings module):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 6
logdata = np.genfromtxt("trendx.log", delimiter=" ",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
ConversionWarning: Some errors were detected !
Line #1 - #113 (got 1 columns instead of 24)
Traceback (most recent call last):
File "C:\Users\Administrator\Desktop\OJT\match.py", line 9, in <module>
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "C:\Python27\lib\site-packages\pandas\core\frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "C:\Python27\lib\site-packages\pandas\core\indexing.py", line 1327, in _convert_to_indexer
.format(mask=objarr[mask]))
KeyError: '[10 14 15] not in index'
解决方案
这段代码应该可以工作。您不需要传入分隔符,np.genfromtxt
因为它默认为您可能想要的空白分隔符。
此外,分隔符pd.read_csv
应该是“,”,因为它是一个 csv 文件。
import numpy as np
import pandas as pd
import csv
#Log data into dataframe using genfromtxt
logdata = np.genfromtxt("trendx.log",invalid_raise = False,dtype=str, comments=None,usecols=np.arange(0,24))
logframe = pd.DataFrame(logdata)
#Dataframe trimmed to use only SHA1, PRG and IP
df2=(logframe[[10,14,15]]).rename(columns={10:'SHA1', 14: 'PRG',15:'IP'})
#sha1_vsdt data into dataframe using read_csv
df1=pd.read_csv("sha1_vsdt.csv",delimiter=",",error_bad_lines=False,engine = 'python',quoting=3)
#Using merge to compare the two CSV
df = pd.merge(df1, df2, left_on='SHA-1', right_on='SHA1', how='left').replace(np.nan, 'undetected', regex=True)
print(df[['SHA-1','VSDT','PRG','IP']])
此代码产生的输出
SHA-1 ... IP
0 0191a23ee122bdb0c69008971e365ec530bf03f5 ... undetected
1 02b809d4edee752d9286677ea30e8a76114aa324 ... undetected
2 0349e0101d8458b6d05860fbee2b4a6d7fa2038d ... undetected
3 035a7afca8b72cf1c05f6062814836ee31091559 ... undetected
4 042065bec5a655f3daec1442addf5acb8f1aa824 ... undetected
5 04939e040d9e85f84d2e2eb28343d94a50ed46ac ... undetected
6 04a1876724b53a016cd9e9c93735985938c91fa4 ... undetected
7 06109df23f7d5deadf0b2c158af1f71c2997d245 ... undetected
8 06194c240c12c51b55d2961ae287fd9628e05751 ... undetected
9 0665de1ad83715cc6e68d00ed700c469944a5925 ... undetected
10 067b448f4c9782489e5ff60c31c62b7059e500b2 ... undetected
11 0688e6966b0e4a1f58d2f3de48f960fce5b42292 ... undetected
12 0689f6f99d10dd8bf396f2d2c73ce9dcb6dcad23 ... undetected
13 06a60c6018a42b1db22e3bf8620861711401c4bb ... undetected
14 0723a895a5f8b2d5d25b4303e9f04d16551791b6 ... undetected
15 07344621cf4480c430f8931af2b2b056775af7e3 ... undetected
16 07831df482f1a34310fc4f5a092c333eeaff4380 ... undetected
17 08386105057cd5867480095696a5ca6701fdb8ad ... undetected
18 0ad5f62b4ec10397b7d13433a8dc794dc6d4f273 ... undetected
19 0bed7d032d5c51f606befd2f10b94e5c75a6a1e3 ... Administrator
20 0c3f8d2cce9e7a6e5604b8d0c9fbe1ff6fd5cebb ... undetected
21 0c793b4f4e0be7f24f93786d7d4a719a7a002a0d ... undetected
推荐阅读
- docker - Debian 突然抛出 'libcrypt.so.1: cannot open shared object file: No such file or directory' in Docker
- javascript - 在 React Native 中每天进行一次条件渲染
- postgresql - Celery 容器中的 pg_dump 与其他容器中的 pg_dump 不同
- typescript - 从 CommonJS 切换到 ES2020 模块后必须附加 /index 以导入 index.d.ts
- node.js - MissingSchemaError:尚未为模型“农场”注册架构
- delphi - 选择哪一个,stdcall 还是 cdecl?
- java - 幂等Kafka生产者回调
- python - 在 Raspbian 中启动后如何启动 python 脚本?
- postgresql - AWS Postgres 设置 pg_trgm.word_similarity_threshold
- gradle - 通过 Nexus 下载 Gradle 构建请求额外的“/gradle/plugin”目录