python - 拆分文件 .log 字段以在 Dataframe 中使用
问题描述
你好我有一个这种格式的日志
-------------------------------------------------------
==00043== Found File /home/xxx Failed with Error FAILURE_LOG_WINEXE_IF_3276147548] Error while parsing the PE
==00044== Found File /home/glasswall Passed
==00045== Found File /home/xxx Failed with Error CLI] File type could not be detected by
==00046== Found File /home/xxx Failed with Error CLI] File type could not be detected by
----------------------------------------------------------------
我想把它分成 3 个字段,但我不明白我该怎么做。
我想要这样的一些在熊猫的数据框中使用
File number Status Description
00043 Failed Error while parsing the PE
00044 Passed
00045 Failed Failed with Error CLI] File type could not be detected by
00046 Failed Failed with Error CLI] File type could not be detected by
请问你能帮帮我吗?
解决方案
您可以使用:
import pandas as pd
import re
pd_list = []
with open("log.txt") as f:
for line in f:
matches = re.findall(r"==(\d+)==.*(Passed|Failed.*)", line.strip(), re.IGNORECASE | re.MULTILINE)
status = matches[0][1].split()
if len(status) > 1:
pd_list.append([matches[0][0], status[0], matches[0][1]])
else:
pd_list.append([matches[0][0], status[0], ""])
x = pd.DataFrame(pd_list, columns=["File number", "Status", "Description" ])
print(x.to_string())
File number Status Description
0 00043 Failed Failed with Error FAILURE_LOG_WINEXE_IF_3276147548] Error while parsing
1 00044 Passed
2 00045 Failed Failed with Error CLI] File type could not be detected by
3 00046 Failed Failed with Error CLI] File type could not be detected by
推荐阅读
- powershell - CAML 查询完全忽略日期
- mysql - 过程/函数中的 MySQL 数组
- arrays - React.JS- 比较 2 个对象数组并删除没有 ID 的重复项
- python - 如何使用 cmd 创建 django 环境?它在 anaconda shell 中运行良好,而不是在 cmd
- javascript - 如何将 react-draft 不受控制的编辑器内容转换为 html?
- r - 将 Trelliscope 与 Shiny 一起使用
- nix - 向 Nix 中的现有包添加和安装运行时依赖项的最佳方法?
- algorithm - 将希尔伯特空间填充曲线扩展到对称矩阵
- python - 改进 bash 脚本
- javascript - 计算和替换跨度内的值