首页 > 解决方案 > 拆分文件 .log 字段以在 Dataframe 中使用

问题描述

你好我有一个这种格式的日志

-------------------------------------------------------
==00043== Found File /home/xxx  Failed with Error FAILURE_LOG_WINEXE_IF_3276147548] Error while parsing the PE

==00044== Found File /home/glasswall  Passed

==00045== Found File /home/xxx  Failed with Error CLI] File type could not be detected by

==00046== Found File /home/xxx  Failed with Error CLI] File type could not be detected by

----------------------------------------------------------------

我想把它分成 3 个字段,但我不明白我该怎么做。

我想要这样的一些在熊猫的数据框中使用

File number              Status     Description 
00043                       Failed      Error while parsing the PE 
00044                       Passed     
00045                       Failed      Failed with Error CLI] File type could not be detected by 
00046                       Failed      Failed with Error CLI] File type could not be detected by

请问你能帮帮我吗?

标签: pythonregexpandasfilesplit

解决方案


您可以使用:

import pandas as pd
import re
pd_list = []
with open("log.txt") as f:
    for line in f:
        matches = re.findall(r"==(\d+)==.*(Passed|Failed.*)", line.strip(), re.IGNORECASE | re.MULTILINE)
        status = matches[0][1].split()
        if len(status) > 1:
            pd_list.append([matches[0][0], status[0], matches[0][1]])
        else:
            pd_list.append([matches[0][0], status[0], ""])

x = pd.DataFrame(pd_list, columns=["File number", "Status", "Description" ])
print(x.to_string())

  File number  Status                    Description
0       00043  Failed    Failed with Error FAILURE_LOG_WINEXE_IF_3276147548] Error while parsing
1       00044  Passed                                                                         
2       00045  Failed    Failed with Error CLI] File type could not be detected by
3       00046  Failed    Failed with Error CLI] File type could not be detected by

演示


推荐阅读